SlideShare a Scribd company logo
0
Mu Sigma Confidential
Chicago, IL
Bangalore, India
www.mu-sigma.com
Proprietary Information
"This document and its attachments are confidential. Any unauthorized copying, disclosure or distribution of the material is strictly forbidden"
Chicago, IL
Bangalore, India
www.mu-sigma.com
Proprietary Information
"This document and its attachments are confidential. Any unauthorized copying, disclosure or distribution of the material is strictly forbidden"
Do The Math
HPC User Forum
2013
Trends Discussion
zubin.dowlaty@mu-sigma.com
Twitter: @crunchdata
1
Mu Sigma Confidential
The information explosion is leading to an unprecedented ability to
store, manage and analyze data; we need to run ahead..
Information Ecosystem
► Text
Expanding Applications
Data Source Explosion
Technology Evolution
Advanced Analytical Techniques
Digital Data in 2011:
~ 1750 Exabytes
Facebook: 100+
million users per day
Mobile Phone
Subscription: 4.6
billion
15 Million Bing
recommendation records
RFID Tags: 40 billion
I
N
F
O
R
M
A
T
I
O
N
C
L
O
U
D
Click-stream
data
Purchase intent
data
Social Media,
Blogs
Email
Archives
RFID & Geospatial Information
Events
Opinion Mining
Social Graph
Targeting
Influencer
Marketing
Offer
Optimization
Clinical Data
Analysis
Fraud
Detection
SKU
Rationalization
Massively large
databases
Search
Technologies
Complex Event
Processing
Cloud
Computing
Software as a
service
Interactive
Visualization
In Memory
Analytics
CART
Adaptive
Regression
Splines
Machine
Learning
Radial
Basis Functions
Support Vector
Machines
Neural
Networks Geospatial Predictive
Modeling
2
Mu Sigma Confidential
Three Key Macro Analytics Trends Targeting Consumption
Macro Trends: Agenda in All Three
Analytics Trends
Computational Enablement
– Scalable Analytics, Big Data &
NoSQL technologies, GP-GPU, In –
Memory & High Performance
Computing
– Master Data Management, Cloud,
and Federated Data – the end of
the central EDW?
– Analytics as a Service and the
Cloud
– Open Source Maturity
– Rise of Agile Self Service
– Data Model Disruption: ETL vs. ELT
Usability and Visualization
– Dashboards will Evolve into Cockpits
– Improved Interactive Data Visualization & Aesthetics
– Augmented / Virtual Reality / Natural User Interfaces
– Graph Analytics sparked by Social Data
– Discovery with Computational Geometry
– Mobile BI and the Integration of Consumer & Enterprise Devices
– Gamification: Do you want to play a game?
Intelligent Systems
(“Anticipation Denotes Intelligence”)
– Operational Smart Systems are
Back propagating – Pervasive
Analytics
– Real Time Analytics and Event
Streams
– Artificial Intelligence (AI) is not
what we expected
– Don’t forget the 4th tier it’s Juicy
– Metaheuristics and the Ants
– Operational Analytics, You are the
Exception
– Experiments in the Enterprise and
the Quasi
– Just Say No to OLS
– Energy Informatics if Radiating
– Intelligent Search
3
Mu Sigma Confidential
Big Data’s recent enterprise popularity can be attributed to two key
factors
Big Data Technology Evolution
2003 2004 2005 2006 2007 2008 2009 2010 2011
Apache: Hadoop project
Yahoo: 10K core
cluster
IBM
Acquires
Early Research Open source dev
momentum
Initial success
stories
Commercialization &
Adoption
Cloudera named most promising
startup funded in 2009
elastic map-reduce
Teradata buys Aster
(map-reduce DB)
Google Trigger: Releases “Map Reduce” & “GFS” paper
Google: Bigtable
paper
Lucene subproject
HP Acquires Vertica
EMC Acquires
Greenplum
2012
Oracle
MSOFT
SAS
 Technology Trigger
– Robust Open Source Technology for Parallel Computation on Commodity Hardware (Hadoop /
NoSQL)
 Frustration Trigger..
– Tyranny of the Data Model, Cost & Complexity for data integration, Agility, Impact, Bureaucracy,
Difficulty & Costly to scale of existing EDW technology
Faced with storing massive data sets from indexing the
web, Google searches for & finds a way to store & retrieve
the data in commodity hardware
4
Mu Sigma Confidential
A Mindset for Enabling Decision Sciences At
Scale
Dataset -> Toolset -> Skillset -> Mindset
Big Data – What is it?
 Skillset
– Map-Reduce, HDFS, HIVE, R, PIG, NoSQL, Unstructured Data
– Parallel computation & Stream Processing
– Data Scientist and Data Explorer
– “…big data job postings on Dice more than tripled year-over-
year – Jan 2013”
 Mindset
– Automation and Scale
– Higher utilization of machines for decisions
– Agile
– Learning vs. Knowing
– Consumption
– Algorithmic and Heuristic (Man & Machine)
5
Mu Sigma Confidential
Trend: Dashboards will Evolve to be More Actionable and Forward
Looking
Dashboredom?
http://www.information-
management.com/issues/20_7/dashboards_data_management_quality_b
usiness_intelligence-10019101-1.html
“As we move from the information age into the intelligence age”..
Usability and Visualization
6
Mu Sigma Confidential
Open Source: D3 (Data-Driven Documents) http://d3js.org/
1. Helps to bring data to life using
HTML5, SVG and CSS.
2. Its an extension of “Protovis”, a
very popular JavaScript library.
3. With Minimal overhead, D3 is
extremely fast, supporting large
datasets and dynamic behavior
for interaction and animation.
4. It provides many built-in
reusable functions and function
factories, such as graphical
primitives for area, line and pie
charts.
http://d3js.org/
Usability and Visualization
7
Mu Sigma Confidential
Open Source:
 R is the premier language for data scientists
– Strength in data visualizations..
– http://www.r-project.org/
Usability and Visualization
8
Mu Sigma Confidential
Yippee ! 5 % discount ..
Yeah.. I think I need these
headphones too
The current business scenario demands automating and
injecting intelligence into various enterprise touch points
Business Situation
Webpage
Add to
Cart
Checkout
Yearly Bonus 
Let me buy a new laptop
Taxonomy,
Associations,
Nearest Neighbor
Popup
Optimization,
Need State
Personalization
Collaborative
Filtering
Recommender
Systems
Offer Optimization
Behind the
Scenes
Email
Marketing
Mix
CRM
Revenue
Management
Hey ! I have some good
deals being provided here
Wow ! Great offers ..
Since I have some more
money, let me have a look
Customer Segmentation
Propensity Score
Attribution Methods to
Optimize Media Spend
Pricing
Elasticity
Add Intelligent Analytics Everywhere You Can to Unlock The Value In Information
Search Buy laptop
Intelligent Systems
9
Mu Sigma Confidential
Current News on Intelligent Systems
CEP Vendors Acquired June 2013: Streambase (Tibco) and Progress
Apama (Software AG)
USA Today, September 16, 2012
– http://www.usatoday.com/money/business/story/2012/09/16/review-automate-this-
probes-math-in-global-trading/57782600/1
 Two Second Advantage: Published 2011
– Excellent portrayal of Intelligent Systems, patterns from the human brains – mental
models
– Wayne Gretzky: Edge: He predicted a little better than anyone else, where the hockey
puck would be.
– Enterprise 2.0: Every transaction became a bit of digital data
– Enterprise 3.0: Every EVENT can become a bit of digital data
– There will always be value in making projections, days, months, years, predictive
analytics, data mining, and analytics systems on historical data – batch systems
– But in today’s world, enterprises need something more, instantaneous, proactive,
predictive capability – real time systems
 GE Mind an Machines: November 2012
» Industrial Internet, Intelligent Systems, and Intelligent Machines
» http://www.ge.com/docs/chapters/Industrial_Internet.pdf
» http://www.youtube.com/watch?v=SvI3Pmv-DhE
Intelligent Systems
10
Mu Sigma Confidential
Decision Support Stack for Operationalizing Analytics
Stack Diagram
Platforms
Assets/
Products
Decision
Sciences
muESP™
muBuzz™
muFlow™
muMix™
muPDNA™
muRx™
Data
Sciences
muXo™
muHPC™
muText™
Math
+
Business
+
Technology
Technology
Math
+
Technology
Mu Sigma Future
State Reference
Analytics Design
11
Mu Sigma Confidential
Mu Sigma’s I&D Big Data team has developed MapReduce
implementations of frequently utilized algorithms in R
muHPC™ - Packages
muHPC™ Packages - Map Reduce implementations of frequently utilized statistical algorithms in R
muEDA – Built on RHIVE (A package for executing HIVE queries through R)
Consists of functions to perform frequency analysis, univariate
analysis and other EDA techniques
muGLM – Built on RMR (A package for writing MapReduce codes in R)
Consists of functions to fit linear and generalized linear models
muKMeans – Built on RMR (A package for writing MapReduce codes in R)
Consists of functions to perform data clustering using K-Means
12
Mu Sigma Confidential
Thank You
Chicago, IL
Bangalore, India
2013
www.mu-sigma.com
Proprietary Information
"This document and its attachments are confidential. Any unauthorized copying, disclosure or distribution of the material is strictly prohibited"
zubin.dowlaty@mu-sigma.com
Twitter: @crunchdata

More Related Content

PPTX
Data sciences and marketing analytics
PPTX
Big data insights part i
PDF
The Rise of Big Data and the Chief Data Officer (CDO)
PPTX
SMAC
PPTX
The future of big data analytics
PDF
SuanIct-Bigdata desktop-final
PDF
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
PDF
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data sciences and marketing analytics
Big data insights part i
The Rise of Big Data and the Chief Data Officer (CDO)
SMAC
The future of big data analytics
SuanIct-Bigdata desktop-final
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...

Similar to Problem Definition muAoPS | Analytics Problem Solving | Mu Sigma (20)

PDF
Random notes on big data
PPTX
000 introduction to big data analytics 2021
PPTX
Big data analytics and machine intelligence v5.0
PDF
Big Data - A Real Life Revolution
PPTX
1 big datacampdell2013
PPTX
1 big datacampdell2013
PPTX
1 big datacampdell2013
PPTX
Big Data & Business Analytics: Understanding the Marketspace
PDF
Big Data, Little Data, and Everything in Between
PDF
Impacto del Big Data en la empresa española
PPTX
1 big datacampdell2013
PDF
Data-Ed Webinar: Demystifying Big Data
PDF
Data-Ed: Demystifying Big Data
PPTX
Big Data World
PDF
Random notes on big data
PDF
Mighty Guides Data Disruption
PDF
Mighty Guides Data Disruption
PPTX
Big Data Analytics
PDF
Ictam big data
PDF
Big Analytics: Building Lasting Value
Random notes on big data
000 introduction to big data analytics 2021
Big data analytics and machine intelligence v5.0
Big Data - A Real Life Revolution
1 big datacampdell2013
1 big datacampdell2013
1 big datacampdell2013
Big Data & Business Analytics: Understanding the Marketspace
Big Data, Little Data, and Everything in Between
Impacto del Big Data en la empresa española
1 big datacampdell2013
Data-Ed Webinar: Demystifying Big Data
Data-Ed: Demystifying Big Data
Big Data World
Random notes on big data
Mighty Guides Data Disruption
Mighty Guides Data Disruption
Big Data Analytics
Ictam big data
Big Analytics: Building Lasting Value
Ad

Recently uploaded (20)

PDF
Nidhal Samdaie CV - International Business Consultant
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PPT
340036916-American-Literature-Literary-Period-Overview.ppt
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PDF
Types of control:Qualitative vs Quantitative
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PDF
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
DOCX
Business Management - unit 1 and 2
PPTX
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
DOCX
Euro SEO Services 1st 3 General Updates.docx
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
Nidhal Samdaie CV - International Business Consultant
Belch_12e_PPT_Ch18_Accessible_university.pptx
ICG2025_ICG 6th steering committee 30-8-24.pptx
Ôn tập tiếng anh trong kinh doanh nâng cao
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
340036916-American-Literature-Literary-Period-Overview.ppt
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
New Microsoft PowerPoint Presentation - Copy.pptx
Types of control:Qualitative vs Quantitative
unit 1 COST ACCOUNTING AND COST SHEET
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
Business Management - unit 1 and 2
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
Digital Marketing & E-commerce Certificate Glossary.pdf.................
Reconciliation AND MEMORANDUM RECONCILATION
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
Euro SEO Services 1st 3 General Updates.docx
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
Ad

Problem Definition muAoPS | Analytics Problem Solving | Mu Sigma

  • 1. 0 Mu Sigma Confidential Chicago, IL Bangalore, India www.mu-sigma.com Proprietary Information "This document and its attachments are confidential. Any unauthorized copying, disclosure or distribution of the material is strictly forbidden" Chicago, IL Bangalore, India www.mu-sigma.com Proprietary Information "This document and its attachments are confidential. Any unauthorized copying, disclosure or distribution of the material is strictly forbidden" Do The Math HPC User Forum 2013 Trends Discussion [email protected] Twitter: @crunchdata
  • 2. 1 Mu Sigma Confidential The information explosion is leading to an unprecedented ability to store, manage and analyze data; we need to run ahead.. Information Ecosystem ► Text Expanding Applications Data Source Explosion Technology Evolution Advanced Analytical Techniques Digital Data in 2011: ~ 1750 Exabytes Facebook: 100+ million users per day Mobile Phone Subscription: 4.6 billion 15 Million Bing recommendation records RFID Tags: 40 billion I N F O R M A T I O N C L O U D Click-stream data Purchase intent data Social Media, Blogs Email Archives RFID & Geospatial Information Events Opinion Mining Social Graph Targeting Influencer Marketing Offer Optimization Clinical Data Analysis Fraud Detection SKU Rationalization Massively large databases Search Technologies Complex Event Processing Cloud Computing Software as a service Interactive Visualization In Memory Analytics CART Adaptive Regression Splines Machine Learning Radial Basis Functions Support Vector Machines Neural Networks Geospatial Predictive Modeling
  • 3. 2 Mu Sigma Confidential Three Key Macro Analytics Trends Targeting Consumption Macro Trends: Agenda in All Three Analytics Trends Computational Enablement – Scalable Analytics, Big Data & NoSQL technologies, GP-GPU, In – Memory & High Performance Computing – Master Data Management, Cloud, and Federated Data – the end of the central EDW? – Analytics as a Service and the Cloud – Open Source Maturity – Rise of Agile Self Service – Data Model Disruption: ETL vs. ELT Usability and Visualization – Dashboards will Evolve into Cockpits – Improved Interactive Data Visualization & Aesthetics – Augmented / Virtual Reality / Natural User Interfaces – Graph Analytics sparked by Social Data – Discovery with Computational Geometry – Mobile BI and the Integration of Consumer & Enterprise Devices – Gamification: Do you want to play a game? Intelligent Systems (“Anticipation Denotes Intelligence”) – Operational Smart Systems are Back propagating – Pervasive Analytics – Real Time Analytics and Event Streams – Artificial Intelligence (AI) is not what we expected – Don’t forget the 4th tier it’s Juicy – Metaheuristics and the Ants – Operational Analytics, You are the Exception – Experiments in the Enterprise and the Quasi – Just Say No to OLS – Energy Informatics if Radiating – Intelligent Search
  • 4. 3 Mu Sigma Confidential Big Data’s recent enterprise popularity can be attributed to two key factors Big Data Technology Evolution 2003 2004 2005 2006 2007 2008 2009 2010 2011 Apache: Hadoop project Yahoo: 10K core cluster IBM Acquires Early Research Open source dev momentum Initial success stories Commercialization & Adoption Cloudera named most promising startup funded in 2009 elastic map-reduce Teradata buys Aster (map-reduce DB) Google Trigger: Releases “Map Reduce” & “GFS” paper Google: Bigtable paper Lucene subproject HP Acquires Vertica EMC Acquires Greenplum 2012 Oracle MSOFT SAS  Technology Trigger – Robust Open Source Technology for Parallel Computation on Commodity Hardware (Hadoop / NoSQL)  Frustration Trigger.. – Tyranny of the Data Model, Cost & Complexity for data integration, Agility, Impact, Bureaucracy, Difficulty & Costly to scale of existing EDW technology Faced with storing massive data sets from indexing the web, Google searches for & finds a way to store & retrieve the data in commodity hardware
  • 5. 4 Mu Sigma Confidential A Mindset for Enabling Decision Sciences At Scale Dataset -> Toolset -> Skillset -> Mindset Big Data – What is it?  Skillset – Map-Reduce, HDFS, HIVE, R, PIG, NoSQL, Unstructured Data – Parallel computation & Stream Processing – Data Scientist and Data Explorer – “…big data job postings on Dice more than tripled year-over- year – Jan 2013”  Mindset – Automation and Scale – Higher utilization of machines for decisions – Agile – Learning vs. Knowing – Consumption – Algorithmic and Heuristic (Man & Machine)
  • 6. 5 Mu Sigma Confidential Trend: Dashboards will Evolve to be More Actionable and Forward Looking Dashboredom? http://www.information- management.com/issues/20_7/dashboards_data_management_quality_b usiness_intelligence-10019101-1.html “As we move from the information age into the intelligence age”.. Usability and Visualization
  • 7. 6 Mu Sigma Confidential Open Source: D3 (Data-Driven Documents) http://d3js.org/ 1. Helps to bring data to life using HTML5, SVG and CSS. 2. Its an extension of “Protovis”, a very popular JavaScript library. 3. With Minimal overhead, D3 is extremely fast, supporting large datasets and dynamic behavior for interaction and animation. 4. It provides many built-in reusable functions and function factories, such as graphical primitives for area, line and pie charts. http://d3js.org/ Usability and Visualization
  • 8. 7 Mu Sigma Confidential Open Source:  R is the premier language for data scientists – Strength in data visualizations.. – http://www.r-project.org/ Usability and Visualization
  • 9. 8 Mu Sigma Confidential Yippee ! 5 % discount .. Yeah.. I think I need these headphones too The current business scenario demands automating and injecting intelligence into various enterprise touch points Business Situation Webpage Add to Cart Checkout Yearly Bonus  Let me buy a new laptop Taxonomy, Associations, Nearest Neighbor Popup Optimization, Need State Personalization Collaborative Filtering Recommender Systems Offer Optimization Behind the Scenes Email Marketing Mix CRM Revenue Management Hey ! I have some good deals being provided here Wow ! Great offers .. Since I have some more money, let me have a look Customer Segmentation Propensity Score Attribution Methods to Optimize Media Spend Pricing Elasticity Add Intelligent Analytics Everywhere You Can to Unlock The Value In Information Search Buy laptop Intelligent Systems
  • 10. 9 Mu Sigma Confidential Current News on Intelligent Systems CEP Vendors Acquired June 2013: Streambase (Tibco) and Progress Apama (Software AG) USA Today, September 16, 2012 – http://www.usatoday.com/money/business/story/2012/09/16/review-automate-this- probes-math-in-global-trading/57782600/1  Two Second Advantage: Published 2011 – Excellent portrayal of Intelligent Systems, patterns from the human brains – mental models – Wayne Gretzky: Edge: He predicted a little better than anyone else, where the hockey puck would be. – Enterprise 2.0: Every transaction became a bit of digital data – Enterprise 3.0: Every EVENT can become a bit of digital data – There will always be value in making projections, days, months, years, predictive analytics, data mining, and analytics systems on historical data – batch systems – But in today’s world, enterprises need something more, instantaneous, proactive, predictive capability – real time systems  GE Mind an Machines: November 2012 » Industrial Internet, Intelligent Systems, and Intelligent Machines » http://www.ge.com/docs/chapters/Industrial_Internet.pdf » http://www.youtube.com/watch?v=SvI3Pmv-DhE Intelligent Systems
  • 11. 10 Mu Sigma Confidential Decision Support Stack for Operationalizing Analytics Stack Diagram Platforms Assets/ Products Decision Sciences muESP™ muBuzz™ muFlow™ muMix™ muPDNA™ muRx™ Data Sciences muXo™ muHPC™ muText™ Math + Business + Technology Technology Math + Technology Mu Sigma Future State Reference Analytics Design
  • 12. 11 Mu Sigma Confidential Mu Sigma’s I&D Big Data team has developed MapReduce implementations of frequently utilized algorithms in R muHPC™ - Packages muHPC™ Packages - Map Reduce implementations of frequently utilized statistical algorithms in R muEDA – Built on RHIVE (A package for executing HIVE queries through R) Consists of functions to perform frequency analysis, univariate analysis and other EDA techniques muGLM – Built on RMR (A package for writing MapReduce codes in R) Consists of functions to fit linear and generalized linear models muKMeans – Built on RMR (A package for writing MapReduce codes in R) Consists of functions to perform data clustering using K-Means
  • 13. 12 Mu Sigma Confidential Thank You Chicago, IL Bangalore, India 2013 www.mu-sigma.com Proprietary Information "This document and its attachments are confidential. Any unauthorized copying, disclosure or distribution of the material is strictly prohibited" [email protected] Twitter: @crunchdata