1© Cloudera, Inc. All rights reserved.
Understanding Your Data
Journey
Dave Shuman
Industry Leader: Retail, Manufacturing, & IoT
Larkin Kay
Industry Leader: Communication
2© Cloudera, Inc. All rights reserved.
Data is abundant
…and cheap.
Keep all data online
as long as needed.
3© Cloudera, Inc. All rights reserved.
Computation
is affordable.
Ask bigger questions
as fast as you can.
4© Cloudera, Inc. All rights reserved.
60%
50%
By 2017,
of big data projects will fail
to go beyond the pilot phase.
or fewer organizations will have made the
cultural or business model adjustments
to benefit from big data.
Gartner “Predicts 2015: Big Data Challenges Move From Technology to the Organization” – November 2014
5© Cloudera, Inc. All rights reserved.
Get the right people.
Ratify your process.
Adopt modern technology.
6© Cloudera, Inc. All rights reserved.
ANALYTICS
DATA
MANAGEMENT
INFRASTRUCTURE
Big Data
Management
Traditional BI and Analytics Big Data Analytics
Assemble the right team
Tightly aligned, Mix of experts and innovators
DATA SCIENTISTS,
APPS DEVELOPERS,
ANALYSTS
DATA ENGINEERS
ARCHITECTS
7© Cloudera, Inc. All rights reserved.
Big Data Functional Areas of Responsibility
Executive
Data Science
Data
Engineering
Architecture
Development and Insights
Foundation and Strategy
Run and Support
Vision and Goals
8© Cloudera, Inc. All rights reserved.
A traditional BI and analytics organization consists of three main
components.
Analytics
▪ Business-oriented teams that use data models and data analysis tools
to develop reports, find insights – often using samples of data
Data
Management
▪ Data modellers that take requests from business users, find data to
satisfy those requests, develop models to answer the users
questions, and load those models into a data warehouse
Infrastructure
▪ Hardware and software specialists responsible for network, storage,
server, and software components needed to satisfy the analytics
needs of the organization
Description
Staff for Success
9© Cloudera, Inc. All rights reserved.
Analytics
Data
Management
Infrastructure
In the Big Data world, the Data Engineering team becomes
strategic since data is your most important asset and can be
transformed and used many different ways
Big Data
Management
Architects
Data Scientists
Data Engineers
It is critical
that these
three roles
be tightly
aligned
Description
Staff for Success
10© Cloudera, Inc. All rights reserved.
Curiosity
Math &
Statistical
Knowledge
Hacking
skills
Subject
Matter
Expertise
The hybrid data scientist
• Subject Matter Expertise lies
in the business
• Hacking skills can come from
existing IT staff or new hires
• Staff at least one true Ph.D
statistician for model
oversight across all teams
Important character trait
Data Science
A luxury is finding one or more
data scientists that cross these
disciplines
Your Data Scientist Team(s)
11© Cloudera, Inc. All rights reserved.
Often a centralized Data Science team can partner with the
business to identify data that differentiates, explore use cases to
solve, and help to jumpstart business teams. Be mindful not to
overbuild centrally.
Agility ▪ The team must be able to learn quickly and adapt
Skills
▪ Hybrid skills of computer science (hacking), domain expertise and
at least one true statistician. Data Science training.
Teams
▪ Often businesses find the domain expertise in-house, add in MS/Ph.D.
candidates from local universities and hire that one true statistician
Experts
▪ This team must be the “data experts” for the entire company in
order to fulfil the vision of sharing data for maximum innovation
Description
Staff for Success: Data Science-as-a-Service
12© Cloudera, Inc. All rights reserved.
Training is one of the biggest keys to success. The opportunity
to be trained attracts talent and shapes careers. Leverage on-
site training for team-building and skills development.
Description
Data Scientist
▪ Data Science at Scale using Spark and Hadoop (3 days)
▪ CCP: Data Scientist (Certification)
Analyst
Data Engineer,
DevOps
▪ Developer Training for Spark and Hadoop (4 days)
▪ CCP Data Engineer (Certification)
Administrator ▪ Admin Training for Apache Hadoop (4 days)
▪ Cloudera Certified Administrator for Apache Hadoop (Certification)
▪ Data Analyst Training (4 days)
Staff for Success: Training is essential
13© Cloudera, Inc. All rights reserved.
Get the right people.
Ratify your process.
Adopt modern technology.
14© Cloudera, Inc. All rights reserved.
Adopt an agile approach
Successful projects start small,
and iterate to success
Get Data
Explore
and Analyze
Deploy
1. Get data you already have, or
create new data.
2. Explore and analyze, quickly.
3. Deploy your application.
…and repeat. Add:
More data, more users, more use cases, more
complex analytics; go real-time!
15© Cloudera, Inc. All rights reserved.
Explore,
Enrich, Analyze
Operationalize
Collect,
Create
More ways to
serve your insights
New data sources to
your data asset
More complex
analysis
Add …
… over time
The Data Journey: Iterate to Success
16© Cloudera, Inc. All rights reserved.
Site clickstream
Unique visitors,
enter/exit URLs
CRM transactions
Service trends
Reports,
email outreach
Dashboards,
page design
Explore,
Enrich,
Analyze
Operationalize
Collect,
Create
Explore,
Enrich,
Analyze
Operationalize
Collect,
Create
Single Data Set Analysis
17© Cloudera, Inc. All rights reserved.
Clickstream + CRM
Cross-dataset identity matching,
path analysis of potential issues
Enriched profiles,
issue identification
Explore,
Enrich, Analyze
Operationalize
Collect,
Create
Multi-Data Set Analysis
18© Cloudera, Inc. All rights reserved.
Preference matching,
predictive offer creation
Enriched profile + inventory
Reports,
Page design,
Email outreach,
Recommendation engines
Explore,
Enrich, Analyze
Operationalize
Collect,
Create
Predictive Modeling
19© Cloudera, Inc. All rights reserved.
“Companies that don’t continue to experiment,
companies that don’t embrace failure,
they eventually get in a desperate position where the only thing they can
do is a Hail Mary bet at the very end of their corporate existence.
Whereas companies that are making bets all along,
even big bets, but not bet-the-company bets, prevail.”
Jeff Bezos
Business Insider Interview
December 2014
20© Cloudera, Inc. All rights reserved.
Lower risk
▪ Risk of funding long-running projects with limited business value is
small. Use daily results to improve the process or change course.
Lower costs
▪ Can run infrastructure, data and insights workstreams in parallel.
Avoids large build-out of infrastructure and data before insights.
Communication
▪ With clear short-term results, enables a continuous communications
stream showcasing results or failures
Team
▪ Can start with small team, and add additional scrum teams as value is
determined and investment is available
Agile methodology provides actionable results more rapidly and
measures the value gained at each step, in small iterations. Agile
should be applied to data and insights project workstreams.
Description
Leverage Agile Methodology to Reduce Risk
21© Cloudera, Inc. All rights reserved.
Agile Approach: Quick, Iterative and Necessary
• Individuals and interactions (over Processes and tools): self-organization and
motivation are important, as are interactions like co-location (physical and
virtual) and teaming people to create a data scientist (each person brings 2 out of
3 key skills- Comp Science, Math, Business knowledge).
• Working software (over Comprehensive documentation): live software, data,
schemas, etc is more useful and welcome than documents in all meetings.
• Customer collaboration (over Contract negotiation): requirements cannot be fully
collected at the beginning of the Use case development cycle, therefore
continuous business user involvement is vital.
• Responding to change (over Following a plan): agile methods are focused on
quick responses to change and continuous development.
22© Cloudera, Inc. All rights reserved.
Building a Big Data Culture
23© Cloudera, Inc. All rights reserved.
Description
An essential key to success is having a strong Executive Sponsor for the
overall Big Data mission including advocacy for creating/collecting data
and Business Stakeholders for individual use cases.
Profile
▪ An executive focused on change, and willing to take risk to ensure the
success of the business via the Big Data initiatives.
Education
▪ Use every opportunity to bring the topic in front of potential
sponsors and stakeholders. Share industry and business potential
ROI models (heeding the warning not to overstate).
Advocacy
▪ Build big data success stories from within the business. Advocate for
the use of data in new ways. Support the proactive collection of data
and lead the charge to assign value to data.
The Important Role of the Executive Sponsor
24© Cloudera, Inc. All rights reserved.
Description
Use different vehicles and forms to enable collaboration
Meetups
▪ Bringing together the larger big data across the company to share
interests, learnings, and wins.
▪ Team led
Big Data Days
▪ Transfer of information through executive led thought leadership.
▪ Include experts from across the business units, vendors, partners.
▪ Cross-domain focussed.
Hackathons
▪ Allow developers to build new applications designed to boost
business.
Communications and Collaboration
25© Cloudera, Inc. All rights reserved.
Get the right people.
Ratify your process.
Adopt modern technology.
26© Cloudera, Inc. All rights reserved.
How an EDH is Different than Traditional Approaches
1. Economically feasible to store more data
2. Powered to predictably process large data sets
3. Ability to build your data asset at linear scale
1. Collect data in native format – enables agility
2. Build history of activity by collecting data prior to its use
3. You can have near real-time access to data, plus a view of history
4. Security at the data layer increases flexibility and ability to protect privacy
5. Create community data and drive innovation by sharing across your business
Extreme performance
and efficiency
Analytic agility
27© Cloudera, Inc. All rights reserved.
Start at Cloudera University
On-site training OnDemand training Classroom Virtual Classroom
www.university.cloudera.com

Becoming Data-Driven Through Cultural Change

  • 1.
    1© Cloudera, Inc.All rights reserved. Understanding Your Data Journey Dave Shuman Industry Leader: Retail, Manufacturing, & IoT Larkin Kay Industry Leader: Communication
  • 2.
    2© Cloudera, Inc.All rights reserved. Data is abundant …and cheap. Keep all data online as long as needed.
  • 3.
    3© Cloudera, Inc.All rights reserved. Computation is affordable. Ask bigger questions as fast as you can.
  • 4.
    4© Cloudera, Inc.All rights reserved. 60% 50% By 2017, of big data projects will fail to go beyond the pilot phase. or fewer organizations will have made the cultural or business model adjustments to benefit from big data. Gartner “Predicts 2015: Big Data Challenges Move From Technology to the Organization” – November 2014
  • 5.
    5© Cloudera, Inc.All rights reserved. Get the right people. Ratify your process. Adopt modern technology.
  • 6.
    6© Cloudera, Inc.All rights reserved. ANALYTICS DATA MANAGEMENT INFRASTRUCTURE Big Data Management Traditional BI and Analytics Big Data Analytics Assemble the right team Tightly aligned, Mix of experts and innovators DATA SCIENTISTS, APPS DEVELOPERS, ANALYSTS DATA ENGINEERS ARCHITECTS
  • 7.
    7© Cloudera, Inc.All rights reserved. Big Data Functional Areas of Responsibility Executive Data Science Data Engineering Architecture Development and Insights Foundation and Strategy Run and Support Vision and Goals
  • 8.
    8© Cloudera, Inc.All rights reserved. A traditional BI and analytics organization consists of three main components. Analytics ▪ Business-oriented teams that use data models and data analysis tools to develop reports, find insights – often using samples of data Data Management ▪ Data modellers that take requests from business users, find data to satisfy those requests, develop models to answer the users questions, and load those models into a data warehouse Infrastructure ▪ Hardware and software specialists responsible for network, storage, server, and software components needed to satisfy the analytics needs of the organization Description Staff for Success
  • 9.
    9© Cloudera, Inc.All rights reserved. Analytics Data Management Infrastructure In the Big Data world, the Data Engineering team becomes strategic since data is your most important asset and can be transformed and used many different ways Big Data Management Architects Data Scientists Data Engineers It is critical that these three roles be tightly aligned Description Staff for Success
  • 10.
    10© Cloudera, Inc.All rights reserved. Curiosity Math & Statistical Knowledge Hacking skills Subject Matter Expertise The hybrid data scientist • Subject Matter Expertise lies in the business • Hacking skills can come from existing IT staff or new hires • Staff at least one true Ph.D statistician for model oversight across all teams Important character trait Data Science A luxury is finding one or more data scientists that cross these disciplines Your Data Scientist Team(s)
  • 11.
    11© Cloudera, Inc.All rights reserved. Often a centralized Data Science team can partner with the business to identify data that differentiates, explore use cases to solve, and help to jumpstart business teams. Be mindful not to overbuild centrally. Agility ▪ The team must be able to learn quickly and adapt Skills ▪ Hybrid skills of computer science (hacking), domain expertise and at least one true statistician. Data Science training. Teams ▪ Often businesses find the domain expertise in-house, add in MS/Ph.D. candidates from local universities and hire that one true statistician Experts ▪ This team must be the “data experts” for the entire company in order to fulfil the vision of sharing data for maximum innovation Description Staff for Success: Data Science-as-a-Service
  • 12.
    12© Cloudera, Inc.All rights reserved. Training is one of the biggest keys to success. The opportunity to be trained attracts talent and shapes careers. Leverage on- site training for team-building and skills development. Description Data Scientist ▪ Data Science at Scale using Spark and Hadoop (3 days) ▪ CCP: Data Scientist (Certification) Analyst Data Engineer, DevOps ▪ Developer Training for Spark and Hadoop (4 days) ▪ CCP Data Engineer (Certification) Administrator ▪ Admin Training for Apache Hadoop (4 days) ▪ Cloudera Certified Administrator for Apache Hadoop (Certification) ▪ Data Analyst Training (4 days) Staff for Success: Training is essential
  • 13.
    13© Cloudera, Inc.All rights reserved. Get the right people. Ratify your process. Adopt modern technology.
  • 14.
    14© Cloudera, Inc.All rights reserved. Adopt an agile approach Successful projects start small, and iterate to success Get Data Explore and Analyze Deploy 1. Get data you already have, or create new data. 2. Explore and analyze, quickly. 3. Deploy your application. …and repeat. Add: More data, more users, more use cases, more complex analytics; go real-time!
  • 15.
    15© Cloudera, Inc.All rights reserved. Explore, Enrich, Analyze Operationalize Collect, Create More ways to serve your insights New data sources to your data asset More complex analysis Add … … over time The Data Journey: Iterate to Success
  • 16.
    16© Cloudera, Inc.All rights reserved. Site clickstream Unique visitors, enter/exit URLs CRM transactions Service trends Reports, email outreach Dashboards, page design Explore, Enrich, Analyze Operationalize Collect, Create Explore, Enrich, Analyze Operationalize Collect, Create Single Data Set Analysis
  • 17.
    17© Cloudera, Inc.All rights reserved. Clickstream + CRM Cross-dataset identity matching, path analysis of potential issues Enriched profiles, issue identification Explore, Enrich, Analyze Operationalize Collect, Create Multi-Data Set Analysis
  • 18.
    18© Cloudera, Inc.All rights reserved. Preference matching, predictive offer creation Enriched profile + inventory Reports, Page design, Email outreach, Recommendation engines Explore, Enrich, Analyze Operationalize Collect, Create Predictive Modeling
  • 19.
    19© Cloudera, Inc.All rights reserved. “Companies that don’t continue to experiment, companies that don’t embrace failure, they eventually get in a desperate position where the only thing they can do is a Hail Mary bet at the very end of their corporate existence. Whereas companies that are making bets all along, even big bets, but not bet-the-company bets, prevail.” Jeff Bezos Business Insider Interview December 2014
  • 20.
    20© Cloudera, Inc.All rights reserved. Lower risk ▪ Risk of funding long-running projects with limited business value is small. Use daily results to improve the process or change course. Lower costs ▪ Can run infrastructure, data and insights workstreams in parallel. Avoids large build-out of infrastructure and data before insights. Communication ▪ With clear short-term results, enables a continuous communications stream showcasing results or failures Team ▪ Can start with small team, and add additional scrum teams as value is determined and investment is available Agile methodology provides actionable results more rapidly and measures the value gained at each step, in small iterations. Agile should be applied to data and insights project workstreams. Description Leverage Agile Methodology to Reduce Risk
  • 21.
    21© Cloudera, Inc.All rights reserved. Agile Approach: Quick, Iterative and Necessary • Individuals and interactions (over Processes and tools): self-organization and motivation are important, as are interactions like co-location (physical and virtual) and teaming people to create a data scientist (each person brings 2 out of 3 key skills- Comp Science, Math, Business knowledge). • Working software (over Comprehensive documentation): live software, data, schemas, etc is more useful and welcome than documents in all meetings. • Customer collaboration (over Contract negotiation): requirements cannot be fully collected at the beginning of the Use case development cycle, therefore continuous business user involvement is vital. • Responding to change (over Following a plan): agile methods are focused on quick responses to change and continuous development.
  • 22.
    22© Cloudera, Inc.All rights reserved. Building a Big Data Culture
  • 23.
    23© Cloudera, Inc.All rights reserved. Description An essential key to success is having a strong Executive Sponsor for the overall Big Data mission including advocacy for creating/collecting data and Business Stakeholders for individual use cases. Profile ▪ An executive focused on change, and willing to take risk to ensure the success of the business via the Big Data initiatives. Education ▪ Use every opportunity to bring the topic in front of potential sponsors and stakeholders. Share industry and business potential ROI models (heeding the warning not to overstate). Advocacy ▪ Build big data success stories from within the business. Advocate for the use of data in new ways. Support the proactive collection of data and lead the charge to assign value to data. The Important Role of the Executive Sponsor
  • 24.
    24© Cloudera, Inc.All rights reserved. Description Use different vehicles and forms to enable collaboration Meetups ▪ Bringing together the larger big data across the company to share interests, learnings, and wins. ▪ Team led Big Data Days ▪ Transfer of information through executive led thought leadership. ▪ Include experts from across the business units, vendors, partners. ▪ Cross-domain focussed. Hackathons ▪ Allow developers to build new applications designed to boost business. Communications and Collaboration
  • 25.
    25© Cloudera, Inc.All rights reserved. Get the right people. Ratify your process. Adopt modern technology.
  • 26.
    26© Cloudera, Inc.All rights reserved. How an EDH is Different than Traditional Approaches 1. Economically feasible to store more data 2. Powered to predictably process large data sets 3. Ability to build your data asset at linear scale 1. Collect data in native format – enables agility 2. Build history of activity by collecting data prior to its use 3. You can have near real-time access to data, plus a view of history 4. Security at the data layer increases flexibility and ability to protect privacy 5. Create community data and drive innovation by sharing across your business Extreme performance and efficiency Analytic agility
  • 27.
    27© Cloudera, Inc.All rights reserved. Start at Cloudera University On-site training OnDemand training Classroom Virtual Classroom www.university.cloudera.com