1© Cloudera, Inc. All rights reserved.
Optimize your cloud strategy
for machine learning and analytics
Mike Olson
CSO co-founder, Cloudera
James Curtis
Senior analyst, 451 Research
2© Cloudera, Inc. All rights reserved.
Optimizing Cloud Strategy for ML & Analytics
James Curtis, Senior Analyst, Data Platforms & Analytics
3© Cloudera, Inc. All rights reserved.
3
451 Research is a leading IT research & advisory company
Founded in 2000
300+ employees, including over 120 analysts
2,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
70,000+ IT professionals, business users and consumers in our research community
Over 52 million data points published each quarter and 4,500+ reports published
each year
3,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions of The
451 Group
Headquartered in New York City, with offices in London, Boston, San Francisco,
Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore
and Malaysia
Research & Data
Advisory
Events
Go 2 Market
4© Cloudera, Inc. All rights reserved.
4
“Every morning in Africa, an antelope wakes up. It knows it must
outrun the fastest lion, or it will be killed. Every morning in Africa,
a lion wakes up. It knows it must run faster than the antelope, or it
will starve....”
5© Cloudera, Inc. All rights reserved.
5
“…It doesn’t matter if you’re a lion or an antelope—when the sun
comes up, you’d better be running.”
6© Cloudera, Inc. All rights reserved.
6
Approximately what percent of your workloads are deployed
in the following environment today? In 2 years?
Source: 451 Research, Voice of the Enterprise:
Workloads and Key Projects, Cloud
Transformation, 2017.
7© Cloudera, Inc. All rights reserved.
7
Thinking of all applications your organization runs, what
percentage run in which environments? In 2 years?
Key Points
 Cloud deployments will be the
dominant environment in every
category
 Every cloud deployment
environment will see increases
in every workload category
 Analytics and App Development
Development areas expected
strong gains
Source: 451 Research, Voice of the Enterprise:
Workloads and Key Projects, Cloud
Transformation, 2017.
8© Cloudera, Inc. All rights reserved.
8
8
Drivers for Cloud Adoption
DATA GRAVITY
TRANSFORMATIONAL
CHANGE
IT REJUVENATION
FLEXIBILITY
COST AVOIDANCE
9© Cloudera, Inc. All rights reserved.
Challenges for Cloud Adoption
9
COST
PEOPLE AND PROCESS
CHANGE
PERFORMANCE
LIABILITY
SECURITY ISSUES
(PERCEIVED AND REAL)
10© Cloudera, Inc. All rights reserved.
10
Big Data Cloud Segmentation
USER
CSP/VENDOR
Responsibility
Low
High
IaaS PaaS SaaS
11© Cloudera, Inc. All rights reserved.
11
Big Data Cloud Segmentation
USER
CSP/VENDOR
Responsibility
Low
High
IaaS PaaS SaaS
Depending on the segment, the
responsibilities born by the
user and CSP/vendor vary
considerably.
12© Cloudera, Inc. All rights reserved.
12
Big Data on Infrastructure-as-a-Service
USER
CSP/VENDOR
IaaS PaaS SaaS
 Provides
cloud
infrastructure
 Configures
environment
 Selects
resources
 Develops
jobs
 Adopts
financial
risk
IaaS
DEPLOYMENT OPTIONS
 Manually deployed
 Marketplace image
 On bare metal
WHEN IT MAKES SENSE
 Control over the
environment is required
 Customized or specialized
use cases
 Procurement is a barrier
13© Cloudera, Inc. All rights reserved.
13
Big Data on Infrastructure-as-a-Service + PLUS
USER
CSP/VENDOR
IaaS PaaS SaaS
 Automated tools
config/resources
 Provides cloud
infrastructure
 Aided/Configures
environment
 Aided/Selects
resources
 Develops
jobs
 Adopts
financial
risk
IaaS PLUS
DEPLOYMENT OPTIONS
 CSP/vendor-specific
tools for deploying and
configuring the
environment
WHEN IT MAKES SENSE
 Control over the
environment is required
 Customized or specialized
use cases
 Procurement is a barrier
14© Cloudera, Inc. All rights reserved.
14
Big Data-as-a-Service
USER
CSP/VENDOR
IaaS PaaS SaaS
 Automated tools
config/resources
 Provides cloud
infrastructure
 Aided/Configures
environment
 Aided/Selects
resources
 Develops
jobs
 Adopts
financial
risk
IaaS PLUS
AVAILABILITY
 By CSP (single cloud)
 By vendor that leverages
cloud infrastructure on
behalf of user
 Configures
environment
 Develops
jobs
 Adopts
financial
risk
 Provides
cloud
infrastructure
 Masks
complexity
-as-a-Service
WHEN IT MAKES SENSE
 Full control not required
 Limited resources or don’t
want responsibility for
certain tasks
 Alignment with a service
provider
15© Cloudera, Inc. All rights reserved.
15
Managed Big Data Services
USER
CSP/VENDOR
IaaS PaaS SaaS
 Automated tools
config/resources
 Provides cloud
infrastructure
 Aided/Configures
environment
 Aided/Selects
resources
 Develops
jobs
 Adopts
financial
risk
IaaS PLUS
 Configures
environment
 Develops
jobs
 Adopts
financial
risk
 Provides
cloud
infrastructure
 Masks
complexity
-as-a-Service
 Develops
jobs and
manages
workloads
 Provides
cloud
infrastructure
 Masks
complexity
 Configures
 Adopts
financial risk
Managed Service
CONSIDERATIONS
 Processing engines,
features, and capabilities
can vary
 Professional services
optional
 Pricing varies
WHEN IT MAKES SENSE
 Focus is on the job instead
of infrastructure
 The ‘managed’ services
serve the organization
 Resources are not available
or organization not willing
to invest is in-house skills
16© Cloudera, Inc. All rights reserved.
16
Managed Big Data Services
USER
CSP/VENDOR
IaaS PaaS SaaS
 Automated tools
config/resources
 Provides cloud
infrastructure
 Aided/Configures
environment
 Aided/Selects
resources
 Develops
jobs
 Adopts
financial
risk
IaaS PLUS
 Configures
environment
 Develops
jobs
 Adopts
financial
risk
 Provides
cloud
infrastructure
 Masks
complexity
 Develops
jobs and
manages
workloads
 Perform data
processing
and analysis
 Provides
cloud
infrastructure
 Masks
complexity
 Configures
 Adopts
financial risk
 Provides
cloud
infrastructure
 Masks
complexity
 Configures
environment
 Adopts
financial risk
-as-a-Service Managed Service Managed Proc. WHERE ARE WE HEADED?
 Processing frameworks,
engines, databases do not
matter to organizations
 Automation, advanced
methods leveraging machine
learning will be integrated
 Focus on an ‘outcome’
desired by the organization
 User base can be expanded
as complexity is abstracted
out of the system
17© Cloudera, Inc. All rights reserved.
17
Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud
service?
Big data analytics in the cloud
Vs
18© Cloudera, Inc. All rights reserved.
18
Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a
cloud service?
Big data analytics in the cloud
Cost
• It costs significantly less to store data in S3 (to use AWS as an
example) than HDFS running on EC2
• HDFS requires storing three copies of each block of data for
resiliency
• S3 offers automated backups and file compression
• Users only pay for the compute resources they consume as
and when they analyze the data.
19© Cloudera, Inc. All rights reserved.
19
Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a
cloud service?
Big data analytics in the cloud
Scalability
• HDFS relies on local storage
• HDFS in the cloud requires manual configuration and
management of associated storage.
• Cloud storage is designed to automatically scale as more data
is added, without any direct user involvement.
20© Cloudera, Inc. All rights reserved.
20
Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a
cloud service?
Big data analytics in the cloud
Durability and persistence
• Data is persisted in EC2 storage instances only for the life of the
instance itself, whereas data is always persisted in S3.
• If you’re running HDFS on EC2, it’s highly likely that you’ll be
storing the data in a persistent data store like S3 anyway and
moving it to and fro for the purposes of analysis.
• S3 is also designed to deliver durability of 99.999999999%,
which would be hard for even the most highly skilled Hadoop
administrator to match.
21© Cloudera, Inc. All rights reserved.
21
ARTIFICIAL INTELLIGENCE
The quest to build software running on
machines that can ‘think’ and act like
humans
MACHINE LEARNING
A subset of artificial intelligence focused on
using algorithms that learn and improve
without being explicitly programmed to do
so
DEEP LEARNING
A branch of machine learning based on
specific set of algorithms that attempt to
mimic the human brain in the form of multi-
layered neural networks
22© Cloudera, Inc. All rights reserved.
ML and the Cloud
Success in ML depends on a
combination of data, algorithms, skills
and compute resources.
While the public cloud is by no means
essential for ML, low-cost storage and
compute services enable storing and
processing data at larger volumes.
Deep learning – which typically
involves modeling many layers of
neural networks and, thus, is highly
resource-intensive – particularly
benefits from recent computing
advancements and increasing comfort
levels with the cloud. 22
23© Cloudera, Inc. All rights reserved.
• Organizations succeed when they match their cloud requirements with their
resources in selecting their cloud deployment preferences, thus enabling the
strengths of the user and CSP/vendor.
• The journey to the cloud requires significant planning, but the goal remains the
same and that is to better manage and leverage the data. The cloud is a means
to that end.
• Carrying out analytics in the cloud works best when organizations utilize the
infrastructure advantages of the cloud such as the ability to scale and secure
large amounts of compute and storage, especially for ML.
Some final thoughts to consider
23
24© Cloudera, Inc. All rights reserved.
24
Thank you
james.curtis@451research.com
@jmscrts
www.451research.com
25© Cloudera, Inc. All rights reserved.
Cloudera & The Cloud
Mike Olson
CSO co-founder, Cloudera
26© Cloudera, Inc. All rights reserved.
My organization
is moving to the cloud,
why should we
consider Cloudera?
27© Cloudera, Inc. All rights reserved.
Current State
28© Cloudera, Inc. All rights reserved.
Instant, self-service access to
data and IT resources
Application performance
Job-oriented tools
Choice and integration
Secure, controlled provisioning
of data and IT resources
Predictable infrastructure costs
Systems-oriented tools
Standardization and portability
KNOWLEDGE WORKERS INFRASTRUCTURE
TEAM
Stakeholders
29© Cloudera, Inc. All rights reserved.
–+
• Speed of deployment
• Tenant isolation
• Self-service
• Workload elasticity
• Shared storage
• Pay-as-you-go
• Bring your own tools
• Bring your own data
• Powerful network
• Proliferation of data copies
• Multiple security frameworks
• Difficult to troubleshoot workloads
• No shared metadata
• Unable to track data lineage
• Disjointed services
• Few on-premise integration services
• Proprietary services
• Cloud lock-in
CLOUD
BENEFITS
CLOUD
SETBACK
S
30© Cloudera, Inc. All rights reserved.
Deployment Options
ON-
PREMISE
CLOUD
INFRASTRUCTURE SERVICES
PRIVATE CLOUDBARE METAL
31© Cloudera, Inc. All rights reserved.
Deployment model choices
Bare Metal Private Cloud IaaS PaaS
Applications Applications Applications Applications
Clusters Clusters Clusters Clusters
Operating System Operating System Operating System Operating System
Network Network Network Network
Storage Storage Storage Storage
Servers Servers Servers Servers
Customer managed Vendor managed
32© Cloudera, Inc. All rights reserved.
Traditional applications
32
Data
Exploration
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
INGEST &
REPLICATION
DATA CATALOG
SQL & BI
Analytics
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
INGEST & REPLICATION
DATA CATALOG
Operational
Real-Time DB
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
INGEST & REPLICATION
DATA CATALOG
ETL & Data
Processing
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
INGEST &
REPLICATION
DATA CATALOG
Custom
Functions
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
INGEST & REPLICATION
DATA CATALOG
Many data silos, each with its own proprietary tools and infrastructure
Different vendors, products, and services on-premises versus in cloud
A fragmented approach is difficult, expensive, and risky
33© Cloudera, Inc. All rights reserved.
The Answer
34© Cloudera, Inc. All rights reserved.
● The modern platform for machine
learning and analytics
● with multiple deployment options
● and one shared data experience
35© Cloudera, Inc. All rights reserved.
One platform. Multiple workloads
DATA ENGINEERING OPERATIONAL
DATABASE
ANALYTIC DATABASE DATA
SCIENCE
DATA PROCESSING
• Cost efficient
• Reliable
• Scalable
• Based on Spark,
MapReduce, Hive
& Pig
• Supported by
Workload
Analytics
FAST BI & SQL
• Flexibilty
• Elastic scale
• Go beyond SQL
• Based on
Impala & Hive
• SQL dev enviro
• Supported by
Workload
Analytics
MACHINE LEARNING
• Fast dev to
production
• Secure self-serve
• Based on
Python, R, and
Spark
• ML dev
environment
(CDSW)
ONLINE & REAL-TIME
• High throughput,
low latency
• Strongly consistent
• Based on
Hbase, Kudu
& Spark
streaming
36© Cloudera, Inc. All rights reserved.
Multiple deployment options
OPERATIONAL
DATABASE
DATA
SCIENCE
ANALYTIC
DATABASE
DATA
ENGINEERING
DATA
ENGINEERING
ANALYTIC
DATABASE
PRIVATE CLOUD
BARE METAL INFRASTRUCTURE SERVICES
(in beta soon)
37© Cloudera, Inc. All rights reserved.
• Shared catalog
• Unified security
• Consistent governance
• Easy workload management
• Flexible ingest and replication
Open platform services
Built for multi-function analytics | Optimized for cloud
38© Cloudera, Inc. All rights reserved. 38
The modern platform for machine learning and analytics optimized for the cloud
DATA CATALOG
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
EXTENSIBLE
SERVICES
CORE SERVICES
DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA
SCIENCE
S3 ADLS HDFS KUDU
STORAGE
SERVICES
Cloudera Enterprise
PRIVATE CLOUDBARE METAL INFRASTRUCTURE
DEPLOYMENT
OPTIONS SERVICES
39© Cloudera, Inc. All rights reserved.
Run anywhere. Deploy any way.
Simple Unified Enterprise
Proven at scale
Trusted security
Hybrid or multi cloud
Platform-as-a-Service
Simplifies operations
Works with your tools
40© Cloudera, Inc. All rights reserved.
"Better to bet on cloud providers
for infrastructure, Cloudera for data,
compute and security fabric, and
leave the rest to the ecosystem"
--- Sean Owen, Director, Data Science at Cloudera
41© Cloudera, Inc. All rights reserved.
Thank you
Mike Olson
@mikeolson

More Related Content

PPTX
Unlocking data science in the enterprise - with Oracle and Cloudera
PPTX
Customer Experience: A Catalyst for Digital Transformation
PPTX
The Big Picture: Real-time Data is Defining Intelligent Offers
PPTX
Cloudera Fast Forward Labs: Accelerate machine learning
PPTX
Get Started with Cloudera’s Cyber Solution
PPTX
The Five Markers on Your Big Data Journey
PDF
Informatica Becomes Part of the Business Data Lake Ecosystem
PPTX
Modernizing Architecture for a Complete Data Strategy
Unlocking data science in the enterprise - with Oracle and Cloudera
Customer Experience: A Catalyst for Digital Transformation
The Big Picture: Real-time Data is Defining Intelligent Offers
Cloudera Fast Forward Labs: Accelerate machine learning
Get Started with Cloudera’s Cyber Solution
The Five Markers on Your Big Data Journey
Informatica Becomes Part of the Business Data Lake Ecosystem
Modernizing Architecture for a Complete Data Strategy

What's hot (20)

PDF
Traditional BI vs. Business Data Lake – A Comparison
PDF
Best Practices in Implementing Social and Mobile CX for Utilities
PPTX
DataOps: Nine steps to transform your data science impact Strata London May 18
PPTX
Data Science in Enterprise
PPTX
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
PDF
Markerstudy Group Drives Growth and Innovation
PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
PDF
Global Cloud Migration Market (2020 - 2025) - Mordor Intelligence
PPTX
The digital transformation of CPG and manufacturing
PDF
Digital Decisioning for the New Decade - 2020 and Beyond
PDF
5 Pillars of API Management
PDF
Revolution in Business Analytics-Zika Virus Example
PDF
Enabling 360-degree Business Insights with SAP Data
PDF
Transformacion del Negocio Financiero por medio de Tecnologias Cloud
PDF
When you need more data in less time...
PDF
IBM Governed Data Lake
PDF
Data analytics as a service
PDF
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
PPTX
Deliver World Class Customer Experience with Big Data and Analytics
PPTX
Journey to Cloud Analytics
Traditional BI vs. Business Data Lake – A Comparison
Best Practices in Implementing Social and Mobile CX for Utilities
DataOps: Nine steps to transform your data science impact Strata London May 18
Data Science in Enterprise
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Markerstudy Group Drives Growth and Innovation
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Global Cloud Migration Market (2020 - 2025) - Mordor Intelligence
The digital transformation of CPG and manufacturing
Digital Decisioning for the New Decade - 2020 and Beyond
5 Pillars of API Management
Revolution in Business Analytics-Zika Virus Example
Enabling 360-degree Business Insights with SAP Data
Transformacion del Negocio Financiero por medio de Tecnologias Cloud
When you need more data in less time...
IBM Governed Data Lake
Data analytics as a service
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
Deliver World Class Customer Experience with Big Data and Analytics
Journey to Cloud Analytics
Ad

Similar to Optimize your cloud strategy for machine learning and analytics (20)

PPTX
Big data journey to the cloud 5.30.18 asher bartch
PPTX
High-Performance Analytics in the Cloud with Apache Impala
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
A deep dive into running data analytic workloads in the cloud
PPTX
Turning Data into Business Value with a Modern Data Platform
PPTX
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
PPTX
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
PDF
Big Data LDN 2017: Machine Learning, AI & The Future of Data Analytics
PPTX
Cloudera Altus: Big Data in the Cloud Made Easy
PPTX
Intel and Cloudera: Accelerating Enterprise Big Data Success
PDF
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PPTX
Introducing the data science sandbox as a service 8.30.18
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
PDF
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
The 5 Biggest Data Myths in Telco: Exposed
PDF
Gab Genai Cloudera - Going Beyond Traditional Analytic
PPTX
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Big data journey to the cloud 5.30.18 asher bartch
High-Performance Analytics in the Cloud with Apache Impala
Leveraging the cloud for analytics and machine learning 1.29.19
A deep dive into running data analytic workloads in the cloud
Turning Data into Business Value with a Modern Data Platform
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Big Data LDN 2017: Machine Learning, AI & The Future of Data Analytics
Cloudera Altus: Big Data in the Cloud Made Easy
Intel and Cloudera: Accelerating Enterprise Big Data Success
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Introducing the data science sandbox as a service 8.30.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Edc event vienna presentation 1 oct 2019
The 5 Biggest Data Myths in Telco: Exposed
Gab Genai Cloudera - Going Beyond Traditional Analytic
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Cloudera SDX
PPTX
Introducing Workload XM 8.7.18
PPTX
Get started with Cloudera's cyber solution
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Cloudera SDX
Introducing Workload XM 8.7.18
Get started with Cloudera's cyber solution
Spark and Deep Learning Frameworks at Scale 7.19.18

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPT
Geologic Time for studying geology for geologist
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Modernising the Digital Integration Hub
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Hybrid model detection and classification of lung cancer
PDF
CloudStack 4.21: First Look Webinar slides
PDF
August Patch Tuesday
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
The various Industrial Revolutions .pptx
Assigned Numbers - 2025 - Bluetooth® Document
Group 1 Presentation -Planning and Decision Making .pptx
Geologic Time for studying geology for geologist
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Modernising the Digital Integration Hub
A comparative study of natural language inference in Swahili using monolingua...
NewMind AI Weekly Chronicles – August ’25 Week III
Developing a website for English-speaking practice to English as a foreign la...
Hybrid model detection and classification of lung cancer
CloudStack 4.21: First Look Webinar slides
August Patch Tuesday
Getting started with AI Agents and Multi-Agent Systems
Zenith AI: Advanced Artificial Intelligence
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Web Crawler for Trend Tracking Gen Z Insights.pptx
Module 1.ppt Iot fundamentals and Architecture
O2C Customer Invoices to Receipt V15A.pptx
Enhancing emotion recognition model for a student engagement use case through...
The various Industrial Revolutions .pptx

Optimize your cloud strategy for machine learning and analytics

  • 1. 1© Cloudera, Inc. All rights reserved. Optimize your cloud strategy for machine learning and analytics Mike Olson CSO co-founder, Cloudera James Curtis Senior analyst, 451 Research
  • 2. 2© Cloudera, Inc. All rights reserved. Optimizing Cloud Strategy for ML & Analytics James Curtis, Senior Analyst, Data Platforms & Analytics
  • 3. 3© Cloudera, Inc. All rights reserved. 3 451 Research is a leading IT research & advisory company Founded in 2000 300+ employees, including over 120 analysts 2,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 70,000+ IT professionals, business users and consumers in our research community Over 52 million data points published each quarter and 4,500+ reports published each year 3,000+ technology & service providers under coverage 451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia Research & Data Advisory Events Go 2 Market
  • 4. 4© Cloudera, Inc. All rights reserved. 4 “Every morning in Africa, an antelope wakes up. It knows it must outrun the fastest lion, or it will be killed. Every morning in Africa, a lion wakes up. It knows it must run faster than the antelope, or it will starve....”
  • 5. 5© Cloudera, Inc. All rights reserved. 5 “…It doesn’t matter if you’re a lion or an antelope—when the sun comes up, you’d better be running.”
  • 6. 6© Cloudera, Inc. All rights reserved. 6 Approximately what percent of your workloads are deployed in the following environment today? In 2 years? Source: 451 Research, Voice of the Enterprise: Workloads and Key Projects, Cloud Transformation, 2017.
  • 7. 7© Cloudera, Inc. All rights reserved. 7 Thinking of all applications your organization runs, what percentage run in which environments? In 2 years? Key Points  Cloud deployments will be the dominant environment in every category  Every cloud deployment environment will see increases in every workload category  Analytics and App Development Development areas expected strong gains Source: 451 Research, Voice of the Enterprise: Workloads and Key Projects, Cloud Transformation, 2017.
  • 8. 8© Cloudera, Inc. All rights reserved. 8 8 Drivers for Cloud Adoption DATA GRAVITY TRANSFORMATIONAL CHANGE IT REJUVENATION FLEXIBILITY COST AVOIDANCE
  • 9. 9© Cloudera, Inc. All rights reserved. Challenges for Cloud Adoption 9 COST PEOPLE AND PROCESS CHANGE PERFORMANCE LIABILITY SECURITY ISSUES (PERCEIVED AND REAL)
  • 10. 10© Cloudera, Inc. All rights reserved. 10 Big Data Cloud Segmentation USER CSP/VENDOR Responsibility Low High IaaS PaaS SaaS
  • 11. 11© Cloudera, Inc. All rights reserved. 11 Big Data Cloud Segmentation USER CSP/VENDOR Responsibility Low High IaaS PaaS SaaS Depending on the segment, the responsibilities born by the user and CSP/vendor vary considerably.
  • 12. 12© Cloudera, Inc. All rights reserved. 12 Big Data on Infrastructure-as-a-Service USER CSP/VENDOR IaaS PaaS SaaS  Provides cloud infrastructure  Configures environment  Selects resources  Develops jobs  Adopts financial risk IaaS DEPLOYMENT OPTIONS  Manually deployed  Marketplace image  On bare metal WHEN IT MAKES SENSE  Control over the environment is required  Customized or specialized use cases  Procurement is a barrier
  • 13. 13© Cloudera, Inc. All rights reserved. 13 Big Data on Infrastructure-as-a-Service + PLUS USER CSP/VENDOR IaaS PaaS SaaS  Automated tools config/resources  Provides cloud infrastructure  Aided/Configures environment  Aided/Selects resources  Develops jobs  Adopts financial risk IaaS PLUS DEPLOYMENT OPTIONS  CSP/vendor-specific tools for deploying and configuring the environment WHEN IT MAKES SENSE  Control over the environment is required  Customized or specialized use cases  Procurement is a barrier
  • 14. 14© Cloudera, Inc. All rights reserved. 14 Big Data-as-a-Service USER CSP/VENDOR IaaS PaaS SaaS  Automated tools config/resources  Provides cloud infrastructure  Aided/Configures environment  Aided/Selects resources  Develops jobs  Adopts financial risk IaaS PLUS AVAILABILITY  By CSP (single cloud)  By vendor that leverages cloud infrastructure on behalf of user  Configures environment  Develops jobs  Adopts financial risk  Provides cloud infrastructure  Masks complexity -as-a-Service WHEN IT MAKES SENSE  Full control not required  Limited resources or don’t want responsibility for certain tasks  Alignment with a service provider
  • 15. 15© Cloudera, Inc. All rights reserved. 15 Managed Big Data Services USER CSP/VENDOR IaaS PaaS SaaS  Automated tools config/resources  Provides cloud infrastructure  Aided/Configures environment  Aided/Selects resources  Develops jobs  Adopts financial risk IaaS PLUS  Configures environment  Develops jobs  Adopts financial risk  Provides cloud infrastructure  Masks complexity -as-a-Service  Develops jobs and manages workloads  Provides cloud infrastructure  Masks complexity  Configures  Adopts financial risk Managed Service CONSIDERATIONS  Processing engines, features, and capabilities can vary  Professional services optional  Pricing varies WHEN IT MAKES SENSE  Focus is on the job instead of infrastructure  The ‘managed’ services serve the organization  Resources are not available or organization not willing to invest is in-house skills
  • 16. 16© Cloudera, Inc. All rights reserved. 16 Managed Big Data Services USER CSP/VENDOR IaaS PaaS SaaS  Automated tools config/resources  Provides cloud infrastructure  Aided/Configures environment  Aided/Selects resources  Develops jobs  Adopts financial risk IaaS PLUS  Configures environment  Develops jobs  Adopts financial risk  Provides cloud infrastructure  Masks complexity  Develops jobs and manages workloads  Perform data processing and analysis  Provides cloud infrastructure  Masks complexity  Configures  Adopts financial risk  Provides cloud infrastructure  Masks complexity  Configures environment  Adopts financial risk -as-a-Service Managed Service Managed Proc. WHERE ARE WE HEADED?  Processing frameworks, engines, databases do not matter to organizations  Automation, advanced methods leveraging machine learning will be integrated  Focus on an ‘outcome’ desired by the organization  User base can be expanded as complexity is abstracted out of the system
  • 17. 17© Cloudera, Inc. All rights reserved. 17 Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud service? Big data analytics in the cloud Vs
  • 18. 18© Cloudera, Inc. All rights reserved. 18 Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud service? Big data analytics in the cloud Cost • It costs significantly less to store data in S3 (to use AWS as an example) than HDFS running on EC2 • HDFS requires storing three copies of each block of data for resiliency • S3 offers automated backups and file compression • Users only pay for the compute resources they consume as and when they analyze the data.
  • 19. 19© Cloudera, Inc. All rights reserved. 19 Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud service? Big data analytics in the cloud Scalability • HDFS relies on local storage • HDFS in the cloud requires manual configuration and management of associated storage. • Cloud storage is designed to automatically scale as more data is added, without any direct user involvement.
  • 20. 20© Cloudera, Inc. All rights reserved. 20 Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud service? Big data analytics in the cloud Durability and persistence • Data is persisted in EC2 storage instances only for the life of the instance itself, whereas data is always persisted in S3. • If you’re running HDFS on EC2, it’s highly likely that you’ll be storing the data in a persistent data store like S3 anyway and moving it to and fro for the purposes of analysis. • S3 is also designed to deliver durability of 99.999999999%, which would be hard for even the most highly skilled Hadoop administrator to match.
  • 21. 21© Cloudera, Inc. All rights reserved. 21 ARTIFICIAL INTELLIGENCE The quest to build software running on machines that can ‘think’ and act like humans MACHINE LEARNING A subset of artificial intelligence focused on using algorithms that learn and improve without being explicitly programmed to do so DEEP LEARNING A branch of machine learning based on specific set of algorithms that attempt to mimic the human brain in the form of multi- layered neural networks
  • 22. 22© Cloudera, Inc. All rights reserved. ML and the Cloud Success in ML depends on a combination of data, algorithms, skills and compute resources. While the public cloud is by no means essential for ML, low-cost storage and compute services enable storing and processing data at larger volumes. Deep learning – which typically involves modeling many layers of neural networks and, thus, is highly resource-intensive – particularly benefits from recent computing advancements and increasing comfort levels with the cloud. 22
  • 23. 23© Cloudera, Inc. All rights reserved. • Organizations succeed when they match their cloud requirements with their resources in selecting their cloud deployment preferences, thus enabling the strengths of the user and CSP/vendor. • The journey to the cloud requires significant planning, but the goal remains the same and that is to better manage and leverage the data. The cloud is a means to that end. • Carrying out analytics in the cloud works best when organizations utilize the infrastructure advantages of the cloud such as the ability to scale and secure large amounts of compute and storage, especially for ML. Some final thoughts to consider 23
  • 24. 24© Cloudera, Inc. All rights reserved. 24 Thank you [email protected] @jmscrts www.451research.com
  • 25. 25© Cloudera, Inc. All rights reserved. Cloudera & The Cloud Mike Olson CSO co-founder, Cloudera
  • 26. 26© Cloudera, Inc. All rights reserved. My organization is moving to the cloud, why should we consider Cloudera?
  • 27. 27© Cloudera, Inc. All rights reserved. Current State
  • 28. 28© Cloudera, Inc. All rights reserved. Instant, self-service access to data and IT resources Application performance Job-oriented tools Choice and integration Secure, controlled provisioning of data and IT resources Predictable infrastructure costs Systems-oriented tools Standardization and portability KNOWLEDGE WORKERS INFRASTRUCTURE TEAM Stakeholders
  • 29. 29© Cloudera, Inc. All rights reserved. –+ • Speed of deployment • Tenant isolation • Self-service • Workload elasticity • Shared storage • Pay-as-you-go • Bring your own tools • Bring your own data • Powerful network • Proliferation of data copies • Multiple security frameworks • Difficult to troubleshoot workloads • No shared metadata • Unable to track data lineage • Disjointed services • Few on-premise integration services • Proprietary services • Cloud lock-in CLOUD BENEFITS CLOUD SETBACK S
  • 30. 30© Cloudera, Inc. All rights reserved. Deployment Options ON- PREMISE CLOUD INFRASTRUCTURE SERVICES PRIVATE CLOUDBARE METAL
  • 31. 31© Cloudera, Inc. All rights reserved. Deployment model choices Bare Metal Private Cloud IaaS PaaS Applications Applications Applications Applications Clusters Clusters Clusters Clusters Operating System Operating System Operating System Operating System Network Network Network Network Storage Storage Storage Storage Servers Servers Servers Servers Customer managed Vendor managed
  • 32. 32© Cloudera, Inc. All rights reserved. Traditional applications 32 Data Exploration STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG SQL & BI Analytics STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG Operational Real-Time DB STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG ETL & Data Processing STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG Custom Functions STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG Many data silos, each with its own proprietary tools and infrastructure Different vendors, products, and services on-premises versus in cloud A fragmented approach is difficult, expensive, and risky
  • 33. 33© Cloudera, Inc. All rights reserved. The Answer
  • 34. 34© Cloudera, Inc. All rights reserved. ● The modern platform for machine learning and analytics ● with multiple deployment options ● and one shared data experience
  • 35. 35© Cloudera, Inc. All rights reserved. One platform. Multiple workloads DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE DATA PROCESSING • Cost efficient • Reliable • Scalable • Based on Spark, MapReduce, Hive & Pig • Supported by Workload Analytics FAST BI & SQL • Flexibilty • Elastic scale • Go beyond SQL • Based on Impala & Hive • SQL dev enviro • Supported by Workload Analytics MACHINE LEARNING • Fast dev to production • Secure self-serve • Based on Python, R, and Spark • ML dev environment (CDSW) ONLINE & REAL-TIME • High throughput, low latency • Strongly consistent • Based on Hbase, Kudu & Spark streaming
  • 36. 36© Cloudera, Inc. All rights reserved. Multiple deployment options OPERATIONAL DATABASE DATA SCIENCE ANALYTIC DATABASE DATA ENGINEERING DATA ENGINEERING ANALYTIC DATABASE PRIVATE CLOUD BARE METAL INFRASTRUCTURE SERVICES (in beta soon)
  • 37. 37© Cloudera, Inc. All rights reserved. • Shared catalog • Unified security • Consistent governance • Easy workload management • Flexible ingest and replication Open platform services Built for multi-function analytics | Optimized for cloud
  • 38. 38© Cloudera, Inc. All rights reserved. 38 The modern platform for machine learning and analytics optimized for the cloud DATA CATALOG SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE S3 ADLS HDFS KUDU STORAGE SERVICES Cloudera Enterprise PRIVATE CLOUDBARE METAL INFRASTRUCTURE DEPLOYMENT OPTIONS SERVICES
  • 39. 39© Cloudera, Inc. All rights reserved. Run anywhere. Deploy any way. Simple Unified Enterprise Proven at scale Trusted security Hybrid or multi cloud Platform-as-a-Service Simplifies operations Works with your tools
  • 40. 40© Cloudera, Inc. All rights reserved. "Better to bet on cloud providers for infrastructure, Cloudera for data, compute and security fabric, and leave the rest to the ecosystem" --- Sean Owen, Director, Data Science at Cloudera
  • 41. 41© Cloudera, Inc. All rights reserved. Thank you Mike Olson @mikeolson