BIG Data on AWS
Paul Duffy
Characteristics of
Big Data

            How the Cloud Is
            Big Data’s Best Friend


                       Big Data on the Cloud
                       In the Real World
Characteristics of Big Data
The cost of data generation is falling rapidly



 Dramatic increase in volume, velocity and
              variety of data
BIG DATA
A collection of tools, techniques and technologies that
allow you to work productively with data at any scale.
Big Data is Getting Bigger

            2.7 Zetabytes in 2012
             Over 90% will be
            unstructured
             Data spread across a wide
            array of silos
Features driven by MapReduce
Variable data structures and sources
Computer Generated          Human Generated
• Application server logs     • Twitter “Fire Hose” 50m
  (web sites, games)            tweets/day 1,400%
• Sensor data (weather,         growth per year
  water, smart grids)         • Blogs/Reviews/Emails/P
• Images/videos (traffic,       ictures
  security cameras)           • Social Graphs:
                                Facebook, Linked-in,
                                Contacts
The Role of Data
  is Changing
Traditional analytics required a
              fixed data model,
based on pre-known questions




     Big Data promotes data exploration and
     experimentation which leads to innovation
Collection &   Computation    Collaboration
Generation
              storage        & analytics    & sharing
Lower costs,
faster throughput


                    Collection &        Computation         Collaboration
    Generation
                     storage             & analytics         & sharing


                              Increased pressure on traditional IT and tools
Require tools designed for data
 collection and computation at
any volume, velocity or format.
Software
 •   Designed for distribution
 •   Easy programming models
 •   Flexible language choice
 •   Platform for abstraction and ecosystem


 • Good example: Hadoop
Infrastructure
  •   Designed for distribution
  •   Easy programming models
  •   Flexible language choice
  •   Platform for abstraction and ecosystem


  • Good example: Cloud computing
Software




           Infrastructure
How the Cloud Is
Big Data’s Best Friend
How do we define the cloud?
       By Benefits!
No Cap Ex
                                      Pay Per
     Elasticity
                                      Use


                      Cloud
Fast Time to Market           Focus on core
                              competency
Why is the Cloud
Big Data’s Best Friend?
We know we want collect, store, organize, analyze and
share it.

But we have limited resources.
The Cloud Optimizes
Precious IT Resources
i.e. Skilled People
“Over the next decade, the number of files or containers that
encapsulate the information in the digital universe will grow by
75x.
While the pool of IT staff available to manage them will grow
only slightly. At 1.5x”
                                  - 2011 IDC Digital Universe Study
Deploying a Hadoop cluster is hard
Cloud computing


                       30%                       70%

      The Old                            Managing All of the
      IT World    Using Big Data
                                   “Undifferentiated Heavy Lifting”
Cloud computing


                           30%                            70%

      The Old                                   Managing All of the
      IT World        Using Big Data
                                          “Undifferentiated Heavy Lifting”

      Cloud-Based                                               Configuring
     Infrastructure        Analyzing and Using Big Data
                                                                Cloud Assets

                                       70%                          30%
Managed
Reusability
              Services


Scale         Innovation
Managed
Reusability
              Services


Scale         Innovation
Managed
Reusability
              Services


Scale         Innovation
Managed
Reusability
              Services


Scale         Innovation
Managed
Reusability
              Services


Scale         Innovation
The Cloud Optimizes
Capacity Resources
Elastic Compute Capacity




    On and Off             Fast Growth




    Variable peaks         Predictable peaks
Elastic Compute Capacity
                                                WASTE




       On and Off                 Fast Growth




       Variable peaks             Predictable peaks

       CUSTOMER DISSATISFACTION
Elastic Compute Capacity

Capacity                           Traditional
                                   IT capacity

                                    Elastic cloud capacity
                            Time

            Your IT needs
Elastic Compute Capacity




       On and Off          Fast Growth




       Variable peaks      Predictable peaks
The Cloud Empowers Users
to Balance Cost and Time
1 instance for 500 hours
=
500 instances for 1 hour
                           I like this!
                             I scale
The Cloud
Reduces Cost
For Experimentation
The Cloud
Enables Collection and Storage
of Big Data
Storage Costs are Declining
Simple Storage Service
                                         1 Trillion
1000,000

 750,000

 500,000

 250,000

   0,000




           750k+ peak transactions per second
Global Accessibility

                                                  Region
 US-WEST (N. California)                                   EU-WEST (Ireland)
                           GOV CLOUD                                                         ASIA PAC (Tokyo)




                                 US-EAST (Virginia)


US-WEST (Oregon)




                                                                               ASIA PAC
                                                                               (Singapore)
                                          SOUTH AMERICA (Sao Paulo)
Amazon DynamoDB
Managed NoSQL database service
Unlimited size
Unlimited scale
Flexible key/value store
Consistent, low latencies (single digit milliseconds, SSD)
Robust, durable data storage
Integrated analytics with Elastic MapReduce
Amazon Elastic MapReduce
On-demand, managed analytics platform
Powered by Hadoop
Integrated with Spot instances to lower costs
Vibrant ecosystem of tools
Elastic clusters
Flexible programming model (Java, Python, Ruby etc)
Big Data on the Cloud
In the Real World
Big Data Verticals

                                                                                               Social
Media/Advertisi                                               Financial
                  Oil & Gas     Retail       Life Sciences                   Security      Network/Gamin
      ng                                                      Services
                                                                                                 g



                                                                                               User
                                                                              Anti-virus
    Targeted                                                 Monte Carlo                    Demographics
                              Recommend
   Advertising                                               Simulations


                   Seismic                      Genome                         Fraud
                                                                                            Usage analysis
                   Analysis                     Analysis                      Detection


   Image and
                              Transactions
     Video                                                   Risk Analysis
   Processing                   Analysis                                       Image           In-game
                                                                             Recognition        metrics
Visualizations
Bank – Monte Carlo Simulations
                 “The AWS platform was a good fit for its
                 unlimited and flexible computational power to

23 Hours to      our risk-simulation process requirements.

                 With AWS, we now have the power to decide
20 Minutes       how fast we want to obtain simulation
                 results, and, more importantly, we have the
                 ability to run simulations not possible before
                 due to the large amount of infrastructure
                 required.” – Castillo, Director, Bankinter
Recommendations




The Taste Test http://www.etsy.com/tastetest
Recommendations
Gift Ideas for Facebook Friends




etsy.com/gifts
Big Data on AWS
Click Stream Analysis
   User recently
   purchased a
   sports movie and       Targeted Ad
   is searching for   (1.7 Million per day)
   video games
Characteristics of
Big Data

            How the Cloud Is
            Big Data’s Best Friend


                       Big Data on the Cloud
                       In the Real World
Thank you…

More Related Content

PPTX
SAP Sybase IQ Sunumu-Sybase Türkiye
PDF
Telco Big Data Workshop Sample
PDF
NextGen Infrastructure for Big Data
PDF
Big Data Scotland 2017
PDF
Big Data World Forum
PDF
Big Data World Forum
PPTX
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
PDF
Big Data & the Cloud
SAP Sybase IQ Sunumu-Sybase Türkiye
Telco Big Data Workshop Sample
NextGen Infrastructure for Big Data
Big Data Scotland 2017
Big Data World Forum
Big Data World Forum
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Big Data & the Cloud

What's hot (20)

PPTX
Hitachi Cloud and Solutions
PDF
IBM-Why Big Data?
PDF
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
PPTX
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
PDF
Overview - IBM Big Data Platform
PPTX
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
PPTX
Implementing Big Data at the Speed of Business
PPTX
Revolution R Enterprise - 100% R and More Webinar Presentation
PDF
Big Data Overview
PDF
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
PDF
Big data ibm keynote d advani presentation
PPTX
Monitizing Big Data at Telecom Service Providers
PDF
Big Data World Forum
PDF
Big Data LDN 2017: The 3rd Wave of Business Intelligence
PPTX
Introduction to Cloud computing and Big Data-Hadoop
PDF
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
PDF
Ibm big data
PDF
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
PPT
Making Hadoop Ready for the Enterprise
PDF
Meet up roadmap cloudera 2020 - janeiro
Hitachi Cloud and Solutions
IBM-Why Big Data?
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Overview - IBM Big Data Platform
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Implementing Big Data at the Speed of Business
Revolution R Enterprise - 100% R and More Webinar Presentation
Big Data Overview
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Big data ibm keynote d advani presentation
Monitizing Big Data at Telecom Service Providers
Big Data World Forum
Big Data LDN 2017: The 3rd Wave of Business Intelligence
Introduction to Cloud computing and Big Data-Hadoop
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Ibm big data
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
Making Hadoop Ready for the Enterprise
Meet up roadmap cloudera 2020 - janeiro
Ad

Similar to Big Data on AWS (20)

PPTX
16h30 p duff-big-data-final
PDF
Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...
PPTX
The Move to the Cloud for Regulated Industries
PPTX
Big Data, Big Content, and Aligning Your Storage Strategy
PPT
Big Data = Big Decisions
PDF
Massive Data Analytics and the Cloud
PPTX
Utilisation du cloud dans les systèmes intelligent
PDF
Internet of Things
PPTX
The Enterprise Trifecta
PDF
Infochimps #1 Big Data Platform for the Cloud
PPTX
Mesa Big Data 2nd Screen Final
PDF
AI at Scale in Enterprises
PDF
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
PDF
Intel Cloud summit: Big Data by Nick Knupffer
PPTX
Secure Big Data Analytics - Hadoop & Intel
PDF
What is big data - Architectures and Practical Use Cases
PPTX
Aws jvaria e_collaborationforum
PPTX
Big Data PPT by Rohit Dubey
PPTX
Unlocking value in your (big) data
PDF
2012: The Tipping Point of Broad Scale Cloud Deployment
16h30 p duff-big-data-final
Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...
The Move to the Cloud for Regulated Industries
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data = Big Decisions
Massive Data Analytics and the Cloud
Utilisation du cloud dans les systèmes intelligent
Internet of Things
The Enterprise Trifecta
Infochimps #1 Big Data Platform for the Cloud
Mesa Big Data 2nd Screen Final
AI at Scale in Enterprises
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Intel Cloud summit: Big Data by Nick Knupffer
Secure Big Data Analytics - Hadoop & Intel
What is big data - Architectures and Practical Use Cases
Aws jvaria e_collaborationforum
Big Data PPT by Rohit Dubey
Unlocking value in your (big) data
2012: The Tipping Point of Broad Scale Cloud Deployment
Ad

More from Amazon Web Services LATAM (20)

PPTX
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
PPTX
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
PPTX
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
PPTX
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
PPTX
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
PPTX
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
PPTX
Automatice el proceso de entrega con CI/CD en AWS
PPTX
Automatize seu processo de entrega de software com CI/CD na AWS
PPTX
Cómo empezar con Amazon EKS
PPTX
Como começar com Amazon EKS
PPTX
Ransomware: como recuperar os seus dados na nuvem AWS
PPTX
Ransomware: cómo recuperar sus datos en la nube de AWS
PPTX
Ransomware: Estratégias de Mitigação
PPTX
Ransomware: Estratégias de Mitigación
PPTX
Aprenda a migrar y transferir datos al usar la nube de AWS
PPTX
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
PPTX
Cómo mover a un almacenamiento de archivos administrados
PPTX
Simplifique su BI con AWS
PPTX
Simplifique o seu BI com a AWS
PPTX
Os benefícios de migrar seus workloads de Big Data para a AWS
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
Automatice el proceso de entrega con CI/CD en AWS
Automatize seu processo de entrega de software com CI/CD na AWS
Cómo empezar con Amazon EKS
Como começar com Amazon EKS
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigación
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Cómo mover a un almacenamiento de archivos administrados
Simplifique su BI con AWS
Simplifique o seu BI com a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS

Recently uploaded (20)

PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PPTX
Internet of Everything -Basic concepts details
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
zbrain.ai-Scope Key Metrics Configuration and Best Practices.pdf
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PDF
Human Computer Interaction Miterm Lesson
PDF
Examining Bias in AI Generated News Content.pdf
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
substrate PowerPoint Presentation basic one
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Internet of Everything -Basic concepts details
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
Build automations faster and more reliably with UiPath ScreenPlay
zbrain.ai-Scope Key Metrics Configuration and Best Practices.pdf
Build Real-Time ML Apps with Python, Feast & NoSQL
Human Computer Interaction Miterm Lesson
Examining Bias in AI Generated News Content.pdf
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
4 layer Arch & Reference Arch of IoT.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
Auditboard EB SOX Playbook 2023 edition.
Connector Corner: Transform Unstructured Documents with Agentic Automation
A symptom-driven medical diagnosis support model based on machine learning te...
substrate PowerPoint Presentation basic one
Rapid Prototyping: A lecture on prototyping techniques for interface design

Big Data on AWS

  • 1. BIG Data on AWS Paul Duffy
  • 2. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  • 4. The cost of data generation is falling rapidly Dramatic increase in volume, velocity and variety of data
  • 5. BIG DATA A collection of tools, techniques and technologies that allow you to work productively with data at any scale.
  • 6. Big Data is Getting Bigger 2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
  • 7. Features driven by MapReduce
  • 8. Variable data structures and sources Computer Generated Human Generated • Application server logs • Twitter “Fire Hose” 50m (web sites, games) tweets/day 1,400% • Sensor data (weather, growth per year water, smart grids) • Blogs/Reviews/Emails/P • Images/videos (traffic, ictures security cameras) • Social Graphs: Facebook, Linked-in, Contacts
  • 9. The Role of Data is Changing
  • 10. Traditional analytics required a fixed data model, based on pre-known questions Big Data promotes data exploration and experimentation which leads to innovation
  • 11. Collection & Computation Collaboration Generation storage & analytics & sharing
  • 12. Lower costs, faster throughput Collection & Computation Collaboration Generation storage & analytics & sharing Increased pressure on traditional IT and tools
  • 13. Require tools designed for data collection and computation at any volume, velocity or format.
  • 14. Software • Designed for distribution • Easy programming models • Flexible language choice • Platform for abstraction and ecosystem • Good example: Hadoop
  • 15. Infrastructure • Designed for distribution • Easy programming models • Flexible language choice • Platform for abstraction and ecosystem • Good example: Cloud computing
  • 16. Software Infrastructure
  • 17. How the Cloud Is Big Data’s Best Friend
  • 18. How do we define the cloud? By Benefits!
  • 19. No Cap Ex Pay Per Elasticity Use Cloud Fast Time to Market Focus on core competency
  • 20. Why is the Cloud Big Data’s Best Friend?
  • 21. We know we want collect, store, organize, analyze and share it. But we have limited resources.
  • 22. The Cloud Optimizes Precious IT Resources i.e. Skilled People
  • 23. “Over the next decade, the number of files or containers that encapsulate the information in the digital universe will grow by 75x. While the pool of IT staff available to manage them will grow only slightly. At 1.5x” - 2011 IDC Digital Universe Study
  • 24. Deploying a Hadoop cluster is hard
  • 25. Cloud computing 30% 70% The Old Managing All of the IT World Using Big Data “Undifferentiated Heavy Lifting”
  • 26. Cloud computing 30% 70% The Old Managing All of the IT World Using Big Data “Undifferentiated Heavy Lifting” Cloud-Based Configuring Infrastructure Analyzing and Using Big Data Cloud Assets 70% 30%
  • 27. Managed Reusability Services Scale Innovation
  • 28. Managed Reusability Services Scale Innovation
  • 29. Managed Reusability Services Scale Innovation
  • 30. Managed Reusability Services Scale Innovation
  • 31. Managed Reusability Services Scale Innovation
  • 33. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  • 34. Elastic Compute Capacity WASTE On and Off Fast Growth Variable peaks Predictable peaks CUSTOMER DISSATISFACTION
  • 35. Elastic Compute Capacity Capacity Traditional IT capacity Elastic cloud capacity Time Your IT needs
  • 36. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  • 37. The Cloud Empowers Users to Balance Cost and Time
  • 38. 1 instance for 500 hours = 500 instances for 1 hour I like this! I scale
  • 39. The Cloud Reduces Cost For Experimentation
  • 40. The Cloud Enables Collection and Storage of Big Data
  • 41. Storage Costs are Declining
  • 42. Simple Storage Service 1 Trillion 1000,000 750,000 500,000 250,000 0,000 750k+ peak transactions per second
  • 43. Global Accessibility Region US-WEST (N. California) EU-WEST (Ireland) GOV CLOUD ASIA PAC (Tokyo) US-EAST (Virginia) US-WEST (Oregon) ASIA PAC (Singapore) SOUTH AMERICA (Sao Paulo)
  • 44. Amazon DynamoDB Managed NoSQL database service Unlimited size Unlimited scale Flexible key/value store Consistent, low latencies (single digit milliseconds, SSD) Robust, durable data storage Integrated analytics with Elastic MapReduce
  • 45. Amazon Elastic MapReduce On-demand, managed analytics platform Powered by Hadoop Integrated with Spot instances to lower costs Vibrant ecosystem of tools Elastic clusters Flexible programming model (Java, Python, Ruby etc)
  • 46. Big Data on the Cloud In the Real World
  • 47. Big Data Verticals Social Media/Advertisi Financial Oil & Gas Retail Life Sciences Security Network/Gamin ng Services g User Anti-virus Targeted Monte Carlo Demographics Recommend Advertising Simulations Seismic Genome Fraud Usage analysis Analysis Analysis Detection Image and Transactions Video Risk Analysis Processing Analysis Image In-game Recognition metrics
  • 49. Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to 23 Hours to our risk-simulation process requirements. With AWS, we now have the power to decide 20 Minutes how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
  • 50. Recommendations The Taste Test http://www.etsy.com/tastetest
  • 51. Recommendations Gift Ideas for Facebook Friends etsy.com/gifts
  • 53. Click Stream Analysis User recently purchased a sports movie and Targeted Ad is searching for (1.7 Million per day) video games
  • 54. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World