© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
MapR Distribution for Hadoop Overview
Top Ranked
Exponential
Growth
500+
Customers Cloud Leaders
3X bookings Q1 ‘13 – Q1 ‘14
80% of accounts expand 3X
90% software licenses
<1% lifetime churn
>$1B in incremental revenue
generated by 1 customer
© 2014 MapR Technologies 3
Topics for Today
• Hadoop Trends and Realities
• Hadoop Deployment Model
• Integrating Hadoop into Your IT Environment
© 2014 MapR Technologies 4© 2014 MapR Technologies
3 Trends
Forcing a revolution in enterprise architecture
© 2014 MapR Technologies 5
Industry Leaders Compete and Win with Data1TREND
More Data Beats Better Algorithms
Collecting interaction data from ecommerce, social media, offline, and call centers
enables a “customer 360 view” and consumer intimacy
Competitive Advantage is Decided by 0.5%
Consumer financial services: 1% improvement in fraud detection means hundreds of millions of dollars
Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability
© 2014 MapR Technologies 6
Big Data is Overwhelming Traditional Systems
• Mission-critical reliability
• Transaction guarantees
• Deep security
• Real-time performance
• Backup and recovery
• Interactive SQL
• Rich analytics
• Workload management
• Data governance
• Backup and recovery
Enterprise
Data
Architecture
2TREND
ENTERPRISE
USERS
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
PRODUCTION
REQUIREMENTS
PRODUCTION
REQUIREMENTS
OUTSIDE SOURCES
© 2014 MapR Technologies 7
Hadoop: The Disruptive Technology at the Core of Big Data3TREND
JOB TRENDS FROM INDEED.COM
Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
© 2014 MapR Technologies 8
ENTERPRISE
DATA HUB
MARKETING
OPTIMIZATION
RISK & SECURITY
OPTIMIZATION
OPERATIONS
INTELLIGENCE
• Multi-structured
data staging & archive
• ETL / DW optimization
• Mainframe
optimization
• Data exploration
• Recommendation
engines & targeting
• Customer 360
• Click-stream analysis
• Social media analysis
• Ad optimization
• Network security
monitoring
• Security information &
event management
• Fraudulent behavioral
analysis
• Supply chain & logistics
• System log analysis
• Manufacturing quality
assurance
• Preventative
maintenance
• Smart meter analysis
Common Use Cases: Taking Advantage of Hadoop
© 2014 MapR Technologies 9© 2014 MapR Technologies
And 2 Realities
© 2014 MapR Technologies 10
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
1REALITY
• Data staging
• Archive
• Data transformation
• Data exploration
• Streaming,
interactions
Hadoop now on the critical path
2 Interoperability
1 Reliability and DR
4
Supports operations
and analytics
3 High performance
Keys for Production Success
© 2014 MapR Technologies 11
Moving towards operational applications
2003
GFS
2004
Web index is batch
(GFS/MapReduce)
2010
Web index is real-time
(BigTable)
The transition from
batch to real-time
2004
MapReduce
2006
BigTable
The explosion in
operational applications
Google’s operational data store (BigTable) has enabled multiple revolutions
within the company:
(1)
(2)
2REALITY
© 2014 MapR Technologies 12© 2014 MapR Technologies
Hadoop Deployment Model
© 2014 MapR Technologies 13
Modern Data Architecture for Hadoop
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
DATA WAREHOUSE
Data Movement
Data Access
Analytics
Search
Schema-less
data exploration
BI, reporting
Ad-hoc integrated
analytics
Data Transformation, Enrichment
and Integration
MAPR DISTRIBUTION FOR HADOOP
Streaming
(Spark Streaming, Storm)
NoSQL ODBMS
(HBase, Accumulo, …)
Data Storage Platform
DISTRIBUTION FOR HADOOP
Batch / Search
(MR, Spark, Hive, Pig, …)
Operational Apps
Recommendations
Fraud Detection
Logistics
Optimized Data Architecture Machine Learning
© 2014 MapR Technologies 14
Data Warehouse Optimization
Improve data services to customers while reducing enterprise architecture costs
• Provide cloud, security, managed services, data center, & comms
• Report on customer usage, profiles, billing, and sales metrics
• Improve service: Measure service quality and repair metrics
• Reduce customer churn – identify and address IP network hotspots
• Cost of ETL & DW storage for growing IP and clickstream data; >3 months
• Reliability & cost of Hadoop alternatives limited ETL & storage offload
• MapR Data Platform for data staging, ETL, and storage at 1/10th the cost
• MapR provided smallest datacenter footprint with best DR solution
• Enterprise-grade: NFS file management, consistent snapshots & mirroring
OBJECTIVES
CHALLENGES
SOLUTION
• Increased scale to handle network IP and clickstream data
• Reduced workload on DW to maintain reporting SLA’s to business
• Unlocked new insights into network usage and customer preferences
Business
Impact
FORTUNE 100
TELCO
© 2014 MapR Technologies 15
Operational Apps: Push Messaging Platform
MapR: Enabling the “smartest, most aware, precise, easy-to-use, scalable,
secure and powerful push messaging platform on the planet"
• Enable organizations to build one-on-one brand relationships
• Push messaging and geo-location targeting that
• Support large numbers of customers in a multi-tenant platform
• Target specific consumers in real time with relevant offers
• Increase reliability of push messaging while lowering data center costs
OBJECTIVES
CHALLENGES
SOLUTION
• Increasing engagement and customer loyalty for 100’s of leading brands
• Reduced hardware footprint by 50%
• Consolidated 8 Hadoop clusters into 1 MapR cluster
Business
Impact
• MapR Distribution for Hadoop with Apache HBase for operational workloads
• Data placement control enables efficient cluster resource management
© 2014 MapR Technologies 16© 2014 MapR Technologies
Integrating Hadoop into Enterprise Environments
© 2014 MapR Technologies 17
Hadoop Success Depends on
Enterprise
Grade
Functionality
Scaling for the
Future
© 2014 MapR Technologies 18
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low TCO
Enterprise Requirements
© 2014 MapR Technologies 19
Data
IT Budgets
TCO : Core to Hadoop evolution
• Hadoop TAM comes from disrupting enterprise data warehouse and storage spending
• Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“
• Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014
$9,000
$40,000
<$1,000
DATA GROWING
AT 40%
2013
ENTERPRISE
STORAGE
IT BUDGETS
GROWING AT 2.5%
2014 2015 2016 2017
DATABASE
WAREHOUSE
$ PER TERABYTE
19
HADOOP
© 2014 MapR Technologies 20
Better Performance with Less Hardware
PREVIOUS
RECORD: 1.6 TB
with 2200 nodes
1.65 TBIN 1 MINUTE
298 NODES
NEW MINUTESORT WORLD RECORD
MapR: With a Fraction of the Hardware
Previous Record
© 2014 MapR Technologies 21
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low TCO
2. Trusted Data
Enterprise Requirements
© 2014 MapR Technologies 22
Data Protection: Replication and Snapshots
Replication
• Protect from hardware failures
• File chunks, table regions and metadata are automatically
replicated (3x by default)
• At least one replica on a different rack
Snapshots
• Protect from user and application errors
• Point-in-time recovery
• Redirect on write
• No performance or scale impact
• Read files and tables directly from snapshot
C1 C2
C3
C1 C2
C4
C1 C4 C4 C2
C5
C5 C6
C3
C5 C6
C3C6 C7
C7 C7
₁
© 2014 MapR Technologies 23
Hadoop Security
Authorization to
ensure the right
access to files
and databases
Authentication
for users and
user-created job
requests
Encryption to
ensure user
credentials and
data are always
secure
Integration with
existing security
infrastructure
© 2014 MapR Technologies 24
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low TCO
2. Trusted Data
3. Application SLAs
Enterprise Requirements
© 2014 MapR Technologies 25
Metadata HA
MapReduce/YARN HA
Instant recovery
Rolling upgrades
HA is built in
• Distributed metadata can self-heal
• No practical limit on # of files
• Jobs are not impacted by failures
• Meet your data processing SLAs
• Files and tables are accessible within seconds of a node
failure or cluster restart
• Upgrade the software with no downtime
• No special configuration to enable HA
High Availability (HA) Everywhere
© 2014 MapR Technologies 26
Disaster Recovery: Mirroring
• Flexible
– Choose the volumes/directories to mirror
– You don’t need to mirror the entire cluster
– Active/active
• Fast
– No performance impact
– Automatic compression
• Safe
– Point-in-time consistency
– End-to-end checksums
• Easy
– Graceful handling of network issues
– No third-party software
– Takes less than two minutes to configure!
Production
WAN
Production Research
Datacenter 1 Datacenter 2
WAN EC2
© 2014 MapR Technologies 27
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low TCO
2. Trusted Data
3. Application SLAs
4. Open Standards
Enterprise Requirements
© 2014 MapR Technologies 28
Seamless Integration with NFS
• POSIX compliance
– Random reads/writes
– Simultaneous reading and writing to a file
– Compression is automatic and transparent
• Industry-standard NFS interface (in
addition to HDFS API)
– Stream data into the cluster
– Leverage thousands of tools and
applications
– Easier to use non-Java programming
languages
– No need for most proprietary Hadoop
connectors
Hadoop
© 2014 MapR Technologies 29
When Hadoop Looks Like a NAS…
• Data ingestion is easy
– Popular online gaming company changed data
ingestion from a complex Flume cluster to a 17-
line Python script
• Database bulk import/export with standard
vendor tools
– Large telco saved $30M on EDW costs (5 years)
by leveraging MapR to pre-process and store
raw data prior to loading into EDW
• 1000s of applications/tools
– Existing Linux commands, browsers work out of
the box
Application
servers
$ find . | grep log
$ cp
$ vi results.csv
$ scp
$ tail -f part-00000
Logs
© 2014 MapR Technologies 30
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low TCO
2. Trusted Data
3. Application SLAs
4. Open Standards
Enterprise Requirements
1. Freedom of Choice
Future Proofing
© 2014 MapR Technologies 31
Pick the
Right Tool
for the Job
© 2014 MapR Technologies 32
Freedom of ChoiceManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark
Streaming
Storm*
Streaming
HBase
Solr
NoSQL &
Search
Juju
Provisiong.
&
Coordn.
Savannah*
Mahout
MLLib
ML,
Graph
GraphX
MR v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Govnce.Tez*
Accumulo*
Hive
Impala
Shark
Drill*
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data
Integrtn.
& Access
HttpFS
Hue
* 2014 TIMELINE
© 2014 MapR Technologies 33
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low TCO
2. Trusted Data
3. Application SLAs
4. Open Standards
Enterprise Requirements
1. Freedom of Choice
2. Multiple Users
Future Proofing
© 2014 MapR Technologies 34
Volumes
100K volumes are OK,
create as many as needed
Volumes dramatically simplify
management of multiple
users:
• Replication factor
• Scheduled mirroring
• Scheduled snapshots
• Data placement control
• User access and tracking
• Administrative permissions
/projects
/tahoe
/yosemite
/user
/msmith
/bjohnson
© 2014 MapR Technologies 35
Multi-tenancy Isolation
• Tasks sandboxed so they don’t impact other tasks or system daemons
• System resources protected from runaway jobs
• Volume-based data placement
• Label-based job scheduling
Quotas
• Storage quotas by volume/user/group
• CPU and memory quotas by queue/user/group
Security and delegation
• Wire-level authentication and encryption (Kerberos not required)
• Fine-grained administration permissions including volume-level delegation
• Authenticate users to AD, LDAP and Kerberos via Linux PAM
Reporting
• Detailed reporting on resource usage (75+ different metrics)
• All reports are available via UI, CLI and REST API
© 2014 MapR Technologies 36
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low TCO
2. Trusted Data
3. Application SLAs
4. Open Standards
Enterprise Requirements
1. Freedom of Choice
2. Multiple Users
3. Operational
Applications
Future Proofing
© 2014 MapR Technologies 37
Operations + Analytics on One Platform
Fraud model
Recommendations
table
HADOOP
Fraud
investigator
Interactive
marketer
Online
transactions
Fraud
detection
Personalized
offers
Clickstream
analysis
Fraud
investigation tool
Real-time Operational Applications
Analytics
© 2014 MapR Technologies 38© 2014 MapR Technologies
Recap
© 2014 MapR Technologies 39
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low TCO
2. Trusted Data
3. Application SLAs
4. Open Standards
Enterprise Requirements
1. Freedom of Choice
2. Multiple Users
3. Operational
Applications
Future Proofing
© 2014 MapR Technologies 40
From Redundant Processing Silos and Data Science Experiments…
Opportunity to Revolutionize Enterprise Data Architecture
© 2014 MapR Technologies 41
… to Consolidated Operational and Analytical Workloads
The Production Enterprise Data Hub
Hadoop
© 2014 MapR Technologies 42
Q&A
@mapr maprtech
nitin@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

Integrating Hadoop into your enterprise IT environment

  • 1.
    © 2014 MapRTechnologies 1© 2014 MapR Technologies
  • 2.
    © 2014 MapRTechnologies 2 MapR Distribution for Hadoop Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 ‘13 – Q1 ‘14 80% of accounts expand 3X 90% software licenses <1% lifetime churn >$1B in incremental revenue generated by 1 customer
  • 3.
    © 2014 MapRTechnologies 3 Topics for Today • Hadoop Trends and Realities • Hadoop Deployment Model • Integrating Hadoop into Your IT Environment
  • 4.
    © 2014 MapRTechnologies 4© 2014 MapR Technologies 3 Trends Forcing a revolution in enterprise architecture
  • 5.
    © 2014 MapRTechnologies 5 Industry Leaders Compete and Win with Data1TREND More Data Beats Better Algorithms Collecting interaction data from ecommerce, social media, offline, and call centers enables a “customer 360 view” and consumer intimacy Competitive Advantage is Decided by 0.5% Consumer financial services: 1% improvement in fraud detection means hundreds of millions of dollars Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability
  • 6.
    © 2014 MapRTechnologies 6 Big Data is Overwhelming Traditional Systems • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery Enterprise Data Architecture 2TREND ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  • 7.
    © 2014 MapRTechnologies 7 Hadoop: The Disruptive Technology at the Core of Big Data3TREND JOB TRENDS FROM INDEED.COM Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
  • 8.
    © 2014 MapRTechnologies 8 ENTERPRISE DATA HUB MARKETING OPTIMIZATION RISK & SECURITY OPTIMIZATION OPERATIONS INTELLIGENCE • Multi-structured data staging & archive • ETL / DW optimization • Mainframe optimization • Data exploration • Recommendation engines & targeting • Customer 360 • Click-stream analysis • Social media analysis • Ad optimization • Network security monitoring • Security information & event management • Fraudulent behavioral analysis • Supply chain & logistics • System log analysis • Manufacturing quality assurance • Preventative maintenance • Smart meter analysis Common Use Cases: Taking Advantage of Hadoop
  • 9.
    © 2014 MapRTechnologies 9© 2014 MapR Technologies And 2 Realities
  • 10.
    © 2014 MapRTechnologies 10 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS 1REALITY • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions Hadoop now on the critical path 2 Interoperability 1 Reliability and DR 4 Supports operations and analytics 3 High performance Keys for Production Success
  • 11.
    © 2014 MapRTechnologies 11 Moving towards operational applications 2003 GFS 2004 Web index is batch (GFS/MapReduce) 2010 Web index is real-time (BigTable) The transition from batch to real-time 2004 MapReduce 2006 BigTable The explosion in operational applications Google’s operational data store (BigTable) has enabled multiple revolutions within the company: (1) (2) 2REALITY
  • 12.
    © 2014 MapRTechnologies 12© 2014 MapR Technologies Hadoop Deployment Model
  • 13.
    © 2014 MapRTechnologies 13 Modern Data Architecture for Hadoop Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA DATA WAREHOUSE Data Movement Data Access Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Data Transformation, Enrichment and Integration MAPR DISTRIBUTION FOR HADOOP Streaming (Spark Streaming, Storm) NoSQL ODBMS (HBase, Accumulo, …) Data Storage Platform DISTRIBUTION FOR HADOOP Batch / Search (MR, Spark, Hive, Pig, …) Operational Apps Recommendations Fraud Detection Logistics Optimized Data Architecture Machine Learning
  • 14.
    © 2014 MapRTechnologies 14 Data Warehouse Optimization Improve data services to customers while reducing enterprise architecture costs • Provide cloud, security, managed services, data center, & comms • Report on customer usage, profiles, billing, and sales metrics • Improve service: Measure service quality and repair metrics • Reduce customer churn – identify and address IP network hotspots • Cost of ETL & DW storage for growing IP and clickstream data; >3 months • Reliability & cost of Hadoop alternatives limited ETL & storage offload • MapR Data Platform for data staging, ETL, and storage at 1/10th the cost • MapR provided smallest datacenter footprint with best DR solution • Enterprise-grade: NFS file management, consistent snapshots & mirroring OBJECTIVES CHALLENGES SOLUTION • Increased scale to handle network IP and clickstream data • Reduced workload on DW to maintain reporting SLA’s to business • Unlocked new insights into network usage and customer preferences Business Impact FORTUNE 100 TELCO
  • 15.
    © 2014 MapRTechnologies 15 Operational Apps: Push Messaging Platform MapR: Enabling the “smartest, most aware, precise, easy-to-use, scalable, secure and powerful push messaging platform on the planet" • Enable organizations to build one-on-one brand relationships • Push messaging and geo-location targeting that • Support large numbers of customers in a multi-tenant platform • Target specific consumers in real time with relevant offers • Increase reliability of push messaging while lowering data center costs OBJECTIVES CHALLENGES SOLUTION • Increasing engagement and customer loyalty for 100’s of leading brands • Reduced hardware footprint by 50% • Consolidated 8 Hadoop clusters into 1 MapR cluster Business Impact • MapR Distribution for Hadoop with Apache HBase for operational workloads • Data placement control enables efficient cluster resource management
  • 16.
    © 2014 MapRTechnologies 16© 2014 MapR Technologies Integrating Hadoop into Enterprise Environments
  • 17.
    © 2014 MapRTechnologies 17 Hadoop Success Depends on Enterprise Grade Functionality Scaling for the Future
  • 18.
    © 2014 MapRTechnologies 18 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO Enterprise Requirements
  • 19.
    © 2014 MapRTechnologies 19 Data IT Budgets TCO : Core to Hadoop evolution • Hadoop TAM comes from disrupting enterprise data warehouse and storage spending • Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“ • Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014 $9,000 $40,000 <$1,000 DATA GROWING AT 40% 2013 ENTERPRISE STORAGE IT BUDGETS GROWING AT 2.5% 2014 2015 2016 2017 DATABASE WAREHOUSE $ PER TERABYTE 19 HADOOP
  • 20.
    © 2014 MapRTechnologies 20 Better Performance with Less Hardware PREVIOUS RECORD: 1.6 TB with 2200 nodes 1.65 TBIN 1 MINUTE 298 NODES NEW MINUTESORT WORLD RECORD MapR: With a Fraction of the Hardware Previous Record
  • 21.
    © 2014 MapRTechnologies 21 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data Enterprise Requirements
  • 22.
    © 2014 MapRTechnologies 22 Data Protection: Replication and Snapshots Replication • Protect from hardware failures • File chunks, table regions and metadata are automatically replicated (3x by default) • At least one replica on a different rack Snapshots • Protect from user and application errors • Point-in-time recovery • Redirect on write • No performance or scale impact • Read files and tables directly from snapshot C1 C2 C3 C1 C2 C4 C1 C4 C4 C2 C5 C5 C6 C3 C5 C6 C3C6 C7 C7 C7 ₁
  • 23.
    © 2014 MapRTechnologies 23 Hadoop Security Authorization to ensure the right access to files and databases Authentication for users and user-created job requests Encryption to ensure user credentials and data are always secure Integration with existing security infrastructure
  • 24.
    © 2014 MapRTechnologies 24 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs Enterprise Requirements
  • 25.
    © 2014 MapRTechnologies 25 Metadata HA MapReduce/YARN HA Instant recovery Rolling upgrades HA is built in • Distributed metadata can self-heal • No practical limit on # of files • Jobs are not impacted by failures • Meet your data processing SLAs • Files and tables are accessible within seconds of a node failure or cluster restart • Upgrade the software with no downtime • No special configuration to enable HA High Availability (HA) Everywhere
  • 26.
    © 2014 MapRTechnologies 26 Disaster Recovery: Mirroring • Flexible – Choose the volumes/directories to mirror – You don’t need to mirror the entire cluster – Active/active • Fast – No performance impact – Automatic compression • Safe – Point-in-time consistency – End-to-end checksums • Easy – Graceful handling of network issues – No third-party software – Takes less than two minutes to configure! Production WAN Production Research Datacenter 1 Datacenter 2 WAN EC2
  • 27.
    © 2014 MapRTechnologies 27 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements
  • 28.
    © 2014 MapRTechnologies 28 Seamless Integration with NFS • POSIX compliance – Random reads/writes – Simultaneous reading and writing to a file – Compression is automatic and transparent • Industry-standard NFS interface (in addition to HDFS API) – Stream data into the cluster – Leverage thousands of tools and applications – Easier to use non-Java programming languages – No need for most proprietary Hadoop connectors Hadoop
  • 29.
    © 2014 MapRTechnologies 29 When Hadoop Looks Like a NAS… • Data ingestion is easy – Popular online gaming company changed data ingestion from a complex Flume cluster to a 17- line Python script • Database bulk import/export with standard vendor tools – Large telco saved $30M on EDW costs (5 years) by leveraging MapR to pre-process and store raw data prior to loading into EDW • 1000s of applications/tools – Existing Linux commands, browsers work out of the box Application servers $ find . | grep log $ cp $ vi results.csv $ scp $ tail -f part-00000 Logs
  • 30.
    © 2014 MapRTechnologies 30 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements 1. Freedom of Choice Future Proofing
  • 31.
    © 2014 MapRTechnologies 31 Pick the Right Tool for the Job
  • 32.
    © 2014 MapRTechnologies 32 Freedom of ChoiceManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisiong. & Coordn. Savannah* Mahout MLLib ML, Graph GraphX MR v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Govnce.Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integrtn. & Access HttpFS Hue * 2014 TIMELINE
  • 33.
    © 2014 MapRTechnologies 33 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements 1. Freedom of Choice 2. Multiple Users Future Proofing
  • 34.
    © 2014 MapRTechnologies 34 Volumes 100K volumes are OK, create as many as needed Volumes dramatically simplify management of multiple users: • Replication factor • Scheduled mirroring • Scheduled snapshots • Data placement control • User access and tracking • Administrative permissions /projects /tahoe /yosemite /user /msmith /bjohnson
  • 35.
    © 2014 MapRTechnologies 35 Multi-tenancy Isolation • Tasks sandboxed so they don’t impact other tasks or system daemons • System resources protected from runaway jobs • Volume-based data placement • Label-based job scheduling Quotas • Storage quotas by volume/user/group • CPU and memory quotas by queue/user/group Security and delegation • Wire-level authentication and encryption (Kerberos not required) • Fine-grained administration permissions including volume-level delegation • Authenticate users to AD, LDAP and Kerberos via Linux PAM Reporting • Detailed reporting on resource usage (75+ different metrics) • All reports are available via UI, CLI and REST API
  • 36.
    © 2014 MapRTechnologies 36 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements 1. Freedom of Choice 2. Multiple Users 3. Operational Applications Future Proofing
  • 37.
    © 2014 MapRTechnologies 37 Operations + Analytics on One Platform Fraud model Recommendations table HADOOP Fraud investigator Interactive marketer Online transactions Fraud detection Personalized offers Clickstream analysis Fraud investigation tool Real-time Operational Applications Analytics
  • 38.
    © 2014 MapRTechnologies 38© 2014 MapR Technologies Recap
  • 39.
    © 2014 MapRTechnologies 39 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements 1. Freedom of Choice 2. Multiple Users 3. Operational Applications Future Proofing
  • 40.
    © 2014 MapRTechnologies 40 From Redundant Processing Silos and Data Science Experiments… Opportunity to Revolutionize Enterprise Data Architecture
  • 41.
    © 2014 MapRTechnologies 41 … to Consolidated Operational and Analytical Workloads The Production Enterprise Data Hub Hadoop
  • 42.
    © 2014 MapRTechnologies 42 Q&A @mapr maprtech [email protected] Engage with us! MapR maprtech mapr-technologies

Editor's Notes

  • #3 The MapR distribution for Hadoop is globally recognized as the technology leader Forrester published a Wave for Big Data Hadoop Solutions where it placed MapR as the highest ranking product based on current offering as well as roadmap. Cloud: MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advantages of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine.
  • #5 Hadoop is making CIO’s rethink their data architecture. It is a fundamental shift in the economics of data storage/processing/analytics, and is opening up entirely new business opportunities. Let’s talk about 3 key trends we are seeing, as well as 3 realities or implications on your business and “readiness” to harness the power of big data and Hadoop.
  • #6 The first trend is that the industry leaders have shown how to use big data to compete and win in their markets. It’s no longer a nice to have – you need big data to compete Google pioneered MapReduce processing on commodity hardware and used that to catapult themselves to into the leading search engine even though they were 19th in the market Yahoo! Leveraged these ideas to create Hadoop to keep up with Google and many mainstream companies have followed with new data-driven applications such as “people you may know” (started by LinkedIN and now used by Facebook, Twitter, and every social application), product recommendation engines, contextual and personalized music services (beats), measuring digital media effectiveness (comScore), serving more relevant/targeted ads(Comcast, rubicon project), fraud and risk detection, healthcare efficacy, and more What makes the difference? A lot of attention is given to data science and developing sophisticated new algorithms, but in many cases just having more data beats better algorithms. (make point on collecting more consumer interaction as well as transaction data, as an example). In addition, competitive advantage is decided by very small percentages. Just 1% improvement in fraud can mean hundreds $millions in savings. A ½% lift in advertising effectiveness means millions in new product sales and profitability. The same can be applied to customer churn, disease diagnosis, and more.
  • #7 A second trend in enterprise architecture has been big data overwhelming the existing workload-specific systems which are in production. (list of requirements for each of these on the side in text) People started with mainframes or operational systems which run ERP, finance, CRM and other mission-critical applications. They require… (pick out attributes you want to stress on the left) You also have data warehouses, marts, data mining, and other analytical systems which pull data from these operational and other systems for providing insights to the business for decision making The amount/variety of data has been overloading these systems. You reach a certain point as you try to ingest new types of data when these systems are not cost-effective to scale to terabytes or petabytes of data
  • #8 Hadoop has become the defacto big data platform which allows organizations to keep up with big data and feed data-driven applications and processes This chart shows the percentage growth of jobs from Indeed.com. Compared to other popular technologies such as MongoDB and Cassandra, Hadoop is not only the fastest growing big data technology it’s one of the fastest growing technologies period. Hadoop has the most robust ecosystem and momentum and is the big data platform of choice for industry-leading, data-driven companies (Also of interest is that Indeed.com (which is a subsidiary of a Japanese-owned company) is a customer of MapR – they harness and analyze all of the job trends data using MapR)
  • #9 Hadoop is being used in lots of different use cases across a variety of industries One way to think of this are functional areas of an organization (from left to right CIO/chief data officer, CMO (marketing), CSO or CRO (chief security or risk), or the COO, head of quality, or IT operations) We have many customers in each of these areas. Here are some example customers of MapR (give example snippets of each) You can also put different use cases in each column that are relevant for your customer
  • #11 The first reality is that as people put Hadoop into production, to relieve the pressure from other systems in their enterprise architecture it needs to reliable . Hadoop needs to be held to the same enterprise standards as your Oracle, SAP, Teradata, NetApp storage, or any other enterprise system. Many organizations are putting Hadoop into their data center to provide (list of use cases underneath) … it can do all of this and more, but For Hadoop to act as a system of record , it must provide the same guarantees for SLA’s, performance, data protection, and more Most importantly, Hadoop has the potential for both analytics AND operations. It can be used to optimize the data warehouse provide batch data refining or storage. But Hadoop can provide many operational analytics or database operations/jobs when done right.
  • #15 Verizon Teradata example Less than 10% of CDR’s analyzed
  • #16 Push messaging. Starbucks or ESPN applications, and others. MapR is the only software that they pay for. Have HBase committers on staff. Taken 8 applications clusters and moved into 1 MapR cluster; have 1 cluster with 8 sub-clusters running on different sets of nodes. Data placement control enables this. Went from 12 CDH servers and cut it down to 6. Just for HBase tables. (They won’t use M7 since they are HBase committers. )
  • #21 They ran the MinuteSort benchmark, a test which shows how much data you can sort in 1 minute. The Minutesort world record was set by Yahoo by sorting 1.6 terabytes with 2200 nodes. This MapR customer broke the record by sorting 1.65TB with 298 nodes. That’s 1/7th the hardware – that translates into tremendous cost, space, and management savings….
  • #29 MapR enables integration by providing industry-standard interfaces More 3rd party solutions work with MapR than any other distribution Proprietary connectors not needed NFS All file-based applications can read and write data Examples: Linux utilities, file browsers, Informatica UltraMessaging ODBC 3.52 All BI applications can leverage Hive Examples: Excel, Crystal Reports, Tableau, MicroStrategy Linux PAM Any authentication provider can be used Examples: LDAP, Kerberos, 3rd party
  • #38 Because only MapR can reliably run both operational and analytical applications on one platform/cluster, MapR enables a faster closed-loop process between operational applications and analytics. This means: interactive marketers and algorithms can update the rules engines more quickly and provide more real-time targeting of offers and relevant content to consumers Fraud models are kept more up to date with the latest patterns to better detect anomalies and take action more quickly on bad actors
  • #39 More important than our product is ensuring customer success.
  • #41 MapR creates a new opportunity for enterprises. The Opportunity to revolutionize the enterprise data architecture From... ‘redundant processing silos’ and ‘data science experiments’. Where you need separate Hadoop clusters for streaming, HDFS/Hive, Hbase and more To… ‘
  • #42 To… ‘converged data & processing hub’ that provides a TRUE PRODUCTIon enterprise data hub. This allows you to consolidate operational and analytical workloads. Not only across Hadoop use cases and applications, but for optimizing your enterprise data architecture