SlideShare a Scribd company logo
1
Ask	
  Bigger	
  Ques,ons	
  
with	
  Cloudera	
  
and	
  Apache	
  Hadoop	
  
Graham	
  Gear	
  
graham@cloudera.com	
  
JUNE	
  2013	
  
	
  
	
  
Data	
  Has	
  Changed	
  in	
  the	
  Last	
  30	
  Years	
  DATA	
  GROWTH	
  
END-­‐USER	
  
APPLICATIONS	
  
THE	
  INTERNET	
  
MOBILE	
  DEVICES	
  
SOPHISTICATED	
  
MACHINES	
  
STRUCTURED	
  DATA	
  –	
  10%	
  
1980	
   2012	
  
UNSTRUCTURED	
  DATA	
  –	
  90%	
  
Data	
  Management	
  Strategies	
  
Have	
  Stayed	
  the	
  Same	
  
	
  
•  Raw	
  data	
  on	
  SAN,	
  NAS	
  
and	
  tape	
  
	
  
•  Data	
  moved	
  from	
  
storage	
  to	
  compute	
  
	
  
•  Rela,onal	
  models	
  with	
  
predesigned	
  schemas	
  
Too	
  Much	
  Data,	
  Too	
  Many	
  Sources	
  
•  Can’t	
  ingest	
  fast	
  enough	
  
Too	
  Much	
  Data,	
  Too	
  Many	
  Sources	
  
$
!
$ $
$
•  Can’t	
  ingest	
  fast	
  enough	
  
	
  
•  Costs	
  too	
  much	
  to	
  store	
  
Too	
  Much	
  Data,	
  Too	
  Many	
  Sources	
  
1
2 3 4
5
•  Can’t	
  ingest	
  fast	
  enough	
  
	
  
•  Costs	
  too	
  much	
  to	
  store	
  
	
  
•  Exists	
  in	
  different	
  places	
  
Too	
  Much	
  Data,	
  Too	
  Many	
  Sources	
  
•  Can’t	
  ingest	
  fast	
  enough	
  
	
  
•  Costs	
  too	
  much	
  to	
  store	
  
	
  
•  Exists	
  in	
  different	
  places	
  
	
  
•  Archived	
  data	
  is	
  lost	
  
Can’t	
  Use	
  It	
  The	
  Way	
  You	
  Want	
  To	
  
•  Analysis	
  and	
  processing	
  
takes	
  too	
  long	
  
Can’t	
  Use	
  It	
  The	
  Way	
  You	
  Want	
  To	
  
1
2 3 4
5
•  Analysis	
  and	
  processing	
  
takes	
  too	
  long	
  
	
  
•  Data	
  exists	
  in	
  silos	
  
Can’t	
  Use	
  It	
  The	
  Way	
  You	
  Want	
  To	
  
? ? ?
•  Analysis	
  and	
  processing	
  
takes	
  too	
  long	
  
	
  
•  Data	
  exists	
  in	
  silos	
  
	
  
•  Can’t	
  ask	
  new	
  ques,ons	
  
Can’t	
  Use	
  It	
  The	
  Way	
  You	
  Want	
  To	
  
•  Analysis	
  and	
  processing	
  
takes	
  too	
  long	
  
	
  
•  Data	
  exists	
  in	
  silos	
  
	
  
•  Can’t	
  ask	
  new	
  ques,ons	
  
	
  
•  Can’t	
  analyze	
  
unstructured	
  data	
  
12
Transform	
  The	
  Way	
  You	
  Think	
  About	
  Data	
  
Cloudera	
  
Ask	
  Bigger	
  Ques,ons	
  
13	
  
When	
  customer	
  x	
  visits	
  my	
  store	
  what	
  
can	
  I	
  recommend	
  based	
  on	
  their	
  
recent	
  web	
  behavior	
  across	
  our	
  
various	
  brand	
  websites?	
  
What	
  is	
  the	
  best	
  loca,on	
  in	
  North	
  
America	
  to	
  efficiently	
  produce	
  both	
  
tomato	
  plants	
  and	
  corn?	
  
What	
  does	
  every	
  fraudulent	
  ac,vity	
  in	
  
the	
  last	
  2	
  years	
  have	
  in	
  common	
  that	
  
will	
  help	
  us	
  iden,fy	
  and	
  proac,vely	
  
prevent	
  the	
  next	
  incident?	
  
Are	
  hotel	
  room	
  sales	
  at	
  Christmas	
  
slow	
  because	
  of	
  inventory	
  or	
  
compe,,ve	
  pricing?	
  	
  
What	
  did	
  customer	
  x	
  view	
  
on	
  their	
  last	
  website	
  visit?	
  
	
  
`	
  
What	
  makes	
  tomato	
  plants	
  
more	
  frui[ul	
  than	
  others	
  ?	
  
	
  
What	
  incidents	
  of	
  fraud	
  did	
  
we	
  detect	
  last	
  year?	
  
	
  
What	
  search	
  terms	
  are	
  used	
  
most	
  oen	
  when	
  looking	
  for	
  
hotels	
  in	
  NYC?	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  SIMPLIFIED,	
  UNIFIED,	
  EFFICIENT	
  
•	
  Bulk	
  of	
  data	
  stored	
  on	
  scalable	
  low	
  cost	
  pla[orm	
  
•	
  Perform	
  end-­‐to-­‐end	
  workflows	
  
•	
  Specialized	
  systems	
  reserved	
  for	
  specialized	
  workloads	
  
•	
  Provides	
  data	
  access	
  across	
  departments	
  or	
  LOB	
  
	
  	
  	
  COMPLEX,	
  FRAGMENTED,	
  COSTLY	
  
•Data	
  silos	
  by	
  department	
  or	
  LOB	
  
•	
  Lots	
  of	
  data	
  stored	
  in	
  expensive	
  specialized	
  systems	
  	
  
•	
  Analysts	
  pull	
  select	
  data	
  into	
  EDW	
  
•	
  No	
  one	
  has	
  a	
  complete	
  view	
  
	
  
The	
  Cloudera	
  Approach	
  
14	
  
Meet	
  enterprise	
  demands	
  with	
  a	
  new	
  way	
  to	
  think	
  about	
  data.	
  
THE	
  CLOUDERA	
  WAY	
  THE	
  OLD	
  WAY	
  
Single	
  data	
  pla[orm	
  to	
  
support	
  BI,	
  Repor,ng	
  &	
  	
  
App	
  Serving	
  
Mul,ple	
  pla[orms	
  	
  
for	
  mul,ple	
  workloads	
  
 	
  
INGEST	
   STORE	
   EXPLORE	
   PROCESS	
   ANALYZE	
   SERVE	
  
CDH	
   CLOUDERA	
  
MANAGER	
  
CLOUDERA	
  
SUPPORT	
  
Cloudera	
  Enterprise:	
  The	
  Pla[orm	
  for	
  Big	
  Data	
  
15	
  
BRINGS	
  STORAGE	
  &	
  
COMPUTE	
  TOGETHER	
  
WORKS	
  WITH	
  EVERY	
  
TYPE	
  OF	
  DATA	
  
CHANGES	
  THE	
  
ECONOMICS	
  OF	
  DATA	
  
MANGAGEMENT	
  
A	
  Revolu,onary	
  Solu,on	
  Built	
  on	
  Apache	
  Hadoop	
  
CLOUDERA	
  
NAVIGATOR	
  
16	
  
Cloudera	
  Enterprise	
  
Includes	
  Advanced	
  System	
  Management	
  &	
  Support	
  for	
  the	
  Core	
  CDH	
  Projects	
  
	
  	
  
CDH	
  
100%	
  OPEN	
  SOURCE	
  
HADOOP	
  DISTRIBUTION	
  
CLOUDERA	
  MANAGER	
  
END-­‐TO-­‐END	
  SYSTEM	
  MANAGEMENT	
  
CORE	
  PROJECTS	
   PREMIUM	
  PROJECTS	
   CONNECTORS	
  
HDFS	
   MAPREDUCE	
   FLUME	
   HCATALOG	
  
MICROSTRATEGY	
  
NETEZZA	
  
ORACLE	
  
QLIKVIEW	
  
TABLEAU	
  
TERADATA	
  
HIVE	
   HUE	
   MAHOUT	
   OOZIE	
  
PIG	
   SQOOP	
   WHIRR	
   ZOOKEEPER	
  
HBASE	
  
IMPALA	
  
SEARCH	
  (BETA)	
  
DEPLOYMENT	
   MONITORING	
   API	
   SNMP	
   CONFIG	
  ROLLBACKS	
   PHONE	
  HOME	
  
SERVICE	
  MGMT	
   DIAGNOSTICS	
   ROLLING	
  UPGRADES	
   LDAP	
   REPORTING	
   BACKUP/DR	
  
CLOUDERA	
  SUPPORT	
  
BEST-­‐IN-­‐CLASS	
  TECHNICAL	
  SUPPORT,	
  
COMMUNICTY	
  ADVOCACY	
  &	
  
INDEMNIFICATION	
  
CLOUDERA	
  NAVIGATOR	
  
END-­‐TO-­‐END	
  DATA	
  MANAGEMENT	
  
ACCESS	
  MGMT	
   DATA	
  AUDIT	
  
CORE	
  HADOOP	
  
PROJECTS	
  
CLOUDERA	
  
MANAGER	
  
CLOUDERA	
  
NAVIGATOR	
  
HBASE	
   IMPALA	
   Search	
  
17	
  
RTD	
  SubscripVon	
  
Includes	
  Support	
  &	
  Indemnity	
  for	
  Apache	
  HBase	
  
	
  	
  
CDH	
  
100%	
  OPEN	
  SOURCE	
  
HADOOP	
  DISTRIBUTION	
  
CLOUDERA	
  MANAGER	
  
END-­‐TO-­‐END	
  SYSTEM	
  MANAGEMENT	
  
CORE	
  PROJECTS	
   PREMIUM	
  PROJECTS	
   CONNECTORS	
  
HDFS	
   MAPREDUCE	
   FLUME	
   HCATALOG	
  
MICROSTRATEGY	
  
NETEZZA	
  
ORACLE	
  
QLIKVIEW	
  
TABLEAU	
  
TERADATA	
  
HIVE	
   HUE	
   MAHOUT	
   OOZIE	
  
PIG	
   SQOOP	
   WHIRR	
   ZOOKEEPER	
  
HBASE	
  
IMPALA	
  
SEARCH	
  (BETA)	
  
DEPLOYMENT	
   MONITORING	
   API	
   SNMP	
   CONFIG	
  ROLLBACKS	
   PHONE	
  HOME	
  
SERVICE	
  MGMT	
   DIAGNOSTICS	
   ROLLING	
  UPGRADES	
   LDAP	
   REPORTING	
   BACKUP/DR	
  
CLOUDERA	
  SUPPORT	
  
BEST-­‐IN-­‐CLASS	
  TECHNICAL	
  SUPPORT,	
  
COMMUNICTY	
  ADVOCACY	
  &	
  
INDEMNIFICATION	
  
CLOUDERA	
  NAVIGATOR	
  
END-­‐TO-­‐END	
  DATA	
  MANAGEMENT	
  
ACCESS	
  MGMT	
   DATA	
  AUDIT	
  
CORE	
  HADOOP	
  
PROJECTS	
  
CLOUDERA	
  
MANAGER	
  
CLOUDERA	
  
NAVIGATOR	
  
HBASE	
   IMPALA	
   Search	
  
18	
  
RTQ	
  SubscripVon	
  
Includes	
  Support	
  &	
  Indemnity	
  for	
  Cloudera	
  Impala	
  
	
  	
  
CDH	
  
100%	
  OPEN	
  SOURCE	
  
HADOOP	
  DISTRIBUTION	
  
CLOUDERA	
  MANAGER	
  
END-­‐TO-­‐END	
  SYSTEM	
  MANAGEMENT	
  
CORE	
  PROJECTS	
   PREMIUM	
  PROJECTS	
   CONNECTORS	
  
HDFS	
   MAPREDUCE	
   FLUME	
   HCATALOG	
  
MICROSTRATEGY	
  
NETEZZA	
  
ORACLE	
  
QLIKVIEW	
  
TABLEAU	
  
TERADATA	
  
HIVE	
   HUE	
   MAHOUT	
   OOZIE	
  
PIG	
   SQOOP	
   WHIRR	
   ZOOKEEPER	
  
HBASE	
  
IMPALA	
  
SEARCH	
  (BETA)	
  
DEPLOYMENT	
   MONITORING	
   API	
   SNMP	
   CONFIG	
  ROLLBACKS	
   PHONE	
  HOME	
  
SERVICE	
  MGMT	
   DIAGNOSTICS	
   ROLLING	
  UPGRADES	
   LDAP	
   REPORTING	
   BACKUP/DR	
  
CLOUDERA	
  SUPPORT	
  
BEST-­‐IN-­‐CLASS	
  TECHNICAL	
  SUPPORT,	
  
COMMUNICTY	
  ADVOCACY	
  &	
  
INDEMNIFICATION	
  
CLOUDERA	
  NAVIGATOR	
  
END-­‐TO-­‐END	
  DATA	
  MANAGEMENT	
  
ACCESS	
  MGMT	
   DATA	
  AUDIT	
  
CORE	
  HADOOP	
  
PROJECTS	
  
CLOUDERA	
  
MANAGER	
  
CLOUDERA	
  
NAVIGATOR	
  
HBASE	
   IMPALA	
   Search	
  
19	
  
RTS	
  SubscripVon	
  
Includes	
  Support	
  &	
  Indemnity	
  for	
  Cloudera	
  Search	
  
	
  	
  
CDH	
  
100%	
  OPEN	
  SOURCE	
  
HADOOP	
  DISTRIBUTION	
  
CLOUDERA	
  MANAGER	
  
END-­‐TO-­‐END	
  SYSTEM	
  MANAGEMENT	
  
CORE	
  PROJECTS	
   PREMIUM	
  PROJECTS	
   CONNECTORS	
  
HDFS	
   MAPREDUCE	
   FLUME	
   HCATALOG	
  
MICROSTRATEGY	
  
NETEZZA	
  
ORACLE	
  
QLIKVIEW	
  
TABLEAU	
  
TERADATA	
  
HIVE	
   HUE	
   MAHOUT	
   OOZIE	
  
PIG	
   SQOOP	
   WHIRR	
   ZOOKEEPER	
  
HBASE	
  
IMPALA	
  
SEARCH	
  (BETA)	
  
DEPLOYMENT	
   MONITORING	
   API	
   SNMP	
   CONFIG	
  ROLLBACKS	
   PHONE	
  HOME	
  
SERVICE	
  MGMT	
   DIAGNOSTICS	
   ROLLING	
  UPGRADES	
   LDAP	
   REPORTING	
   BACKUP/DR	
  
CLOUDERA	
  SUPPORT	
  
BEST-­‐IN-­‐CLASS	
  TECHNICAL	
  SUPPORT,	
  
COMMUNICTY	
  ADVOCACY	
  &	
  
INDEMNIFICATION	
  
CLOUDERA	
  NAVIGATOR	
  
END-­‐TO-­‐END	
  DATA	
  MANAGEMENT	
  
ACCESS	
  MGMT	
   DATA	
  AUDIT	
  
CORE	
  HADOOP	
  
PROJECTS	
  
CLOUDERA	
  
MANAGER	
  
CLOUDERA	
  
NAVIGATOR	
  
HBASE	
   Search	
  IMPALA	
  
20	
  
BDR	
  SubscripVon	
  
Includes	
  Centralized	
  Management	
  For	
  Disaster	
  Recovery	
  Workflows	
  
	
  	
  
CDH	
  
100%	
  OPEN	
  SOURCE	
  
HADOOP	
  DISTRIBUTION	
  
CLOUDERA	
  MANAGER	
  
END-­‐TO-­‐END	
  SYSTEM	
  MANAGEMENT	
  
CORE	
  PROJECTS	
   PREMIUM	
  PROJECTS	
   CONNECTORS	
  
HDFS	
   MAPREDUCE	
   FLUME	
   HCATALOG	
  
MICROSTRATEGY	
  
NETEZZA	
  
ORACLE	
  
QLIKVIEW	
  
TABLEAU	
  
TERADATA	
  
HIVE	
   HUE	
   MAHOUT	
   OOZIE	
  
PIG	
   SQOOP	
   WHIRR	
   ZOOKEEPER	
  
HBASE	
  
IMPALA	
  
SEARCH	
  (BETA)	
  
DEPLOYMENT	
   MONITORING	
   API	
   SNMP	
   CONFIG	
  ROLLBACKS	
   PHONE	
  HOME	
  
SERVICE	
  MGMT	
   DIAGNOSTICS	
   ROLLING	
  UPGRADES	
   LDAP	
   REPORTING	
   BACKUP/DR	
  
CLOUDERA	
  SUPPORT	
  
BEST-­‐IN-­‐CLASS	
  TECHNICAL	
  SUPPORT,	
  
COMMUNICTY	
  ADVOCACY	
  &	
  
INDEMNIFICATION	
  
CLOUDERA	
  NAVIGATOR	
  
END-­‐TO-­‐END	
  DATA	
  MANAGEMENT	
  
ACCESS	
  MGMT	
   DATA	
  AUDIT	
  
CORE	
  HADOOP	
  
PROJECTS	
  
CLOUDERA	
  
MANAGER	
  
CLOUDERA	
  
NAVIGATOR	
  
HBASE	
   IMPALA	
   Search	
  
21	
  
Navigator	
  SubscripVon	
  
Enables	
  Cloudera	
  Navigator	
  for	
  Automated	
  Data	
  Management	
  
	
  	
  
CDH	
  
100%	
  OPEN	
  SOURCE	
  
HADOOP	
  DISTRIBUTION	
  
CLOUDERA	
  MANAGER	
  
END-­‐TO-­‐END	
  SYSTEM	
  MANAGEMENT	
  
CORE	
  PROJECTS	
   PREMIUM	
  PROJECTS	
   CONNECTORS	
  
HDFS	
   MAPREDUCE	
   FLUME	
   HCATALOG	
  
MICROSTRATEGY	
  
NETEZZA	
  
ORACLE	
  
QLIKVIEW	
  
TABLEAU	
  
TERADATA	
  
HIVE	
   HUE	
   MAHOUT	
   OOZIE	
  
PIG	
   SQOOP	
   WHIRR	
   ZOOKEEPER	
  
HBASE	
  
IMPALA	
  
SEARCH	
  (BETA)	
  
DEPLOYMENT	
   MONITORING	
   API	
   SNMP	
   CONFIG	
  ROLLBACKS	
   PHONE	
  HOME	
  
SERVICE	
  MGMT	
   DIAGNOSTICS	
   ROLLING	
  UPGRADES	
   LDAP	
   REPORTING	
   BACKUP/DR	
  
CLOUDERA	
  SUPPORT	
  
BEST-­‐IN-­‐CLASS	
  TECHNICAL	
  SUPPORT,	
  
COMMUNICTY	
  ADVOCACY	
  &	
  
INDEMNIFICATION	
  
CLOUDERA	
  NAVIGATOR	
  
END-­‐TO-­‐END	
  DATA	
  MANAGEMENT	
  
ACCESS	
  MGMT	
   DATA	
  AUDIT	
  
CORE	
  HADOOP	
  
PROJECTS	
  
CLOUDERA	
  
MANAGER	
  
CLOUDERA	
  
NAVIGATOR	
  
HBASE	
   IMPALA	
   Search	
  
22
Customer	
  Case	
  Studies	
  
	
  
	
  
A	
  mul,na,onal	
  bank	
  saves	
  millions	
  by	
  
op,mizing	
  DW	
  for	
  analy,cs	
  &	
  reducing	
  data	
  
storage	
  costs	
  by	
  99%.	
  	
  
Ask	
  Bigger	
  Ques,ons:	
  
How	
  can	
  we	
  op,mize	
  our	
  
data	
  warehouse	
  investment?	
  
Cloudera	
  op,mizes	
  the	
  EDW,	
  saves	
  millions	
  
24	
  
The	
  Challenge:	
  
•  Teradata	
  EDW	
  at	
  capacity:	
  ETL	
  processes	
  consume	
  7	
  days;	
  takes	
  5	
  weeks	
  to	
  
make	
  historical	
  data	
  available	
  for	
  analysis	
  
•  Performance	
  issues	
  in	
  business	
  cri,cal	
  apps;	
  liqle	
  room	
  for	
  discovery,	
  analy,cs,	
  
ROI	
  from	
  opportuni,es	
  
Mul,na,onal	
  bank	
  saves	
  millions	
  by	
  
op,mizing	
  exis,ng	
  DW	
  for	
  analy,cs	
  &	
  
reducing	
  data	
  storage	
  costs	
  by	
  99%.	
  
The	
  Solu,on:	
  
•  Cloudera	
  Enterprise	
  offloads	
  data	
  
storage,	
  processing	
  &	
  some	
  
analy,cs	
  from	
  EDW	
  
•  Teradata	
  can	
  focus	
  on	
  opera,onal	
  
func,ons	
  &	
  analy,cs	
  
A	
  Semiconductor	
  Manufacturer	
  uses	
  	
  
predic,ve	
  analy,cs	
  to	
  take	
  preventa,ve	
  ac,on	
  
on	
  chips	
  likely	
  to	
  fail.	
  
Ask	
  Bigger	
  Ques,ons:	
  
Which	
  semiconductor	
  
chips	
  will	
  fail?	
  
Cloudera	
  enables	
  beqer	
  predic,ons	
  
26	
  
The	
  Challenge:	
  
•  Want	
  to	
  capture	
  greater	
  granular	
  and	
  historical	
  data	
  for	
  more	
  accurate	
  
predic,ve	
  yield	
  modeling	
  
•  Storing	
  9	
  months’	
  data	
  on	
  Oracle	
  is	
  expensive	
  	
  	
  
Semiconductor	
  manufacturer	
  can	
  
prevent	
  chip	
  failure	
  with	
  more	
  
accurate	
  predic,ve	
  yield	
  models.	
  
The	
  Solu,on:	
  
• Dell	
  |	
  Cloudera	
  solu,on	
  for	
  Apache	
  
Hadoop	
  
• 53	
  nodes;	
  plan	
  to	
  store	
  up	
  to	
  10	
  
years	
  (~10PB)	
  
• Capturing	
  &	
  processing	
  data	
  from	
  
each	
  phase	
  of	
  manufacturing	
  process	
  
CONFIDENTIAL	
  -­‐	
  RESTRICTED	
  
The	
  quant	
  risk	
  LOB	
  within	
  a	
  mul,na,onal	
  bank	
  
saves	
  millions	
  through	
  beqer	
  risk	
  exposure	
  
analysis	
  &	
  fraud	
  preven,on.	
  
Ask	
  Bigger	
  Ques,ons:	
  
How	
  can	
  we	
  prevent	
  
fraud?	
  
Cloudera	
  delivers	
  savings	
  through	
  fraud	
  preven,on	
  
28	
  
The	
  Challenge:	
  
•  Fraud	
  detec,on	
  is	
  a	
  cumbersome,	
  mul,-­‐step	
  analy,c	
  process	
  requiring	
  data	
  
sampling	
  
•  2B	
  transac,ons/month	
  necessitate	
  constant	
  revisions	
  to	
  risk	
  profiles	
  
•  Highly	
  tuned	
  100TB	
  Teradata	
  DW	
  drives	
  over-­‐budget	
  capital	
  reserves	
  &	
  lower	
  
investment	
  returns	
  
Quant	
  risk	
  LOB	
  in	
  mul,na,onal	
  bank	
  
saves	
  millions	
  through	
  beqer	
  risk	
  
exposure	
  analysis	
  &	
  fraud	
  preven,on	
  
The	
  Solu,on:	
  
•  Cloudera	
  Enterprise	
  data	
  factory	
  for	
  
fraud	
  preven,on,	
  credit	
  &	
  
opera,onal	
  risk	
  analysis	
  
•  Look	
  at	
  every	
  incidence	
  of	
  fraud	
  for	
  
5	
  years	
  for	
  each	
  person	
  
•  Reduced	
  costs;	
  expensive	
  CPU	
  no	
  
longer	
  consumed	
  by	
  data	
  processing	
  
BlackBerry	
  eliminates	
  data	
  sampling	
  &	
  
simplifies	
  data	
  processing	
  for	
  beqer,	
  more	
  
comprehensive	
  analysis.	
  
Ask	
  Bigger	
  Ques,ons:	
  
How	
  do	
  we	
  retain	
  customers	
  
in	
  a	
  compe,,ve	
  market?	
  
Cloudera	
  delivers	
  ROI	
  through	
  storage	
  alone	
  
30	
  
The	
  Challenge:	
  
•  BlackBerry	
  Services	
  generates	
  .5PB	
  (50-­‐60TB	
  compressed)	
  data	
  per	
  day	
  
•  RDBMS	
  is	
  expensive	
  –	
  limited	
  to	
  1%	
  data	
  sampling	
  for	
  analy,cs	
  
BlackBerry	
  can	
  analyze	
  all	
  their	
  data	
  
vs.	
  relying	
  on	
  1%	
  sample	
  for	
  beqer	
  
network	
  capacity	
  trending	
  &	
  
management.	
  
The	
  Solu,on:	
  
•  Cloudera	
  Enterprise	
  manages	
  global	
  
data	
  set	
  of	
  ~100PB	
  
•  Collec,ng	
  device	
  content,	
  machine-­‐
generated	
  log	
  data,	
  audit	
  details	
  
•  90%	
  ETL	
  code	
  base	
  reduc,on	
  
31
A	
  global	
  retailer’s	
  customers	
  benefit	
  from	
  
more	
  personalized	
  communica,ons	
  and	
  offers	
  
based	
  on	
  interac,ons	
  across	
  all	
  channels.	
  	
  
Ask	
  Bigger	
  Ques,ons:	
  
How	
  can	
  we	
  offer	
  customers	
  
the	
  best	
  experience?	
  
Cloudera	
  op,mizes	
  the	
  DW	
  for	
  improved	
  ROI	
  
32	
  
Global	
  retailer’s	
  customers	
  benefit	
  
from	
  more	
  personalized	
  
communica,ons	
  based	
  on	
  
interac,ons	
  across	
  all	
  channels.	
  
The	
  Solu,on:	
  
•  Cloudera	
  Enterprise	
  with	
  Impala	
  —	
  
1PB	
  over	
  250	
  nodes	
  
•  Consolidated	
  pla[orm	
  for	
  Big	
  Data	
  
with	
  single	
  environment	
  for	
  query	
  
and	
  machine	
  learning	
  
	
  
	
  
	
  	
  
	
  
CONFIDENTIAL	
  -­‐	
  RESTRICTED	
  
The	
  Challenge:	
  
• 	
  Need	
  to	
  correlate	
  online/offline	
  data	
  across	
  disparate,	
  costly	
  legacy	
  DWs	
  
• 	
  Data	
  takes	
  up	
  to	
  4	
  weeks	
  to	
  get	
  data	
  from	
  one	
  group	
  –	
  inhibits	
  produc,vity	
  	
  
33
Any	
  Ques,ons,	
  Big	
  or	
  Small?	
  
	
  
	
  

More Related Content

PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
PDF
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks
 
PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
 
PDF
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
Capgemini
 
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
PDF
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Hortonworks
 
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
PDF
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
Capgemini
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Hortonworks
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 

What's hot (20)

PPTX
Transform You Business with Big Data and Hortonworks
Hortonworks
 
PPTX
Unlocking data science in the enterprise - with Oracle and Cloudera
Cloudera, Inc.
 
PPTX
SAP Sybase IQ Sunumu-Sybase Türkiye
Sybase Türkiye
 
PPTX
Top 5 Strategies for Retail Data Analytics
Hortonworks
 
PPTX
The Five Markers on Your Big Data Journey
Cloudera, Inc.
 
PPTX
Big Data Solutions Executive Overview
RCG Global Services
 
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
PDF
Big Data Telecom
Trick Consulting
 
PDF
Actian forrester- hortonworks
Hortonworks
 
PPTX
The Power of your Data Achieved - Next Gen Modernization
Hortonworks
 
PPTX
Monitizing Big Data at Telecom Service Providers
DataWorks Summit
 
PDF
The path to a Modern Data Architecture in Financial Services
Hortonworks
 
PPTX
Meet the experts dwo bde vds v7
mmathipra
 
PPTX
Oil and gas big data edition
Mark Kerzner
 
PDF
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
PDF
Big Data at Oracle - Strata 2015 San Jose
Jeffrey T. Pollock
 
PDF
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Hortonworks
 
PDF
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
PPTX
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
PDF
Ask bigger questions
South West Data Meetup
 
Transform You Business with Big Data and Hortonworks
Hortonworks
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Cloudera, Inc.
 
SAP Sybase IQ Sunumu-Sybase Türkiye
Sybase Türkiye
 
Top 5 Strategies for Retail Data Analytics
Hortonworks
 
The Five Markers on Your Big Data Journey
Cloudera, Inc.
 
Big Data Solutions Executive Overview
RCG Global Services
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
Big Data Telecom
Trick Consulting
 
Actian forrester- hortonworks
Hortonworks
 
The Power of your Data Achieved - Next Gen Modernization
Hortonworks
 
Monitizing Big Data at Telecom Service Providers
DataWorks Summit
 
The path to a Modern Data Architecture in Financial Services
Hortonworks
 
Meet the experts dwo bde vds v7
mmathipra
 
Oil and gas big data edition
Mark Kerzner
 
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
Big Data at Oracle - Strata 2015 San Jose
Jeffrey T. Pollock
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Hortonworks
 
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
Ask bigger questions
South West Data Meetup
 
Ad

Similar to Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013 (20)

PDF
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
PPTX
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
PDF
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
PDF
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 
PDF
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
TheInevitableCloud
 
PDF
Hadoop Application Architectures - Fraud Detection
hadooparchbook
 
PPTX
Using Hadoop to Drive Down Fraud for Telcos
Cloudera, Inc.
 
PDF
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
PPTX
Big Data LDN 2016: When Big Data Meets Fast Data
Matt Stubbs
 
PDF
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Inside Analysis
 
PDF
Hadoop Perspectives for 2017
Precisely
 
PDF
Is Hadoop the Demise of Data Warehousing? The Impact of Hadoop/Big Data on BI...
Senturus
 
PPTX
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
PPTX
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
PDF
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
PPTX
How Experian increased insights with Hadoop
Precisely
 
PDF
Achieving Business Value by Fusing Hadoop and Corporate Data
Inside Analysis
 
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
TheInevitableCloud
 
Hadoop Application Architectures - Fraud Detection
hadooparchbook
 
Using Hadoop to Drive Down Fraud for Telcos
Cloudera, Inc.
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
Big Data LDN 2016: When Big Data Meets Fast Data
Matt Stubbs
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Inside Analysis
 
Hadoop Perspectives for 2017
Precisely
 
Is Hadoop the Demise of Data Warehousing? The Impact of Hadoop/Big Data on BI...
Senturus
 
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
How Experian increased insights with Hadoop
Precisely
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Inside Analysis
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Ad

More from Publicis Sapient Engineering (20)

PDF
XebiCon'18 - L'algorithme de reconnaissance de formes par le cerveau humain
Publicis Sapient Engineering
 
PDF
Xebicon'18 - IoT: From Edge to Cloud
Publicis Sapient Engineering
 
PDF
Xebicon'18 - Spark in jail : conteneurisez vos traitements data sans serveur
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Modern Infrastructure
Publicis Sapient Engineering
 
PDF
XebiCon'18 - La Web App d'aujourd'hui et de demain : état de l'art et bleedin...
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Des notebook pour le monitoring avec Zeppelin
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Event Sourcing et RGPD, incompatibles ?
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Deno, le nouveau NodeJS qui inverse la tendance ?
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Boostez vos modèles avec du Deep Learning distribué
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Comment j'ai développé un jeu vidéo avec des outils de développe...
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Les utilisateurs finaux, les oubliés de nos produits !
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Comment fausser l'interprétation de vos résultats avec des dataviz
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Le développeur dans la Pop Culture
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Architecturer son application mobile pour la durabilité
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Sécuriser son API avec OpenID Connect
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Structuration du Temps et Dynamique de Groupes, Théorie organisa...
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Spark NLP, un an après
Publicis Sapient Engineering
 
PDF
XebiCon'18 - La sécurité, douce illusion même en 2018
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...
Publicis Sapient Engineering
 
PDF
XebiCon'18 - Ce que l'histoire du métro Parisien m'a enseigné sur la création...
Publicis Sapient Engineering
 
XebiCon'18 - L'algorithme de reconnaissance de formes par le cerveau humain
Publicis Sapient Engineering
 
Xebicon'18 - IoT: From Edge to Cloud
Publicis Sapient Engineering
 
Xebicon'18 - Spark in jail : conteneurisez vos traitements data sans serveur
Publicis Sapient Engineering
 
XebiCon'18 - Modern Infrastructure
Publicis Sapient Engineering
 
XebiCon'18 - La Web App d'aujourd'hui et de demain : état de l'art et bleedin...
Publicis Sapient Engineering
 
XebiCon'18 - Des notebook pour le monitoring avec Zeppelin
Publicis Sapient Engineering
 
XebiCon'18 - Event Sourcing et RGPD, incompatibles ?
Publicis Sapient Engineering
 
XebiCon'18 - Deno, le nouveau NodeJS qui inverse la tendance ?
Publicis Sapient Engineering
 
XebiCon'18 - Boostez vos modèles avec du Deep Learning distribué
Publicis Sapient Engineering
 
XebiCon'18 - Comment j'ai développé un jeu vidéo avec des outils de développe...
Publicis Sapient Engineering
 
XebiCon'18 - Les utilisateurs finaux, les oubliés de nos produits !
Publicis Sapient Engineering
 
XebiCon'18 - Comment fausser l'interprétation de vos résultats avec des dataviz
Publicis Sapient Engineering
 
XebiCon'18 - Le développeur dans la Pop Culture
Publicis Sapient Engineering
 
XebiCon'18 - Architecturer son application mobile pour la durabilité
Publicis Sapient Engineering
 
XebiCon'18 - Sécuriser son API avec OpenID Connect
Publicis Sapient Engineering
 
XebiCon'18 - Structuration du Temps et Dynamique de Groupes, Théorie organisa...
Publicis Sapient Engineering
 
XebiCon'18 - Spark NLP, un an après
Publicis Sapient Engineering
 
XebiCon'18 - La sécurité, douce illusion même en 2018
Publicis Sapient Engineering
 
XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...
Publicis Sapient Engineering
 
XebiCon'18 - Ce que l'histoire du métro Parisien m'a enseigné sur la création...
Publicis Sapient Engineering
 

Recently uploaded (20)

PPTX
Coupa-Overview _Assumptions presentation
annapureddyn
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Software Development Company | KodekX
KodekX
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Beyond Automation: The Role of IoT Sensor Integration in Next-Gen Industries
Rejig Digital
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
This slide provides an overview Technology
mineshkharadi333
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Architecture of the Future (09152021)
EdwardMeyman
 
Coupa-Overview _Assumptions presentation
annapureddyn
 
Doc9.....................................
SofiaCollazos
 
Software Development Company | KodekX
KodekX
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Beyond Automation: The Role of IoT Sensor Integration in Next-Gen Industries
Rejig Digital
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
This slide provides an overview Technology
mineshkharadi333
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Software Development Methodologies in 2025
KodekX
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Architecture of the Future (09152021)
EdwardMeyman
 

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013

  • 1. 1 Ask  Bigger  Ques,ons   with  Cloudera   and  Apache  Hadoop   Graham  Gear   [email protected]   JUNE  2013      
  • 2. Data  Has  Changed  in  the  Last  30  Years  DATA  GROWTH   END-­‐USER   APPLICATIONS   THE  INTERNET   MOBILE  DEVICES   SOPHISTICATED   MACHINES   STRUCTURED  DATA  –  10%   1980   2012   UNSTRUCTURED  DATA  –  90%  
  • 3. Data  Management  Strategies   Have  Stayed  the  Same     •  Raw  data  on  SAN,  NAS   and  tape     •  Data  moved  from   storage  to  compute     •  Rela,onal  models  with   predesigned  schemas  
  • 4. Too  Much  Data,  Too  Many  Sources   •  Can’t  ingest  fast  enough  
  • 5. Too  Much  Data,  Too  Many  Sources   $ ! $ $ $ •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store  
  • 6. Too  Much  Data,  Too  Many  Sources   1 2 3 4 5 •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store     •  Exists  in  different  places  
  • 7. Too  Much  Data,  Too  Many  Sources   •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store     •  Exists  in  different  places     •  Archived  data  is  lost  
  • 8. Can’t  Use  It  The  Way  You  Want  To   •  Analysis  and  processing   takes  too  long  
  • 9. Can’t  Use  It  The  Way  You  Want  To   1 2 3 4 5 •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos  
  • 10. Can’t  Use  It  The  Way  You  Want  To   ? ? ? •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos     •  Can’t  ask  new  ques,ons  
  • 11. Can’t  Use  It  The  Way  You  Want  To   •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos     •  Can’t  ask  new  ques,ons     •  Can’t  analyze   unstructured  data  
  • 12. 12 Transform  The  Way  You  Think  About  Data   Cloudera  
  • 13. Ask  Bigger  Ques,ons   13   When  customer  x  visits  my  store  what   can  I  recommend  based  on  their   recent  web  behavior  across  our   various  brand  websites?   What  is  the  best  loca,on  in  North   America  to  efficiently  produce  both   tomato  plants  and  corn?   What  does  every  fraudulent  ac,vity  in   the  last  2  years  have  in  common  that   will  help  us  iden,fy  and  proac,vely   prevent  the  next  incident?   Are  hotel  room  sales  at  Christmas   slow  because  of  inventory  or   compe,,ve  pricing?     What  did  customer  x  view   on  their  last  website  visit?     `   What  makes  tomato  plants   more  frui[ul  than  others  ?     What  incidents  of  fraud  did   we  detect  last  year?     What  search  terms  are  used   most  oen  when  looking  for   hotels  in  NYC?                                                                                                    
  • 14.                                SIMPLIFIED,  UNIFIED,  EFFICIENT   •  Bulk  of  data  stored  on  scalable  low  cost  pla[orm   •  Perform  end-­‐to-­‐end  workflows   •  Specialized  systems  reserved  for  specialized  workloads   •  Provides  data  access  across  departments  or  LOB        COMPLEX,  FRAGMENTED,  COSTLY   •Data  silos  by  department  or  LOB   •  Lots  of  data  stored  in  expensive  specialized  systems     •  Analysts  pull  select  data  into  EDW   •  No  one  has  a  complete  view     The  Cloudera  Approach   14   Meet  enterprise  demands  with  a  new  way  to  think  about  data.   THE  CLOUDERA  WAY  THE  OLD  WAY   Single  data  pla[orm  to   support  BI,  Repor,ng  &     App  Serving   Mul,ple  pla[orms     for  mul,ple  workloads  
  • 15.     INGEST   STORE   EXPLORE   PROCESS   ANALYZE   SERVE   CDH   CLOUDERA   MANAGER   CLOUDERA   SUPPORT   Cloudera  Enterprise:  The  Pla[orm  for  Big  Data   15   BRINGS  STORAGE  &   COMPUTE  TOGETHER   WORKS  WITH  EVERY   TYPE  OF  DATA   CHANGES  THE   ECONOMICS  OF  DATA   MANGAGEMENT   A  Revolu,onary  Solu,on  Built  on  Apache  Hadoop   CLOUDERA   NAVIGATOR  
  • 16. 16   Cloudera  Enterprise   Includes  Advanced  System  Management  &  Support  for  the  Core  CDH  Projects       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 17. 17   RTD  SubscripVon   Includes  Support  &  Indemnity  for  Apache  HBase       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 18. 18   RTQ  SubscripVon   Includes  Support  &  Indemnity  for  Cloudera  Impala       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 19. 19   RTS  SubscripVon   Includes  Support  &  Indemnity  for  Cloudera  Search       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   Search  IMPALA  
  • 20. 20   BDR  SubscripVon   Includes  Centralized  Management  For  Disaster  Recovery  Workflows       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 21. 21   Navigator  SubscripVon   Enables  Cloudera  Navigator  for  Automated  Data  Management       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 23. A  mul,na,onal  bank  saves  millions  by   op,mizing  DW  for  analy,cs  &  reducing  data   storage  costs  by  99%.     Ask  Bigger  Ques,ons:   How  can  we  op,mize  our   data  warehouse  investment?  
  • 24. Cloudera  op,mizes  the  EDW,  saves  millions   24   The  Challenge:   •  Teradata  EDW  at  capacity:  ETL  processes  consume  7  days;  takes  5  weeks  to   make  historical  data  available  for  analysis   •  Performance  issues  in  business  cri,cal  apps;  liqle  room  for  discovery,  analy,cs,   ROI  from  opportuni,es   Mul,na,onal  bank  saves  millions  by   op,mizing  exis,ng  DW  for  analy,cs  &   reducing  data  storage  costs  by  99%.   The  Solu,on:   •  Cloudera  Enterprise  offloads  data   storage,  processing  &  some   analy,cs  from  EDW   •  Teradata  can  focus  on  opera,onal   func,ons  &  analy,cs  
  • 25. A  Semiconductor  Manufacturer  uses     predic,ve  analy,cs  to  take  preventa,ve  ac,on   on  chips  likely  to  fail.   Ask  Bigger  Ques,ons:   Which  semiconductor   chips  will  fail?  
  • 26. Cloudera  enables  beqer  predic,ons   26   The  Challenge:   •  Want  to  capture  greater  granular  and  historical  data  for  more  accurate   predic,ve  yield  modeling   •  Storing  9  months’  data  on  Oracle  is  expensive       Semiconductor  manufacturer  can   prevent  chip  failure  with  more   accurate  predic,ve  yield  models.   The  Solu,on:   • Dell  |  Cloudera  solu,on  for  Apache   Hadoop   • 53  nodes;  plan  to  store  up  to  10   years  (~10PB)   • Capturing  &  processing  data  from   each  phase  of  manufacturing  process   CONFIDENTIAL  -­‐  RESTRICTED  
  • 27. The  quant  risk  LOB  within  a  mul,na,onal  bank   saves  millions  through  beqer  risk  exposure   analysis  &  fraud  preven,on.   Ask  Bigger  Ques,ons:   How  can  we  prevent   fraud?  
  • 28. Cloudera  delivers  savings  through  fraud  preven,on   28   The  Challenge:   •  Fraud  detec,on  is  a  cumbersome,  mul,-­‐step  analy,c  process  requiring  data   sampling   •  2B  transac,ons/month  necessitate  constant  revisions  to  risk  profiles   •  Highly  tuned  100TB  Teradata  DW  drives  over-­‐budget  capital  reserves  &  lower   investment  returns   Quant  risk  LOB  in  mul,na,onal  bank   saves  millions  through  beqer  risk   exposure  analysis  &  fraud  preven,on   The  Solu,on:   •  Cloudera  Enterprise  data  factory  for   fraud  preven,on,  credit  &   opera,onal  risk  analysis   •  Look  at  every  incidence  of  fraud  for   5  years  for  each  person   •  Reduced  costs;  expensive  CPU  no   longer  consumed  by  data  processing  
  • 29. BlackBerry  eliminates  data  sampling  &   simplifies  data  processing  for  beqer,  more   comprehensive  analysis.   Ask  Bigger  Ques,ons:   How  do  we  retain  customers   in  a  compe,,ve  market?  
  • 30. Cloudera  delivers  ROI  through  storage  alone   30   The  Challenge:   •  BlackBerry  Services  generates  .5PB  (50-­‐60TB  compressed)  data  per  day   •  RDBMS  is  expensive  –  limited  to  1%  data  sampling  for  analy,cs   BlackBerry  can  analyze  all  their  data   vs.  relying  on  1%  sample  for  beqer   network  capacity  trending  &   management.   The  Solu,on:   •  Cloudera  Enterprise  manages  global   data  set  of  ~100PB   •  Collec,ng  device  content,  machine-­‐ generated  log  data,  audit  details   •  90%  ETL  code  base  reduc,on  
  • 31. 31 A  global  retailer’s  customers  benefit  from   more  personalized  communica,ons  and  offers   based  on  interac,ons  across  all  channels.     Ask  Bigger  Ques,ons:   How  can  we  offer  customers   the  best  experience?  
  • 32. Cloudera  op,mizes  the  DW  for  improved  ROI   32   Global  retailer’s  customers  benefit   from  more  personalized   communica,ons  based  on   interac,ons  across  all  channels.   The  Solu,on:   •  Cloudera  Enterprise  with  Impala  —   1PB  over  250  nodes   •  Consolidated  pla[orm  for  Big  Data   with  single  environment  for  query   and  machine  learning             CONFIDENTIAL  -­‐  RESTRICTED   The  Challenge:   •   Need  to  correlate  online/offline  data  across  disparate,  costly  legacy  DWs   •   Data  takes  up  to  4  weeks  to  get  data  from  one  group  –  inhibits  produc,vity    
  • 33. 33 Any  Ques,ons,  Big  or  Small?