Introduc)on	
  to	
  WSO2	
  
Analy)cs	
  Pla5orm	
  
Srinath	
  Perera	
  
VP	
  Research	
  
WSO2	
  Inc.	
  
	
  
Analytics is Growing Up
▪ It is no longer about doing
your first analytics usecase.
▪ It is about
▪ How to do it everyday,
efficiently?
▪ How to recover?
▪ How to make
decisions?
▪ How to do other forms
like real-time ,
Interactive, and
predicative analytics
Analytics 2.0 Platform
▪ One platform for all
four forms of analytics
▪ Single consistent
programming model
▪ One analytics archive
format)
▪ Support for the lifecycle
of analytics Apps
Integrate	
  well	
  with	
  rest	
  of	
  the	
  
enterprise!!	
  
Collect Data
▪ One Sensor API to
publish events
-  REST, Thrift, JMS, Kafka
-  Java clients, java script
clients*
▪ First you define streams
(think it as a infinite table
in SQL DB)
▪ Then send events via
Sensor API
Can	
  send	
  to	
  batch	
  pipeline,	
  Real8me	
  pipeline	
  or	
  both	
  
via	
  configura8on!	
  
Collecting Data: Example
§  Java example: create and send events
§  Events send asynchronously
§  See client given in http://goo.gl/vIJzqc for more info
Agent agent = new Agent(agentConfiguration);
publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );
StreamDefinition definition = new
StreamDefinition(STREAM_NAME,VERSION);
definition.addPayloadData("sid", STRING);
...
publisher.addStreamDefinition(definition);
...
Event event = new Event();
event.setPayloadData(eventData);
publisher.publish(STREAM_NAME, VERSION, event); Send events
Define Stream
Initialize Agent
Analysis: Batch Analytics
Complex Event Processing
Analytics logic with SQL like
Queries
▪ Both BAM and CEP provides a
SQL like data processing language
▪ Since many understands SQL,
above languages made large scale
data processing Big Data
accessible to many
▪ Expressive, short, and sweet.
▪ Define core operations that covers
90% of problems
▪ Lets experts dig in when they like!
(via User Defined functions)
Scaling CEP Queries on top of
Storm
▪ Accepts CEP queries with hints about how to partition streams
▪ Partition streams, build a Apache Storm topology running CEP
nodes as Storm Sprouts, and run it. (see http://goo.gl/pP3kdX )
Predictive Analytics
▪ Predictive Analytics learns a
decision function (a model)
using examples
▪ Is this fraud?
▪ How to drive?
▪ Handwritten text
▪ Build models and use them
with WSO2 CEP, BAM and
ESB using WSO2 Machine
Learner Product ( 2015 Q3)
▪ Build model using R, export
them as PMML, and use
within WSO2 CEP
WSO2 Machine Learner
▪ A wizard to sample,
explore, and understand
data through
visualizations
▪ A wizard to configure,
train machine learning
models, and select the
best model
▪ Find and use those
models with WSO2 CEP,
BAM and ESB
▪ Powered by Apache
Spark MLLib
Communicate: Dashboards
▪ Idea is to give a “Overall idea” in a glance (e.g. car dashboard)
▪ Support for personalization, you can build your own dashboard.
▪ Also the entry point for Drill down
▪ How to build?
-  Dashboard via Google Gadget and content via HTML5 + java scripts
-  Use charting libraries like Vega or D3
Communicate: Alerts
▪ Detecting conditions can
be done via CEP Queries
▪ Key is the “Last Mile”
-  Email
-  SMS
-  Push notifications to a UI
-  Pager
-  Trigger physical Alarm
▪ How?
-  Select Email sender “Output Adaptor” from CEP, or send from
CEP to ESB, and ESB has lot of connectors
Communicate: APIs
▪ With mobile Apps, most data
are exposed and shared as
APIs (REST/Json ) to end
users.
▪ Need to expose analytics
results as API
▪ Following are some challenges
-  Security and Permissions
-  API Discovery
-  Billing, throttling, quotas &
SLA
▪ How?
-  Write data to a database from CEP event tables
-  Build Services via WSO2 Data Service
-  Expose them as APIs via API Manager
Event Stream Store
▪ One stop place for all
event stream definitions
▪ Let users
▪  Publish and consume
though Multiple protocols
like REST, JMS, Thrift,
Web Sockets etc.
▪  Discover event streams
▪  Enforce security and
authorization
▪  Per-pay subscriptions
▪  Effectively a Event Stream
Market Place!!
▪ This will automate APIs
creation as discussed in the
slide before.
What is it good for?
▪ Batch Analytics
▪ Realtime Streaming analytics
▪ Realtime Interactive analytics
▪ Lambda Architecture
▪ Train and use a ML model
▪ Selective Detailed Analysis
Selective Detailed Analysis
•  Too expensive to do
detailed analysis on all the
data
•  Instead detect the condition,
and dig into related data
•  Fraud toolbox
•  Other usecases
–  Dynamic offers at Retail
Site
–  Weather
Lambda Architecture
•  Same code in both batch and realtime layers
•  Idea is to fill the time between two batch runs
•  Batch layer writes the data to a DB
•  Realtime layer merge with batch data via Event Tables
Real Life Use Cases
▪ Health, Smart Parking solutions
▪ Financial Monitoring
▪ Smart City project, Vehicle
tracking, Building monitoring
▪ Railway monitoring
▪ Throttling and Anomaly
Detection
▪ API Analytics (13+ customers)
▪ Connected Car
Case Study: DEBS Grand Challenges
▪ DEBS ((Distributed Event Based Systems) Grand
Challenge is a yearly event processing challenge.
▪ 2014 Challenge:
▪ Smart Home electricity data: 2000 sensors, 40
houses, 4 Billion events. We posted (400K
events/sec) and close to one million
distributed throughput with 4 nodes.
▪ one of the four finalists
▪ 2015 Challenge:
▪ Based on taxi activities collected from New
York City over the year 2013. 14,144 taxis 173
million taxi trip records. We posted 300K/sec
on a single node and one of the finalists.
h=ps://www.flickr.com/photos/shedboy/3681317392/	
  	
  
	
  
Case Study: Realtime Soccer 
Analysis
Watch at:
https://www.youtube.com/watch?v=nRI6buQ0NOM
Case Study: TFL Traffic Analysis
Built using TFL
( Transport for
London) open data
feeds.
http://goo.gl/
04tX6k
http://goo.gl/
9xNiCm
Select the Product
Product Features
WSO2 Data
Analytics Server
(DAS)
Everything : Batch,
Realtime, Interactive,
and Predictive
Analytics
WSO2 Complex
Event Processor
(CEP)
Realtime Analytics
only
WSO2 Machine
Learner
Predictive Analytics
only
Questions?
Thank	
  You	
  

WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform

  • 1.
    Introduc)on  to  WSO2   Analy)cs  Pla5orm   Srinath  Perera   VP  Research   WSO2  Inc.    
  • 2.
    Analytics is GrowingUp ▪ It is no longer about doing your first analytics usecase. ▪ It is about ▪ How to do it everyday, efficiently? ▪ How to recover? ▪ How to make decisions? ▪ How to do other forms like real-time , Interactive, and predicative analytics
  • 3.
    Analytics 2.0 Platform ▪ Oneplatform for all four forms of analytics ▪ Single consistent programming model ▪ One analytics archive format) ▪ Support for the lifecycle of analytics Apps Integrate  well  with  rest  of  the   enterprise!!  
  • 5.
    Collect Data ▪ One SensorAPI to publish events -  REST, Thrift, JMS, Kafka -  Java clients, java script clients* ▪ First you define streams (think it as a infinite table in SQL DB) ▪ Then send events via Sensor API Can  send  to  batch  pipeline,  Real8me  pipeline  or  both   via  configura8on!  
  • 6.
    Collecting Data: Example § Java example: create and send events §  Events send asynchronously §  See client given in http://goo.gl/vIJzqc for more info Agent agent = new Agent(agentConfiguration); publisher = new AsyncDataPublisher("tcp://hostname:7612", .. ); StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION); definition.addPayloadData("sid", STRING); ... publisher.addStreamDefinition(definition); ... Event event = new Event(); event.setPayloadData(eventData); publisher.publish(STREAM_NAME, VERSION, event); Send events Define Stream Initialize Agent
  • 7.
  • 8.
  • 9.
    Analytics logic withSQL like Queries ▪ Both BAM and CEP provides a SQL like data processing language ▪ Since many understands SQL, above languages made large scale data processing Big Data accessible to many ▪ Expressive, short, and sweet. ▪ Define core operations that covers 90% of problems ▪ Lets experts dig in when they like! (via User Defined functions)
  • 10.
    Scaling CEP Querieson top of Storm ▪ Accepts CEP queries with hints about how to partition streams ▪ Partition streams, build a Apache Storm topology running CEP nodes as Storm Sprouts, and run it. (see http://goo.gl/pP3kdX )
  • 11.
    Predictive Analytics ▪ Predictive Analyticslearns a decision function (a model) using examples ▪ Is this fraud? ▪ How to drive? ▪ Handwritten text ▪ Build models and use them with WSO2 CEP, BAM and ESB using WSO2 Machine Learner Product ( 2015 Q3) ▪ Build model using R, export them as PMML, and use within WSO2 CEP
  • 12.
    WSO2 Machine Learner ▪ Awizard to sample, explore, and understand data through visualizations ▪ A wizard to configure, train machine learning models, and select the best model ▪ Find and use those models with WSO2 CEP, BAM and ESB ▪ Powered by Apache Spark MLLib
  • 13.
    Communicate: Dashboards ▪ Idea isto give a “Overall idea” in a glance (e.g. car dashboard) ▪ Support for personalization, you can build your own dashboard. ▪ Also the entry point for Drill down ▪ How to build? -  Dashboard via Google Gadget and content via HTML5 + java scripts -  Use charting libraries like Vega or D3
  • 14.
    Communicate: Alerts ▪ Detecting conditionscan be done via CEP Queries ▪ Key is the “Last Mile” -  Email -  SMS -  Push notifications to a UI -  Pager -  Trigger physical Alarm ▪ How? -  Select Email sender “Output Adaptor” from CEP, or send from CEP to ESB, and ESB has lot of connectors
  • 15.
    Communicate: APIs ▪ With mobileApps, most data are exposed and shared as APIs (REST/Json ) to end users. ▪ Need to expose analytics results as API ▪ Following are some challenges -  Security and Permissions -  API Discovery -  Billing, throttling, quotas & SLA ▪ How? -  Write data to a database from CEP event tables -  Build Services via WSO2 Data Service -  Expose them as APIs via API Manager
  • 16.
    Event Stream Store ▪ Onestop place for all event stream definitions ▪ Let users ▪  Publish and consume though Multiple protocols like REST, JMS, Thrift, Web Sockets etc. ▪  Discover event streams ▪  Enforce security and authorization ▪  Per-pay subscriptions ▪  Effectively a Event Stream Market Place!! ▪ This will automate APIs creation as discussed in the slide before.
  • 17.
    What is itgood for? ▪ Batch Analytics ▪ Realtime Streaming analytics ▪ Realtime Interactive analytics ▪ Lambda Architecture ▪ Train and use a ML model ▪ Selective Detailed Analysis
  • 18.
    Selective Detailed Analysis • Too expensive to do detailed analysis on all the data •  Instead detect the condition, and dig into related data •  Fraud toolbox •  Other usecases –  Dynamic offers at Retail Site –  Weather
  • 19.
    Lambda Architecture •  Samecode in both batch and realtime layers •  Idea is to fill the time between two batch runs •  Batch layer writes the data to a DB •  Realtime layer merge with batch data via Event Tables
  • 20.
    Real Life UseCases ▪ Health, Smart Parking solutions ▪ Financial Monitoring ▪ Smart City project, Vehicle tracking, Building monitoring ▪ Railway monitoring ▪ Throttling and Anomaly Detection ▪ API Analytics (13+ customers) ▪ Connected Car
  • 21.
    Case Study: DEBSGrand Challenges ▪ DEBS ((Distributed Event Based Systems) Grand Challenge is a yearly event processing challenge. ▪ 2014 Challenge: ▪ Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events. We posted (400K events/sec) and close to one million distributed throughput with 4 nodes. ▪ one of the four finalists ▪ 2015 Challenge: ▪ Based on taxi activities collected from New York City over the year 2013. 14,144 taxis 173 million taxi trip records. We posted 300K/sec on a single node and one of the finalists. h=ps://www.flickr.com/photos/shedboy/3681317392/      
  • 22.
    Case Study: RealtimeSoccer Analysis Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
  • 23.
    Case Study: TFLTraffic Analysis Built using TFL ( Transport for London) open data feeds. http://goo.gl/ 04tX6k http://goo.gl/ 9xNiCm
  • 24.
    Select the Product ProductFeatures WSO2 Data Analytics Server (DAS) Everything : Batch, Realtime, Interactive, and Predictive Analytics WSO2 Complex Event Processor (CEP) Realtime Analytics only WSO2 Machine Learner Predictive Analytics only
  • 25.
  • 26.