Data in Action
Natalino Busa
Pleased to meet you!
Italian by birth (43)
In the Netherlands since 1998
Married, 3 kids (Ryan, Nemo, Maya)
Likes: Movies, Music, Photography, Design, Fashion,
Windsurfing, Architecture, Traveling, Salsa, Sailing.
On the web: https://www.google.nl/search?
q=natalino+busa
1/3 slide history of me:
Researcher:
Compilers, VLIW processors, Multimedia
MPEG4, Assisted driving, Audio compression
2/3 slide history of me:
Product Development:
VLSI layout optimization,
DRC rule checking & correction
Lots of math:
Simplex_algorithm
Computational_geometry
Lots of traveling:
Japan, South Korea, US, Germany
3/3 slide history of me:
Big Data and Data Science:
Two verticals:
- Music, Video industry (broadcasters)
- Banking and Finance
Distributed computing
Microservices
Anomaly detection,
Time serie predictions
about:
how to grok data with machines
and generate business value
data is the fabric of our
lives
all our relationships are based on data
Data is the new ...
Data as a Relationship
Data as a Relationship
Trust
Transparency of Use
Customer First
Regulations and Laws
Respect and Protect
Providing a Service
Help the customer
Propose, Advise, Select, Filter, Connect, Simplify
Protect the customer
Detect, Prevent, Alert, Block, Defend, Identify, Authorize
Actionable Data, Ethical approach
Techs
● Massive multi-rack systems
● 100’s of Computing Cores
● 100’s Terabytes of Storage
● Distributed computing
● Share nothing Architecture
● Advanced Query Plans
● Columnar Data Models
● Re-programmable hardware
The Advent of MPP ALAPs (Early 00s)
● Simpler programming paradigm
● Distributed, Replicated File System
Map-Reduce and Hadoop (Early 00s)
Streaming and Real-Time Analytics
16
Real Time APIs
Streaming Data
Data Sources,
Files, DB extracts
Batched Data
Training, Scoring and Exposing models
in-memory computing
is winning!
Spark is emerging as an
improved, faster, better,
“new” hadoop.
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
The RAM is the new Disk
Unified Distributed
Computing paradigm:
SQL,
Statistics
Machine Learning
Graph Analytics
Polyglot
Programming:
R
Python
Scala
Java
Integrated Data Science
https://spark.apache.org/
Spark
Streaming SQL MLlib Graphx
Analytics, Statistics, Data
Science, Model Training
HDFS NoSQL SQL
Data Sources
Map-Reduce
HDFS
KAFKA
Spark: Hadoop evolved
Kafka + Spark + Cassandra + Akka
(noSQL stack, Fast Data)
MPP + HDFS + Spark
(“new” Hadoop / Data Lake)
Popular Operational Analytics Stacks (10s)
Science
Exploratory Data Analysis
In 1977, Tukey published Exploratory Data Analysis,
arguing that more emphasis needed to be placed on using
data to suggest hypotheses to test and that Exploratory
Data Analysis and Confirmatory Data Analysis “can—and
should—proceed side by side.”
Analytics goes mainstream (70s, 80s)
Knowledge Data in Databases (1996)
http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/
http://medriscoll.com/post/4740157098/the-three-sexy-skills-of-data-geeks
The Rise of the Data Scientist (00s)
Applications: Kickstarter
Little demo.
Applications: Kickstarter
you’re not selling analytics, you’re selling biz development !!!
Advices on how to setup a better kickstarter page
Projections for investors and B-series angels
Prediction based on Big Data
PredPol: Los Angeles police
department and University of
California
13 Million recorded crimes
Prediction based on historical
data, geo location and recency
Prediction based on geolocated events
Clustering geolocated data
using Spark and DBSCAN
How to group users’ events using machine learning and distributed computing
By Natalino Busa
January 28, 2016
Data Science: Interpretability vs Accuracy
Traditional
Regression
ARIMA
ANOVA
Naive Bayes
Decision Trees
Splines
Modern
Interpretability prevails
Random Forests
Cons: feature engineering
Modern
Accuracy prevails
Neural Networks
Cons: hard to explain why
Modern
Clustering
SVMs
Gaussian Processes
Bagging
Boosting
- Deep Learning
ConvNets and the rebirth of neural networks
Deep Learning to assist doctors treating and classifying cancer
http://www.enlitic.com/
Cancer Classification and Treating
DL4J
http://deeplearning4j.org/
Theano
http://deeplearning.net/software/theano/
TensorFlow
http://tensorflow.org/
Data Science: Deep Learning
T-SNE and dimensionality reduction
https://www.oreilly.com/learning/an-illustrated-introduction-to-the-t-sne-algorithm
- Topological Data Analysis
Analyze high-dimensional data, visually
http://datarefiner.com/
Analysis of NetFlix Prize Dataset.
Data sets statistics:
● 100,480,507 ratings
● 480,189 users
● 17,770 movies
● 2.8 GB CSV file size
Topological Data Analysis
Fraud Detection and Graph Analysis
Fraud Detection and Graph Analysis
“ Find those cases where the doctor or the
examiner is also a participant in another case,
and phone numbers have been reused ”
Sub Graph isomorphism
Problem
Takeaways
Think Business and ROI
Takeaways
Think Business and ROI
Better Interaction
Streaming Computing: Data in Action!
Takeaways
Think Business and ROI
Better Interaction
Streaming Computing: Data in Action!
Better predictions and models:
Machine Learning + SQL
Takeaways
Think Business and ROI
Better Interaction
Streaming Computing: Data in Action!
Better predictions and models:
Machine Learning + SQL
Better summarization and Feature Extraction
Deep Learning
Takeaways
Think Business and ROI
Better Interaction
Streaming Computing: Data in Action!
Better predictions and models:
Machine Learning + SQL
Better summarization and Feature Extraction
Deep Learning
Better Analysis of Relations
Graph Databases and Algorithms

More Related Content

PDF
Scientific Applications and Heterogeneous Architectures
PDF
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
PDF
Moving forward data centric sciences weaving AI, Big Data & HPC
PPTX
The Importance of Open Innovation in AI era
PDF
姜俊宇/從資料到知識:從零開始的資料探勘
PPTX
Topic modeling using big data analytics
PPTX
Rating Prediction using Deep Learning and Spark
PDF
The Evolution of Data Science
Scientific Applications and Heterogeneous Architectures
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
Moving forward data centric sciences weaving AI, Big Data & HPC
The Importance of Open Innovation in AI era
姜俊宇/從資料到知識:從零開始的資料探勘
Topic modeling using big data analytics
Rating Prediction using Deep Learning and Spark
The Evolution of Data Science

What's hot (20)

PDF
INF2190_W1_2016_public
PDF
Keynote on 2015 Yale Day of Data
PPTX
Data scientist roadmap
PDF
Data science
PDF
Python for data science
PDF
[系列活動] 資料探勘速遊
PDF
Introduction To Data Science
PDF
Présentation on radoop
PPT
Data mining
PPTX
Introduction to Big Data: Smart Factory
PPTX
Bike Sharing Demand: Akshay Patil
PDF
Efficient Duplicate Detection Over Massive Data Sets
PPTX
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
PPTX
51 Use Cases and implications for HPC & Apache Big Data Stack
PPTX
Traffic Data Analysis and Prediction using Big Data
PPTX
History and Trend of Big Data and Deep Learning
PDF
Big Data and Predictive Analysis
PDF
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
PPTX
Big Data and Data Science: The Technologies Shaping Our Lives
PPTX
Introduction to Big Data and its Trends
INF2190_W1_2016_public
Keynote on 2015 Yale Day of Data
Data scientist roadmap
Data science
Python for data science
[系列活動] 資料探勘速遊
Introduction To Data Science
Présentation on radoop
Data mining
Introduction to Big Data: Smart Factory
Bike Sharing Demand: Akshay Patil
Efficient Duplicate Detection Over Massive Data Sets
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
51 Use Cases and implications for HPC & Apache Big Data Stack
Traffic Data Analysis and Prediction using Big Data
History and Trend of Big Data and Deep Learning
Big Data and Predictive Analysis
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Big Data and Data Science: The Technologies Shaping Our Lives
Introduction to Big Data and its Trends
Ad

Viewers also liked (15)

PPT
Briefing room 20160913-ep0016-sap-anomalies-or-alerts-streaming-analytics-to-...
PDF
Où va l'imprimerie ?
PPTX
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
PPT
повреждения позвоночника и таза
PPT
перелом основания черепа
PDF
EN_ADP_VUCA_Report_Animated_V1
PDF
Awesome Banking API's
PPTX
Next Generation Enterprise Architecture
PDF
Manual sobre Google drive (2016)
PDF
Big data real time architectures
PPT
Segurança na operação da empilhadeira
PDF
Data science apps: beyond notebooks
PDF
แนวข้อสอบภาค ก. เล่มที่ 2 ความรู้ความสามารถทั่วไป
PDF
แนวข้อสอบภาค ก. เล่มที่ 1 ความรู้ความสามารถทั่วไป
PDF
Getting started with Social Selling
Briefing room 20160913-ep0016-sap-anomalies-or-alerts-streaming-analytics-to-...
Où va l'imprimerie ?
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
повреждения позвоночника и таза
перелом основания черепа
EN_ADP_VUCA_Report_Animated_V1
Awesome Banking API's
Next Generation Enterprise Architecture
Manual sobre Google drive (2016)
Big data real time architectures
Segurança na operação da empilhadeira
Data science apps: beyond notebooks
แนวข้อสอบภาค ก. เล่มที่ 2 ความรู้ความสามารถทั่วไป
แนวข้อสอบภาค ก. เล่มที่ 1 ความรู้ความสามารถทั่วไป
Getting started with Social Selling
Ad

Similar to Data in Action (20)

PDF
Industrial Data Science
PDF
Thinkful DC - Intro to Data Science
PDF
Intro to Data Science
PDF
Big data solutions for advanced marketing analytics
PDF
2017 06-14-getting started with data science
PDF
Career in Data Science (July 2017, DTLA)
PDF
Getting started in Data Science (April 2017, Los Angeles)
PPTX
basic of data science and big data......
PDF
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
PDF
Data Science in 2016: Moving Up
PPTX
Data Science Overview
PDF
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
PDF
00-01 DSnDA.pdf
PDF
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PDF
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
PPTX
The Future of Data Science
PPTX
Data Science presentation for explanation of numpy and pandas
PPTX
Create a Data Science Lab with Microsoft and Open Source tools
PDF
SuanIct-Bigdata desktop-final
Industrial Data Science
Thinkful DC - Intro to Data Science
Intro to Data Science
Big data solutions for advanced marketing analytics
2017 06-14-getting started with data science
Career in Data Science (July 2017, DTLA)
Getting started in Data Science (April 2017, Los Angeles)
basic of data science and big data......
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving Up
Data Science Overview
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
00-01 DSnDA.pdf
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
The Future of Data Science
Data Science presentation for explanation of numpy and pandas
Create a Data Science Lab with Microsoft and Open Source tools
SuanIct-Bigdata desktop-final

More from Natalino Busa (16)

PDF
Data Production Pipelines: Legacy, practices, and innovation
PDF
Data science apps powered by Jupyter Notebooks
PDF
7 steps for highly effective deep neural networks
PDF
[Ai in finance] AI in regulatory compliance, risk management, and auditing
PDF
Strata London 16: sightseeing, venues, and friends
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
PDF
The evolution of data analytics
PDF
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
PDF
Streaming Api Design with Akka, Scala and Spray
PDF
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
PDF
Yo. big data. understanding data science in the era of big data.
PDF
Big and fast a quest for relevant and real-time analytics
PDF
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
PDF
Strata 2014: Data science and big data trending topics
PDF
Streaming computing: architectures, and tchnologies
PDF
Big data landscape
Data Production Pipelines: Legacy, practices, and innovation
Data science apps powered by Jupyter Notebooks
7 steps for highly effective deep neural networks
[Ai in finance] AI in regulatory compliance, risk management, and auditing
Strata London 16: sightseeing, venues, and friends
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
The evolution of data analytics
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Streaming Api Design with Akka, Scala and Spray
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Yo. big data. understanding data science in the era of big data.
Big and fast a quest for relevant and real-time analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Strata 2014: Data science and big data trending topics
Streaming computing: architectures, and tchnologies
Big data landscape

Recently uploaded (20)

PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
MBA JAPAN: 2025 the University of Waseda
PPT
statistics analysis - topic 3 - describing data visually
PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
DOCX
Factor Analysis Word Document Presentation
PPTX
recommendation Project PPT with details attached
PDF
Navigating the Thai Supplements Landscape.pdf
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PDF
Microsoft Core Cloud Services powerpoint
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
statsppt this is statistics ppt for giving knowledge about this topic
PDF
Microsoft 365 products and services descrption
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
eGramSWARAJ-PPT Training Module for beginners
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Business_Capability_Map_Collection__pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
DU, AIS, Big Data and Data Analytics.ppt
MBA JAPAN: 2025 the University of Waseda
statistics analysis - topic 3 - describing data visually
expt-design-lecture-12 hghhgfggjhjd (1).ppt
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Factor Analysis Word Document Presentation
recommendation Project PPT with details attached
Navigating the Thai Supplements Landscape.pdf
Session 11 - Data Visualization Storytelling (2).pdf
Microsoft Core Cloud Services powerpoint
CYBER SECURITY the Next Warefare Tactics
statsppt this is statistics ppt for giving knowledge about this topic
Microsoft 365 products and services descrption
retention in jsjsksksksnbsndjddjdnFPD.pptx
eGramSWARAJ-PPT Training Module for beginners
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Business_Capability_Map_Collection__pptx

Data in Action