0% found this document useful (0 votes)
26 views92 pages

Week 1 Lecture 2

The document discusses the differences between batch and real-time data processing, highlighting their respective features and use cases in analytics. It also covers the limitations of MapReduce in Hadoop, the architecture of big data platforms, and the technologies involved in data ingestion, storage, processing, and visualization. Additionally, it presents a case study on Netflix's use of big data and data science to enhance user experience and recommend content.

Uploaded by

parth25stat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views92 pages

Week 1 Lecture 2

The document discusses the differences between batch and real-time data processing, highlighting their respective features and use cases in analytics. It also covers the limitations of MapReduce in Hadoop, the architecture of big data platforms, and the technologies involved in data ingestion, storage, processing, and visualization. Additionally, it presents a case study on Netflix's use of big data and data science to enhance user experience and recommend content.

Uploaded by

parth25stat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 92

Data Analytics

Landscape

6/25/22 11:59 AM 1
Data Processing

2
Batch Vs Real-time Processing

Data Storage Analytics Insight

Batch Processing Pipeline

Insight
Data Analytics Storage Consumer
Real-time Processing Pipeline

3
Batch Processing vs Real-Time Processing
The Features below show a comparison of batch and real-time analytics
in the enterprise use cases

Batch Processing Real-Time Processing


• Large group of data/transactions is processed • Data processing takes place upon data entry or
in a single run. command receipt instantaneously.
• Jobs run without any manual intervention. • It must execute on response time within
• The entire data is pre-selected and fed using stringent constraints.
command-line parameters and scripts. • Example:
• It is used to execute multiple operations, Fraud detection
handle heavy data load, reporting, and offline
data workflow.
Example:
Regular reports requiring decision making

4
Should You Do Batch Processing or Stream Processing?

A batch processing
architecture is simple, and
Batch processing is the
It is a good idea to start with therefore quick to set up.
foundation of every good big
batch processing. Platform simplicity means, it
data platform.
will also be relatively cheap
to run.

A batch processing platform When the time comes and


will enable you to quickly ask you also need to do analytics
the big questions. They will on the fly, then add a
give you invaluable insight streaming pipeline to your
into your data and batch processing big data
customers. platform.

5
Limitations of MapReduce in Hadoop

The limitations of MapReduce in Hadoop are listed below:


Unsuitable with OLTP • OLTP requires a large number of short
(Online Transaction transactions, as it works on the batch-oriented
Processing) framework.

• The apache Giraph Library processes graphs,


Unfit for processing graphs which adds additional complexity on top of
MapReduce.

• Being a state-less execution, MapReduce


Unfit for iterative execution doesn’t fit with use cases like Kmeans that
needs iterative execution.

6
Hadoop Ecosystem

Data Data Data


(Analytical) Data Processing Result Store
Sources Collection Consumer

Log Files
Reports
Staging
ERP
Result Store
RDBMS Raw Data Computed Services
(Reservoir) Information
Batch Analytic
Social
Compute Query
Channel Tools
Engine
Sensor
Alerting
Machine Tools

Mobile

= Data in motion = Data at rest

7
Big Data Platform Blueprint

1. Ingestion is all about getting the


data in from the source and
making it available to later stages.
2. To analyze/process stage is where
the actual analytics is done.
Analytics, in the form of stream
and batch processing.
3. This is the typical big data storage
where you just store everything. It
enables you to analyze the big
picture.
4. Displaying data is as important as
ingesting, storing analyzing it.
People need to be able to make
data driven decisions.

8
Big Data Technologies Landscape
Ingestion Storage/Retention Processing Access
Ingestion Architecture: Data Storage: Data Processing: Visualization and APIs:
• Scalable, Extensible to • Depending on the • Processing is provided • Dashboard and
capture streaming and requirements data is for both batch and applications that
batch data. placed into Hadoop near-time use cases provides valuable
• Provide capability to HDFS, Hive, HBase, • Provision Workflows business insights
business logic, filters, Elastic Search or in- for repeatable Data • Data will be made
validations, data memory. processing available to consumers
quality, routing, etc. • Metadata management • Provide Late Data using API, MQ Feed
business requirements • Policy-based Data Arrival Handling and DB access
Retention is provided.

Technology Stack: Technology Stack: Technology Stack: Technology Stack:


• Apache Flume • HDFS • MapReduce • Qlik/ Tableue/
• Apache Kafka • Hive Tables • Hive Sportfire /
• Apache Storm • Hbase / MapR DB • Apache Spark Microstragety
• Apache Sqoop • Elastic Search • AWS Elastic • REST APIs
• NFS Gateway MapReduce

Management, Monitoring, Governance


Ambari, Cloudera Manager, Cloudera Navigator, MapR MCS
9
Batch Processing Pipeline
Raw Data Active
Sources Layer / Data Processing Curated Layer Visualization
Layer
Lake
Source 1

Source 2
Informatica
HQL (Hive Query Lang) Impala
Source 3
Processing
…...

HDFS Hive DB Hive DB MicroStrategy


Engine

Source N
ML Model
(push/Pull) • Python
Time-defined • Spark
Batch data Code
Off loaded to Apache Apache Apache
Raw data Parquet Parquet Parquet
layer (File format) (File format) (File format)

10
Real Time Streaming Pipeline
Raw Data
Source Layer / Data Processing Curated Layer Visualization
Lake

Source 1

Source 2 Impala
Kafka
Source 3 Processing
Topics HDFS Hive DB MicroStrategy
Engine
…..
.

Source ML Model
N • Python Code

Apache Apache
Parquet Parquet
(File format) (File format)
11
AI and Data Platform Architecture Diagram – Complete

Data Processed Data Consumer


Data Ingestion Raw Data Lake, AI & ML, Processing
Sources Stores Apps

Operational Databases Interactive Query AI & ML Toolkit AI & ML Ops Search Engine
Self Service BI
(Deployment)

MySQL
instanc
e Alerts and Notifications
Key-Value Document
Store
Data Transformations and
SQL Data Streams (Real Time) Data Quality Scripts
Server User Mobile
instance App

Data Warehouse
Other Data
Sources
Batch Processing Users
& Job
Management
Raw Data Lake

Apps

12
AI and Data Platform Architecture Diagram – Complete With AWS Technologies

Data Processed Data Consumer


Data Ingestion Raw Data Lake, AI & ML, Processing
Sources Stores Apps

Operational Databases Amazon Athena AWS Deep Learning AMIsAmazon SageMaker Amazon Elasticsearch
Service Amazon QuickSight
Tableau

MySQL
instanc
e Amazon Simple
Amazon DynamoDB Notification Service

AWS Lambda
SQL Amazon Kinesis
Server Data Streams
instance User Mobile
App

Amazon Redshift
Other Data
Sources
AWS Glue Users

Amazon Simple Storage


Service
Apps

13
Netflix Use Case

How Netflix added value to


their business with the help of
big data and data science?

14
About Netflix

140
118
million
million
Netflix is a streaming service that hours
users
allows customers to watch a wide every day
variety of award-winning TV shows,
movies, documentaries, and more on 3 trillion 12
thousands of internet-connected user petabytes
devices. events logs
per day every day

15
The Goal Of Netflix (Keep The Goal in Mind)

The goal for Netflix


is to keep you
subscribed and to get
new subscribers.

16
How Netflix Gather Big Data?

Do the nature of shows


Date the movie/show was One which device is it
Rating Searches watched vary depending on
watched watched
device

Do portions of programs
When is a program paused Do the credits gets skipped Etc
get re-watched

17
Batch Processing: For Finding Hit Movie to Recommend Users

• Location
• Country
Batch
• Impression
Processing
Events
• Play Events Batch processing using historical Data

• Completion
Events Hit Movies

Batch processing: Netflix knows the exact episode of


a TV shows that gets you hooked, not only globally
but for every country.
18
The Netflix Old Batch Processing Pipeline
When Netflix started our, they had a very simple batch processing system architecture
explained in this figure.

Event 1

Event 2

Elastic
Event 3 Chuckwa Amazon S3
MapReduce

.
. Ingestion Storage Analytics
.
Event 4

Chuckwa (a scalable data collection system) wrote incoming messages into Hadoop
sequence files, stored in Amazon S3. These files then could be analyzed by Elastic
MapReduce Jobs (Daily and Hourly basis)
19
The Trending Now Feature
Click
Event
s Play events (While you watch):
Play • Title you watched last, where
Events you did stop watching,
• where you used 30s rewind,
Impression
• etc.
Events

Date &
Time Logs
Impression Events (Not Watching):
Browse Netflix library like scroll up and
down, scroll left or right, click on a
movie, etc.

Trending Now
20
Real Time Processing To Recommend The Movies
Looking past is not enough! Lets do real time processing.

• Location
• Country
Stream
• Impression
Processing
Events
• Play Events
Stream Processing based on the
user’s incoming data
• Completion
• Used Kafka Platform with
Events Cassandra database
Recommended Movies For
Particular User
• They replace their custom
analytics tool with Apache
Spark.

21
Netflix Streaming Pipeline based on Spark
Impression events

Beacon

Trending data

Devices Kafka Spark Cassandra

Ingestion Analytics Storage

Viewing
History

Play events

Recommender
System

Live Data
What Netflix Achieved by Using Data Science?

Finding the next smash-hit series

Personalized video ranker

Top in video ranker

Trending now

Continue watching

Video - Video Similarity Algorithm


23
24
The Science

People Science
Data

Data Analytics

Technology Processes Business

25
The Science – Algorithms
How Data Science algorithms/software can help in decision making?

Software
Algorithms in decision making

Ruled-based decision making Statistical reasoning Machine learning Artificial intelligence

Boolean data Simple Regression Classification Tasks Dynamic adaptation to novelty


(yes or no) Numerical data arbitrary data Autonomous selection of best
allowing for curve fitting That needs to be abstracted methodology when presented with
into numbers
Examples arbitrary data
⮚ Time or threshold- Examples Examples
based alarms ⮚ Interpolation ⮚ Identification of relevant Examples
⮚ Simple pattern ⮚ Outlier detection ⮚ Autonomous vehicles
features from large input
matching ⮚ Predictive maintenance datasets ⮚ Human-like conversational skills
⮚ Quality control using ⮚ Intelligent digital assistant
various metrics
26
Machine Learning Algorithms
Deep Boltzmann Machine • • Naïve Bayes
Deep Belief Networks • • Averaged One-Dependence Estimators
Convolutional neural Networks • Deep Learning • Bayesian Belief Network
Stacked Auto-Encoders • Bayesian • Gaussian Naïve Bayes
• Multinomial Naïve Bays
Random Forest • • Bayesian Network
Gradient Boosting Machines • • Classification & Regression Tree
Boosting • • Interactive dichotomiser 3
Bootstrapped Aggregation • Ensemble • C4.5
AdaBoost • • C5.0
Stacked Generalization • Decision Tree • Chi-squared Automatic Interaction
Gradient Boosted Regression Trees • Detection
• Decision Stump
Radial Basis Function Network • • Conditional Decision Trees
Perceptron • •
Neural Networks MS
Back-Propagation •
Hopfield Network • Machine • Principal Component Analysis
• Partial Least Squares Regression
Ridge Regression •
Learning • Sammon Mapping
Absolute Shrinkage & Selection Operator •
Regularization
Algorithms •

Multidimensional Scaling
Project Pursuit
Elastic Net •
Least Angle Regression • Dimensionality • Principle Component Regression
Reduction • Partial Least Squares Discriminant Analysis
• Mixture Discriminant Analysis
Cubist • • Quadratic Discriminant Analysis
One Rule • • Regularized Discriminant Analysis
Zero Rule • Rule System • Flexible Discriminant Analysis
Repeated Incremental Pruning to Produce • • Linear Discriminant Analysis
Error Reduction • K-Nearest Neighbor
• Learning Vector Quantization
Linear Regression • Instance Based • Self-Organizing Map
Ordinary Least Squares Regression • • Locally Weighted Learning
Stepwise Regression •
Multivariate Adaptive Regression Splines • Regression • K-means
• • K-medians
Locally Estimated Scatterplot Smoothing Clustering • Expectation Maximization
Logistic Regression •
• Hierarchical Clustering
BUSINESS PROBLEM TO DATA MINING TASKS
Answering Business
Classification Regression Questions
• Among all the customer which • How much will a given
are likely to given response? customer use the service? Who are the most profitable
customers?

Similarity Matching Clustering Is there really a difference


• Attempts to identify similar • Do our customers form natural between the profitable
individuals based on known groups or segments? customers and the average
about them.
customer?

Profiling Link Prediction But who really are these


• What is the typical cell phone • Recommending movies to customers? Can I
usage of this customer customers on the basis of characterize/classify them?
segment? watched and rated movies.

Co-occurrence Data Reduction Will some particular new


customer be profitable? How
• Which items are commonly • What is important to trade-off
purchased together? for improved insight
much revenue should I
expect this customer to
generate?

Reference: Data Science for business by foster provost & tom fawcett
28
1. CLASSIFICATION: Customers Classification
• Linear Classifiers
• Support Vector machines
• Decision Trees
• Random Forest Diamond
• Neural Networks

Classifier
Gold
Model

Customers
Data Silver

• Our challenge is to • Building the classifier • Classifier model will classify


classify the customers model on labeled data the customers into groups.
into different types of who will classify the e.g. Diamond, gold or Silver.
categories and predict customer into different • These classes can help us to
their class. groups. encourage the customers to
buy things.

29
2. REGRESSION: Predict Sales
• Linear Regression
• Lasso Regression
• Logistic regression
• Support Vector Machine

Revenue (millions)
• Multivariate Regression Algorithm

Regression
Model

Sales Data
X axis

• Predicting how much • Build a regression • Now we can see our


revenue will made by model who will estimated revenue &
company in upcoming years predict the revenue plan according to
based on historical data? for upcoming time that.
• How much resources we
need next year?

30
3. SIMILARITY MATCHING
• Nearest Neighbor Distance
• Levenshtein Distanance
• Damerau-levenshtein Distance
• Needman- Wunch Distance
• Hamming Distance

Model

Customers Data

• Targeting the people • Finding which people are


similar to existing profitable • Reach to those type of
who are similar to
customers by using customers and market your
your existing
classification, regression and business.
profitable customers.
clustering models.

31
4. CLUSTERING: Business problems & clustering
• K-means
• Mean-Shift Clustering
• DBSCAN
• EM-Clustering

Clustering
Model

• Do Customers form natural • Use k-means or other • You can identify the
groups or clusters? clustering machine clusters using clustering
• What product should we learning algorithms to algorithm as shown in
offer? address the challenge. the fig on unlabeled
• How should our sales team data based on particular
be structured? attributes.

32
5. PROFILING: Profiling Customers
• Classification
• Clustering
• Exploratory Data Analysis

Data
profiling

Cell phone usage data of


customers

• What is the typical • Profiling require complex calculations. • Through profiling


cell phone usage of For example profiling cell phone usage customer you can
particular customer might require a complex description of make new policies
segment?” night & weekend airtime averages, and offer calls,
international usage, roaming charges, messages packages.
text minutes, and so on.

33
6. LINK PREDICTIONS: Suggestions
• Common Neighbors
• Adamic Adar
• Preferential attachment
• Resource allocation
• Same community
• Total neighbors

Link Prediction
Models

The movies you might


Customer’s movies data enjoy

• Recommending the • Use Graph distance or • Link prediction is very


movies to customers other machine learning link common in social media
one can think of a graph prediction model to find websites like Facebook,
between customers and out which type of movies a twitter etc. through this you
the movies they have particular customer wants attract user to watch new
watched or rated to watch. movies or use new products

34
7. CO-OCCURANCE:

• Market basket analysis


• Association Rules

Data Science

Sales Data

• What items are • Using Association rules or • We can offer discount on


purchased to other machine learning sets of items which
together? algorithm we will identify customers purchased
which items are together this can help us
purchased together to increase our revenue.

35
8. DATA REDUCTION
• Correlations analysis,
• Identifying important & less
important features
• Drop duplicate information

Data Optimized
Data Science Dataset

• Converting the large dataset • Decide which attributes are • Now you can
into smaller datasets to important to you using co perform analytics
process data in less time relation or other techniques easily on smaller
and in effective way. and remove less important dataset and save
• Making sure integrity of attributes. time & reduce cost.
data will rename same • Group the similar attributes.

36
Machine Learning in Production

Machine
Learning

Stream
Batch
Processing

Why machine learning in production is harder then you think?

37
Machine Learning Models Do Not Work Forever

• Machine Learning model training is never


ending job. Every time new data comes
in we must need to retrain the model
based on latest dataset.

• What you do in development or


education is that you create a model and
fit it to the data. Then that model is
basically done forever?

• IoT world, the problem is that machines


are very different. They behave very
differently.

38
Which Platform Supports Retraining Model Automatically?

• Automatic re-training and re-deploying is a very big issue, a


very big problem for a lot of companies. Because most
existing platforms don't have this capability.

• Look at AWS machine learning for instance. The process is:


build, train, tune deploy, Where’s the loop of retraining?

• You can create models and then use them in production. But
this loop is almost nowhere to be seen.

39
Machine Learning Training Parameter Management

• To train a model you are manipulating input parameters of the models


• For example deep learning:
• How many layers do you use. The depth of the layers, which means how many neurons
you have in a layer. What activation function you use, how long are you training and
soon.
• You also need to keep track of what data you used to train which model.
• All those parameters need to be manipulated automatically, models trained and tested.
• To do all that, you basically need a database that keeps track of those variables.

40
Data is Stronger Than Opinions
You Have The Data. USE IT
This doesn’t You bring Show the Discussion
work the data statistics end there.

41
42
The Business

People Science
Data

Data Science

Technology Processes Business

43
DATA-DRIVEN DECISION MAKING (DDD)

DDD refers to the practice of taking decisions Data-Driven


using data, rather than purely on intuition: Decision
Making
(across the firm)

• Using data and trending historical data


Automated DDD
• Validating assumptions if any
• Using champion challenger to test scenarios Data Science
• Using experiments
• Use baseline
• Continuous improvement
₋ Customer experiences Data Engineering and Processing
(Including “Big Data” technologies)
₋ Costs
₋ Revenues
If you can’t measure it, you can’t manage it Other positive effects of data
processing
(e.g, faster transaction processing)

Reference: Data Science for business by foster provost & tom fawcett
VALUE: LEVERAGING DATA FOR VALUE-ADDED HEALTHCARE

Increase Proactive Strategic Increase the Value of


Revenue Decision Making Partnerships Data as an Asset
(Data Monetization)

Enhance Customer Enhance Operational Develop/ Enhance


Experience Efficiency Products & Services Innovation

6
WAYS A DATA SCIENTIST CAN ADD VALUE TO ANY BUSINESS 1/2

Empowering Data Scientists Data Scientists Identifying


management and direct the action challenge the opportunities.
officers to make based on trends staff to adopt the
better decisions. which in turn best practices
help in defining and focus on the
goals. issues that
matter.

https://www.simplilearn.com/why-and-how-data-science-matters-to-business-article 46
The Applications – Some Areas

Logistics Banking Insurance

Customer Energy Efficiency


Retail
Analytics & IOT

Marketing Manufacturing Healthcare

Telecom Tourism

47
Big Data In Telecom
• Customer Analytics
• Direct • Customer Journey
• Shop Visit • Footfall analysis
• IVR
• Chat
• Web
Cross-
• Social Media
channel CDR
Interactions

• 3G UE Agents
• Demographic
• APN, Probes
• Age, Gender
• Mobile Access Nodes
Customer • Ethnicity
• Core Network Network Data
Attributes • Geography
• CDRs/XDRs

Teleco • Segment
DPI/Drive Tests
• Service Degradations m • Data usage
• Platform Outages
Customer Transactional
Insights Data

• Churn / Propensity Other • Ordering / fulfillment


• NPS / Customer satisfaction score Enterprise • Trouble Ticketing
• Loyalty Data • Billing
• CLV
• revenue

• Federal Agencies
• City Councils
• Municipalities
• Business Metrics 48
Big Data & Tourism Department

BOOKING PRE-ARRIVAL STAY CHECK OUT OPERATIONS

• Booking Activity by Channel • Segmentation & Clustering •• Top Guests


Guest by Revenue
Satisfaction Score • Top Guests By Revenue • Wage Cost
• Loyalty Points Spend pattern
• Cancellations & Reschedule • Campaign ROI • Social Media Follower Base • Loyalty Points Spend Pattern • Total Labour Cost
• Repeat Customer Revenue
• Upgrades / Downgrade • Cross sell/Up Sell • Customer Retention Rate • Repeat Customer Revenue • Labour Turnover
• Guest Acquisition Cost
• ADR & Occupancy Ratio • Improved Loyalty Signups •• Processing Costs per
Guest Segmentation • Guest Acquisition Cost • Food Cost
• Look-to-book Ratio • Propensity Modeling • Feedbacks
Transaction& Complaints • Guest Segmentation • Average per room cost
• Advance Booking Ratio • Affinity Modeling •• Guest Spending Pattern
Social Sentiment Score • Feedbacks & Complaints • Average hourly Pay
• New Guest Market vs
• No Show • Influence Modeling • Most Preferred Channel • Guest Spending Pattern
Return Guest

49
Big Data & Airports

50
USE CASE: Data Science in Insurance Industry

Customer Fraud Customer


Experience Detection Insights

Marketing Automation

51
Big Data In Banking
RISK DATA AGGREGATION &
REPORTING

PREDICTIVE WEALTH
MANAGEMENT &
AML
COMPLIANCE
ANALYTICS PRIVATE BANKING

ACROSS
BANKING DATA SCIENCE &
PREDICTIVE ANALYTICS
CONSUMER IN BANKING VARIOUS
BANKING TYPES OF
COLOR KEY FRAUD

DEFENSIVE
SAVE THE BANK

OFFENSIVE
DRIVE PROFITS &
COMPETITIVE
PAYMENTS
CYBER
ADVANTAGE SECURITY

FINANCIAL TRADING
APPLICATIONS

52
Big Data In Retail
CROSS SELLING & UP RECOMMENDATION
SELLING ENGINE

PURCHASE ATTRIBUTION
LIKELIHOOD MODELING
Market Collaborative
Basket Filtering
Analysis
Markov
Propensity Chain Monte
Model Carlo

CHURN Survival Retail Optimization PRICING


ANALYTICS Analysis Techniques
ANALYTICS
RFM Panel Data
Analysis Regression
Multi-
Cluster
variate Time
Analysis
Series
MARKETING
CUSTOMER
MIX MODEL
ANALYTICS

DEMAND CUSTOMER, STORE AND


FORECASTING PRODUCT SEGMENTATION
53
Big Data In Manufacturing

Reduction of Supply Optimization of Perfecting Quality as a Predictive


Chain Risk Operations to a Higher Competitive Advantage Maintenance to
Degreed than Ever Reduce Cost

After-Sales Mass and Individual New Data-Driven From Local to


Improvements Customization Revenue Sources and Enterprise-Level Data
Business Models Analytics

54
Big Data In Healthcare

Disease
Patient Medical
Personalized Modeling
Data Test
Medicine and
Analysis Automation
Mapping

Merge and analyze Track patients activities, One of the flashiest uses of Data Science enables
data sets from movements, symptoms data science in the past automation of medical
multiple sources to to discover or few years has been in tests and provides you
create personalized identifying diseases. tracking (and finding ways real time analytics for
treatment. to halt or prevent) example BP, Diabetes
diseases. etc

55
USE CASE: Energy Efficiency (IOT Example)

Data science
Challenge Results
& Big Data

• Our mission is to • Identify inefficiencies • Product Optimization


provide an innovative of energy consumption • Less energy
and affordable solution • Statistical correlation consumption
that accelerates between presence of • Customer’s satisfaction
transition to people and
sustainable buildings inefficiencies
• Control policies for
devices to reduce the
energy consumption

56
USE CASE: Smart Lighting (IOT Example)

Data science
Challenge Results
& Big Data
• Presence sensors to • Reactive/Predictive
ensure lights are not in maintenance
use when rooms are • Adaptive lighting • Product Optimization
empty. solution(presence • Personal settings
• Daylight harvesting to detection) • Scheduling
employ natural lighting • Learn occupants • Less energy
to minimize artificial individual lighting consumption
lighting needs. preferences • Customer’s satisfaction
• Personal dimming to • Management and
allow individuals the control by measuring
option to directly the light intensity
control the lighting in
the room or space

57
USE CASE: Customers Segmentation

Data science
Challenge Results
& Big Data

• Identifying the • Customers • Increase sales


customers which are segmentation based • Reach to your
most likely to on their attributes. customers easily
purchase your • Multivariate analysis • Target customers
product? to find the customers • Offer discounts to your
• Identifying the type of which are most likely customers based on
customers. to purchase your their category like
• Classification of products gold, silver.
customers based on • Exploratory data • Customers satisfaction
their purchases analysis and finding
the information from
data.

58
USE CASES: Customer’s Analytics

Marketing & Customer Retention Customer


Advertising Service & Loyalty Experience

• Personalized • Identifying customer • Deploy and refine • Sentimental Analysis


marketing pain points and predictive models that help • Identifying the
• Right offer, at issues proactively them retain customers with connection between
right time, in right • Updating their FAQs proactive approaches. the customers
location & or other • Investments, in terms of experience and
context, to a right communications with offers and upgrades, can be company’s financial
person existing customers. made at the right time to performance.
increase the likelihood of
retaining desirable
customers
59
Currency Conversion

05/18/2025 07:38 AM 60
Fraudulent wire transfer

05/18/2025 07:38 AM 61
Excess Reserves

05/18/2025 07:38 AM 62
Credit Card Customers

05/18/2025 07:38 AM 63
Misinformation in loan applications

05/18/2025 07:38 AM 64
Potential Best Customers

05/18/2025 07:38 AM 65
Customers At Risk

05/18/2025 07:38 AM 66
Target Potential Customers

05/18/2025 07:38 AM 67
Automated Documentation

05/18/2025 07:38 AM 68
Long Loan-cycle times

05/18/2025 07:38 AM 69
Goal Setting

05/18/2025 07:38 AM 70
Insurance claims

05/18/2025 07:38 AM 71
Liquidity Forecasts

05/18/2025 07:38 AM 72
Risk Scoring

05/18/2025 07:38 AM 73
Real time Blocking

05/18/2025 07:38 AM 74
Rule based AML

05/18/2025 07:38 AM 75
Fraud Detection systems

05/18/2025 07:38 AM 76
Spot Identity Fraud

05/18/2025 07:38 AM 77
Energy and
Utilities

05/18/2025 07:38 AM Dr. Ehsan Ullah Warriach 78


Use Case Overview: Solar Generation Forecast

ML / AI

05/18/2025 07:38 AM 79
Data Platform should have these key Value Propositions

05/18/2025 07:38 AM 80
3 operational levers for utilities to employ AI use cases for better performance

05/18/2025 07:38 AM 81
Prioritizing use cases should consider both total potential values as well as
feasibility for maximizing impact

05/18/2025 07:38 AM 82
International Utilities Company: Improve maintenance by increasing the number
of resolutions at first visit

Context: Maintenance engineers are a scarce resource and when maintenance is required, details are not always precise enough.
These requests come from e-mails and phone calls. Before AI/ML, it was often required to visit at least twice due to the lack of right
tools.

Approach: Using AI/ML, they were able to predict the fault parts of machines from e-mails and phone calls received.

Impact:
• Lower engineering and inventories cost due to higher resolution ratio.
• Resolution ratio jumps from 15% to 60% for all maintenance operations.
• Realized Impact : $ 450k / year due to a better use of engineering resources and inventory

05/18/2025 07:38 AM 83
International Water Management Company: Reduces regulatory cost associated
with N2O measurement

Context: This company is commissioned by many Japanese municipalities to operate water purification plants. Japan's water purification
plants have set standards and regulations for greenhouse gas emissions for each treatment method. The Japanese local government
has also requested this company to take measures based on the measurement results for N2O. The cost of the measurement
equipment is $100,000 per unit, which is a large cost burden if many units are installed.

Approach: By using ML/AI they forecasted N20 concentration using time-series data from water plant, such as temperature,
transparency, pH, quantity of chemicals, and were able to add weather data. Instead of a large number of water quality sensors, this
approach combines a small number of water quality sensors with soft sensors with predictive models.

Impact:
MAPE: <15% for all water plants
ROI estimates: 1 million $ per year savings

05/18/2025 07:38 AM 84
QUESTIONS & ANSWER SESSION

05/18/2025 85
Healthcare

05/18/2025 07:38 AM 86
There are Hundreds of Opportunities to Optimize Every Division of A Health Care
Player

05/18/2025 07:38 AM 87
Healthcare Organizations are scaling Values using AI

05/18/2025 07:38 AM 88
Companies Adopting AI

05/18/2025 07:38 AM 89
Moving Through the AI Maturity Curve

05/18/2025 07:38 AM 90
.

05/18/2025 07:38 AM 91
Large US provider Identities FWA and secures Large ROI

Rule-driven system allowed potential cases of fraud, waste & abuse to fall through the cracks.

• Create a new model that could leverage historical results


• Now able to find more suspicious claims
• ID potential losses before payout
• Estimated ROI = $15M

In addition to using supervised machine learning to identify known behaviors of overpayments,


unsupervised machine learning models identify unknown behaviors of overpayments by
discovering claims that appear to be anomalous. Investigators can use this information to prioritize
the review of anomalous claims and to retrain their supervised machine learning models with
results from their latest investigations

05/18/2025 07:38 AM 92

You might also like