SlideShare a Scribd company logo
From Science to Engineering, 

Process of a Machine Learning Product
Bruce Kuo

bruce3557@gmail.com
!1
Who Am I?
• Bruce Kuo

• Experience:

• Yahoo software engineer in Data team
and Global Search (2014-2017)

• Codementor Data Scientist
(2017 - 2019)
!2
Target Audience
• Who is interested in machine learning product development

• Junior / Mid Level machine learning engineers

• Data scientists / engineers
!3
Goals of This Talk
• Share the overview of a machine learning project

• Share points between business problems and machine learning problems

• Share engineering stuffs in a machine learning product
!4
Machine Learning Project Overview
!5
Machine Learning
Project
Science Engineering
Science
Levels
Research
Steps
Requirements
Define Problem &
Objectives
Offline
Evalution
Solution
Research
Model
Serialization
ML Data
Pipeline
Model

Serving
Performance
Tracking
CI &
Monitoring
Science
!6
Two Different Science Levels
• Unknown business problem

• Example: do we need fast
recommendation after user view a
product page?

• This is another topic
• Known business problem, unknown solutions

• Example: we have supply problem on
matching algorithm, how can we improve
conversion by recommendation.

• More ML steps here

!7
We focus on known business problem in this sharing.
Unknown problem part is another story…
Where ML Requirements Come From
!8
Data Analysis
We need a
recommendation
module!
PM, Analysts
Where ML Requirements Come From
!9
Qualitative Analysis

(User Feedback)
We need to improve
our tag suggestion
module!
PM, Designer, Sales, Marketing
ML Science on Business
• ML science of business problems are like “experimental science”

• Different dataset will have different algorithms to solve / learn.

• Designing experiments is important.
!10
ML Problem Steps
• Goal: enhance a specific business metric
!11
Define Problem &
Objectives
Solution Research &
Experiments
Define Evaluation
Metrics
Define the Problem
• What is the business problem?

• News triggering

• Mentor matching

• Which type of ML problem can be used to solve the business problem?

• Classification?

• Recommendation?

• …
!12
Define Objectives
• In algorithm, we focus on loss

• 0/1 loss

• Mean Square Error (MSE)

• Mean Absolute Error (MAE)

• Cross Entropy

• …
https://cloud.tencent.com/developer/article/1092365!13
• In business, we focus on
business goal.

• Interest rate

• Conversion 

• CTR

• …
Design Offline Evaluation
• After defining problem & objectives, we need to design offline evaluation.

• Usually offline evaluation metrics are business goals (CTR, interest rate, …)

• First version of data pipeline design and online evaluation design.

• Provide confidence before we start integrating algorithm to online service.

• Supervised offline evaluation is easy, unsupervised is hard.
!14
Solution Research
!15
• Paper, paper, paper

• Learning how to solve similar problems
and how we can get idea from those
solutions

• Research areas of machine learning

• For different purposes: classification /
regression / clustering …

• Algorithm optimization: which kind of
gradient descend function is better
Solution Research (Cont.)
• In startup, we usually focus on high level parts because:

• Tuning speed

• Integration - need to choose mature implementation for better
production usage, e.g., scikit-learn or keras.

• Feature engineering is pretty important when we only select
algorithms 

• Small goals on solution engineering - easy to retrain
!16
Example: Product Recommendation
• Problem: Give an user, we want to recommend products to the user

• Ranking problem or recommendation problem

• Objectives: 

• Business Metrics: top-k interest rate

• Loss function: dependent to our solution

• Offline Metrics: We evaluate top-k interest rate as performance metrics after optimization

• Solution research: Matrix Factorization, Factorization machine, Deep Learning, Learning to
rank…
!17
Engineering
!18
Why Engineering Needed?
• Model results should be used in your products

• How?
!19
Need CI / Monitoring
Model Training Data Pipeline
Science Engineering
Serialization
API Serving
From Science to Engineering
First Step: Export Model
!20
Serialization - Export Model
• Goal: serialize your model into binary file or general format, everyone can use
this for prediction.

• Different serialization methods for different algorithms but same interface in
different machine learning packages

• Scikit-learn: https://scikit-learn.org/stable/modules/model_persistence.html

• Keras: https://jovianlin.io/saving-loading-keras-models/

• …

• More low level design: http://dmg.org/pmml/products.html
!21
Serialization - Export Model
• Example: how to serialize logistic regression model?

• scikit-learn: joblib.dump(model, path)

• From scratch: need to realize the model equation

• Equation:

• Only save , that is a linear weight vector, and we can calculate the
prediction function.

• PMML is trying to define serialization interface for each algorithm
!22
Pr(Yi = y|Xi) =
eβ*Xi*y
1 + eβ*Xi
β
Serialization - Export Latent Features
• We extract hidden vectors to represent user / items

• Extract photo features with auto encoder

• Extract user features with matrix factorization …

• 2 Ways to export latent features

• Save model, e.g., auto-encoder

• Save features vectors, e.g., matrix factorization vectors
!23
Example - Matrix Factorization
!24
picture: https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/
Save by user
Save by product
How to Use Model Result?
• Model predict in data API

• Model predict in data pipeline
!25
Predict in Data API
• Model predict in Data API

• Prepare data in data pipeline

• Data in request payload

• Can provide realtime prediction

• Latency is a challenge
Data API
Model
user data
Data Warehouse
user data
!26
Serving Database
user features
DataPipeline
Predict in Data Pipeline
• Model predict in Data Pipeline

• Predict result in pipeline and save to
database 

• Backend implements logics on their
side

• Better API speed

• Lower flexibility
Data API
Model
Data Warehouse
user data
Serving Database
predict
extract result
Components
!27
DataPipeline
Other Concerns in Engineering
• How long we need to provide
model results to users?

• How to handle data changes?

• Online performance tracking

• Monitoring

• CI / CD
Factors to design your
pipeline
!28
Conclusion
• The overview of a machine learning project

• Points between business problems and machine learning problems

• Engineering details in a machine learning project
!29
Q & A

Thanks for
Listening!
!30

More Related Content

PPT
Object Oriented Analysis and Design - Overview
PDF
CS8592-OOAD Lecture Notes Unit-1
PPT
03 unified process
PDF
Seng 123 8-ooad
PPT
Object oriented analysis & Design- Overview
PPTX
11 topic 9 ooa
PDF
Software Engineering an Introduction
Object Oriented Analysis and Design - Overview
CS8592-OOAD Lecture Notes Unit-1
03 unified process
Seng 123 8-ooad
Object oriented analysis & Design- Overview
11 topic 9 ooa
Software Engineering an Introduction

What's hot (7)

PDF
Software Engineering : Process Models
PDF
[2016/2017] Modern development paradigms
PDF
UML Intro
PPTX
Chapter 12 Lecture: GUI Programming, Multithreading, and Animation
PPTX
Class Diagrams
PPTX
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Software Engineering : Process Models
[2016/2017] Modern development paradigms
UML Intro
Chapter 12 Lecture: GUI Programming, Multithreading, and Animation
Class Diagrams
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Ad

Similar to From science to engineering, the process to build a machine learning product (20)

PDF
Pragmatic Machine Learning @ ML Spain
PDF
10 more lessons learned from building Machine Learning systems - MLConf
PDF
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
PDF
10 more lessons learned from building Machine Learning systems
PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
PDF
Productionising Machine Learning Models
PDF
Choosing a Machine Learning technique to solve your need
PDF
Guiding through a typical Machine Learning Pipeline
PPTX
Python for Machine Learning_ A Comprehensive Overview.pptx
PDF
A few Challenges to Make Machine Learning Easy
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PDF
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
PDF
Intro to machine learning for web folks @ BlendWebMix
PDF
Getting started with Machine Learning
PDF
An introduction to Machine Learning
PDF
ML MODULE 1_slideshare.pdf
PPTX
L15.pptx
PPTX
Integrating Machine Learning Capabilities into your team
PDF
General introduction to AI ML DL DS
PDF
Course 2 Machine Learning Data LifeCycle in Production - Week 1
Pragmatic Machine Learning @ ML Spain
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
10 more lessons learned from building Machine Learning systems
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Productionising Machine Learning Models
Choosing a Machine Learning technique to solve your need
Guiding through a typical Machine Learning Pipeline
Python for Machine Learning_ A Comprehensive Overview.pptx
A few Challenges to Make Machine Learning Easy
Lessons Learned from Building Machine Learning Software at Netflix
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Intro to machine learning for web folks @ BlendWebMix
Getting started with Machine Learning
An introduction to Machine Learning
ML MODULE 1_slideshare.pdf
L15.pptx
Integrating Machine Learning Capabilities into your team
General introduction to AI ML DL DS
Course 2 Machine Learning Data LifeCycle in Production - Week 1
Ad

Recently uploaded (20)

PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPT
introduction to datamining and warehousing
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Visual Aids for Exploratory Data Analysis.pdf
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Soil Improvement Techniques Note - Rabbi
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
Fundamentals of safety and accident prevention -final (1).pptx
introduction to datamining and warehousing
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Visual Aids for Exploratory Data Analysis.pdf
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
R24 SURVEYING LAB MANUAL for civil enggi
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Soil Improvement Techniques Note - Rabbi
Exploratory_Data_Analysis_Fundamentals.pdf
Safety Seminar civil to be ensured for safe working.
Categorization of Factors Affecting Classification Algorithms Selection
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
Fundamentals of Mechanical Engineering.pptx
Abrasive, erosive and cavitation wear.pdf
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...

From science to engineering, the process to build a machine learning product

  • 1. From Science to Engineering, Process of a Machine Learning Product Bruce Kuo [email protected] !1
  • 2. Who Am I? • Bruce Kuo • Experience: • Yahoo software engineer in Data team and Global Search (2014-2017) • Codementor Data Scientist (2017 - 2019) !2
  • 3. Target Audience • Who is interested in machine learning product development • Junior / Mid Level machine learning engineers • Data scientists / engineers !3
  • 4. Goals of This Talk • Share the overview of a machine learning project • Share points between business problems and machine learning problems • Share engineering stuffs in a machine learning product !4
  • 5. Machine Learning Project Overview !5 Machine Learning Project Science Engineering Science Levels Research Steps Requirements Define Problem & Objectives Offline Evalution Solution Research Model Serialization ML Data Pipeline Model Serving Performance Tracking CI & Monitoring
  • 7. Two Different Science Levels • Unknown business problem • Example: do we need fast recommendation after user view a product page? • This is another topic • Known business problem, unknown solutions • Example: we have supply problem on matching algorithm, how can we improve conversion by recommendation. • More ML steps here !7 We focus on known business problem in this sharing. Unknown problem part is another story…
  • 8. Where ML Requirements Come From !8 Data Analysis We need a recommendation module! PM, Analysts
  • 9. Where ML Requirements Come From !9 Qualitative Analysis (User Feedback) We need to improve our tag suggestion module! PM, Designer, Sales, Marketing
  • 10. ML Science on Business • ML science of business problems are like “experimental science” • Different dataset will have different algorithms to solve / learn. • Designing experiments is important. !10
  • 11. ML Problem Steps • Goal: enhance a specific business metric !11 Define Problem & Objectives Solution Research & Experiments Define Evaluation Metrics
  • 12. Define the Problem • What is the business problem? • News triggering • Mentor matching • Which type of ML problem can be used to solve the business problem? • Classification? • Recommendation? • … !12
  • 13. Define Objectives • In algorithm, we focus on loss • 0/1 loss • Mean Square Error (MSE) • Mean Absolute Error (MAE) • Cross Entropy • … https://cloud.tencent.com/developer/article/1092365!13 • In business, we focus on business goal. • Interest rate • Conversion • CTR • …
  • 14. Design Offline Evaluation • After defining problem & objectives, we need to design offline evaluation. • Usually offline evaluation metrics are business goals (CTR, interest rate, …) • First version of data pipeline design and online evaluation design. • Provide confidence before we start integrating algorithm to online service. • Supervised offline evaluation is easy, unsupervised is hard. !14
  • 15. Solution Research !15 • Paper, paper, paper • Learning how to solve similar problems and how we can get idea from those solutions • Research areas of machine learning • For different purposes: classification / regression / clustering … • Algorithm optimization: which kind of gradient descend function is better
  • 16. Solution Research (Cont.) • In startup, we usually focus on high level parts because: • Tuning speed • Integration - need to choose mature implementation for better production usage, e.g., scikit-learn or keras. • Feature engineering is pretty important when we only select algorithms • Small goals on solution engineering - easy to retrain !16
  • 17. Example: Product Recommendation • Problem: Give an user, we want to recommend products to the user • Ranking problem or recommendation problem • Objectives: • Business Metrics: top-k interest rate • Loss function: dependent to our solution • Offline Metrics: We evaluate top-k interest rate as performance metrics after optimization • Solution research: Matrix Factorization, Factorization machine, Deep Learning, Learning to rank… !17
  • 19. Why Engineering Needed? • Model results should be used in your products • How? !19
  • 20. Need CI / Monitoring Model Training Data Pipeline Science Engineering Serialization API Serving From Science to Engineering First Step: Export Model !20
  • 21. Serialization - Export Model • Goal: serialize your model into binary file or general format, everyone can use this for prediction. • Different serialization methods for different algorithms but same interface in different machine learning packages • Scikit-learn: https://scikit-learn.org/stable/modules/model_persistence.html • Keras: https://jovianlin.io/saving-loading-keras-models/ • … • More low level design: http://dmg.org/pmml/products.html !21
  • 22. Serialization - Export Model • Example: how to serialize logistic regression model? • scikit-learn: joblib.dump(model, path) • From scratch: need to realize the model equation • Equation: • Only save , that is a linear weight vector, and we can calculate the prediction function. • PMML is trying to define serialization interface for each algorithm !22 Pr(Yi = y|Xi) = eβ*Xi*y 1 + eβ*Xi β
  • 23. Serialization - Export Latent Features • We extract hidden vectors to represent user / items • Extract photo features with auto encoder • Extract user features with matrix factorization … • 2 Ways to export latent features • Save model, e.g., auto-encoder • Save features vectors, e.g., matrix factorization vectors !23
  • 24. Example - Matrix Factorization !24 picture: https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/ Save by user Save by product
  • 25. How to Use Model Result? • Model predict in data API • Model predict in data pipeline !25
  • 26. Predict in Data API • Model predict in Data API • Prepare data in data pipeline • Data in request payload • Can provide realtime prediction • Latency is a challenge Data API Model user data Data Warehouse user data !26 Serving Database user features DataPipeline
  • 27. Predict in Data Pipeline • Model predict in Data Pipeline • Predict result in pipeline and save to database • Backend implements logics on their side • Better API speed • Lower flexibility Data API Model Data Warehouse user data Serving Database predict extract result Components !27 DataPipeline
  • 28. Other Concerns in Engineering • How long we need to provide model results to users? • How to handle data changes? • Online performance tracking • Monitoring • CI / CD Factors to design your pipeline !28
  • 29. Conclusion • The overview of a machine learning project • Points between business problems and machine learning problems • Engineering details in a machine learning project !29
  • 30. Q & A Thanks for Listening! !30