0% found this document useful (0 votes)

13 views

2-ML Principles

Uploaded by

yamada.jpa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

2-ML Principles

Uploaded by

yamada.jpa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/328096312

Machine learning projects workﬂow

Presentation · September 2018

DOI: 10.13140/RG.2.2.19844.37767

CITATION READS

1 1,233

1 author:

Giorgio Alfredo Spedicato

Unipol Gruppo Finanziario
99 PUBLICATIONS 968 CITATIONS

SEE PROFILE

All content following this page was uploaded by Giorgio Alfredo Spedicato on 05 October 2018.

The user has requested enhancement of the downloaded file.

Machine Learning and Actuarial Science
Core Concepts

GIORGIO ALFREDO SPEDICATO, PHD FCAS FSA CSPA

UNISACT 2018
Introduction
«machine learning» and «data mining» terms mean the use of algorithms to acquire insights on
relevant amount of data.
A common subdivision of the algorithms is:
▪ Supervised learning algorithms:
▪ Regression;
▪ Binary or Multinomial classification
▪ Non – supervised learning:
▪ Clustering
▪ Dimensionality reduction
▪ Association rules, Network analysis, …

«Statistical significance» as intended by classical statistical means is not useful in ML context due
to the high amount of data on which models are trained (high «statistical power»).
ML in Actuarial Science: use cases
▪Fine tuning of frequency and severity modeling for non – life pricing, due ML models better
handle interactions between variables and non – linearities.
▪Individual claim reserving («claim level analytics»)
▪As above, for retention and conversion modeling
▪Fraud risk assessment
▪Marketing analytics
▪Recommender systems
ML Projects workflow
▪Business scope definition:
▪ business context
▪ Type of approach(supervised/ not supervised), available predictions, potential deployment issues

▪Data preparation:
▪ ETL: extraction, transformation and load;
▪ Initial descriptive analysis (univariate, bivariate plots and statistics, possible variable transformation)

▪Modeling and deployment

▪ Selection of candidate models
▪ Models’ fit
▪ Performance Assessment
▪ Deployment
Models validation

▪ML focuses on «predictive» performance instead of «explicative» power.

▪Performance metrics depends by outcome nature:
▪ Regressions: RMSE, 𝑅2 , 𝑀𝐴𝐸 …
▪ Classification: AUC/GINI, LogLoss,…

▪«predictive performance» shall be evaluated in terms of generalizability (will the fitted model
work well on unseen data).
Models validation
▪«hold – out» approach: random (or reasoned) split of available data into train, validation and/or
test sets. Models are fit on train set, possibly chosen on the validation one and predictive
performance evalutated on test set
▪«cross – validation» approach:
▪ A k integer (eg. 10, 5,…) is chosen, the original sample is split into k random folders (k-fold cv);
▪ k model «runs» are fit, every time taking out an «hold out» data set on which predictive performance is
calculated;
▪ The estimated predictive performance is the average of the k estimates.

▪«k – fold» cv is more precise, but more computational demanding.

Performance assessment: continuous
outcomes
Performance assessment: continuous
outcomes
Performance assessment: binary
outcomes
▪Metrics:
𝑇𝑃+𝑇𝑁
▪ Accuracy = 𝑃+𝑁
𝑇𝑃
▪ Sensitivity / TPR / Recall= 𝑇𝑃+𝐹𝑁
𝑇𝑁
▪ Specificity / 1-FPR = 𝑇𝑁+𝐹𝑃
TP FN TP+FN=P 𝑇𝑃
FP+TN=N
▪ Precision / PPV = 𝑇𝑃+𝐹𝑃
FP
TN 𝑇𝑁
▪ Recall = 𝑇𝑁+𝐹𝑁
𝑃𝑃𝑉∗𝑇𝑅𝑃
▪ F1-score: 2 ∗ 𝑃𝑃𝑉+𝑇𝑅𝑃
Performance assessment: ROC, AUC e
GINI
Performance assessment: loss metrics
Continuous outcomes
ෞ𝑖 −𝑦𝑖 2
σ𝑖=1..𝑛 𝑦
◦ 𝑅𝑀𝑆𝐸 =
𝑛
𝑛
σ ෞ𝑖 −𝑦𝑖 2
𝑦
◦ 𝑅2 = σ𝑖=1
𝑛 2
𝑖=1 𝑦𝑖 −𝑦𝑖
σ𝑖=1..𝑛 𝑦
ෞ𝑖 −𝑦𝑖
◦ MAE=
𝑛

Binary and multinomial outcomes:

◦ Gini = 2*AUC-1
𝑀
1 𝑀
◦ logLoss =− 𝑁 σ𝑗=1 ෍ 𝑦𝑖𝑗 log𝑝𝑖𝑗
𝑗=1

Suggestions:
◦ Use «hold out» data when possible
◦ Try to evaluate current approach performance
Deployment & Life - cycle
▪Implement «on – line» all data preparation and scoring on production IT
infrastructure
▪Necessary checks:
▪ Reasonableness of results
▪ Numerical checks on IT testing environments

▪Models’ life cycle:

▪ How often models shall be fit on fresher data?
▪ How often the modeling approach has been deeply reviewed due to different business environment?
Supervised learning: regression
▪Linear models:
▪ Normal multivariate regression;
▪ Generalized Linear Models (GLM)
▪ Linear Support Vector Machines (SVM).

▪Non linear models:

▪ Generalized Non Linear models;
▪ Mars Splines.
▪ Radial and polynomial SVM;
▪ K Nearest Neighbors (KNN)
▪ (Deep) Neural Networks

▪Trees:
▪ Single trees: es. C5.0
▪ Bagging: Bagged Trees, Random Forest
▪ Boosted Trees: gbm, xgboost, lightgbm
Supervised learning: classification
▪Linear models:
▪ GLM (logistic, multinomial) possibly using non-linear or additive (splines) terms;
▪ Linear/Quadratic Discriminant Analysis
▪Non linear models:
▪ MARS Splines
▪ KNN
▪ SVM
▪ Naive Bayes
▪ (Deep) Neural Networks
▪Tree based approaches:
▪ Single trees: CHAID, C50
▪ Bagging (Random Forest)
▪ Boosting (GBM, XGBoost, LightGBM)
Unsupervised modeling
Clustering:
◦ Hierarchical clustering
◦ KMeans
◦ DBSCAN, OPTICS, …

Dimension reduction:
◦ PCA, Factor Analysis,…
◦ GLRM

Hybrid models:
◦ Arules
◦ Word2vec
Linear Discriminant Analysis
Support Vector Machines
▪Mathematical function (possibly non linear) creating separating regions in variables space.
▪Can be used both in classification and regression problems.
▪Issues:
▪ Computational complexity (o n3 )
▪ No automatic feature selection
▪ Hard interpretability
SVM: Kernels
MARS Splines
▪Multivariate Adaptive Regression Splines are based on hinge functions, es k1 max 0; 𝑥 − 𝑐 +
k 2 max 0; 𝑐 − 𝑥 to model the relation between predictors and the outcome.
▪Pros:
▪ Handlig both numeric and categorical data, interpretable non – linearity handling
▪ Allow for feature selection

▪Cons:
▪ More performant ML models exists
▪ Computational complexity reduce their ability to handle large data sets.
MARS Splines
KNN
▪KNN uses k-neighbors average value to predict new samples; K depends by the data set.
▪Can be used both for regression and classification
▪Pros:
▪ Easy and intuitive

▪Cons:
▪ Usually the fit is inferior than the one of other models.
▪ Computational complexity: 𝑂(𝑛 𝑑 + 𝑘 )
KNN
Hierarchical Clustering
General algorithm:
1. A distance metric is defined
𝑛
2. All 2
distance pairs are computed
3. Closest pairs are combined and the algorithm starts again

▪Pros:
▪ Different distance metrics can be used
▪ Visual output (dendrogram)

▪Cons:
▪ 𝑂(𝑛3 𝑑) vs the 𝑂(𝑛 ∗ 𝑘 ∗ 𝑑) of Kmeans
▪ Subjective choice of distance threshold to define the cluster.
Hierarchical Clustering: dendrogram
ARULES

Market basket analysis. Typically used to suggest most problable element that completes a set (es.
different insurance covers for personal business)
It infers rules from a binary transaction set based on probabilistic rules.
It can infer if-then rules in the form as «If you own A and B, then you can be insterested at C».
ARULES
TOOLS: H2O
▪Java based ML library that implements efficiently optimized version of broadly used ML algorithms:
▪ Supervised: GLM, Random Forest, GBM, XGBoost, Deep Learning, Naive Bayes and Stacked
Ensemble.
▪ Unsupervised: PCA, KMEANS, GLRM
▪Functions:
▪ Open source tool,
▪ It interfaces with R and Python using dedicated libraries;
▪ Docs: http://docs.h2o.ai/
▪It can be used:
▪ On desktop workstation (using multicore approach);
▪ On PC clusters;
▪ A dedicated version implements GPU calculations.
TOOLS: ML Models wrappers
• R ML libraries that allows a tidy implementation of ETL, tuning and assessing models
performance.
• Different ML models can be fit and compared using a unified approach.
• Available libraries are:
• R: caret e mlr
• Python: scikit-learn
ML Interpretability
▪ML model opacity negatively affected their diffusion and populatiryt in many context, desplite
they often offer significantly superior performance compared to traditional methods.
▪Thus, recent research focused on implementing algorithm to ease models assessment and
interpretability (general and local). Statistical libraries that implements such algorithms are LIME
e DALEX
▪Tools to ease interpretability are:
▪ Residuals distribution;
▪ Variable importance analysis;
▪ Partial dependency plots;
▪ Predictions breakdown
ML Interpretability: residuals analysis
RESIDUALS ANALYSIS RESIDUALS ANALYSIS
ML interpretability: variable importance
analysis
ML interpretability: marginal effects plot
Marginal effects plot Partial groups of categorical predictors
ML interpretability: marginal effects plot

View publication stats

Unified IPCRF (HT & ASPII) SY 21-22
No ratings yet
Unified IPCRF (HT & ASPII) SY 21-22
31 pages
Kubernetes Notes
No ratings yet
Kubernetes Notes
36 pages
Glossier-case-PrePub Final
No ratings yet
Glossier-case-PrePub Final
17 pages
GATE ML Updated 111023
No ratings yet
GATE ML Updated 111023
109 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
ML (AutoRecovered)
No ratings yet
ML (AutoRecovered)
5 pages
ML notes
No ratings yet
ML notes
16 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Assignment
No ratings yet
Assignment
5 pages
ML_notion_1
No ratings yet
ML_notion_1
18 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
ML Revision
No ratings yet
ML Revision
207 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
MLR 3 Book
100% (1)
MLR 3 Book
291 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Lecture Notes on Machine Learning Concepts.docx
No ratings yet
Lecture Notes on Machine Learning Concepts.docx
5 pages
Ass bigd
No ratings yet
Ass bigd
9 pages
Methods and Models
No ratings yet
Methods and Models
12 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
ML Notes
No ratings yet
ML Notes
79 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
AI and ML For Business Antim Prahar WITH ANSWERS
No ratings yet
AI and ML For Business Antim Prahar WITH ANSWERS
26 pages
SML
No ratings yet
SML
8 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
PRCV Unit-2
No ratings yet
PRCV Unit-2
24 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
algorithmeknn-121213175830-phpapp02
No ratings yet
algorithmeknn-121213175830-phpapp02
52 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Lesson 4 -Introduction Machine Learning
No ratings yet
Lesson 4 -Introduction Machine Learning
44 pages
Machine Learning for Data Science Unit-4
No ratings yet
Machine Learning for Data Science Unit-4
16 pages
Tema6 Models for AI and ML
No ratings yet
Tema6 Models for AI and ML
77 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Final ML
No ratings yet
Final ML
2 pages
Ml Notes All
No ratings yet
Ml Notes All
32 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Chapter 2 Machine Learning Draft-85-172
No ratings yet
Chapter 2 Machine Learning Draft-85-172
88 pages
MLR PDF
No ratings yet
MLR PDF
2 pages
Rohit Unit 1 ML Notes
No ratings yet
Rohit Unit 1 ML Notes
27 pages
1 Introduction
No ratings yet
1 Introduction
58 pages
Machine Learning With R
No ratings yet
Machine Learning With R
2 pages
ML Midterm Cheatsheet
No ratings yet
ML Midterm Cheatsheet
2 pages
CT1-MLOPs_S1_2
No ratings yet
CT1-MLOPs_S1_2
68 pages
All About ML
No ratings yet
All About ML
18 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Machine Learning
No ratings yet
Machine Learning
55 pages
Machine Learning
No ratings yet
Machine Learning
95 pages
Machine Learning Presentation
No ratings yet
Machine Learning Presentation
12 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
Scala for Machine Learning Second Edition Patrick R. Nicolasdownload
100% (1)
Scala for Machine Learning Second Edition Patrick R. Nicolasdownload
54 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
The Comprehensive Guide to Machine Learning Algorithms and Techniques
From Everand
The Comprehensive Guide to Machine Learning Algorithms and Techniques
Mohammed Ahmed
5/5 (1)
data science course training in india hyderabad: innomatics research labs
From Everand
data science course training in india hyderabad: innomatics research labs
innomatics research labs
No ratings yet
Calibration Process July 22
No ratings yet
Calibration Process July 22
10 pages
Rowan Hart Resume 2018
No ratings yet
Rowan Hart Resume 2018
1 page
ADET - Lesson 1
No ratings yet
ADET - Lesson 1
21 pages
B.SC .-CS 2017-2014
No ratings yet
B.SC .-CS 2017-2014
123 pages
Sample Creative Dance Resume
No ratings yet
Sample Creative Dance Resume
3 pages
Numerical Method For Engineers-Chapter 19
No ratings yet
Numerical Method For Engineers-Chapter 19
25 pages
Week 8
No ratings yet
Week 8
79 pages
SL210-211 - Migrating To OO Programming With Java - Oh - 0499
No ratings yet
SL210-211 - Migrating To OO Programming With Java - Oh - 0499
175 pages
College Management System
No ratings yet
College Management System
78 pages
Cisco Adaptive Security Virtual Appliance (Asav)
No ratings yet
Cisco Adaptive Security Virtual Appliance (Asav)
8 pages
Engage_Developer_Advocates[Influitive]
No ratings yet
Engage_Developer_Advocates[Influitive]
13 pages
results - 2024-09-11T191556.889
No ratings yet
results - 2024-09-11T191556.889
77 pages
ALUCOBOND AXCENT Programs Color Chart
No ratings yet
ALUCOBOND AXCENT Programs Color Chart
4 pages
Employability Skills X Syllabus
No ratings yet
Employability Skills X Syllabus
3 pages
Ufo Led High Bay Light
No ratings yet
Ufo Led High Bay Light
1 page
Tabla D3 Coeficiente U
No ratings yet
Tabla D3 Coeficiente U
1 page
Apelem Kristal X-Ray Table - Technical Manual
No ratings yet
Apelem Kristal X-Ray Table - Technical Manual
430 pages
Formatted_IT_Capstone_Project_Proposal_Manuscript_Template (1)
No ratings yet
Formatted_IT_Capstone_Project_Proposal_Manuscript_Template (1)
41 pages
Quarterly Progress Report On Progressive Use of O L Hindi (Sections) STR - 18-A
No ratings yet
Quarterly Progress Report On Progressive Use of O L Hindi (Sections) STR - 18-A
2 pages
EX NO 1 Micro Controller 7segment
No ratings yet
EX NO 1 Micro Controller 7segment
5 pages
Assignment/ Tugasan - Statistics/Statistik
No ratings yet
Assignment/ Tugasan - Statistics/Statistik
10 pages
Inset Budget
No ratings yet
Inset Budget
2 pages
BSBPMG522 Assessor Marking Guide
No ratings yet
BSBPMG522 Assessor Marking Guide
44 pages
ZBOX-EN52060V Brochure
No ratings yet
ZBOX-EN52060V Brochure
1 page
Communication Skills Inenglish Model Question Paper
No ratings yet
Communication Skills Inenglish Model Question Paper
7 pages
g6016科勒蓄电池
No ratings yet
g6016科勒蓄电池
2 pages
UPS Power Systems With Generator Sets: Your Reliable Guide For Power Solutions
No ratings yet
UPS Power Systems With Generator Sets: Your Reliable Guide For Power Solutions
2 pages

2-ML Principles

Uploaded by

2-ML Principles

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Machine learning projects workﬂow

Presentation · September 2018

Giorgio Alfredo Spedicato

The user has requested enhancement of the downloaded file.

GIORGIO ALFREDO SPEDICATO, PHD FCAS FSA CSPA

▪Modeling and deployment

▪ML focuses on «predictive» performance instead of «explicative» power.

▪«k – fold» cv is more precise, but more computational demanding.

Binary and multinomial outcomes:

▪Models’ life cycle:

▪Non linear models:

View publication stats

You might also like