MachineLearningSparkML.pptx
Machine Learning with Spark MLlib
Manuel Martín Márquez
Antonio Romero Marin
Joeri Hermans
Hadoop Tutorials
Machine Learning (ML)
• ML is a branch of artificial intelligence:
• Uses computing based systems to make sense out
of data
• Extracting patterns, fitting data to functions, classifying
data, etc
• ML systems can learn and improve
• With historical data, time and experience
• Bridges theoretical computer science and real noise
data.
3
ML in real-life
4
Supervised and Unsupervised Learning
• Unsupervised Learning
• There are not predefined and known set of outcomes
• Look for hidden patterns and relations in the data
• A typical example: Clustering
5
0.0
0.5
1.0
1.5
2.0
2.5
2 4 6
Petal.Length
Petal.Width
irisCluster$cluster
1
2
3
Supervised and Unsupervised Learning
• Supervised Learning
• For every example in the data there is always a predefined
outcome
• Models the relations between a set of descriptive features and
a target (Fits data to a function)
• 2 groups of problems:
• Classification
• Regression
6
Supervised Learning
• Classification
• Predicts which class a given sample of data (sample of
descriptive features) is part of (discrete value).
• Regression
• Predicts continuous values.
7
100.0
0.0
0.0
0.0
96.0
4.0
4.0
0.0
96.0
setosa
versicolor
virginica
setosa versicolor virginica
Actual
Predicted
0
25
50
75
100
Percent
Machine Learning as a Process
Define
Objectives
Data
Preparation
Model
Building
Model
Evaluation
Model
Deployment
8
- Define measurable and quantifiable goals
- Use this stage to learn about the problem
- Normalization
- Transformation
- Missing Values
- Outliers
- Data Splitting
- Features Engineering
- Estimating Performance
- Evaluation and Model
Selection
- Study models accuracy
- Work better than the naïve
approach or previous system
- Do the results make sense in
the context of the problem
ML as a Process: Data Preparation
9
• Needed for several reasons
• Some Models have strict data requirements
• Scale of the data, data point intervals, etc
• Some characteristics of the data may impact dramatically on the
model performance
• Time on data preparation should not be underestimated
• Missing
Values
• Error Values
• Different
Scales
• Dimensionality
• Types
Problems
• Many others
Raw
Data
• Scaling
• Centering
• Skewness
• Outliers
• Missing
Values
• Errors
Data
Transfor
mation
Modeling
phase
Data
Ready
ML as a Process: Feature engineering
10
• Determine the predictors (features) to be used is one of the most critical
questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative
predictors
• Binning predictors
Wrappers
Multiple models
adding and
removing parameter
Algorithms that use
models as input and
performance as
output
Genetics Algorithms
Filters
Evaluate the
relevance of the
predictor
Based normally on
correlations
ML as a Process: Model Building
11
• Data Splitting
• Allocate data to different tasks
• model training
• performance evaluation
• Define Training, Validation and Test sets
• Feature Selection (Review the decision made previously)
• Estimating Performance
• Visualization of results – discovery interesting areas of the problem space
• Statistics and performance measures
• Evaluation and Model selection
• The ‘no free lunch’ theorem no a priory assumptions can be made
• Avoid use of favorite models if NEEDED
• Thank you
11/8/2022 Document reference 12

More Related Content

PPTX
MachineLearningSparkML.pptx
PPTX
MachineLearningSparkML.pptx
PPTX
MachineLearningSparkML AI and expert Systems
PPTX
MachineLearningSparkML.pptx
PPTX
Module III MachineLearningSparkML.pptx
PPTX
MachineLearning Seminar PPT.pptx
PPTX
(Faiz) MachineLearning(ppt).pptx
PPTX
artificial intelligence.pptx
MachineLearningSparkML.pptx
MachineLearningSparkML.pptx
MachineLearningSparkML AI and expert Systems
MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptx
MachineLearning Seminar PPT.pptx
(Faiz) MachineLearning(ppt).pptx
artificial intelligence.pptx

Similar to MachineLearningSparkML.pptx (20)

PPTX
MLfinel PPT.pptx zvsbajajsn a ankakaakbsbabananan
PPTX
Machine learning ppt for presentation 20 slides
PPTX
Machine learning
PPTX
machine learning 67589.pptx
PPTX
deeplearning 67589.pptx
PPTX
Ml leaning this ppt display number of mltypes.pptx
PPTX
Foundations-of-Machine-Learning_in Engineering.pptx
PDF
newmicrosoftpowerpointpresentation-210512111200.pdf
PPTX
Introduction to ML (Machine Learning)
PPTX
machinelearningoverview-250809184828-927201d2.pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
S3_ML Introduction.pdf
PDF
Machine Learning Basic in Computer Science.pdf
PPTX
Machine_Learning_Presentation.pptx application
PDF
Introduction to Machine Learning Techniques
PPTX
Machine Learning Essentials Demystified part1 | Big Data Demystified
PPTX
machine learning workflow with data input.pptx
PDF
Machine Learning_Unit 2_Full.ppt.pdf
PDF
Europython - Machine Learning for dummies with Python
PDF
Machine learning for IoT - unpacking the blackbox
MLfinel PPT.pptx zvsbajajsn a ankakaakbsbabananan
Machine learning ppt for presentation 20 slides
Machine learning
machine learning 67589.pptx
deeplearning 67589.pptx
Ml leaning this ppt display number of mltypes.pptx
Foundations-of-Machine-Learning_in Engineering.pptx
newmicrosoftpowerpointpresentation-210512111200.pdf
Introduction to ML (Machine Learning)
machinelearningoverview-250809184828-927201d2.pptx
Machine Learning_overview_presentation.pptx
S3_ML Introduction.pdf
Machine Learning Basic in Computer Science.pdf
Machine_Learning_Presentation.pptx application
Introduction to Machine Learning Techniques
Machine Learning Essentials Demystified part1 | Big Data Demystified
machine learning workflow with data input.pptx
Machine Learning_Unit 2_Full.ppt.pdf
Europython - Machine Learning for dummies with Python
Machine learning for IoT - unpacking the blackbox

Recently uploaded (20)

PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Consumable AI The What, Why & How for Small Teams.pdf
DOCX
search engine optimization ppt fir known well about this
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
STKI Israel Market Study 2025 version august
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
UiPath Agentic Automation session 1: RPA to Agents
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Consumable AI The What, Why & How for Small Teams.pdf
search engine optimization ppt fir known well about this
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
NewMind AI Weekly Chronicles – August ’25 Week III
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
OpenACC and Open Hackathons Monthly Highlights July 2025
Improvisation in detection of pomegranate leaf disease using transfer learni...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
STKI Israel Market Study 2025 version august
Basics of Cloud Computing - Cloud Ecosystem
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...

MachineLearningSparkML.pptx

  • 2. Machine Learning with Spark MLlib Manuel Martín Márquez Antonio Romero Marin Joeri Hermans Hadoop Tutorials
  • 3. Machine Learning (ML) • ML is a branch of artificial intelligence: • Uses computing based systems to make sense out of data • Extracting patterns, fitting data to functions, classifying data, etc • ML systems can learn and improve • With historical data, time and experience • Bridges theoretical computer science and real noise data. 3
  • 5. Supervised and Unsupervised Learning • Unsupervised Learning • There are not predefined and known set of outcomes • Look for hidden patterns and relations in the data • A typical example: Clustering 5 0.0 0.5 1.0 1.5 2.0 2.5 2 4 6 Petal.Length Petal.Width irisCluster$cluster 1 2 3
  • 6. Supervised and Unsupervised Learning • Supervised Learning • For every example in the data there is always a predefined outcome • Models the relations between a set of descriptive features and a target (Fits data to a function) • 2 groups of problems: • Classification • Regression 6
  • 7. Supervised Learning • Classification • Predicts which class a given sample of data (sample of descriptive features) is part of (discrete value). • Regression • Predicts continuous values. 7 100.0 0.0 0.0 0.0 96.0 4.0 4.0 0.0 96.0 setosa versicolor virginica setosa versicolor virginica Actual Predicted 0 25 50 75 100 Percent
  • 8. Machine Learning as a Process Define Objectives Data Preparation Model Building Model Evaluation Model Deployment 8 - Define measurable and quantifiable goals - Use this stage to learn about the problem - Normalization - Transformation - Missing Values - Outliers - Data Splitting - Features Engineering - Estimating Performance - Evaluation and Model Selection - Study models accuracy - Work better than the naïve approach or previous system - Do the results make sense in the context of the problem
  • 9. ML as a Process: Data Preparation 9 • Needed for several reasons • Some Models have strict data requirements • Scale of the data, data point intervals, etc • Some characteristics of the data may impact dramatically on the model performance • Time on data preparation should not be underestimated • Missing Values • Error Values • Different Scales • Dimensionality • Types Problems • Many others Raw Data • Scaling • Centering • Skewness • Outliers • Missing Values • Errors Data Transfor mation Modeling phase Data Ready
  • 10. ML as a Process: Feature engineering 10 • Determine the predictors (features) to be used is one of the most critical questions • Some times we need to add predictors • Reduce Number: • Fewer predictors more interpretable model and less costly • Most of the models are affected by high dimensionality, specially for non-informative predictors • Binning predictors Wrappers Multiple models adding and removing parameter Algorithms that use models as input and performance as output Genetics Algorithms Filters Evaluate the relevance of the predictor Based normally on correlations
  • 11. ML as a Process: Model Building 11 • Data Splitting • Allocate data to different tasks • model training • performance evaluation • Define Training, Validation and Test sets • Feature Selection (Review the decision made previously) • Estimating Performance • Visualization of results – discovery interesting areas of the problem space • Statistics and performance measures • Evaluation and Model selection • The ‘no free lunch’ theorem no a priory assumptions can be made • Avoid use of favorite models if NEEDED
  • 12. • Thank you 11/8/2022 Document reference 12

Editor's Notes

  • #6: ML methods fall into two learning types Unsupervised Suppose you want to segment your customers into general categories of people with similar buying patterns.
  • #7: More formally fits data to a function or a function approximation
  • #8: More formally fits data to a function or a function approximation
  • #9: More formally fits data to a function or a function Adding Roles
  • #10: Add Examples
  • #11: Random Forest (tree based) MARS and LASSO internally perform predictor selection Add Examples
  • #12: there is no one single model that will works better than any other a priory