0% found this document useful (0 votes)

8 views

Lecture 2.2 Example Data Preparation Feature Engineering

The document provides an overview of machine learning, explaining its goal of learning patterns from examples and generalizing them to new instances. It distinguishes between supervised and unsupervised learning, detailing the processes involved in model fitting, including data splitting, tuning, and evaluation. Additionally, it discusses various algorithms used in supervised learning and emphasizes the importance of avoiding overfitting while maximizing model performance.

Uploaded by

revaldochetie092

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Lecture 2.2 Example Data Preparation Feature Engineering

Uploaded by

revaldochetie092

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Feature Engineering

What is Machine Learning?

Simple
How machines learn rules from examples.
definition:

Goal of any machine learning:

• Learn patterns from examples

• Be able to generalize them to new examples

Supervised and unsupervised machine learning:

 In both cases learning is achieved through examples!

What is Machine Learning?
A program is said to learn from experience E with regard to
Formal
task T and performance measure P, if its performance on task T
definition:
improves with experience.

# Task Experience Performance Measure

Recognize Set of digits with Percent of correct
1
handwritten digits labels recognitions
Predict length of
2 Patient histories Mean prediction error
hospital stay
Recommend Netflix
3 Viewing histories # users viewing show
shows
Some motivating examples

Early detection of Identifying vulnerable Predicting

disease outbreaks buildings for retrofit transport demand

Preventing violent Reducing CO2 Targeting fire

crime emissions risk inspections

Acknowledgment: D. Neill, Machine Learning for Cities, CUSP NYU

An approach to model fitting
Five main steps:

Use Case & Model Tune

Predict Evaluate
Data training (calibrate)

Determine Split data into Tune model Use the tuned Compare the
question of training and parameters model to form predictions
interest, get test sets. Fit predictions with the
informative model to about your actual values
data. training set. test set for the test set

Variables of interest are categorical, supported by classification

or numerical, supported by regression
Unsupervised Learning
• The only thing we have is input data.
• Labels are not provided by a supervisor.

What does an unsupervised algorithm do?

Extract patterns in the data.
Create clusters whose members are similar (based on some set of measurements).

 Example: Take raw data on visitors to my website

 Segment them into groups that share same characteristics; target ads.
Supervised Learning
• The machine learns from examples that have already been labelled.
• Each example has input values (attributes) and an output value.

Example: A spam classifier learns rules from this training set of emails1

Goal:
 Use known output values to learn the patterns of the input.
 Predict the output value of new examples.

Image credit: Géron, Hands-On Machine Learning

Supervised Learning algorithms

Linear regression
• Models output as linear combination of inputs
 Fast to train, effective on high-dimensional data.

Support Vector Machines

• Learns a decision boundary (linear or non-linear)
 Suits complex, medium-size datasets

Decision trees and Random Forest

• Builds flow-chart style rules that maximize information gain
 High predictive power, requires less data preparation.

Neural networks
• Algorithms inspired by structure and function of the brain.
• Scalable, highly accurate on tasks like image recognition.
Building a model
Use Case & Model
Tune Predict Evaluate
Data training
Use Case & Model
Tune Predict Evaluate
Data training
Build labeled dataset for question of interest
Use Case & Model
Tune Predict Evaluate
Data training
Split training and test data

When fitting ML algorithms, it is common to separate

data into training and test sets
Split the dataset
Dataset
(e.g. 70/30 ratio)

Build model on
the training set

Training set Test set Evaluate model on

(70% of records) (30%)
the test set

Image credit: D. Ziganto “Standard Deviations” blog

Use Case & Model
Tune Predict Evaluate
Data training
Complexity vs. accuracy
We can build models of lower or higher complexity by
changing their hyper-parameters.
Aim for the ‘sweet spot’ that maximizes performance but
avoids overfitting.

* Overfitting: a complex model that memorizes the test set (including noise in it)
but fails to generalize to new data.
Complexity vs. accuracy
We can build models of lower or higher complexity by
changing their hyper-parameters.
Aim for the ‘sweet spot’ that maximizes performance but
avoids overfitting.

* Overfitting: a complex model that memorizes the test set (including noise in it)
but fails to generalize to new data.
Use Case & Model
Tune Predict Evaluate
Data training
Make predictions

With the model tuned and fitted to training data, we can

predict outcomes for test set, ensure its performance is
satisfactory, and deploy.

Figure: Object detection in images

Image: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2016)
EXAMPLE:
Predicting mode of
transport and music
taste
Decision tree model for transport planning
Scenario: The World Bank has hired a cohort of 100 new staff, who start this
summer. GSD needs to decide how many bike racks or parking spaces to build for
them.
Decision tree model for transport planning
Scenario: The World Bank has hired a cohort of 100 new staff, who start this
summer. GSD needs to decide how many bike racks or parking spaces to build for
them.

Attributes (𝑿𝟏 … 𝑿𝑵 ) Target variable (y)

From this training set, construct

set of rules to predict mode of
transport for unseen examples.
Decision tree model for transport planning
Scenario: The World Bank has hired a talented cohort of 100 new staff, who start
after Thanksgiving. GSD needs to decide how many bike racks or parking spaces to
build for them.

Mix of home states and ages

Decision tree model for transport planning
Scenario: The World Bank has hired a talented cohort of 100 new staff, who start
after Thanksgiving. GSD needs to decide how many bike racks or parking spaces to
build for them.

High enjoyment of Netflix

Use Case & Model
Tune Predict Evaluate
Data training

Classification
No ratings yet
Classification
53 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
Machine Learning Notes "2023
No ratings yet
Machine Learning Notes "2023
31 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
ML Iat 1
No ratings yet
ML Iat 1
23 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Lecture 15 - Recap and Midterm Review
No ratings yet
Lecture 15 - Recap and Midterm Review
37 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
Chapter 7 Learning
No ratings yet
Chapter 7 Learning
34 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Machine Leaning 1 unit
No ratings yet
Machine Leaning 1 unit
10 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
ML Bu
No ratings yet
ML Bu
31 pages
AAI Lecture 9 Sp 25
No ratings yet
AAI Lecture 9 Sp 25
26 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
Mod 1
No ratings yet
Mod 1
15 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
Week 15
No ratings yet
Week 15
41 pages
I. The Types of Machine Learning
No ratings yet
I. The Types of Machine Learning
8 pages
Chapter 4- Machine Learning
No ratings yet
Chapter 4- Machine Learning
81 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
27 pages
19_ML_intro
No ratings yet
19_ML_intro
33 pages
Lec2 Intro to ML
No ratings yet
Lec2 Intro to ML
35 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Data Science-Unit-4- 05.10.23
No ratings yet
Data Science-Unit-4- 05.10.23
59 pages
0 Machine Learning Overview and Metrics LT
No ratings yet
0 Machine Learning Overview and Metrics LT
84 pages
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
No ratings yet
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
17 pages
Aws ML PDF
No ratings yet
Aws ML PDF
74 pages
AI
No ratings yet
AI
52 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
04 Machine Learning Overview
No ratings yet
04 Machine Learning Overview
109 pages
04 Machine Learning Overview
No ratings yet
04 Machine Learning Overview
109 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Module 1 ML
No ratings yet
Module 1 ML
78 pages
04 Machine Learning Overview
No ratings yet
04 Machine Learning Overview
109 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
ML HAND WRITTEN NOTES
No ratings yet
ML HAND WRITTEN NOTES
19 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
DIR Notes 1
No ratings yet
DIR Notes 1
39 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Knn1 HouseVotes
No ratings yet
Knn1 HouseVotes
2 pages
CT-MCQ
No ratings yet
CT-MCQ
10 pages
Traffic Flow With Example
No ratings yet
Traffic Flow With Example
59 pages
850-Article Text-4036-3-10-20200929
No ratings yet
850-Article Text-4036-3-10-20200929
8 pages
IISc ME CSA
0% (1)
IISc ME CSA
23 pages
Application of Jacobian Series
No ratings yet
Application of Jacobian Series
6 pages
2504.16207v1
No ratings yet
2504.16207v1
8 pages
DAA Assignment 4
No ratings yet
DAA Assignment 4
7 pages
BUT170757
No ratings yet
BUT170757
7 pages
CQF Module 4 Exam - June 2015 Cohort
No ratings yet
CQF Module 4 Exam - June 2015 Cohort
2 pages
Analog Filters (III) Chebyshev Filters: Yogananda Isukapalli
No ratings yet
Analog Filters (III) Chebyshev Filters: Yogananda Isukapalli
14 pages
Lec 1,2
No ratings yet
Lec 1,2
69 pages
K Means Clustering Algorithm - BECOC316
No ratings yet
K Means Clustering Algorithm - BECOC316
5 pages
Cross Domain Sentiment Analysis
No ratings yet
Cross Domain Sentiment Analysis
17 pages
Chapter 4 Quiz - Evidence7 PC4.1-PC4.4 Attempt Review
No ratings yet
Chapter 4 Quiz - Evidence7 PC4.1-PC4.4 Attempt Review
1 page
Unit I Statistics - MA231TB
No ratings yet
Unit I Statistics - MA231TB
26 pages
Sensors 22 00650 v2
No ratings yet
Sensors 22 00650 v2
10 pages
Notes On CFD
No ratings yet
Notes On CFD
3 pages
Optimal Monetary Policy - Lecture Notes
No ratings yet
Optimal Monetary Policy - Lecture Notes
6 pages
Implementation of Single Channel Queuing Modelto Enhance Banking Services
No ratings yet
Implementation of Single Channel Queuing Modelto Enhance Banking Services
8 pages
Presentation Topics for ECE-B
No ratings yet
Presentation Topics for ECE-B
4 pages
Lecture 4 Time Series
No ratings yet
Lecture 4 Time Series
12 pages
Business Analytics BA - BA4206 - Important 2 Marks Questions with Answers - Part 3
No ratings yet
Business Analytics BA - BA4206 - Important 2 Marks Questions with Answers - Part 3
15 pages
Yury V. Orlov - Discontinuous Systems - Lyapunov Analysis and Robust Synthesis Under Uncertainty Conditions-Springer-Verlag London (2009) PDF
No ratings yet
Yury V. Orlov - Discontinuous Systems - Lyapunov Analysis and Robust Synthesis Under Uncertainty Conditions-Springer-Verlag London (2009) PDF
333 pages
Testing Software and Systems (2011)
No ratings yet
Testing Software and Systems (2011)
236 pages
Du 等 - 2024 - A Few-Shot Class-Incremental Learning Method for N
No ratings yet
Du 等 - 2024 - A Few-Shot Class-Incremental Learning Method for N
13 pages
Effect of PD Controller EIE18018
No ratings yet
Effect of PD Controller EIE18018
14 pages
HW 3
No ratings yet
HW 3
3 pages
Cassandra
No ratings yet
Cassandra
25 pages
PR 10
No ratings yet
PR 10
9 pages

Lecture 2.2 Example Data Preparation Feature Engineering

Uploaded by

Lecture 2.2 Example Data Preparation Feature Engineering

Uploaded by

Feature Engineering

What is Machine Learning?

Goal of any machine learning:

• Learn patterns from examples

Supervised and unsupervised machine learning:

 In both cases learning is achieved through examples!

# Task Experience Performance Measure

Early detection of Identifying vulnerable Predicting

Preventing violent Reducing CO2 Targeting fire

Acknowledgment: D. Neill, Machine Learning for Cities, CUSP NYU

Use Case & Model Tune

Variables of interest are categorical, supported by classification

What does an unsupervised algorithm do?

 Example: Take raw data on visitors to my website

Image credit: Géron, Hands-On Machine Learning

Support Vector Machines

Decision trees and Random Forest

When fitting ML algorithms, it is common to separate

Training set Test set Evaluate model on

Image credit: D. Ziganto “Standard Deviations” blog

With the model tuned and fitted to training data, we can

Figure: Object detection in images

Attributes (𝑿𝟏 … 𝑿𝑵 ) Target variable (y)

From this training set, construct

Mix of home states and ages

High enjoyment of Netflix

You might also like