0% found this document useful (0 votes)

22 views29 pages

Lecture 7

This lecture focuses on supervised learning, particularly linear regression, and introduces concepts like overfitting, regularization, and cross-validation. It covers methods such as Lasso and Ridge regression for managing overfitting and emphasizes the importance of hyperparameter tuning and K-fold cross-validation for model selection. The session includes practical Python applications to reinforce these concepts.

Uploaded by

Geetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views29 pages

Lecture 7

Uploaded by

Geetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Programming for Data Science

Lecture 7 – Supervised Learning, Continued.

Thomas Lavastida
University of Texas at Dallas
[email protected]
Spring 2023
Agenda

• Assignment 2 Review
• Quick review of Supervised Learning and Linear Regression
• Linear Regression in Python
• Start Regularization and Cross Validation

2
Assignment 2 Review
Supervised Learning and Regression Review
Supervised Learning

• Given – labelled data points

• – features, independent variables, predictors, columns, etc.
• – target, dependent variable, outcome, etc.
• Continuous -> then we call this regression
• Discrete/categorical -> then we call this classification

• Goal: Find a mapping/function from ’s to ’s such that

Linear Regression

• Simple class of regression models

• Let be independent variables
• Model parameters (one for each indep. variable)
• Predicted outcome computed via a linear function:

• Compute ’s by minimizing average squared error

Overfitting

• As model gets more complex it can fit data

more closely
• New data we see (and want to make
predictions about) may not be fit well (i.e.,
high error)
• This is called overfitting

• Main idea to deal with this -> split into train

and test set
• Training set – used to compute model
parameters
• Test set – used to estimate accuracy of model
on new data
PYTHON PRACTICE
Review: Overfitting

• Model with overfitting problem

• Nice performance for data in hand
• Poor predictive accuracy for new dataset

• Solution 1 – Splitting data

• Training set: train the model (get parameters)
• Test set: evaluate performance

• Solution 2 – Regularization
Regularization – Intuition

• Overfitting occurrence: Too many variables

• True relationship: 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀

• Fit the data w/ 10th degree polynomial

𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 +… + 𝛽 10 𝑥 10 + 𝜀

Fewer variables
Regularization – Intuition (Cont.)

• Overfitting occurrence: Large variance/fluctuation

• Large coefficient => large fluctuation

• Under the same scale

• Green: 4 3 2
𝑓 ( 𝑥 ) =− 𝑥 +7 𝑥 − 5 𝑥 − 31 𝑥 +30

• Blue: 1
𝑔 ( 𝑥 )=− 𝑓 ( 𝑥)
5

Smaller coefficients

https://www.datacamp.com/community/tutorials/towards-preventing-overfitting-regularization
Regularization – Intuition (Cont.)

• What we need
• Smaller coefficients (coefficient closer to 0)
• Fewer variables (coefficient = 0)

• Penalize the magnitude of coefficients

• Regularization
• Modify our original linear regression model
• Add terms to penalize the magnitude of coefficients
Regularization

• Linear regression (fit only)

• Minimize the error between actual and predicted value
𝑛
𝑓 (𝝎)=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) )
2

𝑖=1

• Regularization (fit and overfit)

• Minimize the error between predicted and actual examples
• Penalize the coefficient magnitude of features

𝑛
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝑃𝑒𝑛𝑎𝑙𝑡𝑦(𝝎)
2

𝑖=1
Regularization – Two Methods

𝑛
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝑃𝑒𝑛𝑎𝑙𝑡𝑦(𝝎)
2

𝑖=1
Shrinkage Penalty

• Two formulation of shrinkage penalty

• L2 regularization: equivalent to the square of coefficient magnitude
=> Ridge regression

• L1 regularization: equivalent to absolute value of coefficient magnitude

=> Lasso regression
Ridge Regression

• Linear regression with L2 regularization (square of parameters)

• Minimize function:
𝑛 𝑘
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝜆 ∑ 𝜔
2 2 Shrinkage Penalty
𝑗
𝑖=1 𝑗=1
where

• Large magnitude increases

• the amount of penalty

Ridge Regression – Tuning Parameter

𝑛 𝑘
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝜆 ∑ 𝜔
2 2
𝑗
𝑖=1 𝑗=1

• the amount of penalty

• => A linear regression
• => All coefficients would be zero
• Higher , more penalty, smaller coefficients

• – hyperparameter
• NOT estimated with other parameters
• Set “manually” before model estimation
LASSO

• Linear regression with L1 regularization (absolute value of parameters)

𝑛 𝑘
𝑓 ( 𝝎 )=∑ ( 𝑦 𝑖 − ( 𝝎 𝑥 𝑖 +𝑏 ) ) + 𝜆 ∑ |𝜔 𝑗|
2

𝑖=1 𝑗=1
where .

• L1 penalty can force some coefficient estimates to be exactly zero

• Combines the shrinking advantage of ridge regression with variable selection

• LASSO: Least absolute shrinkage and selection operator

Hyperparameter Tuning and Cross
Validation
Hyperparameter Tuning

• Hyperparameters – set before running the model

• Examples
• LASSO and Ridge –
• Polynomial – degree of polynomial ()

• Intuition of tuning (polynomial case)

• Start by some potential values,
• For each , run the model
• Select the model with the best performance
Tuning Method – Grid Search

• Try all possible hyperparameters of interest

• Most commonly used method for hyperparameter tuning

• Polynomial regression case

• Define a set of potential polynomial degrees
• Estimate, evaluate, choose
Degree MSE Values

Lowest value, selected model

…

• Select the model with best performance … on which dataset?

Data Splitting – Model Training

• Model selection?
Labeled Data
• For each model, get performance
measure in test set
Training Set Test Set • Select model with best performance
in test data
Data Data
• Problem
Model Prediction and • “best model?”
Training Evaluation • “best fit for test set!”
Parameter
Estimates
• Overfitting test set

Performance measure (e.g., MSE) in test set • Solution: more splits

is unbiased (untouched new data)
Data Splitting – Model Selection

Original Training Set Test Set

Training Set Validation Set

• Validation set:
Data Data • Used for model selection (e.g.,
hyperparameter tuning)
Model Model
Training Selection • Test set:
Parameter • Untouched for training and selection
Estimates
• Used for model assessment
(generalizability)
Limitations of Single Splitting (Partition)

• Data waste: method applies to less data

• If not enough data – unreliable result

• Small training set
• Small test set

• Solution: Cross Validation

K-Fold Cross Validation

• Randomly cut dataset into segments

• Use the th segment as test set, the rest as training set
• Obtain , the mean squared error on the th segment (test set)
• After iterations, calculate mean of

𝑀𝑆 𝐸 1 𝑀𝑆 𝐸 2 𝑀𝑆 𝐸 5
1 1 1 1
2 2 2 2
3 3 3 … 3
4 4 4 4
5 5 5 5
K-Fold Cross Validation

• No data put to waste

• Small dataset
• Involves more data to train the model
• Reliable by taking the mean of multiple

• Model selection
• Using more data to evaluate performance of each model
CV for Model Selection

• Combine CV with grid search

• Example:
• Polynomial, grid search for degree, CV

• Leave a portion for test set

• Set grid for hyperparameter (let n be polynomial degree)
• Select model from CV

Degree MSE Values

Apply to test set

Lowest CV score
…
Grid Search with CV

• Manually set a grid of discrete hyperparameter values

• Set a metric for model performance

• Search exhaustively through the grid

• For each set of hyperparameters, evaluate each model’s CV score
• The optimal hyperparameters are those of the model achieving the best CV score
Tuning is expensive

• Run model repetitively

• N grid, K-fold CV => NK iterations
• Example: 20 grid, 5-fold CV

• Computationally expensive

• Sometimes very slight improvement

PYTHON PRACTICE

Algebraic Geometry - A First Course - Joe Harris - Harvard University
86% (7)
Algebraic Geometry - A First Course - Joe Harris - Harvard University
337 pages
Operating System Concepts Test
No ratings yet
Operating System Concepts Test
11 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Cross Validation
No ratings yet
Cross Validation
22 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Unit II_ Supervised Machine Learning Techniques
No ratings yet
Unit II_ Supervised Machine Learning Techniques
131 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Lecture 3
No ratings yet
Lecture 3
61 pages
3-2_Supervised_Learning_with_scikit-learn_-_Chapter_2_Regression(2)
No ratings yet
3-2_Supervised_Learning_with_scikit-learn_-_Chapter_2_Regression(2)
58 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Lecture 4 - Regularization
No ratings yet
Lecture 4 - Regularization
22 pages
Class 9 After
No ratings yet
Class 9 After
38 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
Bias Variance
No ratings yet
Bias Variance
14 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Regularization CrossValidation
No ratings yet
Regularization CrossValidation
37 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
ML 1
No ratings yet
ML 1
24 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
IML Summary
No ratings yet
IML Summary
12 pages
Over Fit
No ratings yet
Over Fit
63 pages
Week 2
No ratings yet
Week 2
43 pages
Advanced Regression Pres
No ratings yet
Advanced Regression Pres
42 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
Lecture Slide 02 - Supervised Learning-1
No ratings yet
Lecture Slide 02 - Supervised Learning-1
43 pages
Lecture 4
No ratings yet
Lecture 4
63 pages
Lecture-6 Linear Regression Addition
No ratings yet
Lecture-6 Linear Regression Addition
15 pages
Regression and Generalization
No ratings yet
Regression and Generalization
67 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
ML Interview Questions
No ratings yet
ML Interview Questions
10 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
57 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Overfitting Regression
No ratings yet
Overfitting Regression
14 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
S1_25(NSP)_ML_CS 34_10th17th Aug 2025
No ratings yet
S1_25(NSP)_ML_CS 34_10th17th Aug 2025
89 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
ML Exam Answers
No ratings yet
ML Exam Answers
26 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
Regularization
No ratings yet
Regularization
42 pages
Notes - Unit 3 - Machine Learning Lnctu-Bca (Aida) - IV Sem
No ratings yet
Notes - Unit 3 - Machine Learning Lnctu-Bca (Aida) - IV Sem
19 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
ML 1 Lecture 2
No ratings yet
ML 1 Lecture 2
50 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
Model Selection and Evaluation
No ratings yet
Model Selection and Evaluation
23 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Unit 2
No ratings yet
Unit 2
23 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
LAB5 Regularization
No ratings yet
LAB5 Regularization
6 pages
ABDUA 3 and 4
No ratings yet
ABDUA 3 and 4
102 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
Ecosystem Services: Economics and Policy Stephen Muddiman Instant Download
No ratings yet
Ecosystem Services: Economics and Policy Stephen Muddiman Instant Download
62 pages
Pro Forma Invoice
0% (1)
Pro Forma Invoice
1 page
CLAIMS IN ETHIOPIAN CONSTRUCTION ...... Thesis
100% (1)
CLAIMS IN ETHIOPIAN CONSTRUCTION ...... Thesis
128 pages
IGCSE Combined Science 0654 Biology Checklist 2
100% (1)
IGCSE Combined Science 0654 Biology Checklist 2
4 pages
Runge-Kutta Method: Consider First Single First-Order Equation: Classic High-Order Scheme Error (4th Order)
No ratings yet
Runge-Kutta Method: Consider First Single First-Order Equation: Classic High-Order Scheme Error (4th Order)
17 pages
Vernalisation in Details
No ratings yet
Vernalisation in Details
3 pages
XI - BST - 3 - Private, Public and Global Enterprises
No ratings yet
XI - BST - 3 - Private, Public and Global Enterprises
3 pages
B1 Booster v1
No ratings yet
B1 Booster v1
32 pages
Maths
No ratings yet
Maths
114 pages
Abs Paris
No ratings yet
Abs Paris
2 pages
Are Today's Teenagers Smarter and Better Than We Think - The New York Times
No ratings yet
Are Today's Teenagers Smarter and Better Than We Think - The New York Times
5 pages
Total 207 212 27 51 Grand Total
No ratings yet
Total 207 212 27 51 Grand Total
20 pages
Factors Led To The Growth of MIS
No ratings yet
Factors Led To The Growth of MIS
17 pages
1.5.2 Strategy As Position: Why Strategy Execution Fails
No ratings yet
1.5.2 Strategy As Position: Why Strategy Execution Fails
12 pages
Current Affairs Weekly Q&A PDF February 2023 2nd Week by AffairsCloud 1
No ratings yet
Current Affairs Weekly Q&A PDF February 2023 2nd Week by AffairsCloud 1
79 pages
AnimalResearchBookletsforRemoteLearningFreebie PDF
100% (1)
AnimalResearchBookletsforRemoteLearningFreebie PDF
45 pages
A Comprehensive Look at The Acid Number Test PDF
No ratings yet
A Comprehensive Look at The Acid Number Test PDF
6 pages
Ollital Quotation ZG-160YRD Manual Type (Lab Open Mill)
No ratings yet
Ollital Quotation ZG-160YRD Manual Type (Lab Open Mill)
3 pages
Camatkara-Candrika 3ed
No ratings yet
Camatkara-Candrika 3ed
100 pages
PCC-2000 Reference Manual V1.42
No ratings yet
PCC-2000 Reference Manual V1.42
26 pages
A Brief Biography of Hazrat Maqdum Fakhi Ali Al-Mahaimi
No ratings yet
A Brief Biography of Hazrat Maqdum Fakhi Ali Al-Mahaimi
13 pages
Awrrpt 1 66643 66644
No ratings yet
Awrrpt 1 66643 66644
228 pages
Beginning of The Year Progress Note
No ratings yet
Beginning of The Year Progress Note
2 pages
COE301 Lab 11 Datapath Component Design
No ratings yet
COE301 Lab 11 Datapath Component Design
7 pages
Pre Post Observation
100% (2)
Pre Post Observation
4 pages
702-Failure Cargo Crane
100% (1)
702-Failure Cargo Crane
27 pages
Irony Reading
No ratings yet
Irony Reading
17 pages
Cable Products Pricelist Cable Products Pricelist: Cable Products Price List Cable Products Price List
No ratings yet
Cable Products Pricelist Cable Products Pricelist: Cable Products Price List Cable Products Price List
24 pages

Lecture 7

Uploaded by

Lecture 7

Uploaded by

Programming for Data Science

Lecture 7 – Supervised Learning, Continued.

• Given – labelled data points

• Goal: Find a mapping/function from ’s to ’s such that

• Simple class of regression models

• Compute ’s by minimizing average squared error

• As model gets more complex it can fit data

• Main idea to deal with this -> split into train

• Model with overfitting problem

• Solution 1 – Splitting data

• Overfitting occurrence: Too many variables

• Fit the data w/ 10th degree polynomial

• Overfitting occurrence: Large variance/fluctuation

• Large coefficient => large fluctuation

• Penalize the magnitude of coefficients

• Linear regression (fit only)

• Regularization (fit and overfit)

• Two formulation of shrinkage penalty

• L1 regularization: equivalent to absolute value of coefficient magnitude

• Linear regression with L2 regularization (square of parameters)

• Large magnitude increases

• the amount of penalty

• the amount of penalty

• Linear regression with L1 regularization (absolute value of parameters)

• L1 penalty can force some coefficient estimates to be exactly zero

• Combines the shrinking advantage of ridge regression with variable selection

• LASSO: Least absolute shrinkage and selection operator

• Hyperparameters – set before running the model

• Intuition of tuning (polynomial case)

• Try all possible hyperparameters of interest

• Most commonly used method for hyperparameter tuning

• Polynomial regression case

Lowest value, selected model

• Select the model with best performance … on which dataset?

Performance measure (e.g., MSE) in test set • Solution: more splits

Original Training Set Test Set

Training Set Validation Set

• Data waste: method applies to less data

• If not enough data – unreliable result

• Solution: Cross Validation

• Randomly cut dataset into segments

• No data put to waste

• Combine CV with grid search

• Leave a portion for test set

Degree MSE Values

• Manually set a grid of discrete hyperparameter values

• Search exhaustively through the grid

• Run model repetitively

• Sometimes very slight improvement

You might also like