0% found this document useful (0 votes)

168 views34 pages

DSA5102 Lecture3

Uploaded by

gjpnwmdpz7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

168 views34 pages

DSA5102 Lecture3

Uploaded by

gjpnwmdpz7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Foundations of Machine Learning

DSA 5102 • Lecture 3

Li Qianxiao
Department of Mathematics
Last time
So far, our hypothesis space consists of smooth functions, or a
simple Sign functions composed with a smooth function

Today, we are going to look at another class of supervised

learning hypothesis spaces consisting of piece-wise constant
functions. We also discuss how to combine them to form strong
classifiers and regressors.
Decision Trees
Should I go to DSA5102?

Is it a Tuesday
evening?
No Yes

Do DSA5102 Go out with friends

homework instead?
Have friends No friends

Do they wanna learn

Go to DSA5102
ML?

Yes No

Reconsider
Go to DSA5102
friendship options
Decision Tree Basics

Root Node

Branches

Leaf Node Internal Node

Depth = 3
Branches

Internal Node Leaf Node

Branches
Leaf Node Leaf Node
Decision Trees
Decision trees are very simple and useful ways to build models.

Key ideas
• Stratify the input space into distinct, non-overlapping
regions
• Assign a chosen, constant prediction to each region
A One-Dimensional Example
Suppose we want to approximate some oracle function

A depth-1 decision tree is the piecewise constant function

This corresponds to the following decision tree

𝑥= 𝜃 0
𝑎
𝑥
𝑏 𝑥≤ 𝜃0 𝑥>𝜃 0

𝑎 𝑏
We can also further split the input space to form deeper trees
Classification and Regression Trees
Suppose that the input space is . A partition of is a collection of
subsets such that

The general decision tree hypothesis space is

In theory, we can consider very general partitions, but in practice
it is convenient to restrict to high-dimensional rectangles
Learning Decision Trees
A decision tree model

depends on both and .

Given , are easy to fix:
• Regression: we take the average label values

• Classification: we take the modal label values

Suppose we are dealing with regression, then we can fix as
before and solve the following empirical risk minimization:

Even restricting to rectangular partitions, this is very hard to solve!

Figure source: https://en.wikipedia.org/wiki/Bell_number

Recursive Binary Splitting
Instead, we can resort to the following greedy algorithm, which
essentially repeats the following two steps:

1. Pick a dimension of the input space (randomly or …)

2. Find the best value to split this input dimension into
two parts and assign new constant values to these new
regions
This grows the tree by adding two leaf nodes at a time, and hence
the name recursive binary splitting

Does this find the optimal solution?

Greedy vs Optimal Solution
Decision Trees for Classification
The greedy algorithm can be carried out analogously, except that
we need to define a proper loss function:

Where proportion of samples in belonging to class

Advantages and Disadvantages
Advantages:
• Can readily visualize and understand predictions
• Implicit feature selection via analyzing contribution of splits to
reduction of error/impurity
• Robust to data types, supervised learning tasks and nonlinear
relationships
Disadvantages:
• Greedy algorithms may find sub-optimal solutions
• Sensitive to data variation and balancing
• Prone to overfitting
Overfitting
The biggest draw back of decision trees is overfitting
Model Ensembling
Ensemble Methods
An effective way to reduce overfitting and increase approximation
power is to combine models. This is called model ensembling.

We will now introduce two classes of such methods

1. Bagging
2. Boosting
Bootstrap Aggregating (Bagging)
The first method for combining model is also the simplest: we
simply train models on random subsamples of the training data.
We can then combine them in the obvious way to make
predictions:
1. Regression

2. Classification
Example
Dataset:

Subsample and train:

1.
2.
3.

Aggregate:
What does bagging do?
Consider a simple model where

Assume the noise satisfies

• and
• Uncorrelated: =0 for

Form aggregate model

Define the errors

We can show that

A significant reduction! But…

• What is the most unrealistic assumption?
• What happens when there is bias so that ?
Bias and Variance
Variance Bias / Approximation Error

∗
Bagging
𝑓
^𝑓 ?
Boosting
Unlike bagging whose purpose is to reduce variance, boosting
aims to reduce bias.

It answers an important question in the positive:

Can weak learners be combined to form a strong learner?

We will introduce the simplest setting of the Adaptive Boosting or

AdaBoost algorithm
Key Ideas of AdaBoost
1. Initialize with uniform weight across all training samples
2. Train a classifier/regressor
3. Identify the samples that got wrong (classification) or has
large errors (regression)
4. Weight these samples more heavily and train on this
reweighted dataset
5. Repeat steps 3-5
Bagging reduces variance
Boosting reduces bias
Demo: Decision Trees with
Ensembling
Cross Validation
Recall: test data is the ultimate test of our model performance

But, should we always rely on test data for model evaluation and
selection?
1. It is not always available
2. No average error estimate
3. We might “overfit” on test data

A more robust idea: cross validation

Given a training set , we can further split it into training and
validation datasets. In fact, we can do this times

K-Fold Cross-Validation

Validation Training Training Training Score

Training Validation Training Training Score

Average
Score
Training Training Validation Training Score

Training Training Training Validation Score

Summary
Decision trees
• Piece-wise constant predictors
• Learn by greedy algorithms

Model Ensembling
• Bagging: reduce variance
• Boosting: reduce bias

Evaluate models using cross validation

Homework and Project
Homework 2 is online (Due in 2 weeks)

Project
• Incorporate cross-validation as a more robust model
evaluation technique
• Ensembling methods
• Tune hyperparameters using cross validation

Test
• Week 7

05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Unit 3
No ratings yet
Unit 3
63 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Machine Learning Ensembling Guide
No ratings yet
Machine Learning Ensembling Guide
7 pages
Bagging, Boosting, and Random Forests Explained
No ratings yet
Bagging, Boosting, and Random Forests Explained
27 pages
Bagging vs Boosting in Ensemble Learning
No ratings yet
Bagging vs Boosting in Ensemble Learning
40 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Unit 3
No ratings yet
Unit 3
59 pages
Ensemble Learning Explained
No ratings yet
Ensemble Learning Explained
32 pages
Module 2
No ratings yet
Module 2
34 pages
Bagging
No ratings yet
Bagging
7 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
32 pages
Boosting vs. Random Forest Analysis
No ratings yet
Boosting vs. Random Forest Analysis
14 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Decissin Tree & Over Fitting
No ratings yet
Decissin Tree & Over Fitting
22 pages
ML Unit@4
No ratings yet
ML Unit@4
70 pages
PDS LVC 2 Post-Session Summary
No ratings yet
PDS LVC 2 Post-Session Summary
11 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
ES335
No ratings yet
ES335
22 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
of Decision Tree
No ratings yet
of Decision Tree
14 pages
Bagging and Random Forest Presentation1
100% (4)
Bagging and Random Forest Presentation1
23 pages
ML - 5
No ratings yet
ML - 5
53 pages
Unit IV
No ratings yet
Unit IV
36 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Finance-Focused Big Data Techniques
100% (1)
Finance-Focused Big Data Techniques
23 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
61 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Random Forest
No ratings yet
Random Forest
25 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Aiml ML Session 13
No ratings yet
Aiml ML Session 13
78 pages
Unit 3 - ML (NEW)
No ratings yet
Unit 3 - ML (NEW)
68 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Random Forest
No ratings yet
Random Forest
20 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
08 - Ensemble Methods
No ratings yet
08 - Ensemble Methods
59 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
M2 Summary
No ratings yet
M2 Summary
78 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Bagging, Boosting, Decision Trees, Random Forest
No ratings yet
Bagging, Boosting, Decision Trees, Random Forest
19 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
DSA5102 Lecture9
100% (1)
DSA5102 Lecture9
35 pages
DSA5102 Lecture10
No ratings yet
DSA5102 Lecture10
40 pages
DSA5102 Lecture12
No ratings yet
DSA5102 Lecture12
41 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Data Science New Report
No ratings yet
Data Science New Report
39 pages
ML Lab: Healthcare Data Analysis
No ratings yet
ML Lab: Healthcare Data Analysis
16 pages
Ensemble Methods
No ratings yet
Ensemble Methods
6 pages
Flow Stress Behaviour Prediction of 92W-5Co-3Ni Alloy Using Various Machine Learning Models
No ratings yet
Flow Stress Behaviour Prediction of 92W-5Co-3Ni Alloy Using Various Machine Learning Models
15 pages
ML Classifiers for Network Security
No ratings yet
ML Classifiers for Network Security
7 pages
Week 7 Notes
No ratings yet
Week 7 Notes
11 pages
Heart Disease Prediction Using Machine Learning-1
No ratings yet
Heart Disease Prediction Using Machine Learning-1
6 pages
Fake News Detection Using Natural Language Processing
100% (1)
Fake News Detection Using Natural Language Processing
8 pages
ML Group 4 Assignment
No ratings yet
ML Group 4 Assignment
26 pages
Bi Lab Manual Cl-IV Be Ai&Ds
No ratings yet
Bi Lab Manual Cl-IV Be Ai&Ds
67 pages
Machine Learning Mock Test Papers
No ratings yet
Machine Learning Mock Test Papers
7 pages
Approaches To Fraud Detection On
No ratings yet
Approaches To Fraud Detection On
10 pages
Unit - 3
No ratings yet
Unit - 3
73 pages
Heart Disease rp3
No ratings yet
Heart Disease rp3
20 pages
Machine Learning Titles 2025 New Projects
No ratings yet
Machine Learning Titles 2025 New Projects
5 pages
Essential Web Scraping Tools Overview
No ratings yet
Essential Web Scraping Tools Overview
60 pages
Comparison of Neural Networks With Traditional Machine Learning Models
No ratings yet
Comparison of Neural Networks With Traditional Machine Learning Models
20 pages
Machine Learning With R The Tidyverse and MLR 1st Edition Hefin I Rhys 2024 Scribd Download
100% (1)
Machine Learning With R The Tidyverse and MLR 1st Edition Hefin I Rhys 2024 Scribd Download
55 pages
Unit 6 - Ensemble Learning Methods and Reinforcement
No ratings yet
Unit 6 - Ensemble Learning Methods and Reinforcement
69 pages
Algorithmic Trading Bot
No ratings yet
Algorithmic Trading Bot
11 pages
Interview Question For Data Science
No ratings yet
Interview Question For Data Science
33 pages
Thesis Book
No ratings yet
Thesis Book
82 pages
BA CH 12 PPT
No ratings yet
BA CH 12 PPT
50 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
6 pages
Metaheuristics For Enterprise Data - Kaustubh Vaman Sakhare
No ratings yet
Metaheuristics For Enterprise Data - Kaustubh Vaman Sakhare
159 pages
ML & AI in Marketing and Sales (2021) Nildri Syam - Emerland
No ratings yet
ML & AI in Marketing and Sales (2021) Nildri Syam - Emerland
225 pages
Fyp Proposal
No ratings yet
Fyp Proposal
3 pages
Suliman 2022 IOP Conf. Ser. Earth Environ. Sci. 950 012092
No ratings yet
Suliman 2022 IOP Conf. Ser. Earth Environ. Sci. 950 012092
11 pages

DSA5102 Lecture3

Uploaded by

DSA5102 Lecture3

Uploaded by

Foundations of Machine Learning

DSA 5102 • Lecture 3

Today, we are going to look at another class of supervised

Do DSA5102 Go out with friends

Do they wanna learn

Leaf Node Internal Node

Internal Node Leaf Node

A depth-1 decision tree is the piecewise constant function

The general decision tree hypothesis space is

depends on both and .

• Classification: we take the modal label values

Even restricting to rectangular partitions, this is very hard to solve!

Figure source: https://en.wikipedia.org/wiki/Bell_number

1. Pick a dimension of the input space (randomly or …)

Does this find the optimal solution?

Where proportion of samples in belonging to class

We will now introduce two classes of such methods

Subsample and train:

Assume the noise satisfies

Form aggregate model

We can show that

A significant reduction! But…

It answers an important question in the positive:

We will introduce the simplest setting of the Adaptive Boosting or

A more robust idea: cross validation

Validation Training Training Training Score

Training Validation Training Training Score

Training Training Training Validation Score

Evaluate models using cross validation

You might also like