0% found this document useful (0 votes)
5 views12 pages

Week+1+ Lecture+Slide+and+Notes

Ensemble methods in machine learning combine predictions from multiple models to enhance performance, based on the idea that a group of experts can outperform a single expert. Techniques like bagging and random forests utilize sampling with replacement to create diverse training subsets, improving robustness and accuracy. Decision trees, while prone to overfitting, can be stabilized by aggregating predictions from multiple trees.

Uploaded by

subhadeepseal1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Week+1+ Lecture+Slide+and+Notes

Ensemble methods in machine learning combine predictions from multiple models to enhance performance, based on the idea that a group of experts can outperform a single expert. Techniques like bagging and random forests utilize sampling with replacement to create diverse training subsets, improving robustness and accuracy. Decision trees, while prone to overfitting, can be stabilized by aggregating predictions from multiple trees.

Uploaded by

subhadeepseal1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Ensemble Methods

• Ensembles are machine learning methods for combining predictions


from multiple separate models.

• The central motivation is rooted under the belief that a committee of


experts working together can perform better than a single expert.

Training Data

[email protected]
EUVBQS86XL
Model-1 Model-2 Model-3 … Model-n

Test Data

Model-1 Model-2 Model-3 … Model-n

Prediction-1 Prediction-2 Prediction-3 … Prediction-n

Combined Prediction
This file is meant for personal use by [email protected] only.
Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
Ensemble Methods

90% 90% 90% 90% 90%


Truth M1 M2 M3 …….. M10
Y X
Y X X X X X X
[email protected]
N
EUVBQS86XL X X X

.. X X X
……. …….

… …….. …

Y …

N …

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
Bagging

......
. . .. . . . Model
….. . . .

Bagging
...... M1
. . .. . . .
[email protected]
EUVBQS86XL
….. . . .

Combine Predict
. . .. . ....... M2
.. . . .. . . .
….. ….. . . .

......
. . .. . . . M3
….. . . .

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
Why Sampling with Replacement?

Dataset

C, A, D, B A, B, C, D C, B, C, A, D, B

Small
Large

[email protected]
EUVBQS86XL n n

C, A, D, B A, B, C, D C, B, C, A

A, D, B, C
A, D, B, C
n

A, D, A, A
B, A, C, D

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
Random Forest

Data
Tree

…. …. ….. ...
…...
… …. …… …
…...
[email protected]
EUVBQS86XL
… …. ….. ...
…..
. …… ….. ..
…..
. …. ….. ..

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
Tree to a Forest

● Decision trees are very sensitive to even small changes in the data - usually
called unstable.

● Can we get a whole bunch of decision trees to work together to yield a


better and more robust prediction?

● Then for prediction we could use the mean for regression trees and mode
for classification trees
[email protected]
EUVBQS86XL

● While individual trees are tend to over-fit training data, averaging corrects
this.

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
[email protected]
EUVBQS86XL

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
The General Ideas

● The general procedure of using multiple models (trees, in this case) to


obtain better predictive performance is called ensemble learning.

● Bootstrap aggregating. also called bagging:

○ Generate new training subsets of the original, each of the same size
(usually the size of the data) by sampling with replacement.
[email protected]
EUVBQS86XL

○ By sampling with replacement, some observations may be repeated in


each subset.

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
Random Forest

A, B, C, D, E, F ,J
1
2
3
4

..
n
[email protected]
EUVBQS86XL

A, B, C, D, E, F J A, B, C, D, E, F J A, B, C, D, E, F J
67 17 32
43 14 95
32 32 32
… 47 64
… … …
.. .. ..
… … …

1 D2 Dk
D

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
Random Forest

A, B, C, D, E, F, J….

1 M number of independent variables


2
3
4

..
n
[email protected]
EUVBQS86XL 3
10

Only m < M Column


Say A, E, I

Allow only C,E,F Allow only B,D,J

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
[email protected]
EUVBQS86XL

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.
Random Forest

Say M = 10 => A B C D J

1
[email protected]
EUVBQS86XL
M=10 => High Tree Correlation

Good

2 M=2 => Your trees are weak

This file is meant for personal use by [email protected] only.


Proprietary content.
Sharing ©orGreat Learning.
publishing All Rights
the contents Reserved.
in part Unauthorized
or full is liable use or distribution
for legal action.

You might also like