0% found this document useful (0 votes)
39 views31 pages

Ensemble Learning for Data Scientists

Uploaded by

ilham.hasib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views31 pages

Ensemble Learning for Data Scientists

Uploaded by

ilham.hasib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

17 Ensemble Learning

- Dr. Sifat Momen (SfM1)


Learning goals
• After this presentation, you should be able to
• Appreciate the use of ensemble techniques
• Understand the general idea of Ensemble Approach
• Understand the idea of Bagging and Pasting
• Understand OOB evaluation
• Understand and Apply Random Forest Algorithm
• Understand Boosting Approach
• Understand AdaBoost Algorithm

08/06/2024 Slides by Dr. Sifat Momen 2


Jupyter Notebook
• Please note that there is an associated Jupyter notebook with this
presentation
• Please use both in parallel for optimal understanding

08/06/2024 Slides by Dr. Sifat Momen 3


Wisdom of the crowd

08/06/2024 Slides by Dr. Sifat Momen 4


Ensemble Methods
• Construct a set of base classifiers learned from the training data

• Predict class label of test records by combining the predictions made


by multiple classifiers (e.g., by taking majority vote)

5
Necessary Conditions for Ensemble Methods

• Ensemble Methods work better than a single base classifier if:


1. All base classifiers are independent of each other
2. All base classifiers perform better than random guessing
(error rate < 0.5 for binary classification)

6
General Approach of Ensemble Learning

Using majority vote or


weighted majority vote
(weighted according to their
accuracy or relevance)

7
Constructing Ensemble Classifiers
• By manipulating training set
• Example: bagging, boosting, random forests

• By manipulating input features


• Example: random forests

• By manipulating class labels


• Example: error-correcting output coding

• By manipulating learning algorithm


• Example: injecting randomness in the initial weights of ANN

8
Voting Classifiers – Training Diverse
Classifiers

• Somewhat surprisingly, this voting classifier often achieves a higher accuracy than the best classifier in the
ensemble.
• In fact, even if each classifier is a weak learner (meaning it does only slightly better than random guessing),
the ensemble can still be a strong learner (achieving high accuracy), provided there are a sufficient number
of weak learners in the ensemble and they are sufficiently diverse.
• If all classifiers are able to estimate class probabilities (i.e., if they all have a predict_proba() method), then
you can tell Scikit-Learn to predict the class with the highest class probability, averaged over all the
individual classifiers. This is called soft voting.
08/06/2024 Slides by Dr. Sifat Momen 9
Voting Classifiers – Bagging and Pasting
• One way to get a diverse set of classifiers is to use very different
training algorithms, as just discussed.
• Another approach is to use the same training algorithm for every
predictor but train them on different random subsets of the training
set.
• When sampling is performed with replacement, this method is called
bagging (short for bootstrap aggregating ).
• When sampling is performed without replacement, it is called pasting

08/06/2024 Slides by Dr. Sifat Momen 10


Voting Classifiers – Bagging and Pasting

08/06/2024 Slides by Dr. Sifat Momen 11


Sampling with replacement

08/06/2024 Slides by Dr. Sifat Momen 12


Sampling without replacement

08/06/2024 Slides by Dr. Sifat Momen 13


Out of Bag Evaluation
• With bagging, some training instances may be sampled several times
for any given predictor, while others may not be sampled at all.
• it can be shown mathematically that only about 63% of the training
instances are sampled on average for each predictor.
• Probability of a training instance being selected in a bootstrap sample
is:
1 – (1 - 1/n)n (n: number of training instances)
~0.632 when n is large

08/06/2024 Slides by Dr. Sifat Momen 14


Out of Bag Evaluation
Probability of a training instance being selected
1.2

1 The remaining 37% of the training


instances that are not sampled are called
0.8
out-of-bag (OOB) instances. Note that
Probability

0.6 they are not the same 37% for all


predictors.
0.4

0.2

0
0 5 10 15 20 25 30 35 40 45

08/06/2024 Slides by Dr. Sifat Momen 15


Out of Bag Evaluation

08/06/2024 Slides by Dr. Sifat Momen 16


Out of Bag Evaluation
• A bagging ensemble can be evaluated using OOB instances, without
the need for a separate validation set: indeed, if there are enough
estimators, then each instance in the training set will likely be an OOB
instance of several estimators, so these estimators can be used to
make a fair ensemble prediction for that instance.
• Once you have a prediction for each instance, you can compute the
ensemble’s prediction accuracy (or any other metric).
• In Scikit-Learn, you can set oob_score=True when creating a
BaggingClassifier to request an automatic OOB evaluation after
training.

08/06/2024 Slides by Dr. Sifat Momen 17


Bagging Example
• Consider 1-dimensional data set:
Original Data:
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y 1 1 1 -1 -1 -1 -1 1 1 1

• Classifier is a decision stump (decision tree of size 1)


• Decision rule: x  k versus x > k
• Split point k is chosen based on entropy xk

True False

yleft yright
18
Bagging Example
Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9 x <= 0.35  y = 1
y 1 1 1 1 -1 -1 -1 -1 1 1 x > 0.35  y = -1

Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1
y 1 1 1 -1 -1 -1 1 1 1 1

Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1

Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1

Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1
y 1 1 1 -1 -1 -1 -1 1 1 1

19
Bagging Example
Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9 x <= 0.35  y = 1
y 1 1 1 1 -1 -1 -1 -1 1 1 x > 0.35  y = -1

Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1 x <= 0.7  y = 1
y 1 1 1 -1 -1 -1 1 1 1 1 x > 0.7  y = 1

Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9 x <= 0.35  y = 1
y 1 1 1 -1 -1 -1 -1 -1 1 1 x > 0.35  y = -1

Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9 x <= 0.3  y = 1
y 1 1 1 -1 -1 -1 -1 -1 1 1 x > 0.3  y = -1

Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1 x <= 0.35  y = 1
y 1 1 1 -1 -1 -1 -1 1 1 1 x > 0.35  y = -1

20
Bagging Example
Bagging Round 6:
x 0.2 0.4 0.5 0.6 0.7 0.7 0.7 0.8 0.9 1 x <= 0.75  y = -1
y 1 -1 -1 -1 -1 -1 -1 1 1 1 x > 0.75  y = 1

Bagging Round 7:
x 0.1 0.4 0.4 0.6 0.7 0.8 0.9 0.9 0.9 1 x <= 0.75  y = -1
y 1 -1 -1 -1 -1 1 1 1 1 1 x > 0.75  y = 1

Bagging Round 8:
x 0.1 0.2 0.5 0.5 0.5 0.7 0.7 0.8 0.9 1 x <= 0.75  y = -1
y 1 1 -1 -1 -1 -1 -1 1 1 1 x > 0.75  y = 1

Bagging Round 9:
x 0.1 0.3 0.4 0.4 0.6 0.7 0.7 0.8 1 1 x <= 0.75  y = -1
y 1 1 -1 -1 -1 -1 -1 1 1 1 x > 0.75  y = 1

Bagging Round 10:


x 0.1 0.1 0.1 0.1 0.3 0.3 0.8 0.8 0.9 0.9 x <= 0.05  y = 1
y 1 1 1 1 1 1 1 1 1 1 x > 0.05  y = 1

21
Bagging Example
• Summary of Trained Decision Stumps:
Round Split Point Left Class Right Class
1 0.35 1 -1
2 0.7 1 1
3 0.35 1 -1
4 0.3 1 -1
5 0.35 1 -1
6 0.75 -1 1
7 0.75 -1 1
8 0.75 -1 1
9 0.75 -1 1
10 0.05 1 1

22
Bagging Example
• Use majority vote (sign of sum of predictions) to determine class of
ensemble classifier
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 1 1 1 -1 -1 -1 -1 -1 -1 -1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 -1 -1 -1 -1 -1 -1 -1
4 1 1 1 -1 -1 -1 -1 -1 -1 -1
5 1 1 1 -1 -1 -1 -1 -1 -1 -1
6 -1 -1 -1 -1 -1 -1 -1 1 1 1
7 -1 -1 -1 -1 -1 -1 -1 1 1 1
8 -1 -1 -1 -1 -1 -1 -1 1 1 1
9 -1 -1 -1 -1 -1 -1 -1 1 1 1
10 1 1 1 1 1 1 1 1 1 1
Predicted Sum 2 2 2 -6 -6 -6 -6 2 2 2
Class Sign 1 1 1 -1 -1 -1 -1 1 1 1

• Bagging can also increase the complexity (representation capacity) of


simple classifiers such as decision stumps
23
Random Forest Algorithm
• Construct an ensemble of decision trees by manipulating training set
as well as features
• Use bootstrap sample to train every decision tree (similar to Bagging)
• Use the following tree induction algorithm:
• At every internal node of decision tree, randomly sample p attributes for selecting split
criterion
• Repeat this procedure until all leaves are pure (unpruned tree)

24
Boosting (originally called hypothesis
boosting)
• Refers to an ensemble method that can combine several weak
learners into a strong learner.
• The general idea of boosting is to train predictors sequentially, each
trying to correct its predecessor.
• AdaBoost (adaptive boosting)
• Gradient Boosting
• XGBoost
• LightGBM (Light gradient boosting machine)

08/06/2024 Slides by Dr. Sifat Momen 25


AdaBoost
• One way for a new predictor to correct its predecessor is to pay a bit
more attention to the training instances that the predecessor
underfit.
• This results in new predictors focusing more and more on the hard
cases.
• This is the technique used by AdaBoost

08/06/2024 Slides by Dr. Sifat Momen 26


AdaBoost

08/06/2024 Slides by Dr. Sifat Momen 27


AdaBoost in detail (Slightly different than
that in the textbook)

08/06/2024 Slides by Dr. Sifat Momen 28


Voting Power
Voting Power
2.5

1.5
Voting power or predictor's weight

0.5

0
0 0.2 0.4 0.6 0.8 1 1.2

-0.5

-1

-1.5

-2

-2.5

error

08/06/2024 Slides by Dr. Sifat Momen 29


How is the overall classifier assembled
• The overall classifier is assembled in series of rounds
• For each round:
• Pick the best “weak” classifier, h(x), to add to the overall classifier, H(x)
• Best – classifier that makes the fewest errors
• Assign the classifier a voting power
• Append the term αh(x) to our overall classifier

08/06/2024 Slides by Dr. Sifat Momen 30


AdaBoost classifier

Check the corresponding Excel file to see how AdaBoost classifier


works

08/06/2024 Slides by Dr. Sifat Momen 31

You might also like