0% found this document useful (0 votes)

6 views30 pages

Ensemble Methods

Ensemble methods combine multiple machine learning models to improve predictive accuracy. They fall into sequential or parallel categories. Popular ensemble methods include bagging, boosting, and stacking. Bagging trains models on random subsets and combines results, while boosting focuses on correcting previous models' errors. Random forests are a type of bagged tree.

Uploaded by

adarsh.tripathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views30 pages

Ensemble Methods

Uploaded by

adarsh.tripathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Ensemble Methods

Ensemble Methods in Machine Learning: What are They and Why Use
Them?
What are Ensemble Methods?
Ensemble methods are techniques that aim at improving the
accuracy of results in models by combining multiple models
instead of using a single model.

The combined models increase the accuracy of the results

signiﬁcantly. This has boosted the popularity of ensemble
methods in machine learning.
Ensemble methods aim at improving predictability in models by combining

several models to make one very reliable model.

The most popular ensemble methods are boosting, bagging, and stacking.

Ensemble methods are ideal for regression and classiﬁcation, where they

reduce bias and variance to boost the accuracy of models.

Categories of Ensemble Methods

Ensemble methods fall into two broad categories, i.e., sequential ensemble
techniques and parallel ensemble techniques.

Sequential ensemble techniques generate base learners in a sequence, e.g.,

Adaptive Boosting (AdaBoost). The sequential generation of base learners promotes
the dependence between the base learners. The performance of the model is then
improved by assigning higher weights to previously misrepresented learners.

In parallel ensemble techniques, base learners are generated in a parallel format,

e.g., random forest. Parallel methods utilize the parallel generation of base learners
to encourage independence between the base learners. The independence of base
learners signiﬁcantly reduces the error due to the application of averages.
Types of Ensemble Methods
BAGGing, or Bootstrap AGGregating.
BAGGing gets its name because it combines Bootstrapping
and Aggregation to form one ensemble model.
Given a sample of data, multiple bootstrapped subsamples
are pulled. A Decision Tree is formed on each of the
bootstrapped subsamples. After each subsample Decision
Tree has been formed, an algorithm is used to aggregate over
the Decision Trees to form the most efficient predictor. The
image below will help explain:
Given a Dataset, bootstrapped subsamples are pulled. A Decision Tree is formed on each bootstrapped
sample. The results of each tree are aggregated to yield the strongest, most accurate predictor.
Random Forest Models. Random Forest Models can be thought of as
BAGGing, with a slight tweak.

When deciding where to split and how to make decisions, BAGGed

Decision Trees have the full disposal of features to choose from.

Therefore, although the bootstrapped samples may be slightly different,

the data is largely going to break off at the same features throughout
each model.
In contrary, Random Forest models
decide where to split based on a
random selection of features. Rather
than splitting at similar features at
each node throughout, Random
Forest models implement a level of
differentiation because each tree will
split based on different features. This
level of differentiation provides a
greater ensemble to aggregate over,
ergo producing a more accurate
predictor. Refer to the image ..
Bootstrapped subsamples are pulled from a larger dataset. A decision tree is formed on each
subsample. HOWEVER, the decision tree is split on different features (in this diagram the features are
represented by shapes).
Boosting
Boosting is an ensemble technique that learns from previous predictor mistakes to
make better predictions in the future. The technique combines several weak base
learners to form one strong learner, thus significantly improving the predictability
of models.

Boosting works by arranging weak learners in a sequence, such that weak learners
learn from the next learner in the sequence to create better predictive models.

Boosting takes many forms, including

Adaptive Boosting (AdaBoost),

Gradient boosting, and

XGBoost (Extreme Gradient Boosting).

AdaBoost
Adaptive boosting or AdaBoost is one of the simplest boosting algorithms. Usually, decision
trees are used for modelling. Multiple sequential models are created, each correcting the errors
from the last model. AdaBoost assigns weights to the observations which are incorrectly
predicted and the subsequent model works to predict these values correctly.

Below are the steps for performing the AdaBoost algorithm:

1. Initially, all observations in the dataset are given equal weights.
2. A model is built on a subset of data.
3. Using this model, predictions are made on the whole dataset.
4. Errors are calculated by comparing the predictions and actual values.
5. While creating the next model, higher weights are given to the data points which were
predicted incorrectly.
6. Weights can be determined using the error value. For instance, higher the error more is
the weight assigned to the observation.
7. This process is repeated until the error function does not change, or the maximum limit
of the number of estimators is reached.
Most of the blogs or books used term is “Weak Learner”. Technically Weak Learners are called as stumps.

A tree with just one node and two leaves is called a stump. As shown in Fig.1

1. AdaBoost combines a lot of “weak

learners”(stumps) to make classifications.

2. Some stumps get more say (information) in

the classification than others.

3. Each stump is made by taking the previous

stump’s mistakes into account

Example :
The first thing we do is assign a weight to each and
every sample(data point) that indicates how
important it is to be correctly classified. Now we
create a new column with the name “Sample
Weight”.
We start by seeing how well “Chest Pain” classifies the samples and will see how the
variables(Blocked Arteries, Patient Weight ) classifies the samples

Of the 5 samples with “Heart

Disease”,3 were correctly classified
as having Heart Disease and 2 were
incorrectly classified.

Of the 3 samples “without Heart

Disease”, 2 were correctly classified
as not Heart Disease, and 1 was
incorrectly classified.
Blocked Arteries

Of the 6 samples with “Heart Disease”,3

were correctly classified as having Heart

Disease and 3 were incorrectly classified.

Of the 2 samples “without Heart Disease”,

1was correctly classified as not Heart Disease,

and 1 was incorrectly classified.

Of the 3 samples with “Heart Disease”,3 were

correctly classified as having Heart Disease, and

0 was incorrectly classified.

Of the 5 samples “without Heart Disease”, 4

were correctly classified as not Heart Disease,

and 1 was incorrectly classified.

Now will calculate the Gini Index for these three stumps

1. Calculate Gini for subnode “Yes” = ((3/5)(3/5))+((2/5)(2/5))=0.52. Gini for

subnode “No” = ((2/3)*(2/3))+((1/3)*(1/3)) = 0.556
2. Calculate Weighted Gini Index for split “Chest Pain” =1- ((5/8)*0.52+(3/8)*0.556)
= 0.47

In the same way, we calculated for “Blood Arteries” and “Patient Weight” the Gini Index
for Blood Arteries is 0.5 and Patient Weight is 0.2

Chest Pain 0.47

Blocked Arteries 0.5

Patient Weight 0.2

The Gini Index for “Patient Weight” is the lowest, so this
would be the first stump.

Now we need to determine how much “Amount of

Say”(Information) of this stump will have in the final
classification.

Formula to calculate the “Amount of say” is

First, we need to calculate “Total Error” for stumps. The Total Error for a stump is the sum of the
weights associated with the incorrectly classified samples.

Total Error for Chest Pain:- It made 3 errors i.e 1/8+1/8+1/8=3/8.

Total Error for Blood Arteries:- It made 4 errors i.e 1/8+1/8+1/8+1/8 = 4/8.

Total Error for Patient Weight:- It made 1 error i.e 1/8.

Note:- Because all the sample's weight is added up to 1, Total Error will always be between 0 and 1.

0 indicates perfect stump, 1 indicates horrible stump.

Now, the Amount of say for “Patient Weight” is 0.97 [1/2 log(7)].
Basically boosting will increase sample weight for incorrectly classified samples and
decrease sample weight for correctly classified samples.

The formula is below for updating weights of incorrectly classified

samples:-

The amount of, say (alpha) will be negative when the sample is
correctly classified.

The amount of, say (alpha) will be positive when the sample is
miss-classified.
There is one misclassified sample, Here sample weight of that sample is 0.125 and the

amount of say of Patient Weight is 0.97

After adding the New Sample Weight Column to our Dataset…
observe, the New Sample Weight of a misclassified sample is increased from 0.125 to 0.33 and
correctly classified samples weight is decreased from 0.125 to 0.05.
New Sample Weight = 0.125* e^0.97 =
0.125 * 2.67 = 0.33. The New Sample
Weight for a misclassified sample is
0.33, which is more than the old weight.

There are seven correctly classified

samples, Here sample weight of that
samples is 1/8 and the amount of say
of Patient Weight is 0.97

New Sample Weight = 0.125*e^-0.97 =

0.125*0.38 = 0.05. The New Sample
Weight for correctly classified samples is
0.05, which is less than the old weight.
We know that the total sum of the
sample weights must be equal to 1,
but here if we sum up all the new
sample weights, we will get 0.68.
To bring this sum equal to 1, we will
normalize these weights by dividing
all the weights by the total sum of
updated weights, which is 0.68.
So, after normalizing the sample
weights, we get this dataset, and
now the sum is equal to 1.
Now, we need to make a new dataset to see if the errors decreased or not. For
this, based on the “new sample weights,” we will divide our data points into
buckets.
Step 6: New Dataset

Now, what the algorithm does is selects

random numbers from 0-1. Since incorrectly
classified records have higher sample
weights, the probability of selecting those
records is very high.

Suppose the 8 random numbers are

0.78,0.56,0.94,0.24,0.68,0.32,0.13,0.73.

Now we will see where these random

numbers fall in the bucket, and according to it,
we’ll make our new dataset shown below.
Ultimately the wrongly classified sample was added to the new
collection of samples(dataset) 4 times, reflecting its larger
sample weight.
From now will use the new collection of samples as a dataset and repeat the same procedure as above i.e

1. Assign equal weights to all samples

2. Find the stump that does the best job classifying the new collection of samples.
3. Calculate “Total Error and Amount of say” to calculate New Sample Weight
4. Normalize the New Sample Weights
5. Repeat above 4 steps until all the samples correctly classified.

So that is how the errors that the first tree makes influence how the second tree is made… and the errors
that the second tree makes influence how the third tree is made, … and so on
Finally, Now we need to talk about how a forest of stumps created by AdaBoost
makes classification.

Imagine there are 6 stumps are created by the AdaBoost algorithm. Out of 6 stumps,
4 stumps are classified patient has Heart Disease, and the other 2 stumps
classified patient does not have Heart Disease.

These are the Amount of Say for these stumps are 0.97+0.32+0.78+0.63 = 2.7,
and the Amount of Say of the other 2 stumps are 0.41+0.82=1.23.

Ultimately, the patient is classified as Has Heart Disease because of the larger
Amount of Say(2.7).
Gradient Boosting (GBM)
Gradient Boosting or GBM is another ensemble machine
learning algorithm that works for both regression and
classification problems.
GBM uses the boosting technique, combining a number of
weak learners to form a strong learner.
Regression trees used as a base learner, each subsequent
tree in series is built on the errors calculated by the previous
tree.
XGBoost
XGBoost (extreme Gradient Boosting) is an advanced implementation of the gradient
boosting algorithm. XGBoost has proved to be a highly effective ML algorithm.

XGBoost has high predictive power and is almost 10 times faster than the other gradient
boosting techniques.

It also includes a variety of regularization which reduces overfitting and improves overall
performance. Hence it is also known as ‘regularized boosting‘ technique.

Refer https://www.geeksforgeeks.org/xgboost/

Unit 5
No ratings yet
Unit 5
61 pages
Mad Scientist Muscle: Build ''Monster'' Mass With Science-Based Training
From Everand
Mad Scientist Muscle: Build ''Monster'' Mass With Science-Based Training
Nick Nilsson
3/5 (5)
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
2025 Ensemble Learning.docx
No ratings yet
2025 Ensemble Learning.docx
25 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
15 Ada Boost
No ratings yet
15 Ada Boost
15 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
ENSEMBLE_LEARNING
No ratings yet
ENSEMBLE_LEARNING
9 pages
AdaBoost Notes
No ratings yet
AdaBoost Notes
5 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
ML-Lecture-15-Ensemble
No ratings yet
ML-Lecture-15-Ensemble
27 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Random Forest
No ratings yet
Random Forest
20 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
lecture slide 12
No ratings yet
lecture slide 12
22 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Ada Boost
No ratings yet
Ada Boost
22 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Boosting
No ratings yet
Boosting
13 pages
ML - Module 4
No ratings yet
ML - Module 4
8 pages
Unit V -Multiple Learners
No ratings yet
Unit V -Multiple Learners
54 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Unit 4
No ratings yet
Unit 4
17 pages
Boosting
No ratings yet
Boosting
2 pages
Validaciones - Bosstrap
No ratings yet
Validaciones - Bosstrap
50 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
7 - Ensemble Techniques-Converted Updated
No ratings yet
7 - Ensemble Techniques-Converted Updated
8 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Adaboostt
No ratings yet
Adaboostt
7 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Ensemble,Voting,Bagging,Boosting
No ratings yet
Ensemble,Voting,Bagging,Boosting
15 pages
cs229 Notes Ensemble
No ratings yet
cs229 Notes Ensemble
7 pages
Boosting and AdaBoost For Machine Learning
No ratings yet
Boosting and AdaBoost For Machine Learning
18 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Unit-V_1
No ratings yet
Unit-V_1
26 pages
ml1_Lab_6
No ratings yet
ml1_Lab_6
5 pages
AI & ML Unit 4 Notes
No ratings yet
AI & ML Unit 4 Notes
16 pages
ML UNIT 3-1
No ratings yet
ML UNIT 3-1
14 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
UNIT 3 AML
No ratings yet
UNIT 3 AML
9 pages
_LECTURE+NOTES_Boosting
No ratings yet
_LECTURE+NOTES_Boosting
8 pages
Hypothesis Testing Made Simple
From Everand
Hypothesis Testing Made Simple
Leonard Gaston
4/5 (5)
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
SPSS for you
From Everand
SPSS for you
A Rajathi
4.5/5 (4)
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
19 Deep Learning
100% (1)
19 Deep Learning
49 pages
A. Install Relevant Package For Classification. B. Choose Classifier For Classification Problem. C. Evaluate The Performance of Classifier
No ratings yet
A. Install Relevant Package For Classification. B. Choose Classifier For Classification Problem. C. Evaluate The Performance of Classifier
10 pages
Unit_4 ANN ppt
No ratings yet
Unit_4 ANN ppt
46 pages
Deep Learning: Convolutional Neural Network & Its Applications
No ratings yet
Deep Learning: Convolutional Neural Network & Its Applications
53 pages
Banana Leaf Disease Detection Using Deep Learning Approach
No ratings yet
Banana Leaf Disease Detection Using Deep Learning Approach
5 pages
Chapter 1 Annexe
No ratings yet
Chapter 1 Annexe
17 pages
III B.Tech I Sem MachineLearning (20AD5T04)
No ratings yet
III B.Tech I Sem MachineLearning (20AD5T04)
1 page
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
Perceptron: Neuron Model (Special Form of Single Layer Feed Forward)
No ratings yet
Perceptron: Neuron Model (Special Form of Single Layer Feed Forward)
17 pages
MODULE 1 DL
No ratings yet
MODULE 1 DL
6 pages
4-Recurrent Neural Network
No ratings yet
4-Recurrent Neural Network
21 pages
RNN Part1
No ratings yet
RNN Part1
12 pages
DL Question Bank
No ratings yet
DL Question Bank
5 pages
Bcse209l Machine-Learning TH 1.0 0 Bcse209l
No ratings yet
Bcse209l Machine-Learning TH 1.0 0 Bcse209l
3 pages
Problems On Som
No ratings yet
Problems On Som
11 pages
06 MultiClass Classification
No ratings yet
06 MultiClass Classification
16 pages
Performance Analysis of Machine Learning Algorithms For Prediction of Liver Disease
No ratings yet
Performance Analysis of Machine Learning Algorithms For Prediction of Liver Disease
7 pages
DLT Unit-1
No ratings yet
DLT Unit-1
66 pages
08 Fair Machine Learning
No ratings yet
08 Fair Machine Learning
53 pages
ML Lab6
No ratings yet
ML Lab6
4 pages
Lecture -14_FFNN
No ratings yet
Lecture -14_FFNN
59 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
gradient_exploding_vanishing_problem_v2
No ratings yet
gradient_exploding_vanishing_problem_v2
3 pages
ML Lec-12
No ratings yet
ML Lec-12
17 pages
MACHINE LEARNING Syllabus
No ratings yet
MACHINE LEARNING Syllabus
3 pages
A Survey On Text Classification From Shallow To Deep Learning
No ratings yet
A Survey On Text Classification From Shallow To Deep Learning
21 pages
Numericals
No ratings yet
Numericals
4 pages

Ensemble Methods

Uploaded by

Ensemble Methods

Uploaded by

Ensemble Methods

The combined models increase the accuracy of the results

several models to make one very reliable model.

reduce bias and variance to boost the accuracy of models.

Sequential ensemble techniques generate base learners in a sequence, e.g.,

In parallel ensemble techniques, base learners are generated in a parallel format,

When deciding where to split and how to make decisions, BAGGed

Therefore, although the bootstrapped samples may be slightly different,

Boosting takes many forms, including

Adaptive Boosting (AdaBoost),

Gradient boosting, and

XGBoost (Extreme Gradient Boosting).

Below are the steps for performing the AdaBoost algorithm:

1. AdaBoost combines a lot of “weak

learners”(stumps) to make classifications.

2. Some stumps get more say (information) in

the classification than others.

3. Each stump is made by taking the previous

stump’s mistakes into account

Of the 5 samples with “Heart

Of the 3 samples “without Heart

Of the 6 samples with “Heart Disease”,3

were correctly classified as having Heart

Disease and 3 were incorrectly classified.

Of the 2 samples “without Heart Disease”,

1was correctly classified as not Heart Disease,

and 1 was incorrectly classified.

correctly classified as having Heart Disease, and

0 was incorrectly classified.

Of the 5 samples “without Heart Disease”, 4

were correctly classified as not Heart Disease,

and 1 was incorrectly classified.

1. Calculate Gini for subnode “Yes” = ((3/5)*(3/5))+((2/5)*(2/5))=0.52. Gini for

Chest Pain 0.47

Blocked Arteries 0.5

Patient Weight 0.2

Now we need to determine how much “Amount of

Formula to calculate the “Amount of say” is

Total Error for Chest Pain:- It made 3 errors i.e 1/8+1/8+1/8=3/8.

Total Error for Patient Weight:- It made 1 error i.e 1/8.

0 indicates perfect stump, 1 indicates horrible stump.

The formula is below for updating weights of incorrectly classified

amount of say of Patient Weight is 0.97

There are seven correctly classified

New Sample Weight = 0.125*e^-0.97 =

Now, what the algorithm does is selects

Suppose the 8 random numbers are

Now we will see where these random

1. Assign equal weights to all samples

You might also like

1. Calculate Gini for subnode “Yes” = ((3/5)(3/5))+((2/5)(2/5))=0.52. Gini for