0% found this document useful (0 votes)
6 views30 pages

Ensemble Methods

Ensemble methods combine multiple machine learning models to improve predictive accuracy. They fall into sequential or parallel categories. Popular ensemble methods include bagging, boosting, and stacking. Bagging trains models on random subsets and combines results, while boosting focuses on correcting previous models' errors. Random forests are a type of bagged tree.

Uploaded by

adarsh.tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views30 pages

Ensemble Methods

Ensemble methods combine multiple machine learning models to improve predictive accuracy. They fall into sequential or parallel categories. Popular ensemble methods include bagging, boosting, and stacking. Bagging trains models on random subsets and combines results, while boosting focuses on correcting previous models' errors. Random forests are a type of bagged tree.

Uploaded by

adarsh.tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Ensemble Methods

Ensemble Methods in Machine Learning: What are They and Why Use
Them?
What are Ensemble Methods?
Ensemble methods are techniques that aim at improving the
accuracy of results in models by combining multiple models
instead of using a single model.

The combined models increase the accuracy of the results


significantly. This has boosted the popularity of ensemble
methods in machine learning.
Ensemble methods aim at improving predictability in models by combining

several models to make one very reliable model.

The most popular ensemble methods are boosting, bagging, and stacking.

Ensemble methods are ideal for regression and classification, where they

reduce bias and variance to boost the accuracy of models.


Categories of Ensemble Methods

Ensemble methods fall into two broad categories, i.e., sequential ensemble
techniques and parallel ensemble techniques.

Sequential ensemble techniques generate base learners in a sequence, e.g.,


Adaptive Boosting (AdaBoost). The sequential generation of base learners promotes
the dependence between the base learners. The performance of the model is then
improved by assigning higher weights to previously misrepresented learners.

In parallel ensemble techniques, base learners are generated in a parallel format,


e.g., random forest. Parallel methods utilize the parallel generation of base learners
to encourage independence between the base learners. The independence of base
learners significantly reduces the error due to the application of averages.
Types of Ensemble Methods
BAGGing, or Bootstrap AGGregating.
BAGGing gets its name because it combines Bootstrapping
and Aggregation to form one ensemble model.
Given a sample of data, multiple bootstrapped subsamples
are pulled. A Decision Tree is formed on each of the
bootstrapped subsamples. After each subsample Decision
Tree has been formed, an algorithm is used to aggregate over
the Decision Trees to form the most efficient predictor. The
image below will help explain:
Given a Dataset, bootstrapped subsamples are pulled. A Decision Tree is formed on each bootstrapped
sample. The results of each tree are aggregated to yield the strongest, most accurate predictor.
Random Forest Models. Random Forest Models can be thought of as
BAGGing, with a slight tweak.

When deciding where to split and how to make decisions, BAGGed


Decision Trees have the full disposal of features to choose from.

Therefore, although the bootstrapped samples may be slightly different,


the data is largely going to break off at the same features throughout
each model.
In contrary, Random Forest models
decide where to split based on a
random selection of features. Rather
than splitting at similar features at
each node throughout, Random
Forest models implement a level of
differentiation because each tree will
split based on different features. This
level of differentiation provides a
greater ensemble to aggregate over,
ergo producing a more accurate
predictor. Refer to the image ..
Bootstrapped subsamples are pulled from a larger dataset. A decision tree is formed on each
subsample. HOWEVER, the decision tree is split on different features (in this diagram the features are
represented by shapes).
Boosting
Boosting is an ensemble technique that learns from previous predictor mistakes to
make better predictions in the future. The technique combines several weak base
learners to form one strong learner, thus significantly improving the predictability
of models.

Boosting works by arranging weak learners in a sequence, such that weak learners
learn from the next learner in the sequence to create better predictive models.

Boosting takes many forms, including

Adaptive Boosting (AdaBoost),

Gradient boosting, and

XGBoost (Extreme Gradient Boosting).


AdaBoost
Adaptive boosting or AdaBoost is one of the simplest boosting algorithms. Usually, decision
trees are used for modelling. Multiple sequential models are created, each correcting the errors
from the last model. AdaBoost assigns weights to the observations which are incorrectly
predicted and the subsequent model works to predict these values correctly.

Below are the steps for performing the AdaBoost algorithm:


1. Initially, all observations in the dataset are given equal weights.
2. A model is built on a subset of data.
3. Using this model, predictions are made on the whole dataset.
4. Errors are calculated by comparing the predictions and actual values.
5. While creating the next model, higher weights are given to the data points which were
predicted incorrectly.
6. Weights can be determined using the error value. For instance, higher the error more is
the weight assigned to the observation.
7. This process is repeated until the error function does not change, or the maximum limit
of the number of estimators is reached.
Most of the blogs or books used term is “Weak Learner”. Technically Weak Learners are called as stumps.

A tree with just one node and two leaves is called a stump. As shown in Fig.1

1. AdaBoost combines a lot of “weak

learners”(stumps) to make classifications.

2. Some stumps get more say (information) in

the classification than others.

3. Each stump is made by taking the previous

stump’s mistakes into account


Example :
The first thing we do is assign a weight to each and
every sample(data point) that indicates how
important it is to be correctly classified. Now we
create a new column with the name “Sample
Weight”.
We start by seeing how well “Chest Pain” classifies the samples and will see how the
variables(Blocked Arteries, Patient Weight ) classifies the samples

Of the 5 samples with “Heart


Disease”,3 were correctly classified
as having Heart Disease and 2 were
incorrectly classified.

Of the 3 samples “without Heart


Disease”, 2 were correctly classified
as not Heart Disease, and 1 was
incorrectly classified.
Blocked Arteries

Of the 6 samples with “Heart Disease”,3

were correctly classified as having Heart

Disease and 3 were incorrectly classified.

Of the 2 samples “without Heart Disease”,

1was correctly classified as not Heart Disease,

and 1 was incorrectly classified.


Of the 3 samples with “Heart Disease”,3 were

correctly classified as having Heart Disease, and

0 was incorrectly classified.

Of the 5 samples “without Heart Disease”, 4

were correctly classified as not Heart Disease,

and 1 was incorrectly classified.


Now will calculate the Gini Index for these three stumps

1. Calculate Gini for subnode “Yes” = ((3/5)*(3/5))+((2/5)*(2/5))=0.52. Gini for


subnode “No” = ((2/3)*(2/3))+((1/3)*(1/3)) = 0.556
2. Calculate Weighted Gini Index for split “Chest Pain” =1- ((5/8)*0.52+(3/8)*0.556)
= 0.47

In the same way, we calculated for “Blood Arteries” and “Patient Weight” the Gini Index
for Blood Arteries is 0.5 and Patient Weight is 0.2

Chest Pain 0.47

Blocked Arteries 0.5

Patient Weight 0.2


The Gini Index for “Patient Weight” is the lowest, so this
would be the first stump.

Now we need to determine how much “Amount of


Say”(Information) of this stump will have in the final
classification.

Formula to calculate the “Amount of say” is


First, we need to calculate “Total Error” for stumps. The Total Error for a stump is the sum of the
weights associated with the incorrectly classified samples.

Total Error for Chest Pain:- It made 3 errors i.e 1/8+1/8+1/8=3/8.

Total Error for Blood Arteries:- It made 4 errors i.e 1/8+1/8+1/8+1/8 = 4/8.

Total Error for Patient Weight:- It made 1 error i.e 1/8.

Note:- Because all the sample's weight is added up to 1, Total Error will always be between 0 and 1.

0 indicates perfect stump, 1 indicates horrible stump.

Now, the Amount of say for “Patient Weight” is 0.97 [1/2 log(7)].
Basically boosting will increase sample weight for incorrectly classified samples and
decrease sample weight for correctly classified samples.

The formula is below for updating weights of incorrectly classified


samples:-

The amount of, say (alpha) will be negative when the sample is
correctly classified.

The amount of, say (alpha) will be positive when the sample is
miss-classified.
There is one misclassified sample, Here sample weight of that sample is 0.125 and the

amount of say of Patient Weight is 0.97


After adding the New Sample Weight Column to our Dataset…
observe, the New Sample Weight of a misclassified sample is increased from 0.125 to 0.33 and
correctly classified samples weight is decreased from 0.125 to 0.05.
New Sample Weight = 0.125* e^0.97 =
0.125 * 2.67 = 0.33. The New Sample
Weight for a misclassified sample is
0.33, which is more than the old weight.

There are seven correctly classified


samples, Here sample weight of that
samples is 1/8 and the amount of say
of Patient Weight is 0.97

New Sample Weight = 0.125*e^-0.97 =


0.125*0.38 = 0.05. The New Sample
Weight for correctly classified samples is
0.05, which is less than the old weight.
We know that the total sum of the
sample weights must be equal to 1,
but here if we sum up all the new
sample weights, we will get 0.68.
To bring this sum equal to 1, we will
normalize these weights by dividing
all the weights by the total sum of
updated weights, which is 0.68.
So, after normalizing the sample
weights, we get this dataset, and
now the sum is equal to 1.
Now, we need to make a new dataset to see if the errors decreased or not. For
this, based on the “new sample weights,” we will divide our data points into
buckets.
Step 6: New Dataset

Now, what the algorithm does is selects


random numbers from 0-1. Since incorrectly
classified records have higher sample
weights, the probability of selecting those
records is very high.

Suppose the 8 random numbers are


0.78,0.56,0.94,0.24,0.68,0.32,0.13,0.73.

Now we will see where these random


numbers fall in the bucket, and according to it,
we’ll make our new dataset shown below.
Ultimately the wrongly classified sample was added to the new
collection of samples(dataset) 4 times, reflecting its larger
sample weight.
From now will use the new collection of samples as a dataset and repeat the same procedure as above i.e

1. Assign equal weights to all samples


2. Find the stump that does the best job classifying the new collection of samples.
3. Calculate “Total Error and Amount of say” to calculate New Sample Weight
4. Normalize the New Sample Weights
5. Repeat above 4 steps until all the samples correctly classified.

So that is how the errors that the first tree makes influence how the second tree is made… and the errors
that the second tree makes influence how the third tree is made, … and so on
Finally, Now we need to talk about how a forest of stumps created by AdaBoost
makes classification.

Imagine there are 6 stumps are created by the AdaBoost algorithm. Out of 6 stumps,
4 stumps are classified patient has Heart Disease, and the other 2 stumps
classified patient does not have Heart Disease.

These are the Amount of Say for these stumps are 0.97+0.32+0.78+0.63 = 2.7,
and the Amount of Say of the other 2 stumps are 0.41+0.82=1.23.

Ultimately, the patient is classified as Has Heart Disease because of the larger
Amount of Say(2.7).
Gradient Boosting (GBM)
Gradient Boosting or GBM is another ensemble machine
learning algorithm that works for both regression and
classification problems.
GBM uses the boosting technique, combining a number of
weak learners to form a strong learner.
Regression trees used as a base learner, each subsequent
tree in series is built on the errors calculated by the previous
tree.
XGBoost
XGBoost (extreme Gradient Boosting) is an advanced implementation of the gradient
boosting algorithm. XGBoost has proved to be a highly effective ML algorithm.

XGBoost has high predictive power and is almost 10 times faster than the other gradient
boosting techniques.

It also includes a variety of regularization which reduces overfitting and improves overall
performance. Hence it is also known as ‘regularized boosting‘ technique.

Refer https://www.geeksforgeeks.org/xgboost/

You might also like