Ensemble Methods
Ensemble Methods
Ensemble Methods in Machine Learning: What are They and Why Use
Them?
What are Ensemble Methods?
Ensemble methods are techniques that aim at improving the
accuracy of results in models by combining multiple models
instead of using a single model.
The most popular ensemble methods are boosting, bagging, and stacking.
Ensemble methods are ideal for regression and classification, where they
Ensemble methods fall into two broad categories, i.e., sequential ensemble
techniques and parallel ensemble techniques.
Boosting works by arranging weak learners in a sequence, such that weak learners
learn from the next learner in the sequence to create better predictive models.
A tree with just one node and two leaves is called a stump. As shown in Fig.1
In the same way, we calculated for “Blood Arteries” and “Patient Weight” the Gini Index
for Blood Arteries is 0.5 and Patient Weight is 0.2
Total Error for Blood Arteries:- It made 4 errors i.e 1/8+1/8+1/8+1/8 = 4/8.
Note:- Because all the sample's weight is added up to 1, Total Error will always be between 0 and 1.
Now, the Amount of say for “Patient Weight” is 0.97 [1/2 log(7)].
Basically boosting will increase sample weight for incorrectly classified samples and
decrease sample weight for correctly classified samples.
The amount of, say (alpha) will be negative when the sample is
correctly classified.
The amount of, say (alpha) will be positive when the sample is
miss-classified.
There is one misclassified sample, Here sample weight of that sample is 0.125 and the
So that is how the errors that the first tree makes influence how the second tree is made… and the errors
that the second tree makes influence how the third tree is made, … and so on
Finally, Now we need to talk about how a forest of stumps created by AdaBoost
makes classification.
Imagine there are 6 stumps are created by the AdaBoost algorithm. Out of 6 stumps,
4 stumps are classified patient has Heart Disease, and the other 2 stumps
classified patient does not have Heart Disease.
These are the Amount of Say for these stumps are 0.97+0.32+0.78+0.63 = 2.7,
and the Amount of Say of the other 2 stumps are 0.41+0.82=1.23.
Ultimately, the patient is classified as Has Heart Disease because of the larger
Amount of Say(2.7).
Gradient Boosting (GBM)
Gradient Boosting or GBM is another ensemble machine
learning algorithm that works for both regression and
classification problems.
GBM uses the boosting technique, combining a number of
weak learners to form a strong learner.
Regression trees used as a base learner, each subsequent
tree in series is built on the errors calculated by the previous
tree.
XGBoost
XGBoost (extreme Gradient Boosting) is an advanced implementation of the gradient
boosting algorithm. XGBoost has proved to be a highly effective ML algorithm.
XGBoost has high predictive power and is almost 10 times faster than the other gradient
boosting techniques.
It also includes a variety of regularization which reduces overfitting and improves overall
performance. Hence it is also known as ‘regularized boosting‘ technique.
Refer https://www.geeksforgeeks.org/xgboost/