unit 5 ML
unit 5 ML
How It Works:
Bagging creates multiple subsets of the training data by randomly selecting data
points with replacement (bootstrap sampling).
Each subset is used to train a separate model (e.g., decision trees in Random Forest).
The final prediction is obtained by averaging the outputs (for regression) or majority
voting (for classification).
Steps in Bagging:
1. Create multiple random subsets of the dataset (bootstrap samples).
2. Train a separate model for each subset independently.
3. Run all models in parallel.
4. Combine their predictions by averaging (for regression) or voting (for classification).
Example: Random Forest is a popular bagging algorithm that trains multiple decision
trees and averages their predictions.
Advantages of Bagging:
Reduces variance and prevents overfitting.
Works well with high-variance models (e.g., decision trees).
Can be used for both classification and regression tasks.
2. Boosting – Reducing Bias and Improving Performance
Goal: Convert weak learners into strong learners by focusing on misclassified data.
An illustration presenting the intuition behind the boosting algorithm, consisting of the
parallel learners and weighted dataset.
How It Works:
Unlike bagging, boosting trains models sequentially, where each new model focuses
on correcting the errors of the previous one.
In each step, misclassified data points are given higher weights, so the next model
learns better.
The final model is obtained by combining the predictions of all models, weighted by
their accuracy.
Steps in Boosting:
1. Train the first model on the entire dataset.
2. Identify misclassified data points and assign them higher weights.
3. Train the next model, focusing more on the misclassified points.
4. Repeat the process, continuously improving the model.
5. The final prediction is obtained by weighted averaging (regression) or weighted
voting (classification).
Popular Boosting Algorithms:
AdaBoost (Adaptive Boosting) – Increases weights of misclassified points.
Gradient Boosting – Minimizes residual errors using gradient descent.
XGBoost – Optimized version of Gradient Boosting (faster, handles missing data).
Advantages of Boosting:
Increases accuracy by correcting previous errors.
Works well with complex datasets.
Handles imbalanced data better than bagging.
3. Stacking – Combining Models for Better Predictions
Goal: Improve predictive performance by learning how to best combine multiple models.
How It Works:
Stacking trains multiple models (base learners) in parallel on the same dataset.
Their predictions are used as inputs for a meta-model, which makes the final
prediction.
The meta-model learns how to best combine the outputs of the base models.
Steps in Stacking:
1. Train multiple models (e.g., Decision Tree, SVM, Neural Network) on the same
dataset.
2. Collect predictions from all models.
3. Train a meta-learner (e.g., Logistic Regression) using these predictions as input.
4. The meta-model makes the final prediction.
Example:
A stacking model might combine:
Decision Trees (capture non-linear relationships)
Support Vector Machines (SVM) (good for high-dimensional data)
Neural Networks (good for complex patterns)
Meta-learner: Logistic Regression (learns how to combine all the models)
Advantages of Stacking
Uses different algorithms to capture different patterns.
Can outperform individual models when properly tuned.
Works well for both classification and regression.
Comparison of Bagging, Boosting, and Stacking
Majority voting
Final Weighted sum of Meta-model learns
(classification), averaging
Prediction weak models best combination
(regression)
AdaBoost,
Example Logistic Regression
Random Forest XGBoost, Gradient
Models as meta-learner
Boosting
Voting:
Voting ensembles are the ensembles technique that trains the multiple machine learning
models, and then predictions from all the individual models are combined for output. Voting
can be used in both classification and regression tasks. A Voting Classifier is a machine
learning model that trains on an ensemble of numerous models and predicts an output
(class) based on their highest probability of chosen class as the output. The idea is instead of
creating separate dedicated models and finding the accuracy for each them, we create a
single model which trains by these models and predicts output based on their combined
majority of voting for each output class.
C1 1 -1 1
C2 -1 1 1
C3 1 1 -1
Class Classifier 1 Classifier 2 Classifier 3
C4 -1 -1 -1
Stacking:
Stacking is a way to ensemble multiple classifications or regression model. There are many
ways to ensemble models, the widely known models are Bagging or Boosting. Bagging
allows multiple similar models with high variance are averaged to decrease variance.
Boosting builds multiple incremental models to decrease the bias, while keeping variance
small.
Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of
stacking is to explore a space of different models for the same problem. The idea is that you
can attack a learning problem with different types of models which are capable to learn
some part of the problem, but not the whole space of the problem. So, you can build
multiple different learners and you use them to build an intermediate prediction, one
prediction for each learned model. Then you add a new model which learns from the
intermediate predictions the same target.
This final model is said to be stacked on the top of the others, hence the name. Thus, you
might improve your overall performance, and often you end up with a model which is better
than any individual intermediate model. Notice however, that it does not give you any
guarantee, as is often the case with any machine learning technique.