Ensemble Classification
Ensemble Classification
AI42001
Different classifiers, different results
f1(X) = Y1
f2(X) = Y2
………
fK(X) = YK
• How to choose a single answer??
Simple Voting
f1(X) = Y1
f2(X) = Y2
………
fK(X) = YK
• How to choose a single answer??
• Classification: mode of all predictions!
• Regression: mean of all predictions!
Simple Voting
• How to choose a single answer??
classifier f1 f2 f3 f4 f5 f6
Label 1 1 2 1 3 2
• Final prediction: Y = 2!
How to have many classifiers?
• Same training set, different classifier functions
• Split the dataset into multiple parts; train a classifier on each part
• Select different subsets of features; train a classifier for each subset
X1 X5 X9 X3 X4 X5 X1 X1 X6 X7 X9
X6 X3 X4 X10 X3 X7 X3 X6 X8 X2
X4 X7 X9 X5 X3 X6 X2 X6 X1 X6
Bootstrap Aggregation
• Given a training set of size N:
• Choose N examples from it with replacement
• i-th draw: each training example may be chosen with probability 1/N
• Same example may be chosen multiple times!
• Each “version” has all the training examples, but only some of the
features!
X(1) X(4) X(2) X(4)
X(1)
X(2) X(2) X(1) X(2)
X(3) X(3) X(4) X(3)
X(4) X(3) X(3) X(2)
X(5) X(2) X(3) X(3)
BAGGing of features
• Instead of choosing subsets of training examples
• We can choose subsets of the features!
• Each “version” has all the training examples, but only some of the
features!
• Weighted K-NN: consider weighted vote of each class label among the
K nearest neighbors
• Weighted Decision Tree: at each node, consider “weighted relative
frequencies” of each class label
Boosting
• 1) Take a “weak” classifier (just better than random)
• 2) Compute the error of the model on each training example
• 3) Identify the examples on which it makes mistakes
• 4) Increase the “weights” of such examples and retrain classifier
• 5) Add the updated classifier to the ensemble
• 6) Goto 2
Boosting
Adaboost Algorithm
• Initialize weights of each training sample wi = 1/N
• For iteration t = 1 to max_iter
• Learn a classifier ht on training examples according to weights w
• Calculate weighted error : et = ∑I wi I(ht(Xi) != yt)
• Set weightage of ht : at = 0.5 log ((1-et) / et)
• Update the weight of each training example
wi = wi exp(-at) if ht(xi) = yi (correct classification: decrease weight)
wi = wi exp(at) if ht(xi) != yi (wrong classification: increase weight)
• Normalize the weights so that they add up to 1
• End
• Output: all the classifiers “h” along with their weights “a”
Mixture of Experts
• Different classifiers may be more effective in different parts of feature
space
• Weights of classifiers should be dependent on features
Summary
• Complex learning problems can often be solved by effectively
combining multiple weak model via ensemble learning methods
• Simple ones: Voting/averaging or stacking
• Bagging: Random Forests
• Boosting: Adaboost
• Mixture of Experts
• These models help reduce variance or overfitting, and may have
computational benefits over more complex classification algorithms
One-vs-one Classification
One-vs-all Classification