0% found this document useful (0 votes)
4 views

Data Mining - Ensemble Methods

Uploaded by

pawankr16123114
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Mining - Ensemble Methods

Uploaded by

pawankr16123114
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ENSEMBLE METHODS

ENSEMBLE METHODS: INCREASING THE


CLASSIFICATION ACCURACY

 Ensemble methods
 An ensemble for classification is a composite model,
made up of a combination of classifiers.
 Use a combination of models to increase accuracy

 Popular ensemble methods


 Bagging: averaging the prediction over a collection of
classifiers
 Boosting: weighted vote with a collection of classifiers
 Random Forests: collection of decision trees

2
ENSEMBLE METHODS

 It combines a series of k learned models, M 1 , M 2 , …, M k , with the


aim of creating an improved model M*
 A given data set, D, is used to create k training sets, D 1 , D 2 ,...,
D k , where D i (1 ≤ i ≤ k) is used to generate classifier M i .
 Given a new data tuple to classify, the base classifiers each vote
by returning a class prediction .
 The ensemble returns a class prediction based on the votes of
the base classifiers .
 An ensemble tends to be more accurate than its base classifiers.
BAGGING: BOOSTRAP AGGREGATION

 Analogy: Diagnosis based on multiple doctors’ majority vote


 Training
 Given a set D of d tuples, at each iteration i, a training set D i of d
tuples is sampled with replacement from D (i.e., bootstrap)
 A classifier model M i is learned for each training set D i
 Classification: classify an unknown sample X
 Each classifier M i returns its class prediction
 The bagged classifier M* counts the votes and assigns the class
with the most votes to X
 Prediction: can be applied to the prediction of continuous values
by taking the average value of each prediction for a given test
tuple
 Accuracy
 Often significantly better than a single classifier derived from D
 For noise data: not considerably worse, more robust
 Proved improved accuracy in prediction
4
ALGORITHM - BAGGING
BOOSTING

 Analogy: Consult several doctors, based on a combination


of weighted diagnoses—weight assigned based on the
previous diagnosis accuracy
 How boosting works?
 Weights are assigned to each training tuple
 A series of k classifiers is iteratively learned
 After a classifier M i is learned, the weights are updated to
allow the subsequent classifier, M i+1 , to pay more
attention to the training tuples that were misclassified by
Mi
 The final M* combines the votes of each individual
classifier, where the weight of each classifier's vote is a
function of its accuracy
 The basic idea is that when we build a classifier, we want it
to focus more on the misclassified tuples of the previous
round.
 Some classifiers may be better at classifying some
“difficult” tuples than others.
 In this way, we build a series of classifiers that
complement each other. 6
ADAPTIVE BOOSTING (AdaBoost)

 Suppose, we are given D, a data set of d class-labeled tuples,


(X1 , y1),(X2, y2),..,(Xd, yd), where yi is the class label of tuple Xi.
 Initially, AdaBoost assigns each training tuple an equal weight of
1/d.
 Generating k classifiers for the ensemble requires k rounds
through the rest of the algorithm.
 In round i, the tuples from D are sampled to form a training set,
Di , of size d.
 Sampling with replacement is used—the same tuple may be
selected more than once.
 A classifier model, Mi , is derived from the training tuples of Di
 If a tuple was incorrectly classified, its weight is increased.
 If a tuple was correctly classified, its weight is decreased.
 These weights will be used to generate the training samples for
the classifier of the next round.
ADAPTIVE BOOSTING (AdaBoost)

 A tuple’s weight reflects how dif ficult it is to classify — the higher


the weight, the more often it has been misclassified
 These weights will be used to generate the training samples for
the classifier of the next round .
 The basic idea is that when we build a classifier, we want it to
focus more on the misclassified tuples of the previous round .
 Some classifiers may be better at classifying some “difficult”
tuples than others
 To compute the error rate of model Mi , we sum the weights of
each of the tuples in Di that Mi misclassified.

 where err(Xj)is the misclassification error of tuple Xj : If the tuple


was misclassified, then err( Xj) is 1; otherwise, it is 0
ADAPTIVE BOOSTING (AdaBoost)

 “Once boosting is complete, how is the ensemble of classifiers


used to predict the class label of a tuple, X?”
 Unlike bagging, where each classifier was assigned an equal
vote, boosting assigns a weight to each classifier’s vote, based
on how well the classifier performed
 The lower a classifier’s error rate, the more accurate it is
 and therefore, the higher its weight for voting should be
 The weight of classifier Mi ’s vote is

 For each class, c, we sum the weights of each classifier that


assigned class c to X
 The class with the highest sum is the “winner” and is returned as
the class prediction for tuple X
ALGORITHM - AdaBoost
RANDOM FOREST (BREIMAN 2001)
 Random Forest:
 Each classifier in the ensemble is a decision tree classifier and is
generated using a random selection of attributes at each node to
determine the split
 During classification, each tree votes and the most popular class is
returned
 Two Methods to construct Random Forest:
 Forest-RI (random input selection): Randomly select, at each node, F
attributes as candidates for the split at the node. The CART
methodology is used to grow the trees to maximum size
 Forest-RC (random linear combinations): Creates new attributes (or
features) that are a linear combination of the existing attributes
(reduces the correlation between individual classifiers)
 Comparable in accuracy to Adaboost, but more robust to errors and
outliers
 Insensitive to the number of attributes selected for consideration at
each split, and faster than bagging or boosting

11
CLASSIFICATION OF CLASS-IMBALANCED
DATA SETS

 Class-imbalance problem: Rare positive example but


numerous negative ones, e.g., medical diagnosis, fraud, oil -
spill, fault, etc.
 Traditional methods assume a balanced distribution of
classes and equal error costs: not suitable for class -
imbalanced data
 Typical methods for imbalance data in 2 -class classification:
 Oversampling: re-sampling of data from positive class
 Under-sampling: randomly eliminate tuples from negative
class
 Threshold-moving: moves the decision threshold, t, so that
the rare class tuples are easier to classify, and hence, less
chance of costly false negative errors
 Ensemble techniques: Ensemble multiple classifiers
introduced above
 Still difficult for class imbalance problem on multiclass
tasks
12

You might also like