0% found this document useful (0 votes)

4 views

Data Mining - Ensemble Methods

Uploaded by

pawankr16123114

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Data Mining - Ensemble Methods

Uploaded by

pawankr16123114

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

ENSEMBLE METHODS

ENSEMBLE METHODS: INCREASING THE

CLASSIFICATION ACCURACY

 Ensemble methods
 An ensemble for classification is a composite model,
made up of a combination of classifiers.
 Use a combination of models to increase accuracy

 Popular ensemble methods

 Bagging: averaging the prediction over a collection of
classifiers
 Boosting: weighted vote with a collection of classifiers
 Random Forests: collection of decision trees

2
ENSEMBLE METHODS

 It combines a series of k learned models, M 1 , M 2 , …, M k , with the

aim of creating an improved model M*
 A given data set, D, is used to create k training sets, D 1 , D 2 ,...,
D k , where D i (1 ≤ i ≤ k) is used to generate classifier M i .
 Given a new data tuple to classify, the base classifiers each vote
by returning a class prediction .
 The ensemble returns a class prediction based on the votes of
the base classifiers .
 An ensemble tends to be more accurate than its base classifiers.
BAGGING: BOOSTRAP AGGREGATION

 Analogy: Diagnosis based on multiple doctors’ majority vote

 Training
 Given a set D of d tuples, at each iteration i, a training set D i of d
tuples is sampled with replacement from D (i.e., bootstrap)
 A classifier model M i is learned for each training set D i
 Classification: classify an unknown sample X
 Each classifier M i returns its class prediction
 The bagged classifier M* counts the votes and assigns the class
with the most votes to X
 Prediction: can be applied to the prediction of continuous values
by taking the average value of each prediction for a given test
tuple
 Accuracy
 Often significantly better than a single classifier derived from D
 For noise data: not considerably worse, more robust
 Proved improved accuracy in prediction
4
ALGORITHM - BAGGING
BOOSTING

 Analogy: Consult several doctors, based on a combination

of weighted diagnoses—weight assigned based on the
previous diagnosis accuracy
 How boosting works?
 Weights are assigned to each training tuple
 A series of k classifiers is iteratively learned
 After a classifier M i is learned, the weights are updated to
allow the subsequent classifier, M i+1 , to pay more
attention to the training tuples that were misclassified by
Mi
 The final M* combines the votes of each individual
classifier, where the weight of each classifier's vote is a
function of its accuracy
 The basic idea is that when we build a classifier, we want it
to focus more on the misclassified tuples of the previous
round.
 Some classifiers may be better at classifying some
“difficult” tuples than others.
 In this way, we build a series of classifiers that
complement each other. 6
ADAPTIVE BOOSTING (AdaBoost)

 Suppose, we are given D, a data set of d class-labeled tuples,

(X1 , y1),(X2, y2),..,(Xd, yd), where yi is the class label of tuple Xi.
 Initially, AdaBoost assigns each training tuple an equal weight of
1/d.
 Generating k classifiers for the ensemble requires k rounds
through the rest of the algorithm.
 In round i, the tuples from D are sampled to form a training set,
Di , of size d.
 Sampling with replacement is used—the same tuple may be
selected more than once.
 A classifier model, Mi , is derived from the training tuples of Di
 If a tuple was incorrectly classified, its weight is increased.
 If a tuple was correctly classified, its weight is decreased.
 These weights will be used to generate the training samples for
the classifier of the next round.
ADAPTIVE BOOSTING (AdaBoost)

 A tuple’s weight reflects how dif ficult it is to classify — the higher

the weight, the more often it has been misclassified
 These weights will be used to generate the training samples for
the classifier of the next round .
 The basic idea is that when we build a classifier, we want it to
focus more on the misclassified tuples of the previous round .
 Some classifiers may be better at classifying some “difficult”
tuples than others
 To compute the error rate of model Mi , we sum the weights of
each of the tuples in Di that Mi misclassified.

 where err(Xj)is the misclassification error of tuple Xj : If the tuple

was misclassified, then err( Xj) is 1; otherwise, it is 0
ADAPTIVE BOOSTING (AdaBoost)

 “Once boosting is complete, how is the ensemble of classifiers

used to predict the class label of a tuple, X?”
 Unlike bagging, where each classifier was assigned an equal
vote, boosting assigns a weight to each classifier’s vote, based
on how well the classifier performed
 The lower a classifier’s error rate, the more accurate it is
 and therefore, the higher its weight for voting should be
 The weight of classifier Mi ’s vote is

 For each class, c, we sum the weights of each classifier that

assigned class c to X
 The class with the highest sum is the “winner” and is returned as
the class prediction for tuple X
ALGORITHM - AdaBoost
RANDOM FOREST (BREIMAN 2001)
 Random Forest:
 Each classifier in the ensemble is a decision tree classifier and is
generated using a random selection of attributes at each node to
determine the split
 During classification, each tree votes and the most popular class is
returned
 Two Methods to construct Random Forest:
 Forest-RI (random input selection): Randomly select, at each node, F
attributes as candidates for the split at the node. The CART
methodology is used to grow the trees to maximum size
 Forest-RC (random linear combinations): Creates new attributes (or
features) that are a linear combination of the existing attributes
(reduces the correlation between individual classifiers)
 Comparable in accuracy to Adaboost, but more robust to errors and
outliers
 Insensitive to the number of attributes selected for consideration at
each split, and faster than bagging or boosting

11
CLASSIFICATION OF CLASS-IMBALANCED
DATA SETS

 Class-imbalance problem: Rare positive example but

numerous negative ones, e.g., medical diagnosis, fraud, oil -
spill, fault, etc.
 Traditional methods assume a balanced distribution of
classes and equal error costs: not suitable for class -
imbalanced data
 Typical methods for imbalance data in 2 -class classification:
 Oversampling: re-sampling of data from positive class
 Under-sampling: randomly eliminate tuples from negative
class
 Threshold-moving: moves the decision threshold, t, so that
the rare class tuples are easier to classify, and hence, less
chance of costly false negative errors
 Ensemble techniques: Ensemble multiple classifiers
introduced above
 Still difficult for class imbalance problem on multiclass
tasks
12

TYCS Data Science Questions Bank
No ratings yet
TYCS Data Science Questions Bank
3 pages
cs229 Notes Ensemble
No ratings yet
cs229 Notes Ensemble
7 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Ensemble Method
No ratings yet
Ensemble Method
8 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
UNIT 3 AML
No ratings yet
UNIT 3 AML
9 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
ML-Lecture-15-Ensemble
No ratings yet
ML-Lecture-15-Ensemble
27 pages
Unit V -Multiple Learners
No ratings yet
Unit V -Multiple Learners
54 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
Module 2
No ratings yet
Module 2
34 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
AI25
No ratings yet
AI25
7 pages
ML UNIT 3-1
No ratings yet
ML UNIT 3-1
14 pages
Ensemble (v6)
No ratings yet
Ensemble (v6)
45 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
ML8Ensembles (1)
No ratings yet
ML8Ensembles (1)
31 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
U1-Ensemble Methods
No ratings yet
U1-Ensemble Methods
17 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
AdaBoost Notes
No ratings yet
AdaBoost Notes
5 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
ensembles_learning
No ratings yet
ensembles_learning
16 pages
AdaBoost Classifier in Python (Article) - DataCamp
100% (1)
AdaBoost Classifier in Python (Article) - DataCamp
9 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Topics in Applied Econometrics MIT 14.387 J. Angrist Spring 2004 W. Newey
No ratings yet
Topics in Applied Econometrics MIT 14.387 J. Angrist Spring 2004 W. Newey
7 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Managerial Economics in A Global Economy, 5th Edition by Dominick Salvatore
No ratings yet
Managerial Economics in A Global Economy, 5th Edition by Dominick Salvatore
26 pages
SEM
100% (1)
SEM
20 pages
Logisticregression PDF
No ratings yet
Logisticregression PDF
48 pages
Nonlinear Least Squares Regression - Lecture13
No ratings yet
Nonlinear Least Squares Regression - Lecture13
10 pages
Regression
No ratings yet
Regression
3 pages
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
No ratings yet
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
7 pages
ECN 702 Final Examination Question Paper
No ratings yet
ECN 702 Final Examination Question Paper
6 pages
HW2 Applied Questions: 1 Problem 6
No ratings yet
HW2 Applied Questions: 1 Problem 6
24 pages
Subject: Validity and Reliability Test Prepared By: Taufiq Kurniawan, S.Si, MM
No ratings yet
Subject: Validity and Reliability Test Prepared By: Taufiq Kurniawan, S.Si, MM
4 pages
Binary ClassificationMetrics Cheathsheet
No ratings yet
Binary ClassificationMetrics Cheathsheet
7 pages
How To Apply Panel ARDL Using EVIEWS
100% (3)
How To Apply Panel ARDL Using EVIEWS
4 pages
JURNAL
No ratings yet
JURNAL
18 pages
AMDA Cheat Sheet Spring FINAL3
No ratings yet
AMDA Cheat Sheet Spring FINAL3
2 pages
Regression Analysis: Interpretation of Regression Model
No ratings yet
Regression Analysis: Interpretation of Regression Model
22 pages
Econometrics I: Chapter 2: Two Variable Regression Analysis: Some Basic Ideas
No ratings yet
Econometrics I: Chapter 2: Two Variable Regression Analysis: Some Basic Ideas
18 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Williams L.J., Krishnan A., Abdi H - Experimental Design and Analysis For Psychology PDF
100% (1)
Williams L.J., Krishnan A., Abdi H - Experimental Design and Analysis For Psychology PDF
192 pages
SPSS for Research Methods A Basic Guide, 2nd Edition ISBN 0393543064, 9780393543063 pdf docx
No ratings yet
SPSS for Research Methods A Basic Guide, 2nd Edition ISBN 0393543064, 9780393543063 pdf docx
15 pages
Mid Exam
No ratings yet
Mid Exam
3 pages
Statistical Package For Social Science
No ratings yet
Statistical Package For Social Science
4 pages
Quantitative Methods For The Social Sciences
No ratings yet
Quantitative Methods For The Social Sciences
8 pages
Decision Sciences Formulae Sheet
No ratings yet
Decision Sciences Formulae Sheet
3 pages
Economics class 11 F
No ratings yet
Economics class 11 F
12 pages
Model Solution_econ f241 Mid (1)
No ratings yet
Model Solution_econ f241 Mid (1)
3 pages
revision eco430
No ratings yet
revision eco430
7 pages
BUSINESS STATISTICS: Simple Linear Regression and Correlation
No ratings yet
BUSINESS STATISTICS: Simple Linear Regression and Correlation
55 pages
Arfabark Example
No ratings yet
Arfabark Example
25 pages

Data Mining - Ensemble Methods

Uploaded by

Data Mining - Ensemble Methods

Uploaded by

ENSEMBLE METHODS

ENSEMBLE METHODS: INCREASING THE

 Popular ensemble methods

 It combines a series of k learned models, M 1 , M 2 , …, M k , with the

 Analogy: Diagnosis based on multiple doctors’ majority vote

 Analogy: Consult several doctors, based on a combination

 Suppose, we are given D, a data set of d class-labeled tuples,

 A tuple’s weight reflects how dif ficult it is to classify — the higher

 where err(Xj)is the misclassification error of tuple Xj : If the tuple

 “Once boosting is complete, how is the ensemble of classifiers

 For each class, c, we sum the weights of each classifier that

 Class-imbalance problem: Rare positive example but

You might also like