0% found this document useful (0 votes)

19 views45 pages

Ensemble (v6)

The document discusses ensemble methods in machine learning, focusing on techniques like bagging and boosting to improve model performance with minimal modifications. It explains the concepts of bias and variance, the use of diverse classifiers, and the importance of aggregating their outputs. Additionally, it covers specific algorithms such as Random Forest and AdaBoost, detailing their frameworks and mechanisms for enhancing weak classifiers.

Uploaded by

snowxiaoyu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views45 pages

Ensemble (v6)

Uploaded by

snowxiaoyu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Ensemble

Introduction
• We are almost at the end of the semester/final
competition.
• https://inclass.kaggle.com/c/ml2016-cyber-security-
attack-defender/leaderboard
• https://www.kaggle.com/c/outbrain-click-prediction/
leaderboard
• https://www.kaggle.com/c/transfer-learning-on-stack-
exchange-tags/leaderboard
• You already developed some algorithms and codes.
Lazy to modify them.
• Ensemble: improving your machine with little
modification
Framework of Ensemble
• Get a set of classifiers
• , , ……

坦補 DD They should be diverse.

• Aggregate the classifiers (properly)

• 在打王時每個人都有該站的位置
Ensemble: Bagging
Review: Bias v.s. Variance
Error from bias
Error from variance
Error observed

Underfitting Overfitting

Large Bias Small Bias

Small Variance Large Variance
Universe 1 Universe 2 Universe 3

A complex model will If we average all the , is

have large variance. it close to

We can average
𝐸 𝑓 ]= ^
[ ∗
𝑓
complex models to
reduce variance.
Sampling N’
N training
Bagging examples
examples with
replacement
(usually N=N’)

Set 1 Set 2 Set 3 Set 4

Function Function Function Function

1 2 3 4
This approach would be helpful when
Baggingyour model is complex, easy to overfit.
e.g. decision tree

Testing data x

Function Function Function Function

1 2 3 4

y1 y2 y3 y4

Average/voting
Assume each object x is
Decision Tree represented by a 2-dim vector
𝑥2

x1 < 0.5
𝑥 2=0.7
yes no 𝑥 2=0.3

x2 < 0.3 x2 < 0.7 𝑥1

𝑥1 =0.5
yes no yes no The questions in training
Class 1 Class 2 Class 2 Class 1 …..
number of
branches,
Can have more complex questions Branching
criteria,
Experiment: Function of
Miku

http://speech.ee.ntu.edu.tw/~tlkagk/
courses/MLDS_2015_2/theano/miku
(1st column: x, 2nd column: y, 3rd column: output (1 or 0) )
Experiment:
Function of Miku

Single Depth = 5 Depth = 10

Decision
Tree

Depth = 15 Depth = 20
train f1 f2 f3 f4
x1 O X O X
Random Forest x2 O X X O
x3 X O O X
• Decision tree: x4 X O X O
• Easy to achieve 0% error rate on training data
• If each training example has its own leaf ……
• Random forest: Bagging of decision tree
• Resampling training data is not sufficient
• Randomly restrict the features/questions used in each
split
• Out-of-bag validation for bagging
• Using RF = f2+f4 to test x1
Out-of-bag (OOB) error
• Using RF = f2+f3 to test x2
• Using RF = f1+f4 to test x3 Good error estimation
of testing set
• Using RF = f1+f3 to test x 4
Experiment:
Function of Miku

Random
Forest Depth = 5 Depth = 10

(100 trees)

Depth = 15 Depth = 20
Ensemble: Boosting
Improving Weak Classifiers
Training data:

Boosting (binary classification)

• Guarantee:
• If your ML algorithm can produce classifier with error
rate smaller than 50% on training data
• You can obtain 0% error rate classifier after boosting.
• Framework of boosting
• Obtain the first classifier
• Find another function to help
• However, if is similar to , it will not help a lot.
• We want to be complementary with (How?)
• Obtain the second classifier
• …… Finally, combining all the classifiers
• The classifiers are learned sequentially.
How to obtain different
classifiers?
• Training on different training data sets
• How to have different training data sets
• Re-sampling your training data to form a new set
• Re-weighting your training data to form a new set
• In real implementation, you only have to change the
cost/objective function

(𝑥 , ^
1
𝑦 ,𝑢 ) 𝑢 =1
1 1 1
0.4 𝐿 ( 𝑓 ) =∑ 𝑙 ( 𝑓 ( 𝑥 ) , 𝑦 )
𝑛
^ 𝑛

𝑛
2
( 𝑥2 , ^
𝑦 2 ,𝑢 2 ) 𝑢 =1 2.1
𝐿 𝑓 =∑ 𝑢 𝑙 ( 𝑓 𝑥 , 𝑦 )
( )
𝑛 𝑛 𝑛
( ) ^
( 𝑥3 , ^
𝑦 3 , 𝑢3 ) 𝑢3 =1 0.7 𝑛
Idea of Adaboost
• Idea: training on the new training set that fails
• How to find a new training set that fails ?
: the error rate of on its training data
∑ 𝑢𝑛1 𝛿 ( 𝑓 1 ( 𝑥 𝑛 ) ≠ ^
𝑦 )
𝑍 1 =∑ 𝑢
𝑛
𝑛
𝜀1 =
𝑛
1 𝜀1 < 0.5
𝑍1
𝑛
Changing the example weights from to such that

∑ 𝑢𝑛2 𝛿 ( 𝑓 1 ( 𝑥 𝑛 ) ≠ ^
𝑦𝑛 ) The performance of for new
𝑛
= 0.5 weights would be random.
𝑍2

Training based on the new weights

Re-weighting Training Data
• Idea: training on the new training set that fails
• How to find a new training set that fails ?

𝑢 =1/ √ 3
1
( 𝑥1, ^
𝑦 1 , 𝑢1 ) 𝑢 = 1
1

𝑢 =√ 3
2 2
𝑦 2 ,𝑢 2 ) 𝑢 =1
(𝑥 , ^
2

𝑢 =1/ √ 3
3 3
3
𝑦 , 𝑢 ) 𝑢 =1
(𝑥 , ^3 3

𝑢 =1 / √ 3
4
4
(𝑥 , ^
4
𝑦 ,𝑢 ) 𝑢 =1
4 4

0.5
0.25 0.5
𝑓 1 (𝑥 ) 𝑓 2 (𝑥 )
Re-weighting Training Data
• Idea: training on the new training set that fails
• How to find a new training set that fails ?

If misclassified by ()
multiplying increase
If correctly classified by ()
devided by decrease

will be learned based on example weights

What is the value of ?
Re-weighting Training Data
𝑍 1 =∑ 𝑢
∑ 𝑢𝑛1 𝛿 ( 𝑓 1 ( 𝑥 𝑛 ) ≠ ^
𝑦 )
𝑛 𝑛
𝜀1 =
𝑛 1
𝑍1 𝑛

∑ 𝑢 𝛿( 𝑓 1( 𝑥 ) ≠ ^
𝑛
2 𝑦 )
𝑛 𝑛
( 𝑛
𝑓1 𝑥 ≠ 𝑦 ) ^ 𝑛
multiplying
𝑛
= 0.5
𝑍2 𝑓 1 ( 𝑥𝑛 )= ^
𝑦𝑛 devided by

¿ ∑ 𝑢 𝑑1
𝑛
1 ¿ ∑ 𝑛
𝑢 +
2 ∑ 𝑢
𝑛
2
𝑓 1 ( 𝑥 ) ≠ ^𝑦
𝑛 𝑛
𝑓 1(𝑥 )≠ ^ 𝑓 1 ( 𝑥 )= ^
𝑛 𝑛 𝑛 𝑛
𝑦 𝑦

¿∑ 𝑢
𝑛

𝑛
2 ¿ ∑ 𝑛
𝑢 𝑑 1+
1 ∑ 𝑛
𝑢 /𝑑 1
1
𝑓 1 ( 𝑥 ) ≠ ^𝑦 𝑓 1 ( 𝑥 ) = ^𝑦
𝑛 𝑛 𝑛 𝑛

∑ 𝑢1 𝑑1
𝑛

𝑓 ( 𝑥𝑛 ) ≠ 𝑦
^𝑛
2
= 0.5
1

∑ 𝑛
𝑢 𝑑1 +
1 ∑ 𝑛
𝑢 / 𝑑1
1
𝑓 1
(𝑥 )≠ ^
𝑛
𝑦 𝑛
𝑓 1
( 𝑥 )= ^
𝑛
𝑦 𝑛
Re-weighting Training Data
𝑍 1 =∑ 𝑢
∑ 𝑢𝑛1 𝛿 ( 𝑓 1 ( 𝑥 𝑛 ) ≠ ^
𝑦 )
𝑛 𝑛
𝜀1 =
𝑛 1
𝑍1 𝑛

∑ 𝑢 𝛿( 𝑓 1( 𝑥 ) ≠ ^
𝑛
2 𝑦 )
𝑛
𝑛
( 𝑛
𝑓1 𝑥 ≠ 𝑦 ) ^ 𝑛
multiplying
𝑛
= 0.5
𝑍2 𝑓 1 ( 𝑥𝑛 )= ^
𝑦𝑛 devided by
∑ 𝑛
𝑢 1 / 𝑑1
𝑓 ( 𝑥 𝑛) = ^
𝑦
𝑛
1
=1
∑ 𝑛
𝑢1 𝑑 1
𝑓 1
( 𝑥 𝑛) ≠ 𝑦
^𝑛

∑ 𝑛
𝑢 /𝑑 1=
1 ∑ 𝑢 𝑑1 1
𝑛
1
𝑑1 𝑓
∑ 𝑛
𝑢1 =𝑑 1 ∑ 𝑛
𝑢1
𝑓 1 ( 𝑥 )= ^𝑦 𝑓 1 ( 𝑥 ) ≠ ^𝑦
𝑛 𝑛 𝑛 𝑛
1
( 𝑥 𝑛 )= ^𝑦 𝑛 𝑓 1(𝑥 )≠ ^
𝑛
𝑦
𝑛

∑ 𝑢𝑛 𝑍 1 ( 1 − 𝜀1 ) 𝑍 1 𝜀1
1
𝑓 1(𝑥 )≠ ^
𝑛 𝑛
𝑦
𝜀1 =
𝑍 1 ( 1 − 𝜀 1 ) / 𝑑 1= 𝑍 1 𝜀 1 𝑑1
∑
𝑍1 𝑛
𝑢 =𝑍 1 𝜀1
1
𝑑1= √ ( 1− 𝜀 1 ) / 𝜀 1 ¿ 1
𝑓 1 ( 𝑥 ) ≠ ^𝑦
𝑛 𝑛
Algorithm for AdaBoost
• Giving training data
• (Binary classification), (equal weights)

• Training weak classifier with weights

• For t = 1, …, T:

• is the error rate of with weights

• For n = 1, …, N:
• If is misclassified by :
𝑦 𝑛≠ 𝑓 𝑡 ( 𝑥𝑛)
^
𝑛
• Else: ¿ 𝑢𝑡 × 𝑒 xp ( 𝛼 𝑡 ) 𝑑𝑡 = √ ( 1− 𝜀 𝑡 ) / 𝜀 𝑡

𝑛 𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡
¿ 𝑢 × 𝑒 xp ( − 𝛼𝑡 )
𝑡

𝑢𝑡 +1 ←𝑢 𝑡 × 𝑒𝑥𝑝 ( − ^
𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡)
𝑛 𝑛 𝑛 𝑛
Algorithm for AdaBoost
• We obtain a set of functions: ,
• How to aggregate them?
• Uniform weight:

• Non-uniform weight: Smaller error ,

larger weight for
final voting

𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡 𝜀𝑡 = 0.1 𝜀𝑡 = 0.4

𝑢𝑡 +1 =𝑢𝑡 ×𝑒𝑥𝑝 ( − ^
𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡 )
𝑛 𝑛 𝑛 𝑛
1.10 0.20
Toy ExampleT=3, weak classifier = decision stump
• t=1

1.0 +
1.0 +
1.0 - 1.53 +
+
1.53
0.65 -
1.0 + 1.53 +
1.0 + - - 0.65 + - -
1.0 1.0 0.65 0.65
𝜀1 =0.30
1.0 + 1.0 - 0.65 + 0.65 -
1.0 - 𝑑1=1.53 0.65 -
𝛼 1=0.42
𝑓 1 (𝑥 )
Toy ExampleT=3, weak classifier = decision stump
:
• t=2
𝛼 1=0.42

1.53 +
+
1.53
0.65 -
0.78 +
0.78 + 0.33 -
1.53 + 0.78 +

1.26- 1.26
-
- 0.33 +
0.65 + -
0.65 0.65
𝜀 2=0.21
0.33 + 0.33 -
0.65 + 0.65 -
0.65 - 𝑑2 =1.94 1.26 -
𝛼 2=0.66
𝑓 2 (𝑥 )
Toy ExampleT=3, weak classifier = decision stump
: :
• t=3
𝛼 1=0.42 𝛼 2=0.66

0.78 + 0.33 - :
0.78 +
𝑓 3 ( 𝑥) 0.78 + 𝛼 3=0.95
1.26- 1.26
-
0.33 +
𝜀 3=0.13
0.33 + 0.33 - 𝑑3 =2.59
1.26 -
𝛼 3=0.95
Toy Example
• Final Classifier:

𝑠𝑖𝑔𝑛¿0.42 + 0.66 + 0.95 ¿

+
+ -
+
+ - -

+ -
-
Warning of Math

( )
𝑇
𝐻 ( 𝑥 ) =𝑠𝑖𝑔𝑛 ∑ 𝛼𝑡 𝑓 𝑡 ( 𝑥 ) 𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡
𝑡 =1

As we have more and more (T increases), achieves smaller

and smaller error rate on training data.
Error Rate of Final
Classifier
• Final classifier:
𝑔(𝑥)

Training Data Error Rate

1
¿ ∑ 𝛿 ( 𝐻 ( 𝑥 ) ≠ ^𝑦 )
𝑛 𝑛
𝑁 𝑛
1
¿ ∑ 𝛿 ( ^𝑦 𝑔 ( 𝑥 ) <0 )
𝑛 𝑛
𝑁 𝑛
1
≤ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔 𝑥 )
( )
𝑛 𝑛
^
𝑁 𝑛
𝑦 𝑔 𝑥 )
^ 𝑛
( 𝑛
𝑇
Training Data Error Rate 𝑔 ( 𝑥 ) = ∑ 𝛼𝑡 𝑓 𝑡 ( 𝑥 )
1 𝑛 ¿ 1 𝑍 𝑡=1
≤ ∑ 𝑒𝑥𝑝 ( − ^𝑦 𝑔 ( 𝑥 ) ) 𝑁 𝑇 +1
𝑛
𝑁 𝑛 𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡

: the summation of the weights of training data for training

𝑍 𝑇 +1=∑ 𝑢
𝑛
What is
𝑇 +1
𝑛
𝑛 𝑇
𝑢 =1
1
𝑢 𝑛
𝑇+1 =∏ 𝑒𝑥𝑝 ( − ^𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡 )
𝑛 𝑛

𝑢𝑡 +1 =𝑢𝑡 ×𝑒𝑥𝑝 ( − ^
𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡 )
𝑛 𝑛 𝑛 𝑛
𝑡=1
𝑇
𝑍 𝑇 +1=∑ ∏ 𝑒𝑥𝑝 ( − ^𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡 )
𝑛 𝑛
𝑔(𝑥)

( )
𝑛 𝑡 =1 𝑇
¿ ∑ 𝑒𝑥𝑝 − ^𝑦 𝑛
∑ 𝑓 𝑡(𝑥 𝑛
) 𝛼𝑡
𝑛 𝑡 =1
𝑇
Training Data Error Rate 𝑔 ( 𝑥 ) = ∑ 𝛼𝑡 𝑓 𝑡 ( 𝑥 )
1 1 𝑡=1
≤ ∑ 𝑒𝑥𝑝 ( − ^𝑦 𝑔 ( 𝑥 ) ) ¿ 𝑁 𝑍 𝑇 +1
𝑛 𝑛
𝑁 𝑛 𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡

𝑍 1 =𝑁 (equal weights)
𝑍 𝑡 = 𝑍 𝑡 − 1 𝜀 𝑡 𝑒𝑥𝑝 ( 𝛼 𝑡 ) + 𝑍 𝑡 − 1 ( 1 − 𝜀 𝑡 ) 𝑒𝑥𝑝 ( −𝛼𝑡 )
Misclassified portion in Correctly classified portion in

¿ 𝑍 𝑡 −1 𝜀 𝑡 √ ( 1− 𝜀 𝑡 ) / 𝜀𝑡 + 𝑍 𝑡 −1 ( 1 − 𝜀𝑡 ) √ 𝜀 𝑡 / ( 1 − 𝜀𝑡 )
𝑇
¿ 𝑍 𝑡 −1 ×2 √ 𝜀 𝑡 ( 1− 𝜀 𝑡 )
𝑍 𝑇 +1=𝑁 ∏ 2 √ 𝜀𝑡 ( 1− 𝜀𝑡 )
𝑇
𝑡=1

Training Data Error Rate ≤ ∏ 2 √𝜖 𝑡 ( 1− 𝜖𝑡 ) Smaller and

𝑡 =1 <1 smaller
End of Warning
Even though the training error is 0,
the testing error still decreases?

𝐻 ( 𝑥)

𝑔(𝑥)
Margin =
𝐻 ( 𝑥)
Large Margin?
𝑔(𝑥)
Training Data Error Rate =

1
¿ ∑ 𝛿 ( 𝐻 ( 𝑥 ) ≠ ^𝑦 ) Adaboost
𝑛 𝑛
𝑁 𝑛
1
≤ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔 𝑥 )
( )
𝑛 𝑛
^
𝑁 𝑛 Logistic
regression
𝑇
¿ ∏ 2 √ 𝜖𝑡 ( 1− 𝜖 𝑡 )
𝑡=1 SVM
Getting smaller and
smaller as T increase 𝑦 𝑛 𝑔 ( 𝑥𝑛 )
^
Experiment:
Function of Miku

Adaboost T = 10 T = 20
+Decision Tree

(depth = 5)

T = 50 T = 100
To learn more …
• Introduction of Adaboost:
• Freund; Schapire (1999). "A Short Introduction to Boosting“
• Multiclass/Regression
• Y. Freund, R. Schapire, “A Decision-Theoretic Generalization of on-Line
Learning and an Application to Boosting”, 1995.
• Robert E. Schapire and Yoram Singer. Improved boosting algorithms using
confidence-rated predictions. In Proceedings of the Eleventh Annual
Conference on Computational Learning Theory, pages 80–91, 1998.
• Gentle Boost
• Schapire, Robert; Singer, Yoram (1999). "Improved Boosting Algorithms
Using Confidence-rated Predictions".
General Formulation of
Boosting
• Initial function
• For t = 1 to T:
• Find a function and to improve

• Output:

What is the learning target of ?

Minimize 𝐿 𝑔 =∑ 𝑙 ( 𝑦 , 𝑔 𝑥 )¿ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔 𝑥 )
( ) ( )
𝑛 𝑛 𝑛 𝑛
( ) ^ ^
𝑛 𝑛
Gradient Boosting
• Find , minimize
• If we already have , how to update ?

Gradient Descent:
𝜕 𝐿(𝑔 )
𝑔 𝑡 ( 𝑥 ) =𝑔 𝑡 −1 ( 𝑥 ) −𝜂
𝜕 𝑔 (𝑥 ) g ( 𝑥 ) =𝑔 𝑡 −1 ( 𝑥 )

− ∑ 𝑒𝑥𝑝 (− ^𝑦 𝑔 𝑡−1 ( 𝑥 ) ) ( − ^𝑦 )
𝑛 𝑛 𝑛
Same direction
𝑛
𝑔 𝑡 ( 𝑥 ) =𝑔 𝑡 −1 ( 𝑥 ) +𝛼 𝑡 𝑓 𝑡 ( 𝑥 )
Gradient Boosting
𝑓 𝑡(𝑥)
Same direction 𝑛
∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔𝑡 ( 𝑥 ) ) ( 𝑦 )
^ 𝑛 𝑛
^ 𝑛

We want to find maximizing

Minimize Error
∑ 𝑒𝑥𝑝 ( − ^𝑦 𝑔𝑡 −1 ( 𝑥 ) ) ( ^𝑦 ) 𝑓 ( 𝑥 )
𝑛 𝑛 𝑛
𝑡
𝑛

𝑛 example weight Same sign

( )
𝑡− 1
𝑔𝑡 − 1 ( 𝑥 )¿) 𝑒𝑥𝑝 − 𝑦 ∑ 𝛼𝑖 𝑓 𝑖 ( 𝑥 )
𝑛 𝑛
𝑢 =𝑒𝑥𝑝 ( − ^
𝑛
𝑡 𝑦
𝑛 𝑛
^
𝑖=1
𝑡 −1
¿ ∏ 𝑒𝑥𝑝 (− ^𝑦 𝛼 𝑖 𝑓 𝑖 ( 𝑥 ) )
𝑛 𝑛 Exactly the weights we obtain
in Adaboost
𝑖=1
Gradient Boosting
• Find , minimize
is something like
𝑔 𝑡 ( 𝑥 ) =𝑔 𝑡 −1 ( 𝑥 ) +𝛼 𝑡 𝑓 𝑡 ( 𝑥 )
learning rate
Find minimzing
𝐿 ( 𝑔 )=∑ 𝑒𝑥𝑝 (− 𝑦 ( 𝑔𝑡 −1 ( 𝑥 ) +𝛼 𝑡 𝑓 𝑡 ( 𝑥 ) ) )
^ 𝑛 Find such
that
𝑛
¿ ∑ 𝑒𝑥𝑝 ( − ^𝑦 𝑔𝑡 −1 ( 𝑥 ) ) 𝑒𝑥𝑝 (− ^𝑦 𝛼 𝑡 𝑓 𝑡 ( 𝑥 ) )
𝑛 𝑛 𝜕 𝐿(𝑔 )
=0
𝑛
𝜕 𝛼𝑡
¿ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔𝑡 −1 ( 𝑥 ) ) 𝑒𝑥𝑝 ( 𝛼 𝑡 )
^ 𝑛 𝑛
𝛼 𝑡=¿
𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀𝑡
𝑦^ 𝑛 ≠ 𝑓 𝑡 ( 𝑥 )
+ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔𝑡− 1 𝑥 ) ) 𝑒𝑥𝑝 ( − 𝛼𝑡 )
^ 𝑛
( 𝑛
Adaboost
𝑦^ 𝑛= 𝑓 𝑡 ( 𝑥 )
!
Cool Demo
• http://arogozhnikov.github.io/2016/07/05/
gradient_boosting_playground.html
Ensemble: Stacking
Voting

小明’ s system y

老王’ s system y
Majority
x
Vote
老李’ s system y

小毛’ s system y
Training Training Val Testing
Stacking Data Data Data Data

小明’ s system y

老王’ s system y
x Final
Classifier
老李’ s system y

小毛’ s system y
as new feature
2017
新年快樂
Happy New Year

Ensemble Classifiers Overview
No ratings yet
Ensemble Classifiers Overview
37 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Ensemble Learning for Data Scientists
No ratings yet
Ensemble Learning for Data Scientists
41 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
Lecture 16: Boosting - Applied ML
No ratings yet
Lecture 16: Boosting - Applied ML
20 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
LECTURE+NOTES Boosting
No ratings yet
LECTURE+NOTES Boosting
8 pages
Ensemble Methods for ML Students
No ratings yet
Ensemble Methods for ML Students
28 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Adaboost
No ratings yet
Adaboost
29 pages
Boosting
No ratings yet
Boosting
13 pages
Machine Learning Boosting Guide
No ratings yet
Machine Learning Boosting Guide
27 pages
Boosting in Machine Learning Explained
No ratings yet
Boosting in Machine Learning Explained
49 pages
Lecture Slide 12
No ratings yet
Lecture Slide 12
22 pages
UNIT1
No ratings yet
UNIT1
80 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
Boosting Algorithms Explained
No ratings yet
Boosting Algorithms Explained
79 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
Ensemble Learning for Data Scientists
No ratings yet
Ensemble Learning for Data Scientists
31 pages
Boosting and AdaBoost For Machine Learning
No ratings yet
Boosting and AdaBoost For Machine Learning
18 pages
16 Boosting
No ratings yet
16 Boosting
7 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
8 Bagging Boosting Annotated
No ratings yet
8 Bagging Boosting Annotated
31 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
54 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
Lecture 10 Boosting
No ratings yet
Lecture 10 Boosting
20 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
Bagging and Boosting: Amit Srinet Dave Snyder
No ratings yet
Bagging and Boosting: Amit Srinet Dave Snyder
33 pages
Boosted Trees
No ratings yet
Boosted Trees
66 pages
Boosting
No ratings yet
Boosting
28 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Adaboost
No ratings yet
Adaboost
22 pages
Ensemble
No ratings yet
Ensemble
33 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Gradient Boosted Decision Trees in Regression
No ratings yet
Gradient Boosted Decision Trees in Regression
33 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Bagging - Boosting
No ratings yet
Bagging - Boosting
9 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Adaboost Tutorial by Derek Hoiem
No ratings yet
Adaboost Tutorial by Derek Hoiem
46 pages
RNN (v2)
No ratings yet
RNN (v2)
89 pages
4 - DL (v2)
No ratings yet
4 - DL (v2)
32 pages
DNN Tip
No ratings yet
DNN Tip
49 pages
Gradient Descent (v2)
No ratings yet
Gradient Descent (v2)
38 pages
Be The Voice of Justice - Start With A Law Degree at VMLS Chennai
No ratings yet
Be The Voice of Justice - Start With A Law Degree at VMLS Chennai
13 pages
From Words To Pictures Artificial Intelligence Based Art Generator
No ratings yet
From Words To Pictures Artificial Intelligence Based Art Generator
9 pages
BRACIS Programação DIA 18/11
No ratings yet
BRACIS Programação DIA 18/11
6 pages
Cambridge Lower Secondary Computing Exec Preview Mini UK (No Request)
100% (3)
Cambridge Lower Secondary Computing Exec Preview Mini UK (No Request)
28 pages
Thesis by Tushar Group 11
No ratings yet
Thesis by Tushar Group 11
43 pages
Getting Started With RPA
No ratings yet
Getting Started With RPA
8 pages
Recent Trends in Computing
No ratings yet
Recent Trends in Computing
5 pages
Digital Supply Chains Key Facilitator To Industry 4.0 and New Business Models, Leveraging S4 HANA and Beyond (Götz G. Wehberg)
No ratings yet
Digital Supply Chains Key Facilitator To Industry 4.0 and New Business Models, Leveraging S4 HANA and Beyond (Götz G. Wehberg)
221 pages
Advances In Data Science And Information Engineering Proceedings From Icdata 2020 And Ike 2020 Transactions On Computational Science And Computational Intelligence 1st Ed 2021 Robert Stahlbock Editor pdf download
No ratings yet
Advances In Data Science And Information Engineering Proceedings From Icdata 2020 And Ike 2020 Transactions On Computational Science And Computational Intelligence 1st Ed 2021 Robert Stahlbock Editor pdf download
85 pages
Assignment For Day 4 - Implementation of AI & ML For Real-World Applications - Challenges and Best Practices
No ratings yet
Assignment For Day 4 - Implementation of AI & ML For Real-World Applications - Challenges and Best Practices
8 pages
Ets Digital Literacy Ai Full Report
No ratings yet
Ets Digital Literacy Ai Full Report
40 pages
B.Tech CSE (AI & ML) Curriculum Overview
No ratings yet
B.Tech CSE (AI & ML) Curriculum Overview
76 pages
Retailer Satisfaction in FMCG Supply Chain
No ratings yet
Retailer Satisfaction in FMCG Supply Chain
64 pages
Bhargavi Karna - SalesforceDev
No ratings yet
Bhargavi Karna - SalesforceDev
2 pages
Trust in AI
No ratings yet
Trust in AI
118 pages
Cyber Defense Magazine - February 2023 PDF
No ratings yet
Cyber Defense Magazine - February 2023 PDF
186 pages
Google LaMDA: AI Chatbot Overview
No ratings yet
Google LaMDA: AI Chatbot Overview
9 pages
Beginners Guide To Artificial Intelligence
No ratings yet
Beginners Guide To Artificial Intelligence
109 pages
Human Qualities vs. AI: Sophia's Rights
No ratings yet
Human Qualities vs. AI: Sophia's Rights
4 pages
AI-Driven Brain Tumor Detection
No ratings yet
AI-Driven Brain Tumor Detection
17 pages
AI Argumentative Essay
No ratings yet
AI Argumentative Essay
2 pages
4waysmachine Learning
No ratings yet
4waysmachine Learning
14 pages
The Closed World
No ratings yet
The Closed World
438 pages
BTech 7
No ratings yet
BTech 7
6 pages
Backpropagation Lecture Notes
No ratings yet
Backpropagation Lecture Notes
31 pages
Karumanchi Navadeep Kumar Flowcv Resume 20250428
No ratings yet
Karumanchi Navadeep Kumar Flowcv Resume 20250428
1 page
Manuel JB Bandalan - Lesson 2.2 Positive Use of ICT
No ratings yet
Manuel JB Bandalan - Lesson 2.2 Positive Use of ICT
7 pages
AI Assessment Ideas for Educators
No ratings yet
AI Assessment Ideas for Educators
51 pages
How To Use Consensus AI Revolutionising Research For You
No ratings yet
How To Use Consensus AI Revolutionising Research For You
8 pages
TSK and Tsukamoto Fuzzy Models
No ratings yet
TSK and Tsukamoto Fuzzy Models
23 pages

Ensemble (v6)

Uploaded by

Ensemble (v6)

Uploaded by

Ensemble

坦 補 DD They should be diverse.

• Aggregate the classifiers (properly)

Large Bias Small Bias

A complex model will If we average all the , is

Set 1 Set 2 Set 3 Set 4

Function Function Function Function

Function Function Function Function

x2 < 0.3 x2 < 0.7 𝑥1

Single Depth = 5 Depth = 10

Boosting (binary classification)

Training based on the new weights

will be learned based on example weights

• Training weak classifier with weights

• is the error rate of with weights

• Non-uniform weight: Smaller error ,

𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡 𝜀𝑡 = 0.1 𝜀𝑡 = 0.4

𝑠𝑖𝑔𝑛¿0.42 + 0.66 + 0.95 ¿

As we have more and more (T increases), achieves smaller

Training Data Error Rate

: the summation of the weights of training data for training

Training Data Error Rate ≤ ∏ 2 √𝜖 𝑡 ( 1− 𝜖𝑡 ) Smaller and

What is the learning target of ?

We want to find maximizing

𝑛 example weight Same sign

You might also like

坦補 DD They should be diverse.