0% found this document useful (0 votes)
19 views45 pages

Ensemble (v6)

The document discusses ensemble methods in machine learning, focusing on techniques like bagging and boosting to improve model performance with minimal modifications. It explains the concepts of bias and variance, the use of diverse classifiers, and the importance of aggregating their outputs. Additionally, it covers specific algorithms such as Random Forest and AdaBoost, detailing their frameworks and mechanisms for enhancing weak classifiers.

Uploaded by

snowxiaoyu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views45 pages

Ensemble (v6)

The document discusses ensemble methods in machine learning, focusing on techniques like bagging and boosting to improve model performance with minimal modifications. It explains the concepts of bias and variance, the use of diverse classifiers, and the importance of aggregating their outputs. Additionally, it covers specific algorithms such as Random Forest and AdaBoost, detailing their frameworks and mechanisms for enhancing weak classifiers.

Uploaded by

snowxiaoyu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Ensemble

Introduction
• We are almost at the end of the semester/final
competition.
• https://inclass.kaggle.com/c/ml2016-cyber-security-
attack-defender/leaderboard
• https://www.kaggle.com/c/outbrain-click-prediction/
leaderboard
• https://www.kaggle.com/c/transfer-learning-on-stack-
exchange-tags/leaderboard
• You already developed some algorithms and codes.
Lazy to modify them.
• Ensemble: improving your machine with little
modification
Framework of Ensemble
• Get a set of classifiers
• , , ……

坦 補 DD They should be diverse.

• Aggregate the classifiers (properly)


• 在打王時每個人都有該站的位置
Ensemble: Bagging
Review: Bias v.s. Variance
Error from bias
Error from variance
Error observed

Underfitting Overfitting

Large Bias Small Bias


Small Variance Large Variance
Universe 1 Universe 2 Universe 3

A complex model will If we average all the , is


have large variance. it close to

We can average
𝐸 𝑓 ]= ^
[ ∗
𝑓
complex models to
reduce variance.
Sampling N’
N training
Bagging examples
examples with
replacement
(usually N=N’)

Set 1 Set 2 Set 3 Set 4

Function Function Function Function


1 2 3 4
This approach would be helpful when
Baggingyour model is complex, easy to overfit.
e.g. decision tree

Testing data x

Function Function Function Function


1 2 3 4

y1 y2 y3 y4

Average/voting
Assume each object x is
Decision Tree represented by a 2-dim vector
𝑥2

x1 < 0.5
𝑥 2=0.7
yes no 𝑥 2=0.3

x2 < 0.3 x2 < 0.7 𝑥1


𝑥1 =0.5
yes no yes no The questions in training
Class 1 Class 2 Class 2 Class 1 …..
number of
branches,
Can have more complex questions Branching
criteria,
Experiment: Function of
Miku

http://speech.ee.ntu.edu.tw/~tlkagk/
courses/MLDS_2015_2/theano/miku
(1st column: x, 2nd column: y, 3rd column: output (1 or 0) )
Experiment:
Function of Miku

Single Depth = 5 Depth = 10


Decision
Tree

Depth = 15 Depth = 20
train f1 f2 f3 f4
x1 O X O X
Random Forest x2 O X X O
x3 X O O X
• Decision tree: x4 X O X O
• Easy to achieve 0% error rate on training data
• If each training example has its own leaf ……
• Random forest: Bagging of decision tree
• Resampling training data is not sufficient
• Randomly restrict the features/questions used in each
split
• Out-of-bag validation for bagging
• Using RF = f2+f4 to test x1
Out-of-bag (OOB) error
• Using RF = f2+f3 to test x2
• Using RF = f1+f4 to test x3 Good error estimation
of testing set
• Using RF = f1+f3 to test x 4
Experiment:
Function of Miku

Random
Forest Depth = 5 Depth = 10

(100 trees)

Depth = 15 Depth = 20
Ensemble: Boosting
Improving Weak Classifiers
Training data:

Boosting (binary classification)


• Guarantee:
• If your ML algorithm can produce classifier with error
rate smaller than 50% on training data
• You can obtain 0% error rate classifier after boosting.
• Framework of boosting
• Obtain the first classifier
• Find another function to help
• However, if is similar to , it will not help a lot.
• We want to be complementary with (How?)
• Obtain the second classifier
• …… Finally, combining all the classifiers
• The classifiers are learned sequentially.
How to obtain different
classifiers?
• Training on different training data sets
• How to have different training data sets
• Re-sampling your training data to form a new set
• Re-weighting your training data to form a new set
• In real implementation, you only have to change the
cost/objective function

(𝑥 , ^
1
𝑦 ,𝑢 ) 𝑢 =1
1 1 1
0.4 𝐿 ( 𝑓 ) =∑ 𝑙 ( 𝑓 ( 𝑥 ) , 𝑦 )
𝑛
^ 𝑛

𝑛
2
( 𝑥2 , ^
𝑦 2 ,𝑢 2 ) 𝑢 =1 2.1
𝐿 𝑓 =∑ 𝑢 𝑙 ( 𝑓 𝑥 , 𝑦 )
( )
𝑛 𝑛 𝑛
( ) ^
( 𝑥3 , ^
𝑦 3 , 𝑢3 ) 𝑢3 =1 0.7 𝑛
Idea of Adaboost
• Idea: training on the new training set that fails
• How to find a new training set that fails ?
: the error rate of on its training data
∑ 𝑢𝑛1 𝛿 ( 𝑓 1 ( 𝑥 𝑛 ) ≠ ^
𝑦 )
𝑍 1 =∑ 𝑢
𝑛
𝑛
𝜀1 =
𝑛
1 𝜀1 < 0.5
𝑍1
𝑛
Changing the example weights from to such that

∑ 𝑢𝑛2 𝛿 ( 𝑓 1 ( 𝑥 𝑛 ) ≠ ^
𝑦𝑛 ) The performance of for new
𝑛
= 0.5 weights would be random.
𝑍2

Training based on the new weights


Re-weighting Training Data
• Idea: training on the new training set that fails
• How to find a new training set that fails ?

𝑢 =1/ √ 3
1
( 𝑥1, ^
𝑦 1 , 𝑢1 ) 𝑢 = 1
1

𝑢 =√ 3
2 2
𝑦 2 ,𝑢 2 ) 𝑢 =1
(𝑥 , ^
2

𝑢 =1/ √ 3
3 3
3
𝑦 , 𝑢 ) 𝑢 =1
(𝑥 , ^3 3

𝑢 =1 / √ 3
4
4
(𝑥 , ^
4
𝑦 ,𝑢 ) 𝑢 =1
4 4

0.5
0.25 0.5
𝑓 1 (𝑥 ) 𝑓 2 (𝑥 )
Re-weighting Training Data
• Idea: training on the new training set that fails
• How to find a new training set that fails ?

If misclassified by ()
multiplying increase
If correctly classified by ()
devided by decrease

will be learned based on example weights


What is the value of ?
Re-weighting Training Data
𝑍 1 =∑ 𝑢
∑ 𝑢𝑛1 𝛿 ( 𝑓 1 ( 𝑥 𝑛 ) ≠ ^
𝑦 )
𝑛 𝑛
𝜀1 =
𝑛 1
𝑍1 𝑛

∑ 𝑢 𝛿( 𝑓 1( 𝑥 ) ≠ ^
𝑛
2 𝑦 )
𝑛 𝑛
( 𝑛
𝑓1 𝑥 ≠ 𝑦 ) ^ 𝑛
multiplying
𝑛
= 0.5
𝑍2 𝑓 1 ( 𝑥𝑛 )= ^
𝑦𝑛 devided by

¿ ∑ 𝑢 𝑑1
𝑛
1 ¿ ∑ 𝑛
𝑢 +
2 ∑ 𝑢
𝑛
2
𝑓 1 ( 𝑥 ) ≠ ^𝑦
𝑛 𝑛
𝑓 1(𝑥 )≠ ^ 𝑓 1 ( 𝑥 )= ^
𝑛 𝑛 𝑛 𝑛
𝑦 𝑦

¿∑ 𝑢
𝑛

𝑛
2 ¿ ∑ 𝑛
𝑢 𝑑 1+
1 ∑ 𝑛
𝑢 /𝑑 1
1
𝑓 1 ( 𝑥 ) ≠ ^𝑦 𝑓 1 ( 𝑥 ) = ^𝑦
𝑛 𝑛 𝑛 𝑛

∑ 𝑢1 𝑑1
𝑛

𝑓 ( 𝑥𝑛 ) ≠ 𝑦
^𝑛
2
= 0.5
1

∑ 𝑛
𝑢 𝑑1 +
1 ∑ 𝑛
𝑢 / 𝑑1
1
𝑓 1
(𝑥 )≠ ^
𝑛
𝑦 𝑛
𝑓 1
( 𝑥 )= ^
𝑛
𝑦 𝑛
Re-weighting Training Data
𝑍 1 =∑ 𝑢
∑ 𝑢𝑛1 𝛿 ( 𝑓 1 ( 𝑥 𝑛 ) ≠ ^
𝑦 )
𝑛 𝑛
𝜀1 =
𝑛 1
𝑍1 𝑛

∑ 𝑢 𝛿( 𝑓 1( 𝑥 ) ≠ ^
𝑛
2 𝑦 )
𝑛
𝑛
( 𝑛
𝑓1 𝑥 ≠ 𝑦 ) ^ 𝑛
multiplying
𝑛
= 0.5
𝑍2 𝑓 1 ( 𝑥𝑛 )= ^
𝑦𝑛 devided by
∑ 𝑛
𝑢 1 / 𝑑1
𝑓 ( 𝑥 𝑛) = ^
𝑦
𝑛
1
=1
∑ 𝑛
𝑢1 𝑑 1
𝑓 1
( 𝑥 𝑛) ≠ 𝑦
^𝑛

∑ 𝑛
𝑢 /𝑑 1=
1 ∑ 𝑢 𝑑1 1
𝑛
1
𝑑1 𝑓
∑ 𝑛
𝑢1 =𝑑 1 ∑ 𝑛
𝑢1
𝑓 1 ( 𝑥 )= ^𝑦 𝑓 1 ( 𝑥 ) ≠ ^𝑦
𝑛 𝑛 𝑛 𝑛
1
( 𝑥 𝑛 )= ^𝑦 𝑛 𝑓 1(𝑥 )≠ ^
𝑛
𝑦
𝑛

∑ 𝑢𝑛 𝑍 1 ( 1 − 𝜀1 ) 𝑍 1 𝜀1
1
𝑓 1(𝑥 )≠ ^
𝑛 𝑛
𝑦
𝜀1 =
𝑍 1 ( 1 − 𝜀 1 ) / 𝑑 1= 𝑍 1 𝜀 1 𝑑1

𝑍1 𝑛
𝑢 =𝑍 1 𝜀1
1
𝑑1= √ ( 1− 𝜀 1 ) / 𝜀 1 ¿ 1
𝑓 1 ( 𝑥 ) ≠ ^𝑦
𝑛 𝑛
Algorithm for AdaBoost
• Giving training data
• (Binary classification), (equal weights)

• Training weak classifier with weights


• For t = 1, …, T:

• is the error rate of with weights


• For n = 1, …, N:
• If is misclassified by :
𝑦 𝑛≠ 𝑓 𝑡 ( 𝑥𝑛)
^
𝑛
• Else: ¿ 𝑢𝑡 × 𝑒 xp ( 𝛼 𝑡 ) 𝑑𝑡 = √ ( 1− 𝜀 𝑡 ) / 𝜀 𝑡

𝑛 𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡
¿ 𝑢 × 𝑒 xp ( − 𝛼𝑡 )
𝑡

𝑢𝑡 +1 ←𝑢 𝑡 × 𝑒𝑥𝑝 ( − ^
𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡)
𝑛 𝑛 𝑛 𝑛
Algorithm for AdaBoost
• We obtain a set of functions: ,
• How to aggregate them?
• Uniform weight:

• Non-uniform weight: Smaller error ,


larger weight for
final voting

𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡 𝜀𝑡 = 0.1 𝜀𝑡 = 0.4


𝑢𝑡 +1 =𝑢𝑡 ×𝑒𝑥𝑝 ( − ^
𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡 )
𝑛 𝑛 𝑛 𝑛
1.10 0.20
Toy ExampleT=3, weak classifier = decision stump
• t=1

1.0 +
1.0 +
1.0 - 1.53 +
+
1.53
0.65 -
1.0 + 1.53 +
1.0 + - - 0.65 + - -
1.0 1.0 0.65 0.65
𝜀1 =0.30
1.0 + 1.0 - 0.65 + 0.65 -
1.0 - 𝑑1=1.53 0.65 -
𝛼 1=0.42
𝑓 1 (𝑥 )
Toy ExampleT=3, weak classifier = decision stump
:
• t=2
𝛼 1=0.42

1.53 +
+
1.53
0.65 -
0.78 +
0.78 + 0.33 -
1.53 + 0.78 +

1.26- 1.26
-
- 0.33 +
0.65 + -
0.65 0.65
𝜀 2=0.21
0.33 + 0.33 -
0.65 + 0.65 -
0.65 - 𝑑2 =1.94 1.26 -
𝛼 2=0.66
𝑓 2 (𝑥 )
Toy ExampleT=3, weak classifier = decision stump
: :
• t=3
𝛼 1=0.42 𝛼 2=0.66

0.78 + 0.33 - :
0.78 +
𝑓 3 ( 𝑥) 0.78 + 𝛼 3=0.95
1.26- 1.26
-
0.33 +
𝜀 3=0.13
0.33 + 0.33 - 𝑑3 =2.59
1.26 -
𝛼 3=0.95
Toy Example
• Final Classifier:

𝑠𝑖𝑔𝑛¿0.42 + 0.66 + 0.95 ¿

+
+ -
+
+ - -

+ -
-
Warning of Math

( )
𝑇
𝐻 ( 𝑥 ) =𝑠𝑖𝑔𝑛 ∑ 𝛼𝑡 𝑓 𝑡 ( 𝑥 ) 𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡
𝑡 =1

As we have more and more (T increases), achieves smaller


and smaller error rate on training data.
Error Rate of Final
Classifier
• Final classifier:
𝑔(𝑥)

Training Data Error Rate


1
¿ ∑ 𝛿 ( 𝐻 ( 𝑥 ) ≠ ^𝑦 )
𝑛 𝑛
𝑁 𝑛
1
¿ ∑ 𝛿 ( ^𝑦 𝑔 ( 𝑥 ) <0 )
𝑛 𝑛
𝑁 𝑛
1
≤ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔 𝑥 )
( )
𝑛 𝑛
^
𝑁 𝑛
𝑦 𝑔 𝑥 )
^ 𝑛
( 𝑛
𝑇
Training Data Error Rate 𝑔 ( 𝑥 ) = ∑ 𝛼𝑡 𝑓 𝑡 ( 𝑥 )
1 𝑛 ¿ 1 𝑍 𝑡=1
≤ ∑ 𝑒𝑥𝑝 ( − ^𝑦 𝑔 ( 𝑥 ) ) 𝑁 𝑇 +1
𝑛
𝑁 𝑛 𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡

: the summation of the weights of training data for training


𝑍 𝑇 +1=∑ 𝑢
𝑛
What is
𝑇 +1
𝑛
𝑛 𝑇
𝑢 =1
1
𝑢 𝑛
𝑇+1 =∏ 𝑒𝑥𝑝 ( − ^𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡 )
𝑛 𝑛

𝑢𝑡 +1 =𝑢𝑡 ×𝑒𝑥𝑝 ( − ^
𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡 )
𝑛 𝑛 𝑛 𝑛
𝑡=1
𝑇
𝑍 𝑇 +1=∑ ∏ 𝑒𝑥𝑝 ( − ^𝑦 𝑓 𝑡 ( 𝑥 ) 𝛼𝑡 )
𝑛 𝑛
𝑔(𝑥)

( )
𝑛 𝑡 =1 𝑇
¿ ∑ 𝑒𝑥𝑝 − ^𝑦 𝑛
∑ 𝑓 𝑡(𝑥 𝑛
) 𝛼𝑡
𝑛 𝑡 =1
𝑇
Training Data Error Rate 𝑔 ( 𝑥 ) = ∑ 𝛼𝑡 𝑓 𝑡 ( 𝑥 )
1 1 𝑡=1
≤ ∑ 𝑒𝑥𝑝 ( − ^𝑦 𝑔 ( 𝑥 ) ) ¿ 𝑁 𝑍 𝑇 +1
𝑛 𝑛
𝑁 𝑛 𝛼 𝑡=𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀 𝑡

𝑍 1 =𝑁 (equal weights)
𝑍 𝑡 = 𝑍 𝑡 − 1 𝜀 𝑡 𝑒𝑥𝑝 ( 𝛼 𝑡 ) + 𝑍 𝑡 − 1 ( 1 − 𝜀 𝑡 ) 𝑒𝑥𝑝 ( −𝛼𝑡 )
Misclassified portion in Correctly classified portion in

¿ 𝑍 𝑡 −1 𝜀 𝑡 √ ( 1− 𝜀 𝑡 ) / 𝜀𝑡 + 𝑍 𝑡 −1 ( 1 − 𝜀𝑡 ) √ 𝜀 𝑡 / ( 1 − 𝜀𝑡 )
𝑇
¿ 𝑍 𝑡 −1 ×2 √ 𝜀 𝑡 ( 1− 𝜀 𝑡 )
𝑍 𝑇 +1=𝑁 ∏ 2 √ 𝜀𝑡 ( 1− 𝜀𝑡 )
𝑇
𝑡=1

Training Data Error Rate ≤ ∏ 2 √𝜖 𝑡 ( 1− 𝜖𝑡 ) Smaller and


𝑡 =1 <1 smaller
End of Warning
Even though the training error is 0,
the testing error still decreases?

𝐻 ( 𝑥)

𝑔(𝑥)
Margin =
𝐻 ( 𝑥)
Large Margin?
𝑔(𝑥)
Training Data Error Rate =

1
¿ ∑ 𝛿 ( 𝐻 ( 𝑥 ) ≠ ^𝑦 ) Adaboost
𝑛 𝑛
𝑁 𝑛
1
≤ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔 𝑥 )
( )
𝑛 𝑛
^
𝑁 𝑛 Logistic
regression
𝑇
¿ ∏ 2 √ 𝜖𝑡 ( 1− 𝜖 𝑡 )
𝑡=1 SVM
Getting smaller and
smaller as T increase 𝑦 𝑛 𝑔 ( 𝑥𝑛 )
^
Experiment:
Function of Miku

Adaboost T = 10 T = 20
+Decision Tree

(depth = 5)

T = 50 T = 100
To learn more …
• Introduction of Adaboost:
• Freund; Schapire (1999). "A Short Introduction to Boosting“
• Multiclass/Regression
• Y. Freund, R. Schapire, “A Decision-Theoretic Generalization of on-Line
Learning and an Application to Boosting”, 1995.
• Robert E. Schapire and Yoram Singer. Improved boosting algorithms using
confidence-rated predictions. In Proceedings of the Eleventh Annual
Conference on Computational Learning Theory, pages 80–91, 1998.
• Gentle Boost
• Schapire, Robert; Singer, Yoram (1999). "Improved Boosting Algorithms
Using Confidence-rated Predictions".
General Formulation of
Boosting
• Initial function
• For t = 1 to T:
• Find a function and to improve

• Output:

What is the learning target of ?

Minimize 𝐿 𝑔 =∑ 𝑙 ( 𝑦 , 𝑔 𝑥 )¿ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔 𝑥 )
( ) ( )
𝑛 𝑛 𝑛 𝑛
( ) ^ ^
𝑛 𝑛
Gradient Boosting
• Find , minimize
• If we already have , how to update ?

Gradient Descent:
𝜕 𝐿(𝑔 )
𝑔 𝑡 ( 𝑥 ) =𝑔 𝑡 −1 ( 𝑥 ) −𝜂
𝜕 𝑔 (𝑥 ) g ( 𝑥 ) =𝑔 𝑡 −1 ( 𝑥 )

− ∑ 𝑒𝑥𝑝 (− ^𝑦 𝑔 𝑡−1 ( 𝑥 ) ) ( − ^𝑦 )
𝑛 𝑛 𝑛
Same direction
𝑛
𝑔 𝑡 ( 𝑥 ) =𝑔 𝑡 −1 ( 𝑥 ) +𝛼 𝑡 𝑓 𝑡 ( 𝑥 )
Gradient Boosting
𝑓 𝑡(𝑥)
Same direction 𝑛
∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔𝑡 ( 𝑥 ) ) ( 𝑦 )
^ 𝑛 𝑛
^ 𝑛

We want to find maximizing


Minimize Error
∑ 𝑒𝑥𝑝 ( − ^𝑦 𝑔𝑡 −1 ( 𝑥 ) ) ( ^𝑦 ) 𝑓 ( 𝑥 )
𝑛 𝑛 𝑛
𝑡
𝑛

𝑛 example weight Same sign

( )
𝑡− 1
𝑔𝑡 − 1 ( 𝑥 )¿) 𝑒𝑥𝑝 − 𝑦 ∑ 𝛼𝑖 𝑓 𝑖 ( 𝑥 )
𝑛 𝑛
𝑢 =𝑒𝑥𝑝 ( − ^
𝑛
𝑡 𝑦
𝑛 𝑛
^
𝑖=1
𝑡 −1
¿ ∏ 𝑒𝑥𝑝 (− ^𝑦 𝛼 𝑖 𝑓 𝑖 ( 𝑥 ) )
𝑛 𝑛 Exactly the weights we obtain
in Adaboost
𝑖=1
Gradient Boosting
• Find , minimize
is something like
𝑔 𝑡 ( 𝑥 ) =𝑔 𝑡 −1 ( 𝑥 ) +𝛼 𝑡 𝑓 𝑡 ( 𝑥 )
learning rate
Find minimzing
𝐿 ( 𝑔 )=∑ 𝑒𝑥𝑝 (− 𝑦 ( 𝑔𝑡 −1 ( 𝑥 ) +𝛼 𝑡 𝑓 𝑡 ( 𝑥 ) ) )
^ 𝑛 Find such
that
𝑛
¿ ∑ 𝑒𝑥𝑝 ( − ^𝑦 𝑔𝑡 −1 ( 𝑥 ) ) 𝑒𝑥𝑝 (− ^𝑦 𝛼 𝑡 𝑓 𝑡 ( 𝑥 ) )
𝑛 𝑛 𝜕 𝐿(𝑔 )
=0
𝑛
𝜕 𝛼𝑡
¿ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔𝑡 −1 ( 𝑥 ) ) 𝑒𝑥𝑝 ( 𝛼 𝑡 )
^ 𝑛 𝑛
𝛼 𝑡=¿
𝑙𝑛 √ ( 1 − 𝜀𝑡 ) / 𝜀𝑡
𝑦^ 𝑛 ≠ 𝑓 𝑡 ( 𝑥 )
+ ∑ 𝑒𝑥𝑝 ( − 𝑦 𝑔𝑡− 1 𝑥 ) ) 𝑒𝑥𝑝 ( − 𝛼𝑡 )
^ 𝑛
( 𝑛
Adaboost
𝑦^ 𝑛= 𝑓 𝑡 ( 𝑥 )
!
Cool Demo
• http://arogozhnikov.github.io/2016/07/05/
gradient_boosting_playground.html
Ensemble: Stacking
Voting

小明’ s system y

老王’ s system y
Majority
x
Vote
老李’ s system y

小毛’ s system y
Training Training Val Testing
Stacking Data Data Data Data

小明’ s system y

老王’ s system y
x Final
Classifier
老李’ s system y

小毛’ s system y
as new feature
2017
新年快樂
Happy New Year

You might also like