Chapter4 Machine Learning Part3
Chapter4 Machine Learning Part3
´ Linear Regression
´ Logistic Regression
´ Naïve Bayes Classifier
´ Decision Tree
´ Random Forest
3 Decision Tree
“A” last year? Student ‘A’ last Black Works Drinks? ‘A’ this year?
year? hair? hard?
Richard Yes Yes No Yes No
yes no Alan Yes Yes Yes No Yes
Alison No No Yes No No
Output = No Jeff No Yes No Yes No
Works hard?
Gail Yes No Yes Yes Yes
Simon No Yes Yes Yes No
yes no
´ In other words… always favor the simplest answer that correctly fits
the training data
´ i.e. the smallest tree on average
´ This type of assumption is called inductive bias
´ inductive bias = making a choice beyond what the training
instances contain
11 Finding the Best Tree
empty tree
class F2?
class F3?
class F4?
class F5?
class F6?
class F7?
class class
F1?
F2? F3?
´ Patron:
´ If value is Some… all outputs=Yes
´ If value is None… all outputs=No
´ If value is Full… we need more tests
´ Type:
´ If value is French… we need more tests
´ If value is Italian… we need more tests
´ If value is Thai… we need more tests
´ If value is Burger… we need more tests
´ …
´ So patron may lead to shorter tree…
15 Next Feature
´ 4 tests instead of 9
´ 11 branches instead of 21
17 Choosing the Next Attribute
P(head)
21 Choosing the Best Feature
3 1
H(S | Shape) = (0.918) + (0) = 0.6885
4 4
gain(Shape) = H(S) - H(S | Shape) = 1 - 0.6885 = 0.3115
25 A Small Example 3
Size Color Shape Output
æ2 2 2 2ö
H(S) = -ç log2 + log2 ÷ = 1
è4 4 4 4ø
1 1
H(S | Size) = (1) + (1) = 1
2 2
gain(Size) = H(S) - H(S | Size) = 1 - 1 = 0
26 A Small Example 4
Size Color Shape Output
Big Red Circle +
Small Red Circle +
Small Red Square -
Big Blue Circle -
gain(Shape) = 0.3115
gain(Color ) = 0.3115
gain(Size) = 0
´ So first separate according to either color or shape
(root of the tree)
27 A Small Example 4
Color
Size Color Shape Output
red blue
Big Red Circle +
S2 Size? or
Small Red Circle + -
Shape?
Small Red Square -
Big Blue Circle -
æ2 2 1 1ö
H(S2) = -ç log2 + log2 ÷
è3 3 3 3ø
for each v of Values(Size) for each v of Values(Shape)
æ1 0 ö æ2 0ö
H(S2 | Size = big) = Hç , ÷ = 0 H(S2 | Shape = circle) = Hç , ÷ = 0
è1 1 ø
è2 2 ø
æ1 1ö
H(S2 | Size = small) = Hç , ÷ = 1 æ1 1ö
è2 2ø H(S2 | Shape = square) = Hç , ÷ = 0
è1 1ø
1 2
H(S2 | Size) = (0) + (1) H(S2 | Shape)
3 3
gain(Size) = H(S2) - H(S2 | Color) gain(Shape) = H(S2) - H(S2 | Shape)
28 Back to the Restaurant
´ Training data:
29 The Restaurant Example
gain(alt) = ... gain(bar) = ... gain(fri) = ... gain(hun) = ...
æ2 æ 0 2ö 4 æ0 4ö 6 æ 2 4 öö
gain(pat) = 1 - çç x Hç , ÷ + x Hç , ÷ + x Hç , ÷ ÷÷
è 12 è 2 2 ø 12 è 4 4 ø 12 è 6 6 øø
æ2 æ0 0 2 2ö 4 æ0 0 4 4ö ö
= 1 - çç x - ç log2 + log2 ÷ + x - ç log2 + log2 ÷ + ... ÷÷ » 0.541bits
è 12 è2 2 2 2 ø 12 è4 4 4 4ø ø
gain(price) = ... gain(rain) = ... gain(res) = ...
æ2 æ1 1ö 2 æ1 1ö 4 æ2 2ö 4 æ 2 2 öö
gain(type) = 1 - çç x Hç , ÷ + x Hç , ÷ + x Hç , ÷ + x Hç , ÷ ÷÷ = 0 bits
è 12 è 2 2 ø 12 è 2 2 ø 12 è 4 4 ø 12 è 4 4 øø
gain(est) = ...
Feature 2
31 Decision Boundaries
Feature 1
Feature 2 > t1
??
t1
Feature 2
32 Decision Boundaries
Feature 1
Feature 2 > t1
t2 Feature 1 > t2
t1 Feature 2
??
33 Decision Boundaries
Feature 2 > t1
Feature 1
Feature 1 > t2
t2
t3 t1 Feature 2 > t3
Feature 2
34 Supervised Learning Algorithms
´ Linear Regression
´ Logistic Regression
´ Naïve Bayes Classifier
´ Decision Tree
´ Random Forest
35 Random Forest
´ More Accuracy
´ reduce the variations and the predictions by combining the
result of multiple decision trees on different samples of the
data set.
´ Example: use the following data set to create a Random
Forest that predicts of a person has heart disease or not.
37 Creating a Random Forest
Blocked Ateries
40 Creating a Random Forest
Heart Disease
Yes No
1 0
Output
42 Creating a Random Forest
Abnormal Yes Yes 180 Yes OOB data set (testing data set)