2 Supervised Learning
2 Supervised Learning
Machine Learning
Prof Dr. Hammad Afzal
[email protected]
1
Agenda
• Supervised Learning
• Decision Tree
2
Supervised Learning
Lecture 3
3
Classification Process
4
Classification Applications
• Classification
– Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations.
– New data is classified based on the training set
5
Classification—A Two-Step Process
• Model construction: describing a set of predetermined
classes
• Model usage:
– For classifying future or unknown objects
– Estimate accuracy of the model
▪ The known label of test sample is compared with
the classified result from the model
▪ Accuracy rate is the percentage of test set
samples that are correctly classified by the model
▪ Test set is independent of training set (otherwise
over-fitting)
– If the accuracy is acceptable, use the model to
classify new data 7
Process (1): Model Construction
Classification
Algorithms
Training
Data
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 9
Dividing a Dataset
• We need independent data sets to train, set
parameters, and test performance
10
Model Evaluation and
Seletion
11
Model Evaluation and Selection
• Evaluation metrics: How can we measure accuracy? Other metrics to
consider?
• Use test set of class-labeled tuples instead of training set when assessing
accuracy
12
Classifier Evaluation Metrics: Confusion Matrix
Confusion Matrix:
13
Classifier Evaluation Metrics: Confusion Matrix
• True Positives
– Postive tuples correctly classified as positive.
• True Negatives:
– Negative tuples correctly classified as negative.
• False Positives:
– Negative tuples incorrectly classified as positives.
• False Negatives:
– Positive tuples incorrectly classified as negatives
14
Accuracy/Error Rate
A\P C ¬C
C TP FN P
¬C FP TN N
P’ N’ All
15
Sensitivity and Specificity
◼ Class Imbalance Problem:
◼ One class may be rare, e.g. fraud, or HIV-positive
class
16
Precision and Recall, and F-measures
17
Precision and Recall, and F-measures
18
Classifier Evaluation Metrics: Example
– Precision = 90/230 = 39.13% Recall = 90/300 = 30.00%
19
Holdout Method
• Holdout method
– Given data is randomly partitioned into two independent sets
▪ Training set (e.g., 2/3) for model construction
▪ Test set (e.g., 1/3) for accuracy estimation
20
Cross-Validation Methods
• Cross-validation (k-fold, where k = 10 is most
popular)
21
Cross-Validation Methods
– Leave-one-out: k folds where k = # of tuples, for small sized
data
– *Stratified cross-validation*:
– folds are stratified so that class dist. in each fold is approx. the
same as that in the initial data
22
Decision Tree
23
Decision Tree
24
Decision Tree
25
Apply Model To Test Data
26
Apply Model To Test Data
27
Apply Model To Test Data
28
Apply Model To Test Data
29
Apply Model To Test Data
30
Apply Model To Test Data
31
Example Decision Tree
32
Decision Tree Classification
33
Algorithm for Decision Tree Induction
35
Decision Tree Classification
36
How to Determine the best Split
37
How to Determine the best Split
38
Brief Review of Entropy
39
Attribute Selection Measure: Information Gain
(ID3/C4.5)
Computing the Entropy
Information Gain (ID3/C4.5)
Class P: buys_computer = “yes” 5 4
Infoage ( D) = I (2,3) + I (4,0)
Class N: buys_computer = “no” 14 14
9 9 5 5 5
Info( D) = I (9,5) = − log 2 ( ) − log 2 ( ) =0.940 + I (3,2) = 0.694
14 14 14 14 14
5
I (2,3) means “age <=30” has 5 out
age income student credit_rating buys_computer 14
<=30 high no fair no of 14 samples, with 2 yes’es and
<=30 high no excellent no 3 no’s. Hence
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes Gain(age) = Info( D) − Infoage ( D) = 0.246
>40 low yes excellent no Similarly,
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes Gain(income) = 0.029
>40 medium yes fair yes
<=30 medium yes excellent yes Gain( student) = 0.151
31…40
31…40
medium no excellent
high yes fair
yes
yes
Gain(credit _ rating) = 0.048
>40 medium no excellent no 42
A Decision Tree for “Buys_Computer”
age?
<=30 overcast
30..40 >40
no yes no yes
43 CS490D
Extracting Classification Rules from Trees
44
Gain Ratio (C4.5)
• Information gain measure is biased towards attributes
with a large number of values
• C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)
v | Dj | | Dj |
SplitInfoA ( D) = − log 2 ( )
j =1 |D| |D|
– Gain_Ratio(A) = Gain(A)/SplitInfo(A)
• Ex.
– Information gain:
▪ biased towards multi-valued attributes
– Gain ratio:
▪ tends to prefer unbalanced splits in which one partition is
much smaller than the others
47
Other Attribute Selection Measures
• CHAID: a popular decision tree algorithm, measure based on χ2 test for
independence
• C-SEP: performs better than info. gain and gini index in certain cases
– The best tree as the one that requires the fewest # of bits to both (1)
encode the tree, and (2) encode the exceptions to the tree
49