0% found this document useful (0 votes)

31 views48 pages

2 Supervised Learning

The document discusses supervised learning and classification. It covers classification applications and processes, including model construction and usage. It also discusses evaluating models using metrics like accuracy, precision, recall and F-measures. Evaluation methods like holdout and cross validation are described. Finally, decision trees are discussed as a classification model, including their construction and use for classification.

Uploaded by

Zaeem Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views48 pages

2 Supervised Learning

Uploaded by

Zaeem Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

SE-807/CS-871

Machine Learning
Prof Dr. Hammad Afzal
[email protected]

Data and Text Processing Lab

www.codteem.com

1
Agenda
• Supervised Learning

• Model Selection and Evaluation

• Decision Tree

2
Supervised Learning
Lecture 3

3
Classification Process

4
Classification Applications
• Classification
– Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations.
– New data is classified based on the training set

5
Classification—A Two-Step Process
• Model construction: describing a set of predetermined
classes

– Each tuple/sample is assumed to belong to a

predefined class, as determined by the class label
attribute

– The set of tuples used for model construction is

training set

– The model is represented as classification rules,

decision trees, or mathematical formulae 6
Classification—A Two-Step Process

• Model usage:
– For classifying future or unknown objects
– Estimate accuracy of the model
▪ The known label of test sample is compared with
the classified result from the model
▪ Accuracy rate is the percentage of test set
samples that are correctly classified by the model
▪ Test set is independent of training set (otherwise
over-fitting)
– If the accuracy is acceptable, use the model to
classify new data 7
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
OR years
8 >6
THEN tenured = ‘yes’
Process 2: Model Usage for Prediction
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 9
Dividing a Dataset
• We need independent data sets to train, set
parameters, and test performance

• Thus we will often divide a data set into three

– Training set
– Parameter selection set (Validation Dataset)
– Test set

• These must be independent

• Data set 2 is not always necessary

10
Model Evaluation and
Seletion

11
Model Evaluation and Selection
• Evaluation metrics: How can we measure accuracy? Other metrics to
consider?

• Use test set of class-labeled tuples instead of training set when assessing
accuracy

• Some of the measures are:

– Accuracy – suitable when class tuples are evenly distributed
– Precision - suitable when class tuples are not evenly distributed
– Recall - Sensitivity

12
Classifier Evaluation Metrics: Confusion Matrix
Confusion Matrix:

Actual class\Predicted class Yes No

Yes True Positives (TP) False Negatives (FN)
No False Positives (FP) True Negatives (TN)

Actual class\Predicted class buy_computer = buy_computer = Total

yes no
buy_computer = yes 6954 46 7000
buy_computer = no 412 2588 3000
Total 7366 2634 10000

• Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i

that were labeled by the classifier as class j
• May have extra rows/columns to provide totals

13
Classifier Evaluation Metrics: Confusion Matrix
• True Positives
– Postive tuples correctly classified as positive.

• True Negatives:
– Negative tuples correctly classified as negative.

• False Positives:
– Negative tuples incorrectly classified as positives.

• False Negatives:
– Positive tuples incorrectly classified as negatives

14
Accuracy/Error Rate

A\P C ¬C
C TP FN P
¬C FP TN N
P’ N’ All

• Classifier Accuracy, or recognition rate: percentage of test set

tuples that are correctly classified
Accuracy = (TP + TN)/All

• Error rate: 1 – accuracy, or

Error rate = (FP + FN)/All

15
Sensitivity and Specificity
◼ Class Imbalance Problem:
◼ One class may be rare, e.g. fraud, or HIV-positive

◼ Significant majority of the negative class and minority of the positive

class

◼ Sensitivity: True Positive recognition rate

◼ Sensitivity = TP/P

◼ Specificity: True Negative recognition rate

◼ Specificity = TN/N

16
Precision and Recall, and F-measures

• Precision: exactness – what % of tuples that the classifier

labeled as positive are actually positive

• Recall: completeness – what % of positive tuples did the

classifier label as positive?
• Perfect score is 1.0

17
Precision and Recall, and F-measures

• Inverse relationship between precision & recall

• F measure (F1 or F-score): harmonic mean of precision and
recall,

• Fß: weighted measure of precision and recall

– assigns ß times as much weight to recall as to precision

18
Classifier Evaluation Metrics: Example
– Precision = 90/230 = 39.13% Recall = 90/300 = 30.00%

Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

cancer = yes 90 210 300 30.00 (sensitivity

cancer = no 140 9560 9700 98.56

(specificity)

Total 230 9770 10000 96.40 (accuracy)

19
Holdout Method

• Holdout method
– Given data is randomly partitioned into two independent sets
▪ Training set (e.g., 2/3) for model construction
▪ Test set (e.g., 1/3) for accuracy estimation

– Random sampling: a variation of holdout

▪ Repeat holdout k times, accuracy = avg. of the accuracies
obtained

20
Cross-Validation Methods
• Cross-validation (k-fold, where k = 10 is most
popular)

– Randomly partition the data into k

mutually exclusive subsets, each
approximately equal size.

– Training and testing are performed k

times.

– At i-th iteration, use Di as test set and

others as training set

– Each set is used equal number of times

for training and once for testing.

– Accuracy = Overall correct classifications

in k iterations/total number of tuples.

21
Cross-Validation Methods
– Leave-one-out: k folds where k = # of tuples, for small sized
data

– *Stratified cross-validation*:
– folds are stratified so that class dist. in each fold is approx. the
same as that in the initial data

22
Decision Tree

23
Decision Tree

24
Decision Tree

25
Apply Model To Test Data

26
Apply Model To Test Data

27
Apply Model To Test Data

28
Apply Model To Test Data

29
Apply Model To Test Data

30
Apply Model To Test Data

31
Example Decision Tree

32
Decision Tree Classification

33
Algorithm for Decision Tree Induction

• Basic algorithm (a greedy algorithm)

– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (Works best with Categorical – If values are
continuous, they can be discretized)
– Examples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a statistical measure (e.g.,
information gain)

• Conditions for stopping partitioning

– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning – majority voting is
employed for classifying the leaf
34
Decision Tree Classification

35
Decision Tree Classification

36
How to Determine the best Split

37
How to Determine the best Split

38
Brief Review of Entropy

39
Attribute Selection Measure: Information Gain
(ID3/C4.5)
Computing the Entropy
Information Gain (ID3/C4.5)
 Class P: buys_computer = “yes” 5 4
Infoage ( D) = I (2,3) + I (4,0)
 Class N: buys_computer = “no” 14 14
9 9 5 5 5
Info( D) = I (9,5) = − log 2 ( ) − log 2 ( ) =0.940 + I (3,2) = 0.694
14 14 14 14 14
5
I (2,3) means “age <=30” has 5 out
age income student credit_rating buys_computer 14
<=30 high no fair no of 14 samples, with 2 yes’es and
<=30 high no excellent no 3 no’s. Hence
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes Gain(age) = Info( D) − Infoage ( D) = 0.246
>40 low yes excellent no Similarly,
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes Gain(income) = 0.029
>40 medium yes fair yes
<=30 medium yes excellent yes Gain( student) = 0.151
31…40
31…40
medium no excellent
high yes fair
yes
yes
Gain(credit _ rating) = 0.048
>40 medium no excellent no 42
A Decision Tree for “Buys_Computer”

age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

43 CS490D
Extracting Classification Rules from Trees

• Represent the knowledge in the form of IF-THEN rules

• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction
• The leaf node holds the class prediction
• Rules are easier for humans to understand
• Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”
IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”

44
Gain Ratio (C4.5)
• Information gain measure is biased towards attributes
with a large number of values
• C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)
v | Dj | | Dj |
SplitInfoA ( D) = −  log 2 ( )
j =1 |D| |D|
– Gain_Ratio(A) = Gain(A)/SplitInfo(A)
• Ex.

– Gain_Ratio (income) = 0.029/1.557 = 0.019

• The attribute with the maximum gain ratio is selected as

the splitting attribute 46
Comparing Attribute Selection
Measures
• The three measures, in general, return good results but

– Information gain:
▪ biased towards multi-valued attributes

– Gain ratio:
▪ tends to prefer unbalanced splits in which one partition is
much smaller than the others

47
Other Attribute Selection Measures
• CHAID: a popular decision tree algorithm, measure based on χ2 test for
independence

• C-SEP: performs better than info. gain and gini index in certain cases

• G-statistic: has a close approximation to χ2 distribution

• MDL (Minimal Description Length) principle (i.e., the simplest solution is

preferred):

– The best tree as the one that requires the fewest # of bits to both (1)
encode the tree, and (2) encode the exceptions to the tree

• Multivariate splits (partition based on multiple variable combinations)

• CART: finds multivariate splits based on a linear comb. of attrs.

• Which attribute selection measure is the best?
48
– Most give good results, none is significantly superior than others
Thank You

2 Supervised Learning
No ratings yet
2 Supervised Learning
52 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Week 5
No ratings yet
Week 5
72 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Chp8 Classification Basic Concepts - Lecture#8
No ratings yet
Chp8 Classification Basic Concepts - Lecture#8
40 pages
DM UNIT-3
No ratings yet
DM UNIT-3
23 pages
CH 5
No ratings yet
CH 5
84 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Classification
No ratings yet
Classification
33 pages
7 Classification
100% (3)
7 Classification
63 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
DWM Unit-III
No ratings yet
DWM Unit-III
24 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
ClassificationandPrediction_Module3
No ratings yet
ClassificationandPrediction_Module3
88 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
TE_DWM Module no 3
No ratings yet
TE_DWM Module no 3
48 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
6.Data Mining - Classification Ppt
No ratings yet
6.Data Mining - Classification Ppt
37 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Classification
No ratings yet
Classification
81 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
CST 42315 Dam - L9 1
No ratings yet
CST 42315 Dam - L9 1
15 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
4 Classification
No ratings yet
4 Classification
20 pages
19-Introduction classification algorithm-18-09-2024
No ratings yet
19-Introduction classification algorithm-18-09-2024
102 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
16 pages
Module 3
No ratings yet
Module 3
64 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Unit 4- Classification and Prediction
No ratings yet
Unit 4- Classification and Prediction
72 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
05 Classification
No ratings yet
05 Classification
79 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
4_22865_IS465_2019_1__2_1_08ClassBasic
No ratings yet
4_22865_IS465_2019_1__2_1_08ClassBasic
43 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
Classification
No ratings yet
Classification
73 pages
classification
No ratings yet
classification
36 pages
Unit-3
No ratings yet
Unit-3
98 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Active Learning and Machine Teaching For Online Learning A Study of Attention and Labelling Cost
No ratings yet
Active Learning and Machine Teaching For Online Learning A Study of Attention and Labelling Cost
6 pages
4-Bayesian Theory
No ratings yet
4-Bayesian Theory
65 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
Site Information Table For PERN Feasibility
No ratings yet
Site Information Table For PERN Feasibility
1 page

2 Supervised Learning

Uploaded by

2 Supervised Learning

Uploaded by

SE-807/CS-871

Data and Text Processing Lab

• Model Selection and Evaluation

– Each tuple/sample is assumed to belong to a

– The set of tuples used for model construction is

– The model is represented as classification rules,

NAME RANK YEARS TENURED Classifier

• Thus we will often divide a data set into three

• These must be independent

• Some of the measures are:

Actual class\Predicted class Yes No

Actual class\Predicted class buy_computer = buy_computer = Total

• Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i

• Classifier Accuracy, or recognition rate: percentage of test set

• Error rate: 1 – accuracy, or

◼ Significant majority of the negative class and minority of the positive

◼ Sensitivity: True Positive recognition rate

◼ Specificity: True Negative recognition rate

• Precision: exactness – what % of tuples that the classifier

• Recall: completeness – what % of positive tuples did the

• Inverse relationship between precision & recall

• Fß: weighted measure of precision and recall

Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

cancer = yes 90 210 300 30.00 (sensitivity

cancer = no 140 9560 9700 98.56

Total 230 9770 10000 96.40 (accuracy)

– Random sampling: a variation of holdout

– Randomly partition the data into k

– Training and testing are performed k

– At i-th iteration, use Di as test set and

– Each set is used equal number of times

– Accuracy = Overall correct classifications

• Basic algorithm (a greedy algorithm)

• Conditions for stopping partitioning

student? yes credit rating?

no yes excellent fair

• Represent the knowledge in the form of IF-THEN rules

– Gain_Ratio (income) = 0.029/1.557 = 0.019

• The attribute with the maximum gain ratio is selected as

• G-statistic: has a close approximation to χ2 distribution

• MDL (Minimal Description Length) principle (i.e., the simplest solution is

• Multivariate splits (partition based on multiple variable combinations)

• CART: finds multivariate splits based on a linear comb. of attrs.

You might also like