0% found this document useful (0 votes)

376 views52 pages

2 Supervised Learning

The document discusses machine learning topics including supervised learning, model selection and evaluation, and decision trees. It provides an overview of classification processes, including constructing classification models from training data and using the models to classify new data. Various evaluation metrics for classification models are also described such as accuracy, precision, recall, and cross-validation methods.

Uploaded by

Ahlam Azam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

376 views52 pages

2 Supervised Learning

Uploaded by

Ahlam Azam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

CS-471

Machine Learning
Dr. Hammad Afzal
[email protected]

Prof (NUST)
Data and Text Processing Lab
www.codteem.com

1
Agenda
• Supervised Learning

• Model Selection and Evaluation

• Decision Tree

2
Supervised Learning
Lecture 3

3
Classification Process

4
Classification Applications
• Classification
– Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations.
– New data is classified based on the training set

5
Classification—A Two-Step Process
• Model construction: describing a set of predetermined
classes

– Each tuple/sample is assumed to belong to a

predefined class, as determined by the class label
attribute

– The set of tuples used for model construction is

training set

– The model is represented as classification rules,

decision trees, or mathematical formulae 6
Classification—A Two-Step Process
• Model usage:
– For classifying future or unknown objects
– Estimate accuracy of the model
▪ The known label of test sample is compared with
the classified result from the model
▪ Accuracy rate is the percentage of test set
samples that are correctly classified by the model
▪ Test set is independent of training set (otherwise
over-fitting)
– If the accuracy is acceptable, use the model to
classify new data 7
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
OR years
8 >6
THEN tenured = ‘yes’
Process 2: Model Usage for Prediction
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 9
Dividing a Dataset
• We need independent data sets to train, set
parameters, and test performance

• Thus we will often divide a data set into three

– Training set
– Parameter selection set (Validation Dataset)
– Test set

• These must be independent

• Data set 2 is not always necessary

10
Model Evaluation and
Seletion

11
Model Evaluation and Selection
• Evaluation metrics: How can we measure accuracy? Other
metrics to consider?

• Use test set of class-labeled tuples instead of training set

when assessing accuracy

• Some of the measures are:

– Accuracy – suitable when class tuples are evenly distributed
– Precision - suitable when class tuples are not evenly distributed
– Recall - Sensitivity

12
Classifier Evaluation Metrics: Confusion Matrix
Confusion Matrix:

Actual class\Predicted class Yes No

Yes True Positives (TP) False Negatives (FN)
No False Positives (FP) True Negatives (TN)

Actual class\Predicted class C1 ¬ C1

C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

• Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i

that were labeled by the classifier as class j
• May have extra rows/columns to provide totals

13
Classifier Evaluation Metrics: Confusion Matrix

• True Positives
– Postive tuples correctly classified as positive.

• True Negatives:
– Negative tuples correctly classified as negative.

• False Positives:
– Negative tuples incorrectly classified as positives.

• False Negatives:
– Positive tuples incorrectly classified as negatives
14
Classifier Evaluation Metrics: Confusion Matrix
Confusion Matrix:
Actual class\Predicted class C1 ¬ C1
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

Example of Confusion Matrix:

Actual class\Predicted class buy_computer = yes buy_computer = no Total

buy_computer = yes 6954 46 7000

buy_computer = no 412 2588 3000
Total 7366 2634 10000

• Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i

that were labeled by the classifier as class j
• May have extra rows/columns to provide totals
15
Accuracy/Error Rate

A\P C ¬C
C TP FN P
¬C FP TN N
P’ N’ All

• Classifier Accuracy, or recognition rate: percentage

of test set tuples that are correctly classified
Accuracy = (TP + TN)/All

• Error rate: 1 – accuracy, or

Error rate = (FP + FN)/All

16
Sensitivity and Specificity
◼ Class Imbalance Problem:
◼ One class may be rare, e.g. fraud, or HIV-positive

◼ Significant majority of the negative class and minority of the positive

class

◼ Sensitivity: True Positive recognition rate

◼ Sensitivity = TP/P

◼ Specificity: True Negative recognition rate

◼ Specificity = TN/N

17
Precision and Recall, and F-measures

• Precision: exactness – what % of tuples that the

classifier labeled as positive are actually positive

• Recall: completeness – what % of positive tuples did the

classifier label as positive?
• Perfect score is 1.0

18
Precision and Recall, and F-measures

• Inverse relationship between precision & recall

• F measure (F1 or F-score): harmonic mean of precision and
recall,

• Fß: weighted measure of precision and recall

– assigns ß times as much weight to recall as to precision

19
Classifier Evaluation Metrics: Example
– Precision = 90/230 = 39.13% Recall = 90/300 = 30.00%

Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

cancer = yes 90 210 300 30.00 (sensitivity

cancer = no 140 9560 9700 98.56

(specificity)

Total 230 9770 10000 96.40 (accuracy)

20
Holdout Method

• Holdout method
– Given data is randomly partitioned into two independent sets
▪ Training set (e.g., 2/3) for model construction
▪ Test set (e.g., 1/3) for accuracy estimation

– Random sampling: a variation of holdout

▪ Repeat holdout k times, accuracy = avg. of the accuracies
obtained

21
Cross-Validation Methods
• Cross-validation (k-fold, where k = 10 is most popular)

– Randomly partition the data into k mutually exclusive subsets,

each approximately equal size.

– Training and testing are performed k times.

– At i-th iteration, use Di as test set and others as training set

– Each set is used equal number of times for training and once
for testing.

– Accuracy = Overall correct classifications in k iterations/total

number of tuples.

22
5-Fold Cross-Validation
Cross-Validation Methods
– Leave-one-out: k folds where k = # of tuples, for small sized
data

– *Stratified cross-validation*:
– folds are stratified so that class dist. in each fold is approx. the
same as that in the initial data

24
Decision Tree

25
Decision Tree

26
Decision Tree

27
Apply Model To Test Data

28
Apply Model To Test Data

29
Apply Model To Test Data

30
Apply Model To Test Data

31
Apply Model To Test Data

32
Apply Model To Test Data

33
Apply Model To Test Data

34
Example Decision Tree

35
Decision Tree Classification

36
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner

– At start, all the training examples are at the root

– Attributes are categorical (Works best with Categorical – If values are

continuous, they can be discretized)

– Examples are partitioned recursively based on selected attributes

– Test attributes are selected on the basis of a statistical measure (e.g.,

information gain)

37
Algorithm for Decision Tree Induction

• Conditions for stopping partitioning

– All samples for a given node belong to the

same class

– There are no remaining attributes for further

partitioning – majority voting is employed for
classifying the leaf

38
Decision Tree Classification

39
Decision Tree Classification

40
How to Determine the best Split

41
How to Determine the best Split

42
Brief Review of Entropy

43
Attribute Selection Measure: Information Gain
(ID3/C4.5)
Computing the Entropy
Information Gain (ID3/C4.5)
 Class P: buys_computer = “yes” 5 4
Infoage ( D) = I (2,3) + I (4,0)
 Class N: buys_computer = “no” 14 14
9 9 5 5 5
Info( D) = I (9,5) = − log2 ( ) − log2 ( ) =0.940 + I (3,2) = 0.694
14 14 14 14 14
5
I (2,3) means “age <=30” has 5 out
age income student credit_rating buys_computer 14
<=30 high no fair no of 14 samples, with 2 yes’es and
<=30 high no excellent no 3 no’s. Hence
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes Gain(age) = Info( D) − Infoage ( D) = 0.246
>40 low yes excellent no Similarly,
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes Gain(income) = 0.029
>40 medium yes fair yes
<=30 medium yes excellent yes Gain( student ) = 0.151
31…40
31…40
medium no excellent
high yes fair
yes
yes
Gain(credit _ rating ) = 0.048
>40 medium no excellent no 46
A Decision Tree for “Buys_Computer”

age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

47 CS490D
Extracting Classification Rules from Trees

• Represent the knowledge in the form of IF-THEN rules

• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction
• The leaf node holds the class prediction
• Rules are easier for humans to understand
• Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”
IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”

48
Gain Ratio (C4.5)
• Information gain measure is biased towards attributes
with a large number of values
• C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)
v | Dj | | Dj |
SplitInfoA ( D) = −  log2 ( )
j =1 |D| |D|
– Gain_Ratio(A) = Gain(A)/SplitInfo(A)
• Ex.

– Gain_Ratio (income) = 0.029/1.557 = 0.019

• The attribute with the maximum gain ratio is selected as

the splitting attribute 50
Comparing Attribute Selection
Measures
• The three measures, in general, return good results but

– Information gain:
▪ biased towards multi-valued attributes

– Gain ratio:
▪ tends to prefer unbalanced splits in which one partition is
much smaller than the others

51
Other Attribute Selection
Measures
• CHAID: a popular decision tree algorithm, measure based on χ2 test for
independence

• C-SEP: performs better than info. gain and gini index in certain cases

• G-statistic: has a close approximation to χ2 distribution

• MDL (Minimal Description Length) principle (i.e., the simplest solution is

preferred):

– The best tree as the one that requires the fewest # of bits to both (1)
encode the tree, and (2) encode the exceptions to the tree

• Multivariate splits (partition based on multiple variable combinations)

• CART: finds multivariate splits based on a linear comb. of attrs.

• Which attribute selection measure is the best?
52
– Most give good results, none is significantly superior than others
Thank You

Artificial Intelligence - Online Workshop 53

Year 9 Wwi
No ratings yet
Year 9 Wwi
28 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
48 pages
Week 5
No ratings yet
Week 5
72 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
TE_DWM Module no 3
No ratings yet
TE_DWM Module no 3
48 pages
CH 5
No ratings yet
CH 5
84 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Chp8 Classification Basic Concepts - Lecture#8
No ratings yet
Chp8 Classification Basic Concepts - Lecture#8
40 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
DM UNIT-3
No ratings yet
DM UNIT-3
23 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
CST 42315 Dam - L9 1
No ratings yet
CST 42315 Dam - L9 1
15 pages
DWM Unit-III
No ratings yet
DWM Unit-III
24 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
8c - Model Evaluation and Selection
No ratings yet
8c - Model Evaluation and Selection
15 pages
7 Classification
100% (3)
7 Classification
63 pages
Classification
No ratings yet
Classification
81 pages
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
No ratings yet
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
15 pages
Classification
No ratings yet
Classification
33 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
ClassificationandPrediction_Module3
No ratings yet
ClassificationandPrediction_Module3
88 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Module 3
No ratings yet
Module 3
64 pages
6.Data Mining - Classification Ppt
No ratings yet
6.Data Mining - Classification Ppt
37 pages
Classification Part 1
No ratings yet
Classification Part 1
76 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
ml unit 3
No ratings yet
ml unit 3
13 pages
08 Classification
No ratings yet
08 Classification
26 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Unit 4- Classification and Prediction
No ratings yet
Unit 4- Classification and Prediction
72 pages
Unit 4
No ratings yet
Unit 4
78 pages
DMDM Part 2
No ratings yet
DMDM Part 2
94 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
4. Classification
No ratings yet
4. Classification
75 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
DWM_Module 3 (1)
No ratings yet
DWM_Module 3 (1)
22 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Adopting Building Information Modeling BIM For The
No ratings yet
Adopting Building Information Modeling BIM For The
26 pages
Nursery School Admission - Top School in Noida For Admissions - Heritage School Noida
No ratings yet
Nursery School Admission - Top School in Noida For Admissions - Heritage School Noida
5 pages
Pushkar KR Resume
No ratings yet
Pushkar KR Resume
2 pages
Class Test Topic: Modifier Total Marks: 20
No ratings yet
Class Test Topic: Modifier Total Marks: 20
1 page
Advanced Materials - 2012 - Fortunato - Oxide Semiconductor Thin Film Transistors A Review of Recent Advances
No ratings yet
Advanced Materials - 2012 - Fortunato - Oxide Semiconductor Thin Film Transistors A Review of Recent Advances
42 pages
SEAM4C PRELIMINARY ASSIGNMENT 1 Revised
No ratings yet
SEAM4C PRELIMINARY ASSIGNMENT 1 Revised
1 page
Secular Spirituality Reincarnation and Spiritism in Nineteenth Century France 2nd Edition Lynn L. Sharp 2024 scribd download
100% (6)
Secular Spirituality Reincarnation and Spiritism in Nineteenth Century France 2nd Edition Lynn L. Sharp 2024 scribd download
81 pages
Handouts ClassroomManagement&Assessment IP3
No ratings yet
Handouts ClassroomManagement&Assessment IP3
199 pages
Bhaurao Patil
No ratings yet
Bhaurao Patil
2 pages
Ratios of Adults To Children: Guidance Notes No
No ratings yet
Ratios of Adults To Children: Guidance Notes No
3 pages
10 2 / Electronics 11 / 2 Christian P. Paulo: I. Objectives
No ratings yet
10 2 / Electronics 11 / 2 Christian P. Paulo: I. Objectives
2 pages
Language Activity Evaluation Form
No ratings yet
Language Activity Evaluation Form
2 pages
College English 1 (058) Fall 2021 T/TH 14:00 15:15 (Speaking Lab 11:00 11:50)
No ratings yet
College English 1 (058) Fall 2021 T/TH 14:00 15:15 (Speaking Lab 11:00 11:50)
5 pages
hospital report
No ratings yet
hospital report
2 pages
Car SBA Liam FINAL 3.5
No ratings yet
Car SBA Liam FINAL 3.5
33 pages
Ravi Kant Sharma: To Gain A Position, Where I Can Implement My Creative Ideas, Abilities Knowledge
No ratings yet
Ravi Kant Sharma: To Gain A Position, Where I Can Implement My Creative Ideas, Abilities Knowledge
3 pages
Grade 5 Cycle Test March 2023
No ratings yet
Grade 5 Cycle Test March 2023
6 pages
French Study Guide Edito B1
No ratings yet
French Study Guide Edito B1
2 pages
The Handbook of Project Management 6th Edition Martina Huemann 2024 Scribd Download
100% (2)
The Handbook of Project Management 6th Edition Martina Huemann 2024 Scribd Download
55 pages
Professional Skills For Behavior Analyst
No ratings yet
Professional Skills For Behavior Analyst
13 pages
AF5203 - Course Outline - 2022-23 Sem 2 v0.3
No ratings yet
AF5203 - Course Outline - 2022-23 Sem 2 v0.3
9 pages
Clothes and Seasons
No ratings yet
Clothes and Seasons
6 pages
Case Study
No ratings yet
Case Study
12 pages
Chapter 6: Work The Design Process Work The Design Process
No ratings yet
Chapter 6: Work The Design Process Work The Design Process
4 pages
Before You Listen: Rent A Car
100% (1)
Before You Listen: Rent A Car
1 page
The_Other_Senses_Reading_Activity
No ratings yet
The_Other_Senses_Reading_Activity
11 pages
2024 UG Admissions Statistics 2019 2023 Final External
No ratings yet
2024 UG Admissions Statistics 2019 2023 Final External
283 pages
The Benefits of Attachment Parenting For Infants and Children: A Behavioral Developmental View
No ratings yet
The Benefits of Attachment Parenting For Infants and Children: A Behavioral Developmental View
14 pages
Flexible Work Arrangements and Employee Retention in IT Sector
No ratings yet
Flexible Work Arrangements and Employee Retention in IT Sector
8 pages

2 Supervised Learning

Uploaded by

2 Supervised Learning

Uploaded by

CS-471

• Model Selection and Evaluation

– Each tuple/sample is assumed to belong to a

– The set of tuples used for model construction is

– The model is represented as classification rules,

NAME RANK YEARS TENURED Classifier

• Thus we will often divide a data set into three

• These must be independent

• Use test set of class-labeled tuples instead of training set

• Some of the measures are:

Actual class\Predicted class Yes No

Actual class\Predicted class C1 ¬ C1

• Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i

Example of Confusion Matrix:

buy_computer = yes 6954 46 7000

• Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i

• Classifier Accuracy, or recognition rate: percentage

• Error rate: 1 – accuracy, or

◼ Significant majority of the negative class and minority of the positive

◼ Sensitivity: True Positive recognition rate

◼ Specificity: True Negative recognition rate

• Precision: exactness – what % of tuples that the

• Recall: completeness – what % of positive tuples did the

• Inverse relationship between precision & recall

• Fß: weighted measure of precision and recall

Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

cancer = yes 90 210 300 30.00 (sensitivity

cancer = no 140 9560 9700 98.56

Total 230 9770 10000 96.40 (accuracy)

– Random sampling: a variation of holdout

– Randomly partition the data into k mutually exclusive subsets,

– Training and testing are performed k times.

– At i-th iteration, use Di as test set and others as training set

– Accuracy = Overall correct classifications in k iterations/total

– At start, all the training examples are at the root

– Attributes are categorical (Works best with Categorical – If values are

– Examples are partitioned recursively based on selected attributes

– Test attributes are selected on the basis of a statistical measure (e.g.,

• Conditions for stopping partitioning

– All samples for a given node belong to the

– There are no remaining attributes for further

student? yes credit rating?

no yes excellent fair

• Represent the knowledge in the form of IF-THEN rules

– Gain_Ratio (income) = 0.029/1.557 = 0.019

• The attribute with the maximum gain ratio is selected as

• G-statistic: has a close approximation to χ2 distribution

• MDL (Minimal Description Length) principle (i.e., the simplest solution is

• Multivariate splits (partition based on multiple variable combinations)

• CART: finds multivariate splits based on a linear comb. of attrs.

Artificial Intelligence - Online Workshop 53

You might also like