0% found this document useful (0 votes)

18 views

Lecture03. Classification (Chapter 3)

Uploaded by

emad qedies

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Lecture03. Classification (Chapter 3)

Uploaded by

emad qedies

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Lecture 4

Classification(Chapter 3)

CSC 484 / 584, DA 515

FALL 2024
ML Algorithms

DA 515

DA 535

2
Deep Learning: more resources

3
Main Points of Chapter 3
 Classification:
 Binary (1/0. or True/False)
 Multiclass (0, 1, …, 9)

 Evaluation Metrics:
 Cross-Validation
 Accuracy, Precision, Recall, F1
 ROC/AUC

ATTENTIONS: some models might take more than 30 minutes

to train them using your computer.

4
MNIST: Modified National Institute of Standards and Technology
 70,000 small images of digits handwritten
 Each image is labeled : 0, 1, …, 9

5
Load in dataset
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
 Read in from a dictionary : data key and target key

X, y = mnist["data"], mnist["target"] # X: image data, y:

label

 All Images: 70000 rows; each: 784 cols (features

784=28x28)
X.shape # (70000, 784)

y.shape #(70000,) # Labels: from 0 to 9, total 70000 6

rows
One Image: digital 5
# from 1-d to 2-d
X[0].reshape(28, 28)
[ [ 0 0 0 ………………0],
[ 0 0 0 ………… ......0],

[…………… 64 ……],

[0 ……………………..0] ]

 intensity Value: grayscale [0-255]

7
Check one image
import matplotlib.pyplot as plt
some_digit = X[0]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap="binary")
plt.axis("off")
plt.show()

print("this is digital:", y[0])

#this is digital: 5

8
We focus on Evaluating of ML Models
For now, we skip the following steps:
 Missing value handling
 Outlier detection
 Data Scaling
 Feature Selection
 PCA
 Imbalanced data
 ….

9
Split data sets: Training vs. Testing
X_train, X_test, y_train, y_test =
X[:60000], X[60000:], y[:60000], y[60000:]

 This is not random sampling, but the training set is

already shuffled for us, which is good because this
guarantees that all cross-validation folds will be similar

10
1. Binary Classification
 Simple Binary problems: one-versus-the-rest (OvR)
 True (class 1) if it is number 5
 False (class 0) if not

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

# True for all 5s, False for all other digits

 For two-class problem:
 Each class has a probability, then higher prob wins

 For many classes problems:

 Softmax (each gets a probability from 0 to 1)
11
Now which model?
 K-NN, DT, Forest, XGboost
 SVC, Logistic Regression, Naïve Bayes
 SGD vs mini-Batch
 SGD: Stochastic Gradient Descent (SGD) classifier,
 Training instances independently, one at a time
 Being capable of handling very large datasets
 Well suited for online learning

or
 Mini-Batch: use a bunch of samples to update.

12
SGD vs. Mini-Batch vs. Full

13
1.1 SGD: stochastic gradient descent
 Classifier: Stochastic Gradient Descent (SGD)
 Built-in in Scikit-Learn
from sklearn.linear_model import SGDClassifier

# (1) define a model

sgd_clf = SGDClassifier(random_state=42)

# (2) train the model

sgd_clf.fit(X_train, y_train_5)

# (3) evaluate the model (next)

14
Performance Measures
 Accuracy: corrected/ALL
 Cross Validation
 Confusion Matrix
 Precision: positive predictions
 Recall(sensitivity): true positive rate
 F1: harmonic mean of precision and recall
 ROC/AUC

15
Test: Accuracy = corrected/All
print(sgd_clf.predict(X_test)[0:50]) # predicted for the first 50
print(y_test_5[0:50]) # real label for the first 50

Predicted
[False False False False False False False False False False False False
False False False True False False False False False False False True
False False False False False False False False False False False False
False False False False False False False False False True False False
False False]
Real
[False False False False False False False False True False False False
False False False True False False False False False False False True
False False False False False False False False False False False False
False False False False False False False False False True False False
False False]
16
imbalanced Data
Why the accuracy is so high (>95%)

 (# of Digital 5)/all Digitals => 1/10

 90% accuracy if you classify all data to false(not 5)

 imbalanced data: be careful

 Balanced data:
 T:F(50:50), 50% accuracy is the baseline.

17
Try models: Cross Validation
 Fold = 3 applied to Training data
 Sampling => stratified
skfolds = StratifiedKFold(n_splits=3,
random_state=42)

# count how many of correctly classified

y_pred = clone_clf.predict(X_test_fold)

n_correct = sum(y_pred == y_test_fold)

print(n_correct / len(y_pred))

OUTPUT:
#0.95035, 0.96035, 0.9604
18
Confusion Matrix
 Accuracy = (TP+TN)/ALL

 Multiple classes

19
Confusion Matrix Code
from sklearn.metrics import confusion_matrix
confusion_matrix(y_train_5, y_train_pred)

# output
array([[53892, 687],
[ 1891, 3530]], dtype=int64)

#The sk-learn output

Predict N Predict P
Real N 53892 687
Real P 1891 3530
20
Confusion Matrix Heatmap

21
Other metrics
 Precision:
positive predictions

 Recall(sensitivity)
true positive rate

precision_score(y_train_5, y_train_pred) # ==
4096 / (4096 + 1522), 0.7290850836596654
When it claims an image represents a
5, it is correct only 72.9% of the
time
recall_score(y_train_5, y_train_pred) # ==
4096 / (4096 + 1325) , 0.7555801512636044
it only detects 75.6% of the real 5s.
22
Metric F1

from sklearn.metrics import f1_score

f1_score(y_train_5, y_train_pred)

#0.7325171197343846

23
Classification Report

 Macro avg: simple avg (0.97+0.84)/2 = 0.90

 Weighted avg: 90%*0.97 + 10%*0.84 = 0.95

24
Precision Vs Recall
(changing for different thresholds)

25
Decision Threshold (best: trade-off)
 True 5: total 6

26
Another Example: continuous distribution

27
ROC
 The receiver operating characteristic (ROC) curve

 ROC curve plots the true positive rate (another name

for recall) against the false positive rate (FPR). It is equal
to 1 – the true negative rate (TNR), which is the ratio of
negative instances that are correctly classified as
negative.

 The TNR is also called specificity. Hence, the ROC curve

plots sensitivity (recall) versus (1 – specificity.)

28
ROC curve

29
Area Under the Curve (AUC)

30
ROC for different Algorithms
SGD Random Forest
AUC = 0.9604938554008616 AUC = 0.9983436731328145

31
Change threshold: prediction changes
(5 or not 5)
 y_scores
 # array([2164.22030239])

(1) If we use small threshold

threshold = 0
y_some_digit_pred = (y_scores > threshold)
y_some_digit_pred
# array([ True]) classified as 5
(2) Big threshold
threshold = 3000
#array([False]) #The prediction is changed
32
Summary
 You now know:
 how to train binary classifiers,
 choose the appropriate metric for your task,
 evaluate your classifiers using cross-validation,
 select the precision/recall tradeoff that fits your

needs, and
 use ROC curves and ROC AUC scores to compare

various models.

 NEXT, let’s try to detect more than just the 5s.

33
2. Multiclass Classification (0 1 … 9)
 Some algorithms (such as SGD classifiers, Random Forest
classifiers, and naive Bayes
classifiers) are capable of handling multiple classes natively.
 Others (such as Logistic Regression or Support Vector Machine

classifiers) are strictly binary classifiers.

Algorithms:
(1) SDG
(2) Forest

(3) SVC: Support Vector Machine classifier(later)

34
(2.1) SGD
 The decision_function() method now returns one value per class.
sgd_clf.decision_function([some_digit])
# array([[-15955.22628, -38080.96296, -13326.66695,
573.52692, -17680.68466, 2412.53175, -25526.86498, -
12290.15705, -7946.05205, -10631.35889]])

 Use the cross_val_score() function: the

SGDClassifier’s accuracy:
cross_val_score(sgd_clf, X_train,
y_train, cv=3, scoring="accuracy")
# array([0.87365, 0.85835, 0.8689 ]))
35
Scale the data to N(0, 1)
 Simply scaling the inputs (as discussed in Chapter 2)
increases accuracy above 89%:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled =
scaler.fit_transform(X_train.astype(np.float64))

cross_val_score(sgd_clf, X_train_scaled, y_train,

cv=3, scoring="accuracy")

array([0.89707059, 0.8960948 , 0.90693604])

36
2.2 RandomForestClassifier
Just switch the classifier from sgd to Random Forest:
# just swith the classifier from sgd to Random Forest
from sklearn.ensemble import RandomForestClassifier

forest_clf = RandomForestClassifier(random_state=42)

cross_val_score(forest_clf, X_train_scaled, y_train,

cv=3, scoring="accuracy")

# array([0.96445, 0.96255, 0.96645])

WE GOT MUCH BETTER RESULTS USING RANDOM FOREST

37
If this were a real project
 Error Analysis:
 checklist (see Appendix B).
 try out multiple models
 fine-tuning their hyperparameters using GridSearchCV
 CONFUSION MATRIX

38
Vizualization
plt.matshow(conf_mx, cmap=plt.cm.gray)
plt.show()

39
SGD model: examples of 3s and 5s

 Bad handwriting

40
3. (skip) Multilabel Classification
 One sample: assigned to one class
 multilabel classification for each instance:
 Example: face recognition
 three faces: Alice, Bob, and Charlie.
 Classifier for picture of Alice and Charlie:

[1, 0, 1]
(meaning “Alice yes, Bob no, Charlie yes”).
 Classifier outputs: multiple binary tags

41
4. (skip) Multioutput Classification
 multioutput–multiclass classification (or simply multioutput
classification).
 It is simply a generalization of multilabel classification
where each label can be multiclass (i.e., it can have more
than two possible values).
 noise from images. It will take as input a noisy digit image,
and it will (hopefully) output a clean digit image,
represented as an array of pixel intensities, just like the
MNIST images.
 Notice that the classifier’s output is multilabel (one label
per pixel) and each label can have multiple values (pixel
intensity ranges from 0 to 255). It is thus an example of a
multioutput classification system. 42
Summary
 Classification
 Binary (yes, no)
 Multiclass (0, 1, …9)

 Evaluation Metrics:
 Confusion Matrix
 Accuracy
 ROC/AUC
 Cross-Validation

43
Trade-off

44
Optional HW: Data augmentation
(no turning in)

45
END

• Read book Chapter 3

• Practice code from this Chapter
• Practice HW (No submission)

1) The Art of Feature Engineering (Pablo Duboue)
100% (1)
1) The Art of Feature Engineering (Pablo Duboue)
287 pages
HW#4
No ratings yet
HW#4
2 pages
Machine Learning Chapter3
No ratings yet
Machine Learning Chapter3
27 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
Module 2
No ratings yet
Module 2
151 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
Guide
No ratings yet
Guide
24 pages
Machine_Learning_II
No ratings yet
Machine_Learning_II
61 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
CS585 Lecture October03rd
No ratings yet
CS585 Lecture October03rd
146 pages
CH-5_ML
No ratings yet
CH-5_ML
36 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
Classification
No ratings yet
Classification
4 pages
ML models
No ratings yet
ML models
21 pages
BSC ML CH1.pptx
No ratings yet
BSC ML CH1.pptx
63 pages
Classification: Prof. Gheith Abandah
No ratings yet
Classification: Prof. Gheith Abandah
30 pages
Unit II - 3 - Chapter 3 - MNIST Classification
No ratings yet
Unit II - 3 - Chapter 3 - MNIST Classification
13 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
What Is Machine Learning_ _ Python Data Science Handbook
No ratings yet
What Is Machine Learning_ _ Python Data Science Handbook
11 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
lec5_Classification
No ratings yet
lec5_Classification
27 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Module 4 - Classification (1)
No ratings yet
Module 4 - Classification (1)
10 pages
Module_5
No ratings yet
Module_5
5 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
Lecture 5 Evaluation_Classifer
No ratings yet
Lecture 5 Evaluation_Classifer
61 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
A10-Model-Performance-v2-2up
No ratings yet
A10-Model-Performance-v2-2up
11 pages
INT524 unit3
No ratings yet
INT524 unit3
35 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
ML U4
No ratings yet
ML U4
48 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Scikit Learn
No ratings yet
Scikit Learn
25 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Classification Metrics For Generalized Results
No ratings yet
Classification Metrics For Generalized Results
70 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) - Download the ebook and explore the most detailed content
100% (1)
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) - Download the ebook and explore the most detailed content
60 pages
DM assignment 2
No ratings yet
DM assignment 2
23 pages
Lec 2
No ratings yet
Lec 2
13 pages
Machine Learning
No ratings yet
Machine Learning
95 pages
ML-21AI63
No ratings yet
ML-21AI63
26 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Chapter 2 Machine Learning Draft-85-172
No ratings yet
Chapter 2 Machine Learning Draft-85-172
88 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
04 Machine Learning Overview
No ratings yet
04 Machine Learning Overview
109 pages
6 - Steps of The Classification Algorithm in Supervised Learning
No ratings yet
6 - Steps of The Classification Algorithm in Supervised Learning
15 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
ML UNIT-II
No ratings yet
ML UNIT-II
37 pages
ML Notes -2025
No ratings yet
ML Notes -2025
145 pages
Classification
No ratings yet
Classification
53 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Short Answer (Chapter 1) : Jerry Croft Fri Sep 07 04:34:00 PDT 2012
No ratings yet
Short Answer (Chapter 1) : Jerry Croft Fri Sep 07 04:34:00 PDT 2012
29 pages
Geostatistics Syllabus
No ratings yet
Geostatistics Syllabus
3 pages
Lecture 4 Data Encryption Standard
No ratings yet
Lecture 4 Data Encryption Standard
10 pages
Unit 6: Stability of Linear Control System
No ratings yet
Unit 6: Stability of Linear Control System
14 pages
aif-c01_1
No ratings yet
aif-c01_1
16 pages
Bernoulli Formula With Sample
No ratings yet
Bernoulli Formula With Sample
3 pages
RSA
No ratings yet
RSA
25 pages
Download Stochastic Models of Financial Mathematics 1st Edition Vigirdas Mackevicius ebook All Chapters PDF
100% (3)
Download Stochastic Models of Financial Mathematics 1st Edition Vigirdas Mackevicius ebook All Chapters PDF
62 pages
Operations Research: Instructor: DR: Abdelhamid Mostafa
No ratings yet
Operations Research: Instructor: DR: Abdelhamid Mostafa
13 pages
Using RRC Algorithm Classify The Proteins and Visualize in Biological Databases
No ratings yet
Using RRC Algorithm Classify The Proteins and Visualize in Biological Databases
6 pages
A Survey On Evaluation of Large Language Models
No ratings yet
A Survey On Evaluation of Large Language Models
26 pages
1973 A Parallel Algorithm For The Efficient Solution of A General Class of Recurrence Equations
No ratings yet
1973 A Parallel Algorithm For The Efficient Solution of A General Class of Recurrence Equations
8 pages
The Fast Continuous Wavelet Transformation (FCWT) For Real-Time, High-Quality, Noise-Resistant Time-Frequency Analysis
No ratings yet
The Fast Continuous Wavelet Transformation (FCWT) For Real-Time, High-Quality, Noise-Resistant Time-Frequency Analysis
17 pages
Sec 12.1 041219
No ratings yet
Sec 12.1 041219
5 pages
Probability
No ratings yet
Probability
13 pages
Hopfield Neural Network
100% (1)
Hopfield Neural Network
6 pages
AI Lecture 11
No ratings yet
AI Lecture 11
9 pages
Cideciyan 1992 - A PRML System for Digital Magnetic Recording
No ratings yet
Cideciyan 1992 - A PRML System for Digital Magnetic Recording
19 pages
1 - PERT-CPM (Part 2)
No ratings yet
1 - PERT-CPM (Part 2)
5 pages
Grade 7 study material
No ratings yet
Grade 7 study material
3 pages
M M M M M M M M M: Koppar & Associates, Chartered Accountants 6/30/2011
0% (1)
M M M M M M M M M: Koppar & Associates, Chartered Accountants 6/30/2011
34 pages
Exp3 B
No ratings yet
Exp3 B
4 pages
DSP - LP
No ratings yet
DSP - LP
4 pages
S&NM
No ratings yet
S&NM
11 pages
Basic Key Exchange: Trusted 3 Parties
No ratings yet
Basic Key Exchange: Trusted 3 Parties
36 pages
Linear Equations in One Variable
No ratings yet
Linear Equations in One Variable
7 pages
Module1 Simultaneous Equations and Inequalities
No ratings yet
Module1 Simultaneous Equations and Inequalities
9 pages
Assignment 7
No ratings yet
Assignment 7
17 pages

Lecture03. Classification (Chapter 3)

Uploaded by

Lecture03. Classification (Chapter 3)

Uploaded by

Lecture 4

CSC 484 / 584, DA 515

ATTENTIONS: some models might take more than 30 minutes

X, y = mnist["data"], mnist["target"] # X: image data, y:

 All Images: 70000 rows; each: 784 cols (features

y.shape #(70000,) # Labels: from 0 to 9, total 70000 6

 intensity Value: grayscale [0-255]

print("this is digital:", y[0])

 This is not random sampling, but the training set is

# True for all 5s, False for all other digits

 For many classes problems:

# (1) define a model

# (2) train the model

# (3) evaluate the model (next)

 (# of Digital 5)/all Digitals => 1/10

 imbalanced data: be careful

# count how many of correctly classified

n_correct = sum(y_pred == y_test_fold)

#The sk-learn output

from sklearn.metrics import f1_score

 Macro avg: simple avg (0.97+0.84)/2 = 0.90

 ROC curve plots the true positive rate (another name

 The TNR is also called specificity. Hence, the ROC curve

(1) If we use small threshold

 NEXT, let’s try to detect more than just the 5s.

classifiers) are strictly binary classifiers.

(3) SVC: Support Vector Machine classifier(later)

 Use the cross_val_score() function: the

cross_val_score(sgd_clf, X_train_scaled, y_train,

array([0.89707059, 0.8960948 , 0.90693604])

cross_val_score(forest_clf, X_train_scaled, y_train,

# array([0.96445, 0.96255, 0.96645])

WE GOT MUCH BETTER RESULTS USING RANDOM FOREST

• Read book Chapter 3

You might also like