0% found this document useful (0 votes)
18 views

Lecture03. Classification (Chapter 3)

Uploaded by

emad qedies
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lecture03. Classification (Chapter 3)

Uploaded by

emad qedies
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Lecture 4

Classification(Chapter 3)

CSC 484 / 584, DA 515

FALL 2024
ML Algorithms

DA 515

DA 535

2
Deep Learning: more resources

3
Main Points of Chapter 3
 Classification:
 Binary (1/0. or True/False)
 Multiclass (0, 1, …, 9)

 Evaluation Metrics:
 Cross-Validation
 Accuracy, Precision, Recall, F1
 ROC/AUC

ATTENTIONS: some models might take more than 30 minutes


to train them using your computer.

4
MNIST: Modified National Institute of Standards and Technology
 70,000 small images of digits handwritten
 Each image is labeled : 0, 1, …, 9

5
Load in dataset
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
 Read in from a dictionary : data key and target key

X, y = mnist["data"], mnist["target"] # X: image data, y:


label

 All Images: 70000 rows; each: 784 cols (features


784=28x28)
X.shape # (70000, 784)

y.shape #(70000,) # Labels: from 0 to 9, total 70000 6

rows
One Image: digital 5
# from 1-d to 2-d
X[0].reshape(28, 28)
[ [ 0 0 0 ………………0],
[ 0 0 0 ………… ......0],

[…………… 64 ……],

[0 ……………………..0] ]

 intensity Value: grayscale [0-255]


7
Check one image
import matplotlib.pyplot as plt
some_digit = X[0]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap="binary")
plt.axis("off")
plt.show()

print("this is digital:", y[0])


#this is digital: 5

8
We focus on Evaluating of ML Models
For now, we skip the following steps:
 Missing value handling
 Outlier detection
 Data Scaling
 Feature Selection
 PCA
 Imbalanced data
 ….

9
Split data sets: Training vs. Testing
X_train, X_test, y_train, y_test =
X[:60000], X[60000:], y[:60000], y[60000:]

 This is not random sampling, but the training set is


already shuffled for us, which is good because this
guarantees that all cross-validation folds will be similar

10
1. Binary Classification
 Simple Binary problems: one-versus-the-rest (OvR)
 True (class 1) if it is number 5
 False (class 0) if not

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

# True for all 5s, False for all other digits


 For two-class problem:
 Each class has a probability, then higher prob wins

 For many classes problems:


 Softmax (each gets a probability from 0 to 1)
11
Now which model?
 K-NN, DT, Forest, XGboost
 SVC, Logistic Regression, Naïve Bayes
 SGD vs mini-Batch
 SGD: Stochastic Gradient Descent (SGD) classifier,
 Training instances independently, one at a time
 Being capable of handling very large datasets
 Well suited for online learning

or
 Mini-Batch: use a bunch of samples to update.

12
SGD vs. Mini-Batch vs. Full

13
1.1 SGD: stochastic gradient descent
 Classifier: Stochastic Gradient Descent (SGD)
 Built-in in Scikit-Learn
from sklearn.linear_model import SGDClassifier

# (1) define a model


sgd_clf = SGDClassifier(random_state=42)

# (2) train the model


sgd_clf.fit(X_train, y_train_5)

# (3) evaluate the model (next)

14
Performance Measures
 Accuracy: corrected/ALL
 Cross Validation
 Confusion Matrix
 Precision: positive predictions
 Recall(sensitivity): true positive rate
 F1: harmonic mean of precision and recall
 ROC/AUC

15
Test: Accuracy = corrected/All
print(sgd_clf.predict(X_test)[0:50]) # predicted for the first 50
print(y_test_5[0:50]) # real label for the first 50

Predicted
[False False False False False False False False False False False False
False False False True False False False False False False False True
False False False False False False False False False False False False
False False False False False False False False False True False False
False False]
Real
[False False False False False False False False True False False False
False False False True False False False False False False False True
False False False False False False False False False False False False
False False False False False False False False False True False False
False False]
16
imbalanced Data
Why the accuracy is so high (>95%)

 (# of Digital 5)/all Digitals => 1/10


 90% accuracy if you classify all data to false(not 5)

 imbalanced data: be careful

 Balanced data:
 T:F(50:50), 50% accuracy is the baseline.

17
Try models: Cross Validation
 Fold = 3 applied to Training data
 Sampling => stratified
skfolds = StratifiedKFold(n_splits=3,
random_state=42)

# count how many of correctly classified


y_pred = clone_clf.predict(X_test_fold)

n_correct = sum(y_pred == y_test_fold)


print(n_correct / len(y_pred))

OUTPUT:
#0.95035, 0.96035, 0.9604
18
Confusion Matrix
 Accuracy = (TP+TN)/ALL

 Multiple classes

19
Confusion Matrix Code
from sklearn.metrics import confusion_matrix
confusion_matrix(y_train_5, y_train_pred)

# output
array([[53892, 687],
[ 1891, 3530]], dtype=int64)

#The sk-learn output


Predict N Predict P
Real N 53892 687
Real P 1891 3530
20
Confusion Matrix Heatmap

21
Other metrics
 Precision:
positive predictions

 Recall(sensitivity)
true positive rate

precision_score(y_train_5, y_train_pred) # ==
4096 / (4096 + 1522), 0.7290850836596654
When it claims an image represents a
5, it is correct only 72.9% of the
time
recall_score(y_train_5, y_train_pred) # ==
4096 / (4096 + 1325) , 0.7555801512636044
it only detects 75.6% of the real 5s.
22
Metric F1

from sklearn.metrics import f1_score


f1_score(y_train_5, y_train_pred)

#0.7325171197343846

23
Classification Report

 Macro avg: simple avg (0.97+0.84)/2 = 0.90


 Weighted avg: 90%*0.97 + 10%*0.84 = 0.95

24
Precision Vs Recall
(changing for different thresholds)

25
Decision Threshold (best: trade-off)
 True 5: total 6

26
Another Example: continuous distribution

27
ROC
 The receiver operating characteristic (ROC) curve

 ROC curve plots the true positive rate (another name


for recall) against the false positive rate (FPR). It is equal
to 1 – the true negative rate (TNR), which is the ratio of
negative instances that are correctly classified as
negative.

 The TNR is also called specificity. Hence, the ROC curve


plots sensitivity (recall) versus (1 – specificity.)

28
ROC curve

29
Area Under the Curve (AUC)

30
ROC for different Algorithms
SGD Random Forest
AUC = 0.9604938554008616 AUC = 0.9983436731328145

31
Change threshold: prediction changes
(5 or not 5)
 y_scores
 # array([2164.22030239])

(1) If we use small threshold


threshold = 0
y_some_digit_pred = (y_scores > threshold)
y_some_digit_pred
# array([ True]) classified as 5
(2) Big threshold
threshold = 3000
#array([False]) #The prediction is changed
32
Summary
 You now know:
 how to train binary classifiers,
 choose the appropriate metric for your task,
 evaluate your classifiers using cross-validation,
 select the precision/recall tradeoff that fits your

needs, and
 use ROC curves and ROC AUC scores to compare

various models.

 NEXT, let’s try to detect more than just the 5s.

33
2. Multiclass Classification (0 1 … 9)
 Some algorithms (such as SGD classifiers, Random Forest
classifiers, and naive Bayes
classifiers) are capable of handling multiple classes natively.
 Others (such as Logistic Regression or Support Vector Machine

classifiers) are strictly binary classifiers.

Algorithms:
(1) SDG
(2) Forest

(3) SVC: Support Vector Machine classifier(later)

34
(2.1) SGD
 The decision_function() method now returns one value per class.
sgd_clf.decision_function([some_digit])
# array([[-15955.22628, -38080.96296, -13326.66695,
573.52692, -17680.68466, 2412.53175, -25526.86498, -
12290.15705, -7946.05205, -10631.35889]])

 Use the cross_val_score() function: the


SGDClassifier’s accuracy:
cross_val_score(sgd_clf, X_train,
y_train, cv=3, scoring="accuracy")
# array([0.87365, 0.85835, 0.8689 ]))
35
Scale the data to N(0, 1)
 Simply scaling the inputs (as discussed in Chapter 2)
increases accuracy above 89%:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled =
scaler.fit_transform(X_train.astype(np.float64))

cross_val_score(sgd_clf, X_train_scaled, y_train,


cv=3, scoring="accuracy")

array([0.89707059, 0.8960948 , 0.90693604])

36
2.2 RandomForestClassifier
Just switch the classifier from sgd to Random Forest:
# just swith the classifier from sgd to Random Forest
from sklearn.ensemble import RandomForestClassifier

forest_clf = RandomForestClassifier(random_state=42)

cross_val_score(forest_clf, X_train_scaled, y_train,


cv=3, scoring="accuracy")

# array([0.96445, 0.96255, 0.96645])

WE GOT MUCH BETTER RESULTS USING RANDOM FOREST

37
If this were a real project
 Error Analysis:
 checklist (see Appendix B).
 try out multiple models
 fine-tuning their hyperparameters using GridSearchCV
 CONFUSION MATRIX

38
Vizualization
plt.matshow(conf_mx, cmap=plt.cm.gray)
plt.show()

39
SGD model: examples of 3s and 5s

 Bad handwriting

40
3. (skip) Multilabel Classification
 One sample: assigned to one class
 multilabel classification for each instance:
 Example: face recognition
 three faces: Alice, Bob, and Charlie.
 Classifier for picture of Alice and Charlie:

[1, 0, 1]
(meaning “Alice yes, Bob no, Charlie yes”).
 Classifier outputs: multiple binary tags

41
4. (skip) Multioutput Classification
 multioutput–multiclass classification (or simply multioutput
classification).
 It is simply a generalization of multilabel classification
where each label can be multiclass (i.e., it can have more
than two possible values).
 noise from images. It will take as input a noisy digit image,
and it will (hopefully) output a clean digit image,
represented as an array of pixel intensities, just like the
MNIST images.
 Notice that the classifier’s output is multilabel (one label
per pixel) and each label can have multiple values (pixel
intensity ranges from 0 to 255). It is thus an example of a
multioutput classification system. 42
Summary
 Classification
 Binary (yes, no)
 Multiclass (0, 1, …9)

 Evaluation Metrics:
 Confusion Matrix
 Accuracy
 ROC/AUC
 Cross-Validation

43
Trade-off

44
Optional HW: Data augmentation
(no turning in)

45
END

• Read book Chapter 3


• Practice code from this Chapter
• Practice HW (No submission)

You might also like