Assignment 02 | PDF | Statistical Classification | Applied Mathematics
0% found this document useful (0 votes)
32 views5 pages

Assignment 02

This document provides instructions for a machine learning assignment involving k-nearest neighbors (k-NN) classification. It consists of 3 parts: 1. Implement a k-NN classifier from scratch in Python without using scikit-learn. Test it on a dataset of cat and dog images and evaluate performance for different k values and distance metrics. 2. Use scikit-learn's k-NN implementation to classify the same cat/dog dataset, and compare results to Part 1. 3. Generalize the k-NN classifier to handle multiple classes and test it on a weather dataset with 4 classes. Evaluate performance and provide confusion matrix, accuracy, and macro average F1 score.

Uploaded by

Muhammad Husnain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views5 pages

Assignment 02

This document provides instructions for a machine learning assignment involving k-nearest neighbors (k-NN) classification. It consists of 3 parts: 1. Implement a k-NN classifier from scratch in Python without using scikit-learn. Test it on a dataset of cat and dog images and evaluate performance for different k values and distance metrics. 2. Use scikit-learn's k-NN implementation to classify the same cat/dog dataset, and compare results to Part 1. 3. Generalize the k-NN classifier to handle multiple classes and test it on a weather dataset with 4 classes. Evaluate performance and provide confusion matrix, accuracy, and macro average F1 score.

Uploaded by

Muhammad Husnain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

LAHORE UNIVERSITY OF MANAGEMENT SCIENCES

Syed Babar Ali School of Science and Engineering

EE514/CS535 Machine Learning


Spring Semester 2021

Programming Assignment 2 – k-Nearest Neighbors Classification

Issued: Tuesday 9th February, 2021

Total Marks: 100


Submission: 11:55 pm, Thursday 18th February, 2021.
Contribution to Final Percentage: 4%

Goal
The goal of this assignment is to get you familiar with k-NN classification and to give
hands on experience of basic python tools and libraries which will be used in implementing
the algorithm. You will learn the following:
ˆ Feature engineering, i.e., extracting features from raw data using different tech-
niques.
ˆ Implementation of the k-NN algorithm from scratch.
ˆ Classification of images using the k-NN algorithm.
ˆ Evaluation of the performance of your classifier using different evaluation metrics
such as confusion matrix, F1-score, accuracy.

Instructions
ˆ Submit your code both as notebook file (.ipynb) and python script (.py) on LMS.
The name of both files should be your roll number. Failing to submit any one of
them will result in the reduction of marks.
ˆ The code MUST be implemented independently. Any plagiarism or cheating of work
from others or the internet will be immediately referred to the DC.
ˆ 10% penalty per day for 3 days after due date. No submissions will be accepted
after that.
ˆ Use procedural programming style and comment your code properly.

1
Part 1: Implement k-NN classifier from scratch (35
Marks)
NOTE:
You are not allowed to use scikit-learn or any other machine learning toolkit for this part.
You have to implement your own k-NN classifier from scratch. You may use Pandas,
NumPy, Matplotlib and other standard python libraries.

Dataset:
The dataset contains 10,000 images of dogs and cats which have already been split (80
%, 20%) into training and test data. There are two top-level directories [test set/, train-
ing set/] corresponding to test set and training set respectively. Each of these directories
further contains two directories [cats/, dogs/] comprising images of cats and dogs. The
class labels of each of the images correspond to the directory they are contained in i.e.,
cat/dog.
Dataset: Dogs and Cats Images

Feature Extraction:
In the feature extraction step, firstly, you have to read the images which will give you a
multi dimensional array containing RGB pixel intensities of the image. Using raw pixel
values is the simplest way to create features from an image but for this part we will use
HOG (Histogram of Oriented Gradients) feature descriptor to extract features from image
data. The HOG descriptor focuses on the structure or shape of an object. It identifies if a
pixel is an edge or not, as well as edge direction, by extracting the gradient and orientation
(or magnitude and direction) of the edges. You can use skimage.feature library to extract
HOG features from the image.

Tasks:
1. Create your own k-Nearest Neighbors classifier function by performing following
tasks:
ˆ For a test data point, find its distance from all training instances.
ˆ Sort the calculated distances in ascending order based on distance values.
ˆ Choose k training samples with minimum distances from the test data point.
ˆ Return the most frequent class of these samples.
(Your function should work with Euclidean distance as well as Manhattan
distance. Pass the distance metric as a parameter in k-NN classifier function.
Your function should also be general enough to work with any value of k.)
2. Implement a evaluation function which calculates the classification accuracy, F1
score and the confusion matrix of your classifier on the test set. What significance
does the F1 score hold, and why is it a better metric than accuracy?
3. Run your k-NN function for the values of k = 1, 2, 3, 4, 5, 6, 7. Do this for both the
Euclidean distance and the Manhattan distance for each value of k. Formulas for

2
both are listed below:

Euclidean Distance:
p
d(~p, ~q) = (p1 − q1 )2 + (p2 − q2 )2 + (p3 − q3 )2 + ... + (pn − qn )2

Manhattan Distance:

d(~p, ~q) = |(p1 − q1 )| + |(p2 − q2 )| + |(p3 − q3 )| + ... + |(pn − qn )|

4. For the even values of k given in the above task break ties by backing off to k − 1
value (Assume that you take the k = 4 nearest neighbors of a particular image, and
two of them have the label “cat” and the other two have the label “dog”. In this
case you will break tie by backing off to k = 3 neighbors).
5. Present the results as a graph with k values on x-axis and F1 score on y-axis. Use
a single plot to compare the two versions of classifier (one using Euclidean and the
other using Manhattan distance metric). The graphs should be properly labelled.

3
Part 2: k-NN classifier using scikit-learn (20 marks)
In this part you have to use scikit-learn’s k-NN implementation to train and test your
classifier on the dataset used in Part 1. Run the k-NN classifier again for values of
k = 1, 2, 3, 4, 5, 6, 7 using both Euclidean and Manhattan distance. Use scikit-learn’s
accuracy score function to calculate the accuracy, F1 score to calculate F1 score and
confusion matrix function to calculate confusion matrix for test data. Also present the
results as a graph with k values on x-axis and F1 score on y-axis for both distance metrics
in a single plot.

4
Part 3: Implement a k-NN classifier for a multi-class
data set (45 marks)
Dataset:
You will be using the weather data set for this part, which can be found here. There are
two top-level directories [Test data/, Training data/] corresponding to test set and train-
ing set respectively. There are 899 images in the training data and 224 images in the test
data. Each of these directories further contains four directories [Cloudy/, Rain/, Shine/,
Sunrise/] comprising images of relevant weather conditions. The class labels of each of the
images correspond to the directory they are contained in i.e., Cloudy/Rain/Shine/Sunrise.

Feature Extraction:
In the feature extraction step, you have to read the images and then resize them to a
fixed size (32, 32), which can be done using this function. After that, flatten the RGB
pixel intensities of the images (which are multi-dimensional arrays obtained by reading
the image) to a single list of numbers, i.e., a one dimensional array. Doing so, you will
get a numpy array of shape (image size*3, ) for each image, which will serve as your
feature vector for the particular image. You can use cv2 and numpy to implement these
steps.

Tasks:
In this part you have to implement a generalized form of a k-NN classifier, which can
classify a data set which has N classes. You have to repeat all the steps that you have
implemented in Part 1, this time ensuring that all the steps are scaled up from a bi-
nary (classes c = 2) to a generalized form (c = N ). Evaluation function should now
provide macro average F1 score as an output output along with accuracy and confusion
matrix.

You might also like