0% found this document useful (0 votes)

154 views27 pages

Supervised Learning & KNN Guide

The KNN algorithm is used to classify a new data point based on its similarity to labeled examples in the training set. It finds the K nearest neighbors of the new point and assigns the most common label of those neighbors as the prediction. In the example, K=3 nearest neighbors are found for a new paper sample with attributes a=3 and b=7. Two of the three nearest neighbors are labeled "Good", so the new sample is predicted to be "Good".

Uploaded by

sajjadimam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views27 pages

Supervised Learning & KNN Guide

Uploaded by

sajjadimam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

CHAPTER#10

SUPERVISED LEARNING & ITS ALGORITHMS

PART # 01

COURSE INSTRUCTORS:
 ENGR. FARHEEN QAZI
 ENGR. SAJJAD IMAM
 ENGR. SUNDUS ZEHRA

DEPARTMENT OF SOFTWARE ENGINEERING

SIR SYED UNIVERSITY OF ENGINEERING & TECHNOLOGY
AGENDA

 Supervised Learning
 Categories of Supervised Learning
 K-Nearest Neighbors Algorithm
 Working of K-Nearest Neighbors Algorithm
 Graphical Example
 Applications of K-NN
 Advantages
 Disadvantages
 Summary
SUPERVISED LEARNING

 Supervised learning as the name indicates the presence of a supervisor as a teacher.

 Basically supervised learning is a learning in which we teach or train the machine
using data which is well labeled that means some data is already tagged with the
correct answer.
 After that, the machine is provided with a new set of examples(data) so that
supervised learning algorithm analyses the training data(set of training examples) and
produces a correct outcome from labeled data.
CONTD….

 For instance, suppose you are given a basket filled with different kinds of fruits. Now the first step is to train the
machine with all different fruits one by one like this:

 If shape of object is rounded and depression at top having color Red then it will be labelled as –Apple.
 If shape of object is long curving cylinder having color Green-Yellow then it will be labelled as –Banana.
CONTD….

 Now suppose after training the data, you have given a new separate fruit say Banana from basket and asked to
identify it.

 Since the machine has already learned the things from previous data and this time have to use it wisely. It will first
classify the fruit with its shape and color and would confirm the fruit name as BANANA and put it in Banana
category.
 Thus the machine learns the things from training data(basket containing fruits) and then apply the knowledge to
test data(new fruit).
CATEGORIES OF SUPERVISED LEARNING

Supervised learning classified into two categories of algorithms:

 Classification: A classification problem is when the output variable is a
category, such as “Red” or “blue” or “disease” and “no disease”.
 Regression: A regression problem is when the output variable is a real
value, such as “dollars” or “weight”.
CATEGORIES OF SUPERVISED LEARNING

Supervised Learning

Classification Regression

Linear regression
Decision Tree, .
Naïve Bayes,
K-Nearest Neighbor
OVERFITTING AND UNDERFITTING IN MACHINE LEARNING
 Overfitting and Underfitting are the two main problems that occur in machine
learning and degrade the performance of the machine learning models.

 OVERFITTING
 Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset.
 Because of this, the model starts caching noise and inaccurate values present in
the dataset, and all these factors reduce the efficiency and accuracy of the model.
 The overfitted model has low bias and high variance.
 Overfitting is the main problem that occurs in supervised learning.
OVERFITTING EXAMPLE

 The concept of the overfitting can be understood by the below graph of the linear regression output:

 As we can see from the above graph, the model tries to cover all the data points present in the scatter plot. It
may look efficient, but in reality, it is not so.
 Because the goal of the regression model to find the best fit line, but here we have not got any best fit, so, it will
generate the prediction errors.
UNDERFITTING

 Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data.
 To avoid the overfitting in the model, the fed of training data can be stopped at an
early stage, due to which the model may not learn enough from the training data.
 As a result, it may fail to find the best fit of the dominant trend in the data.
 In the case of underfitting, the model is not able to learn enough from the training
data, and hence it reduces the accuracy and produces unreliable predictions.
 An underfitted model has high bias and low variance.
UNDERFITTING EXAMPLE

 We can understand the underfitting using below output of the linear regression model:

 As we can see from the above diagram, the model is unable to capture the data points present in the plot.
K-NEAREST NEIGHBORS (KNN) ALGORITHM

K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be

used for both classification as well as regression predictive problems. However, it is mainly
used for classification predictive problems in industry. The following two properties would
define KNN well −
 Lazy learning algorithm − KNN is a lazy learning algorithm because it does not
have a specialized training phase and uses all the data for training while
classification.
 Non-parametric learning algorithm − KNN is also a non-parametric learning
algorithm because it doesn’t assume anything about the underlying data.
WORKING OF K-NEAREST NEIGHBORS ALGORITHM

K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new data points
which further means that the new data point will be assigned a value based on how closely it matches
the points in the training set.We can understand its working with the help of following steps −
 Step 1 − For implementing any algorithm, we need dataset. So during the first step of KNN,
we must load the training as well as test data.
 Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can be any
integer.
 Step 3 − For each point in the test data do the following −
3.1 − Calculate the distance between test data and each row of training data with the help
of any of the method namely: Euclidean, Manhattan or Hamming distance. The most
commonly used method to calculate distance is Euclidean.
CONTD….

3.2 − Now, based on the distance value, sort them in ascending order.
3.3 − Next, it will choose the top K rows from the sorted array.
3.4 − Now, it will assign a class to the test point based on most frequent class of these rows.
Step 4 − End
DISTANCE MEASURE
GRAPHICAL EXAMPLE
The following is an example to understand the concept of K and working of KNN
algorithm.
Suppose we have a dataset which can be plotted as follows:
CONTD….
Now, we need to classify new data point with black dot (at point 60,60) into blue or red class. We are
assuming K = 3 i.e. it would find three nearest data points. It is shown in the below diagram −

We can see in the above diagram the three nearest neighbors of the data point with black dot. Among those
three, two of them lies in Red class hence the black dot will also be assigned in red class.
EXAMPLE 1
Question: We have data from the questionnaires survey (to ask people opinion) and
objective testing with two attributes (acid durability and strength) to classify whether a
special paper tissue is good or not. Here is four training samples

a = Acid Durability b = Strength Y = Classification

(seconds) (kg/square meter)
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good

Now the factory produces a new paper tissue that pass laboratory test with a = 3 and
b = 7. Without another expensive survey, can we guess what the classification of this new
tissue is?
CONTD….
1. Determine parameter K = number of nearest neighbors
 Suppose use K = 3
2. Calculate the distance between the query-instance and all the training samples
 Coordinate of query instance is (3, 7), instead of calculating the distance we compute square distance which is
faster to calculate (without square root)

a = Acid Durability b = Strength Euclidean Distance to Query Instance

(seconds) (kg/square meter) (3, 7)

7 7 𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟕)𝟐 = 𝟒

7 4 𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟒)𝟐 = 𝟓

3 4 𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟑)𝟐 +(𝟕 − 𝟒)𝟐 = 𝟑

1 4 𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟏)𝟐 +(𝟕 − 𝟒)𝟐 = 𝟑. 𝟔

CONTD….
 3. Sort the distance and determine nearest neighbors based on the K-th minimum distance

a = Acid b = Strength Euclidean Distance to Rank Minimum Is it included in 3-

Durability (kg/square meter) Query Instance Distance Nearest neighbors?
(seconds) (3, 7)

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟕)𝟐 3 Yes

7 7
=𝟒

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟒)𝟐 4 No

7 4
=𝟓

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟑)𝟐 +(𝟕 − 𝟒)𝟐 1 Yes

3 4
=𝟑

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟏)𝟐 +(𝟕 − 𝟒)𝟐 2 Yes

1 4
= 𝟑. 𝟔
CONTD….
4. Gather the category of the nearest neighbors. Notice in the second row last column that the category of
nearest neighbor (Y) is not included because the rank of this data is more than 3 (=K).
X1 = Acid X2 = Strength Euclidean Distance to Rank Is it included in 3- Y = Category
Durability (kg/square Query Instance Minimum Nearest neighbors? of nearest
(seconds) meter) (3, 7) Distance Neighbor

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟕)𝟐 3 Yes Bad

7 7
=𝟒

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟒)𝟐 4 No -

7 4
=𝟓

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟑)𝟐 +(𝟕 − 𝟒)𝟐 1 Yes Good

3 4
=𝟑

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟏)𝟐 +(𝟕 − 𝟒)𝟐 2 Yes Good

1 4
= 𝟑. 𝟔
CONTD….

 5. Use simple majority of the category of nearest neighbors as the prediction value of the
query instance
 We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that
pass laboratory test with a = 3 and b = 7 is included in Good category.
EXAMPLE 2
Question: Consider a dataset having two variables: height(cm) & weight(kg) and each point is classified as
Normal and Underweight
Weight (X1) Height (Y1) Class
51 167 Underweight
62 182 Normal On the basis of the given data we have to classify the
69 176 Normal below set as Normal or Underweight using KNN
64 173 Normal (K=3)
65 172 Normal
56 174 Underweight
58 169 Normal
57 173 Normal
55 170 Normal
APPLICATIONS OF K-NN
The following are some of the areas in which KNN can be applied successfully −
 Banking System
KNN can be used in banking system to predict weather an individual is fit for loan approval? Does
that individual have the characteristics similar to the defaulters one?
 Calculating Credit Ratings
KNN algorithms can be used to find an individual’s credit rating by comparing with the persons having
similar traits.
 Politics
With the help of KNN algorithms, we can classify a potential voter into various classes like “Will
Vote”,“Will not Vote”,“Will Vote to Party ‘PTI’,“Will Vote to Party ‘PMLN’.
 Other areas in which KNN algorithm can be used are Speech Recognition, Handwriting Detection,
Image Recognition and Video Recognition.
ADVANTAGES

 It is very simple algorithm to understand and interpret.

 It is very useful for nonlinear data because there is no assumption about
data in this algorithm.
 It is a versatile algorithm as we can use it for classification as well as
regression.
 It has relatively high accuracy but there are much better supervised
learning models than KNN.
DISADVANTAGES

 It is computationally a bit expensive algorithm because it stores all the

training data.
 High memory storage required as compared to other supervised
learning algorithms.
 It is very sensitive to the scale of data as well as irrelevant features
SUMMARY

 KNN is conceptually simple, yet able to solve complex problems.

 Can work with relatively little information
 Learning is simple (no learning at all)
 Memory and CPU cost
 Feature selection problem
 Sensitive to representation

Lecture 3
No ratings yet
Lecture 3
17 pages
ML-LECTURE9 KNN Classification
No ratings yet
ML-LECTURE9 KNN Classification
23 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Intro to Supervised Learning
No ratings yet
Intro to Supervised Learning
28 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
19 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Class Notes ML1
No ratings yet
Class Notes ML1
115 pages
Class Notes ML1
No ratings yet
Class Notes ML1
111 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Supervised Learning: kNN Explained
No ratings yet
Supervised Learning: kNN Explained
15 pages
Supervised Learning Techniques
No ratings yet
Supervised Learning Techniques
33 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Intro to KNN for Data Science
No ratings yet
Intro to KNN for Data Science
37 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Unit 2
No ratings yet
Unit 2
30 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
k-NN Algorithm Overview & Applications
No ratings yet
k-NN Algorithm Overview & Applications
35 pages
ML KN
No ratings yet
ML KN
12 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
KNN Algorithm Guide with Python
No ratings yet
KNN Algorithm Guide with Python
13 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
Understanding K-Nearest Neighbors (KNN)
No ratings yet
Understanding K-Nearest Neighbors (KNN)
9 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
CMTH642 - Module 10.2 - Classification
No ratings yet
CMTH642 - Module 10.2 - Classification
10 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
ml5 1
No ratings yet
ml5 1
8 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
ML Unit 2 (Ab22)
No ratings yet
ML Unit 2 (Ab22)
61 pages
Distance-Based Methods - KNN
0% (1)
Distance-Based Methods - KNN
8 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
04 KNN
No ratings yet
04 KNN
25 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
Unit 2 1
No ratings yet
Unit 2 1
28 pages
08 - KNN
No ratings yet
08 - KNN
39 pages
08 Classification Using K NN
No ratings yet
08 Classification Using K NN
23 pages
Supervised Learning Techniques Overview
No ratings yet
Supervised Learning Techniques Overview
71 pages
Supervised Learning Methods Guide
No ratings yet
Supervised Learning Methods Guide
34 pages
KNN Classifier for Data Scientists
No ratings yet
KNN Classifier for Data Scientists
16 pages
KNN & Decision Tree Basics
No ratings yet
KNN & Decision Tree Basics
9 pages
K-Nearest Neighbor Algorithm Explained
100% (1)
K-Nearest Neighbor Algorithm Explained
17 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
Classification
No ratings yet
Classification
74 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Ayan Khan - PMCOE Day 12
No ratings yet
Ayan Khan - PMCOE Day 12
5 pages
Physics Lab: Convex Mirror Focal Length
No ratings yet
Physics Lab: Convex Mirror Focal Length
3 pages
6.1 Mean Median Mode and Range
No ratings yet
6.1 Mean Median Mode and Range
21 pages
Texture Profile Analysis
No ratings yet
Texture Profile Analysis
7 pages
Anterior Deprogramming Device Fabrication Using A Thermoplastic Material PDF
No ratings yet
Anterior Deprogramming Device Fabrication Using A Thermoplastic Material PDF
3 pages
EI6002-Power Plant Instrumentation
No ratings yet
EI6002-Power Plant Instrumentation
7 pages
FactoryTalk View SE Lab
No ratings yet
FactoryTalk View SE Lab
2 pages
Lesson Plan English 10
No ratings yet
Lesson Plan English 10
4 pages
Stellar Nucleosynthesis & Evolution
No ratings yet
Stellar Nucleosynthesis & Evolution
14 pages
Pipelined HighSpeed Low Power Neural Network
No ratings yet
Pipelined HighSpeed Low Power Neural Network
7 pages
Reliability Analysis For Refinery Plants: February 2017
100% (1)
Reliability Analysis For Refinery Plants: February 2017
11 pages
Stacbloc Brochure
No ratings yet
Stacbloc Brochure
4 pages
BTS3900 BTS3900A LTE eNodeB Survey Guide-20091015-B-1.0
No ratings yet
BTS3900 BTS3900A LTE eNodeB Survey Guide-20091015-B-1.0
64 pages
Android Debugging Logs
No ratings yet
Android Debugging Logs
107 pages
NIfTI1 Data Format Guide
No ratings yet
NIfTI1 Data Format Guide
4 pages
Finding 1 More and 1 Less: Answers
No ratings yet
Finding 1 More and 1 Less: Answers
20 pages
SBA May Test Draft Ver - 01
No ratings yet
SBA May Test Draft Ver - 01
3 pages
Cive140001 200506 Solutions
No ratings yet
Cive140001 200506 Solutions
12 pages
Amasiri Project PP
No ratings yet
Amasiri Project PP
90 pages
Tumble Dryer T5350
No ratings yet
Tumble Dryer T5350
10 pages
Hotstart, Inc.: Req. For Element Removal 5.13 (130) MAX DIA
No ratings yet
Hotstart, Inc.: Req. For Element Removal 5.13 (130) MAX DIA
2 pages
Leica iCON Build BRO 793504 1120 en LR
No ratings yet
Leica iCON Build BRO 793504 1120 en LR
4 pages
DC Circuits
0% (1)
DC Circuits
11 pages
Statistics For Business and Economics: Bab 12
No ratings yet
Statistics For Business and Economics: Bab 12
34 pages
Caffine Anhyd
No ratings yet
Caffine Anhyd
2 pages
Rudder and Manoeuvring Arrangement Guidelines
No ratings yet
Rudder and Manoeuvring Arrangement Guidelines
14 pages
Florida Drainage Manual Overview
No ratings yet
Florida Drainage Manual Overview
77 pages
Office License Management Tool
No ratings yet
Office License Management Tool
9 pages
Wood Chipping and Screening Guide
100% (1)
Wood Chipping and Screening Guide
39 pages
Resume Anup Mishra
No ratings yet
Resume Anup Mishra
2 pages
2022 G9 Free State P2 Nov Memo
No ratings yet
2022 G9 Free State P2 Nov Memo
13 pages

Supervised Learning & KNN Guide

Uploaded by

Supervised Learning & KNN Guide

Uploaded by

CHAPTER#10

SUPERVISED LEARNING & ITS ALGORITHMS

DEPARTMENT OF SOFTWARE ENGINEERING

 Supervised learning as the name indicates the presence of a supervisor as a teacher.

Supervised learning classified into two categories of algorithms:

K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be

a = Acid Durability b = Strength Y = Classification

a = Acid Durability b = Strength Euclidean Distance to Query Instance

7 7 𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟕)𝟐 = 𝟒

7 4 𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟒)𝟐 = 𝟓

3 4 𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟑)𝟐 +(𝟕 − 𝟒)𝟐 = 𝟑

1 4 𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟏)𝟐 +(𝟕 − 𝟒)𝟐 = 𝟑. 𝟔

a = Acid b = Strength Euclidean Distance to Rank Minimum Is it included in 3-

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟕)𝟐 3 Yes

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟒)𝟐 4 No

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟑)𝟐 +(𝟕 − 𝟒)𝟐 1 Yes

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟏)𝟐 +(𝟕 − 𝟒)𝟐 2 Yes

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟕)𝟐 3 Yes Bad

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟕)𝟐 +(𝟕 − 𝟒)𝟐 4 No -

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟑)𝟐 +(𝟕 − 𝟒)𝟐 1 Yes Good

𝒅 𝒂𝒏 , 𝒃𝒏 = (𝟑 − 𝟏)𝟐 +(𝟕 − 𝟒)𝟐 2 Yes Good

 It is very simple algorithm to understand and interpret.

 It is computationally a bit expensive algorithm because it stores all the

 KNN is conceptually simple, yet able to solve complex problems.

You might also like