0% found this document useful (0 votes)

9 views42 pages

Slide 2 ML Basics

The document discusses K Nearest Neighbors (KNN) classification, detailing its implementation steps, distance metrics, and the importance of feature scaling. It also covers the characteristics of the KNN model, hyperparameters to tune, and the distinction between regression and classification tasks in machine learning. Additionally, it addresses the requirements for an ML model, including hypothesis and cost functions, as well as the impact of noise and outliers.

Uploaded by

JOBIN Wilson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views42 pages

Slide 2 ML Basics

Uploaded by

JOBIN Wilson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

24CSA524: Machine Learning

Remya Rajesh
K Nearest Neighbors Classification
SCATTER PLOT
Points from the visualization (Scatter plot)
• Two dimensions are the two features of a
dataset(number_of_malignant_nodes, age)
• Target: coloured – survived (blue), Did not Survive (red)
• number_of_malignant_nodes – range of values – (0,25]
• Age – range of values – (0,60]
• Each point in the plot is corresponding to a patient.
• Number of points(50) = Number of patients(50) in the dataset
• Each patient is identified by the values corresponding to the two
features (number_of_malignant_nodes, age)
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
What is Needed to Select a KNN Model?

•Correct value for 'K'

•How to measure closeness of neighbors?
Decision Boundary
Measurement of Distance
Euclidean Distance (L2 Distance)
Euclidean Distance (L2 Distance)
Manhattan Distance (L1 or City Block Distance)
KNN for Classification
• Load the data
• Preprocess the data
• Choose the value of K and define the distance metric
• Compute distances between the test point and all training points using the
chosen distance metric.
• Sort the distances in ascending order.
• Select the K nearest neighbors (smallest distances).
• Vote for the most frequent class among the K neighbors (majority rule).
• Assign the class label of the majority as the prediction.
• Evaluate the model
• Optimize the model
For regression
• Compute distances between the test point and all training points.
• Sort the distances in ascending order.
• Select the K nearest neighbors.
• Calculate the average (or weighted average) of the target values of
the K neighbors.
• Use the computed value as the prediction.
Feature Scaling is important
Comparison of Feature Scaling Methods

•Standard Scaler: mean center data and

scale to unit variance v−
v' =
A

 A

•Minimum-Maximum Scaler: scale data to

fixed range (usually 0–1)
v − minA
v' = (new _ maxA − new _ minA) + new _ minA
maxA − minA
Python

•NumPy, SciPy, Pandas: numerical computation

•Matplotlib, Seaborn:data visualization
•Scikit-learn: machine learning
Example:
Import the class containing the scaling method
from sklearn.preprocessing import StandardScaler
Create an instance of the class
StdSc= StandardScaler()
Fit the scaling parameters and then transform the data
StdSc= StdSc.fit(X_data)
X_scaled= StdSc.transform(X_data)

Other scaling method: MinMaxScaler

K Nearest Neighbors: The Syntax

Import the class containing the classification method

from sklearn.neighbors import KNeighborsClassifier
Create an instance of the class
KNN= KNeighborsClassifier(n_neighbors=3)
Fit the instance on the data and then predict the expected value
KNN= KNN.fit(X_data, y_data)
y_predict= KNN.predict(X_data)

Regression can be done with KNeighborsRegressor

Characteristics of KNN model

• KNN is a non-parametric algorithm (no model parameters)

•Fast to create model because it simply stores
data
•Slow to predict because many distance
calculations
•Can require lots of memory if data set is large
Hyperparameters to Tune

• K: Number of neighbors.
• Distance metric (e.g., Euclidean, Manhattan, etc.).
• Weighting scheme (uniform vs. distance-based) , w = 1/distance
• Neighbor search algorithm (brute force, k-d tree, ball tree).

• Extra points: Use of TreeSet in Java

What We Talk About When We Talk
About“Learning”
• Learning general models from a data of particular
examples
• Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
• Build a model that is a good and useful
approximation/representation to the data!!
• Describe/ Summarize data in the form of a model

25
Learning: Knowledge iterates to improve

prior Learning knowledge

knowledge

Data/ Additional Data

26
Regression vs Classification
• regression: if 𝑦 ∈ ℝ is a continuous variable
• e.g., price prediction
• classification: the label is a discrete variable
• e.g., the task of predicting the types of residence

(living room size, parking area size) → mansion or villa?

𝑦 = mansion or
villa?
Algorithm

Model

Data
Neural Network model
Training and Test Splits
Using training and test data
Using training and test data
Train and Test Splitting: The Syntax
• Import the train and test split function
from sklearn.model_selection import train_test_split
• Split the data and put 30% into the test set
train, test = train_test_split(data, test_size=0.3)
Requirements for an ML Model
• Hypothesis Function - represents the mathematical model that maps
input features (X) to output predictions (Y). Different models have
different hypothesis functions.
• Examples:

• Cost Function - represents how well the hypothesis function fits the
data. It quantifies the error between predicted and actual values.
Different models have different cost functions.
Supervised Learning
Classification
• Example: Loan
payment
• Differentiating
between low-risk
and high-risk
customers from
their income and
savings

Discriminant: IF income > θ1 AND savings > θ2

THEN low-risk ELSE high-risk
36
Class C
(p1  price  p2 ) AND (e1  engine power  e2 )
Is a class rule for positive
examples

37
Hypothesis class H – set of all possible rectangles

Choose hypothesis h that

predicts well on unseen
examples (“test set”)

 1 if h says x is positive
h( x) = 
0 if h says x is negative

Generalization – How well the

hypothesis will classify unseen data not
part of the training set
38
Example:
Price Engine Power Y H(X)

10,000,00 150 1 0
20,000,00 192 0 1
15,000,00 170 1 1
19,000,00 187 0 0

Empirical Error of h –
Proportion of training
instances which don’t
match the required value
Noise and Outliers
Noise – due to wrong data collection, wrong labelling, due to other
hidden (latent) attributes not considered here
Outliers – Extreme cases

40
Linear Regression
Triple Trade-Off

• There is a trade-off between three factors

(Dietterich, 2003):
1. Complexity C of H ,
2. Training set size, N,
3. Generalization error, Er, on new data
 As N Er
 As C(H) first Er and then Er

Mastering Omaha8 Poker by Krieger LouTenner Mark Z
100% (2)
Mastering Omaha8 Poker by Krieger LouTenner Mark Z
319 pages
ITP For Fire Fighting System
33% (3)
ITP For Fire Fighting System
7 pages
Floor Function - Titu Andreescu, Dorin Andrica - MR 2006 PDF
0% (1)
Floor Function - Titu Andreescu, Dorin Andrica - MR 2006 PDF
5 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
ML 4 (1)
No ratings yet
ML 4 (1)
33 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Lesson 4 - Supervised Learning
No ratings yet
Lesson 4 - Supervised Learning
36 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
ML UNIT-2
No ratings yet
ML UNIT-2
33 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
ml unit2
No ratings yet
ml unit2
38 pages
Unit 5 Learning with Algorithm
No ratings yet
Unit 5 Learning with Algorithm
7 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
No ratings yet
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
93 pages
SL
No ratings yet
SL
30 pages
KNN
No ratings yet
KNN
29 pages
UNIT-3
No ratings yet
UNIT-3
100 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
cs4302-lecture2
No ratings yet
cs4302-lecture2
40 pages
Unit-5
No ratings yet
Unit-5
73 pages
Unit4_PPT
No ratings yet
Unit4_PPT
118 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Lecture 02 - KNN and ML Basics
No ratings yet
Lecture 02 - KNN and ML Basics
33 pages
L4
No ratings yet
L4
37 pages
ML unit-2 (CEC)
No ratings yet
ML unit-2 (CEC)
96 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Lecture 2 Final
No ratings yet
Lecture 2 Final
90 pages
ML U4
No ratings yet
ML U4
48 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Data Science for Civil Engineering Unit 4 Notes
No ratings yet
Data Science for Civil Engineering Unit 4 Notes
18 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
105 pages
Lect 1
No ratings yet
Lect 1
24 pages
Week 4 v1.1 (hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (hidden) - Supervised Learning (Classification)
43 pages
03 Supervised Classification
No ratings yet
03 Supervised Classification
68 pages
DSV ia2
No ratings yet
DSV ia2
18 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Lecture7 KNN
No ratings yet
Lecture7 KNN
40 pages
-Updated K-Nearest Neighbors in Machine Learning
No ratings yet
-Updated K-Nearest Neighbors in Machine Learning
11 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
ML Unit 2 r20 Jntuk
No ratings yet
ML Unit 2 r20 Jntuk
34 pages
Week 09 Lesson 1 Intro Machine Learning 1 to 32 (4)
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 to 32 (4)
61 pages
Nearest Neighbour
No ratings yet
Nearest Neighbour
25 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
ML04_KNN-SVM_2024-2025
No ratings yet
ML04_KNN-SVM_2024-2025
57 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
ML Notes
100% (2)
ML Notes
125 pages
ML-KN
No ratings yet
ML-KN
12 pages
Data Science Unit 3 (1) - Copy
No ratings yet
Data Science Unit 3 (1) - Copy
33 pages
CSE445 NSU Week_5
No ratings yet
CSE445 NSU Week_5
26 pages
Classification
No ratings yet
Classification
74 pages
ML.4-Classification Techniques (Week 5,6,7)
No ratings yet
ML.4-Classification Techniques (Week 5,6,7)
56 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Machine Learning unit 3
No ratings yet
Machine Learning unit 3
40 pages
New Classification and Regression Models
No ratings yet
New Classification and Regression Models
7 pages
ML-MID1-MYANS
No ratings yet
ML-MID1-MYANS
24 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
09a52102 Aerodynamics II PDF
No ratings yet
09a52102 Aerodynamics II PDF
6 pages
314313-ESTIMATING-COSTING-AND-VALUATION
No ratings yet
314313-ESTIMATING-COSTING-AND-VALUATION
9 pages
Managing Knowledge
No ratings yet
Managing Knowledge
52 pages
Descriptive Text
No ratings yet
Descriptive Text
12 pages
Assignment - I: Fundamentals of Interior Designing
100% (1)
Assignment - I: Fundamentals of Interior Designing
9 pages
Milk Tea
100% (1)
Milk Tea
14 pages
The Rose
100% (1)
The Rose
3 pages
Turbocompounding Technology
No ratings yet
Turbocompounding Technology
23 pages
Questions Module 1
No ratings yet
Questions Module 1
2 pages
Hypress Ficha Tecnica Mang R1 PDF
No ratings yet
Hypress Ficha Tecnica Mang R1 PDF
1 page
CMA Inter OMSM DJB Updated For Dec 24 & June 25-2
No ratings yet
CMA Inter OMSM DJB Updated For Dec 24 & June 25-2
272 pages
Practices and Challenges of Continuous Assessment
100% (2)
Practices and Challenges of Continuous Assessment
13 pages
Pestel Analysis On FMCG Industry:: Political Factors
No ratings yet
Pestel Analysis On FMCG Industry:: Political Factors
6 pages
Topic 6 - Leases (IFRS 16) (Eng) - SV
No ratings yet
Topic 6 - Leases (IFRS 16) (Eng) - SV
67 pages
Aadhar Kendra From Pune Municipal Corporation - Pune Municipal Corporation
No ratings yet
Aadhar Kendra From Pune Municipal Corporation - Pune Municipal Corporation
4 pages
Euglenophyta: The Euglena: History and Classifications
No ratings yet
Euglenophyta: The Euglena: History and Classifications
4 pages
What Is A Cell - MedlinePlus Genetics
No ratings yet
What Is A Cell - MedlinePlus Genetics
3 pages
Lesson 4 Operations On Functions
No ratings yet
Lesson 4 Operations On Functions
17 pages
Uttarakhand Climate Change Effects On Bridge Infrastructure
No ratings yet
Uttarakhand Climate Change Effects On Bridge Infrastructure
17 pages
Mapwork Calculations
No ratings yet
Mapwork Calculations
15 pages
Vitrinite Reflectance
No ratings yet
Vitrinite Reflectance
33 pages
Microsoft Office For Mac Shortcuts
No ratings yet
Microsoft Office For Mac Shortcuts
3 pages
Understand The Flows of Significant Classes of Transactions, Including Walk-Through - Cash Disbursements
No ratings yet
Understand The Flows of Significant Classes of Transactions, Including Walk-Through - Cash Disbursements
11 pages
Final 26, Yl5 Writing
No ratings yet
Final 26, Yl5 Writing
4 pages
Cat Is Kinda Sussy Baka
No ratings yet
Cat Is Kinda Sussy Baka
9 pages
PDF Created With Pdffactory Pro Trial Version
No ratings yet
PDF Created With Pdffactory Pro Trial Version
16 pages
Bosch GBH 2 26dre
No ratings yet
Bosch GBH 2 26dre
3 pages

Slide 2 ML Basics

Uploaded by

Slide 2 ML Basics

Uploaded by

24CSA524: Machine Learning

•Correct value for 'K'

•Standard Scaler: mean center data and

•Minimum-Maximum Scaler: scale data to

•NumPy, SciPy, Pandas: numerical computation

Other scaling method: MinMaxScaler

Import the class containing the classification method

Regression can be done with KNeighborsRegressor

• KNN is a non-parametric algorithm (no model parameters)

• Extra points: Use of TreeSet in Java

prior Learning knowledge

Data/ Additional Data

(living room size, parking area size) → mansion or villa?

Discriminant: IF income > θ1 AND savings > θ2

Choose hypothesis h that

Generalization – How well the

• There is a trade-off between three factors

You might also like