Unit-3
Unit-3
● Square Root Rule: A common way to select K is to set it equal to the square root of
the number of data points in the training set.
● Testing Multiple K Values: You can try different K values and choose the one
that works best on validation data.
● Weighted Voting: In cases where some neighbors are much closer than others, you
can give them more weight in the voting process.
Problems :
1.https://www.youtube.com/watch?v=HZT0lxD5h6k&list=PLoE_l1sqGKLRXcbobCaczeu5y4
sk6yKrL&index=2
2.https://www.youtube.com/watch?v=YndCelYQfgs&list=PLoE_l1sqGKLRXcbobCaczeu5y4
sk6yKrL&index=5
3.https://www.youtube.com/watch?v=T0YkfWssHjk&list=PLoE_l1sqGKLRXcbobCaczeu5y4
sk6yKrL&index=6
4.https://www.youtube.com/watch?v=WgSGdukowNA&list=PLoE_l1sqGKLRXcbobCaczeu5
y4sk6yKrL&index=8
5.https://www.youtube.com/watch?v=kCNpLbCUo7g&list=PLoE_l1sqGKLRXcbobCaczeu5y
4sk6yKrL&index=8
Assumptions:
Logistic regression builds on the linear regression equation. It adjusts the linear output to a
probability using the sigmoid function. The basic logistic regression equation is:
Example Problems
1.https://www.youtube.com/watch?v=2C8IqOLO1os
2.https://www.youtube.com/watch?v=UCOm-LFKX9E
A Perceptron is the simplest type of artificial neural network, serving as the foundation for
more complex neural networks. It is a supervised learning algorithm used for binary
classification, which means it classifies input data into one of two categories (e.g., 0 or 1,
Yes or No).
Binary classifiers can be considered as linear classifiers. In simple words, we can
understand it as a classification algorithm that can predict linear predictor function in terms
of weight and feature vectors.
● The Perceptron is a foundational algorithm in Machine Learning and Artificial Neural Networks.
● Invented by Frank Rosenblatt in the mid-20th century, it was designed for binary classification
tasks.
● It is a linear, supervised learning algorithm used for classifying data into two categories.
Key Concepts
● Binary Classifier:
A function that determines whether an input belongs to one of two classes.
Perceptrons create a linear decision boundary for classification tasks.
● Components:
1. Input Layer: Accepts numerical input data.
2. Weights and Bias:
■ Weights: Measure the strength of the connection between input and output.
■ Bias: Allows shifting of the decision boundary.
3. Net Sum: Weighted sum of inputs and bias (∑wi⋅xi+b).
4. Activation Function: Determines the output (e.g., Step, Sign, Sigmoid functions).
How It Works
Learning Process:
● Training: The perceptron adjusts its weights using gradient descent and the
perceptron learning rule until it correctly classifies all training examples.
● Output: It predicts 0 or 1 depending on whether the weighted sum of inputs is
below or above the threshold.
Limitations:
● Perceptrons can only solve linearly separable problems (e.g., classifying data that
can be separated by a straight line).
● They struggle with more complex tasks, which is why multi-layer perceptrons
(MLPs) were developed to solve nonlinear problems.
Applications: Basic binary classification tasks such as spam detection or image recognition
(for simple patterns).
A Single-Layer Perceptron is the simplest type of neural network. It is composed of a single layer
of neurons (or nodes) that process inputs and produce outputs. It’s used for solving binary
classification problems when the data is linearly separable.
Structure
5. Activation Function:
How It Works
1. Forward Pass:
○ Takes inputs, computes the weighted sum, and passes it through the step activation
function to produce an output (0 or 1).
2. Training:
○ Uses the Perceptron Learning Rule:
Limitations
1. Linearity:
○ Can only classify data that is linearly separable (e.g., NOT, AND, OR).
○ Cannot handle non-linear problems like XOR.
2. Fixed Activation:
○ Uses a step function, which is not differentiable, limiting flexibility.
Multi-Layer Perceptron (MLP)
https://www.youtube.com/watch?v=tTjcakAuHPI&list=PL4gu8xQu0_5JBO1FKRO5p20wc8DprlOgn&ind
ex=94
A Multi-Layer Perceptron (MLP) is a type of neural network that consists of an input layer, one or
more hidden layers, and an output layer. It is capable of solving non-linear problems, making it
more powerful than the single-layer perceptron.
Structure of MLP
1. Input Layer:
○ Accepts the input features (x1,x2,...,xn) from the dataset.
2. Hidden Layers:
○ One or more layers of neurons between the input and output layers.
○ Each neuron applies weights, biases, and a non-linear activation function (e.g., Sigmoid,
ReLU).
○ These layers enable the MLP to learn complex relationships.
3. Output Layer:
○ Produces the final output (e.g., class probabilities or regression values).
How It Works
1. Forward Propagation:
2. Backward Propagation:
● Error (difference between predicted and actual output) is calculated using a loss function (e.g.,
Mean Squared Error, Cross-Entropy Loss).
● The error is propagated backward through the network using the chain rule to compute
gradients.
● Weights are updated using gradient descent or its variants (e.g., SGD, Adam):
1.https://www.youtube.com/watch?v=tUoUdOdTkRw&list=PL4gu8xQu0_5JBO1FKRO5p20wc8DprlOgn
&index=88
2.https://www.youtube.com/watch?v=n2L1J5JYgUk&list=PL4gu8xQu0_5JBO1FKRO5p20wc8DprlOgn&
index=89
3.https://www.youtube.com/watch?v=AWhboi1aTxI&list=PL4gu8xQu0_5JBO1FKRO5p20wc8DprlOgn&i
ndex=90
2. ReLU (Rectified Linear Unit): Outputs max(0,x) helps avoid vanishing gradients.
3. Tanh: Outputs values between -1 and 1, useful for symmetric outputs.
Applications of MLP
1. Classification:
○ Handwritten digit recognition (e.g., MNIST dataset).
○ Sentiment analysis (positive/negative classification).
2. Regression:
○ Predicting housing prices.
3. Clustering and Feature Extraction:
○ Learning representations of data for tasks like anomaly detection.
4. Complex Tasks:
○ Image recognition, speech recognition, and natural language processing (with deep
extensions).
● XOR is not linearly separable, but MLP solves it using two hidden neurons:
1. Hidden neurons learn to split the input space into non-linear regions.
2. The output layer combines these regions to form the XOR logic.
Advantages
Limitations
perceptron:
● https://www.youtube.com/watch?v=ItkSCYzSD34&list=PL4gu8xQu0_5JBO1FKRO5p20wc8Dprl
Ogn&index=91
● https://www.youtube.com/watch?v=du5fyS44DR8&list=PL4gu8xQu0_5JBO1FKRO5
p20wc8DprlOgn&index=76
● https://www.youtube.com/watch?v=zmIzNBMsQYQ&list=PL4gu8xQu0_5JBO1FKRO
5p20wc8DprlOgn&index=77
● https://www.youtube.com/watch?v=uxmNDNb0u9A&list=PL4gu8xQu0_5JBO1FKRO
5p20wc8DprlOgn&index=78
● https://www.youtube.com/watch?v=dM_8Y41EgsY&list=PL4gu8xQu0_5JBO1FKRO
5p20wc8DprlOgn&index=7
Support Vector Machines (SVM) is a powerful supervised machine learning algorithm used for
classification and regression tasks, though it is more commonly applied to classification
problems. SVM works by finding a hyperplane that best divides a dataset into different classes.
The goal of SVM is to find a decision boundary (or hyperplane) that maximizes the margin
between data points of different classes. The margin is the distance between the hyperplane and
the closest data points from either class, known as support vectors. The idea is that the wider
the margin, the better the generalization ability of the model.
● Linear SVM: If the data is linearly separable, a straight line (in 2D), a plane (in 3D),
or a hyperplane (in higher dimensions) can separate the two classes.
● Non-linear SVM: If the data is not linearly separable, SVM can still perform well by
transforming the data into a higher-dimensional space where a linear hyperplane can
be used for separation (through the kernel trick).
Margin:The margin is the distance between the decision boundary (hyperplane) and the closest
data points from either class. SVM aims to maximize this margin to improve the model's
generalization ability. A larger margin reduces the risk of overfitting.
Support Vectors:Support vectors are the data points that are closest to the decision boundary
(hyperplane). These points are critical in defining the position and orientation of the hyperplane.
They are the most important data points for constructing the model.
Hyperplane:A hyperplane is a decision boundary that separates data points of different classes.
In 2D, it is a line; in 3D, a plane; and in higher dimensions, it becomes a hyperplane.
Linear SVM
When the data is linearly separable, SVM tries to find the hyperplane that best separates the two
classes. The hyperplane can be represented as:
Where:
Objective:
● Maximize the margin, which is defined as the distance from the hyperplane to the
nearest data points (support vectors).
● The distance of a data point xifrom the hyperplane is given by:
Non-linear SVM and Kernel Trick
In many real-world problems, the data is not linearly separable. In these cases, SVM uses the
kernel trick to map the data into a higher-dimensional feature space where it becomes possible
to find a linear hyperplane that separates the classes.
Using the kernel trick, the SVM's decision boundary in the higher-dimensional space is a
hyperplane, which is equivalent to a non-linear decision boundary in the original input space.
SVM is inherently a binary classifier, but it can be extended to multi-class classification problems
using strategies like:
● One-vs-One (OvO): Create a classifier for every pair of classes. If there are k
classes, then the total number of classifiers is k(k−1)/2
● One-vs-Rest (OvR): Create a classifier for each class, where each classifier
distinguishes one class from all other classes.
Advantages of SVM
Disadvantages of SVM
● Computationally intensive: SVM can be slow for large datasets because of the
quadratic optimization problem it involves.
● Choice of kernel and hyperparameters: The performance of SVM is highly
dependent on the correct selection of kernel, C, and other hyperparameters.
Applications:
SVM models can be sensitive to hyperparameters, and improper selection can lead to poor
performance. Key hyperparameters include:
● C: Regularization parameter that controls the trade-off between margin size and
classification error.
● Kernel: Type of kernel to use (e.g., linear, RBF, polynomial).
● Gamma (for RBF kernel): A parameter that controls the influence of a single training
example on the decision boundary.
Techniques like grid search or random search can be used to tune these hyperparameters and
find the optimal configuration. Cross-validation is also essential to evaluate the model's
generalization ability.