Machine Learning Juunit2.pdf Lands

Uploaded by

sahugungun76

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views7 pages

Machine Learning Juunit2.pdf Lands

Uploaded by

sahugungun76

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Machine Learning Machine Learning

Unit-2
Regularization, bias, and variance
Overfitting and underfitting are two common problems encountered in machine learning
when building predictive models. They both relate to the ability of a model to generalize
well to unseen data. Regularization, bias, and variance are fundamental concepts in machine learning that are
closely related to each other and play a crucial role in model performance and
1. Overfitting: generalization.
 Overfitting occurs when a model learns the training data too well, capturing 1. Regularization:
noise and random fluctuations in the data rather than the underlying  Regularization is a technique used to prevent overfitting in machine
patterns. As a result, an overfitted model performs very well on the training learning models by adding a penalty term to the model's objective function.
data but poorly on new, unseen data. This penalty discourages overly complex models by penalizing large
 Characteristics of an overfitted model include excessively complex decision parameter values.
boundaries or relationships between features and target variables.  The purpose of regularization is to find a balance between fitting the
 Overfitting often happens when a model is too flexible or has too many training data well and avoiding overfitting.
parameters relative to the amount of training data available.  Common regularization techniques include L1 regularization (Lasso), L2
 Common remedies for overfitting include using simpler models, reducing regularization (Ridge), and elastic net regularization, which combine L1
the complexity of the model (e.g., by decreasing the number of features or and L2 penalties.
parameters), or applying regularization techniques.  Regularization helps to improve a model's generalization performance by
discouraging overly complex models that may perform well on the training
2. Underfitting: data but poorly on new, unseen data.
 Underfitting occurs when a model is too simplistic to capture the 2. Bias:
underlying structure of the data. In other words, the model fails to learn the  Bias refers to the error introduced by approximating a real-world problem
patterns in the training data and performs poorly both on the training data with a simplified model. It represents the difference between the average
and new, unseen data. prediction of the model and the true value being predicted.
 Characteristics of an underfitted model include high training error and high  High bias models are overly simplistic and may fail to capture the
test error, indicating that the model is not capturing the underlying underlying patterns in the data. These models often underfit the training
relationships present in the data. data and perform poorly on both the training and test datasets.
 Underfitting often happens when a model is too simple or lacks the capacity  Examples of high bias models include linear regression with few features or
to represent the complexity of the data adequately. low polynomial degrees.
 Common remedies for underfitting include using more complex models, 3. Variance:
increasing the number of features, or enhancing the model's capacity to  Variance refers to the amount by which the model's prediction would
capture the underlying patterns in the data. change if it were trained on a different dataset. It measures the sensitivity of
the model to fluctuations in the training data.
To summarize, overfitting and underfitting represent two extremes in model performance,  High variance models are overly complex and tend to fit the training data
with overfitting occurring when a model is too complex and learns noise in the data, and too closely, capturing noise and random fluctuations in the data. These
underfitting occurring when a model is too simplistic and fails to capture the underlying models often overfit the training data and perform well on the training
patterns. Achieving an appropriate balance between model complexity and generalization dataset but poorly on new, unseen data.
is crucial for building effective machine learning models. Regularization techniques,  Examples of high variance models include decision trees with deep
cross-validation, and monitoring performance on validation or test datasets are essential branches or high polynomial degrees in polynomial regression.
strategies for mitigating overfitting and underfitting. Understanding the trade-off between bias and variance is essential in machine learning
model selection and training. High bias models may benefit from increased model
complexity or feature engineering to capture more complex patterns in the data, while
high variance models may benefit from regularization techniques to reduce complexity
Machine Learning Machine Learning

and improve generalization. depending on the distribution of the features:

In summary, regularization helps to control model complexity and prevent overfitting,  Gaussian Naive Bayes: Assumes that numerical features follow a Gaussian
bias refers to the error introduced by model simplification, and variance refers to the (normal) distribution.
model's sensitivity to fluctuations in the training data. Achieving an appropriate balance  Multinomial Naive Bayes: Suitable for features that represent counts or
between bias and variance is crucial for building machine learning models that generalize frequencies, often used in text classification.
well to new, unseen data.  Bernoulli Naive Bayes: Appropriate for binary features, where features
represent presence or absence of certain characteristics.
Naive Bayes 6. Classification Decision: Once the posterior probabilities for each class are
calculated, Naive Bayes assigns the class with the highest posterior probability as
the predicted class for the given instance.
Naive Bayes is a popular and simple probabilistic classification algorithm based on 7. Advantages:
Bayes' theorem with the "naive" assumption of feature independence. Despite its  Simple and easy to implement.
simplicity, Naive Bayes often performs surprisingly well in many real-world  Fast training and prediction.
classification tasks and is widely used in text classification, spam filtering, and other  Works well with high-dimensional data.
applications. Here's an explanation of how Naive Bayes works:  Performs well even with small training datasets.
1. Bayes' Theorem: Bayes' theorem is a fundamental theorem in probability theory 8. Limitations:
that describes the probability of a hypothesis given the evidence:  The assumption of feature independence may not hold true in real-world
�(�∣�)=�(�∣�)×�(�)�(�)P(H∣E)=P(E)P(E∣H)×P(H) datasets.
Where:  May suffer from the "zero-frequency" problem when encountering unseen
 �(�∣�)P(H∣E) is the posterior probability of hypothesis �H given features during prediction.
evidence �E.  Sensitivity to irrelevant features.
 �(�∣�)P(E∣H) is the likelihood of observing evidence �E given Despite its simplifying assumptions, Naive Bayes often performs surprisingly well in
hypothesis �H. practice and serves as a baseline model for many classification tasks in machine learning.
 �(�)P(H) is the prior probability of hypothesis �H.
 �(�)P(E) is the probability of observing evidence �E.
2. Naive Assumption of Feature Independence: Naive Bayes assumes that all Support Vector Machine (SVM)
features are conditionally independent given the class label. In other words, the
presence or absence of a particular feature is assumed to be unrelated to the
presence or absence of any other feature, given the class label. Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification and regression tasks. The primary objective of SVM is to find a hyperplane
3. Classification: Given a set of features �1,�2,...,��x1,x2,...,xn and a class label
in an N-dimensional space (where N is the number of features) that distinctly separates
��Ck, Naive Bayes calculates the posterior probability of each class given the
data points belonging to different classes.
features using Bayes' theorem:
�(��∣�1,�2,...,��)∝�(��)×∏�=1��(��∣��)P(Ck∣x1,x2,...,xn)∝P(Ck Here's a step-by-step explanation of how SVM works:
)×∏i=1nP(xi∣Ck) 1. Data Preparation: SVM works well with both linear and non-linear data. For
Where: linear data, it's essential to scale the features to ensure that they are all in the same
 �(��)P(Ck) is the prior probability of class ��Ck. range. For non-linear data, you might need to use a kernel trick (e.g., polynomial
 �(��∣��)P(xi∣Ck) is the likelihood of observing feature ��xi given kernel, radial basis function kernel) to transform the data into a higher-
class ��Ck. dimensional space where it can be linearly separated.
4. Model Training: To train a Naive Bayes classifier, it estimates the prior
probabilities �(��)P(Ck) and the class-conditional probabilities 2. Hyperplane: A hyperplane in an N-dimensional space is an (N-1)-dimensional
�(��∣��)P(xi∣Ck) from the training data. plane that separates the data points belonging to different classes. In a binary
5. Types of Naive Bayes: There are different types of Naive Bayes classifiers classification problem, the hyperplane is a line in 2D space, a plane in 3D space,
Machine Learning Machine Learning

and a hyperplane in higher-dimensional space. vectors. These points determine the hyperplane's position and orientation.

3. Maximizing Margin: SVM aims to find the hyperplane that maximizes the  Kernel Trick: If the data is not linearly separable, SVM can use a kernel trick
margin between the two classes. The margin is the distance between the (e.g., polynomial kernel, RBF kernel) to transform the data into a higher-
hyperplane and the closest data points (called support vectors) from each class. dimensional space where it can be separated linearly.
Maximizing the margin helps improve the generalization of the model.
 Classification: To classify a new animal, SVM checks which side of the line the
4. Support Vectors: Support vectors are the data points closest to the hyperplane. animal lies on. If it's on the positive side, the animal is classified as a cat; if it's on
These points are crucial for determining the hyperplane's position and orientation. the negative side, it's classified as a dog.
Only the support vectors contribute to defining the hyperplane, while other data
points can be ignored.

5. Kernel Trick: In cases where the data is not linearly separable, SVM uses a
kernel trick to map the data into a higher-dimensional space where it can be
separated linearly. Commonly used kernels include polynomial kernel, radial basis
function (RBF) kernel, and sigmoid kernel.

6. Classification: To classify a new data point, SVM checks which side of the
hyperplane the point lies on. If it's on the positive side, the point belongs to one
class; if it's on the negative side, it belongs to the other class.

Here's an example of SVM in action:

Suppose you have a dataset of cats and dogs, where each data point has two features:
weight and height. The goal is to classify new animals as either cats or dogs based on
their weight and height.

 Data Preparation: Scale the features (weight and height) to ensure they are in the
same range.

 Hyperplane: In this case, the hyperplane is a line in 2D space that separates cats
from dogs.

 Maximizing Margin: SVM finds the line that maximizes the margin between cats
and dogs.

 Support Vectors: The data points closest to the hyperplane are the support
Machine Learning Machine Learning

5. Kernel Matrix: To use the kernel trick, we need to compute the kernel matrix,
which is a matrix where each element (i, j) is the result of applying the kernel
Here's a figure to illustrate the concept of SVM:
function to the ith and jth data points. The kernel matrix can be used to compute
In the figure, the hyperplane is the dashed line that separates the two classes (blue and red
circles). The support vectors are the data points closest to the hyperplane (the filled the inner products between all pairs of data points.
circles on the dashed line). The margin is the distance between the hyperplane and the
support vectors. SVM aims to maximize this margin while correctly classifying the data 6. SVM in High-Dimensional Space: In the high-dimensional space, the SVM
points. algorithm finds the hyperplane that separates the data points belonging to
different classes. The hyperplane is defined by a set of weights (coefficients) and
a bias term, just like in the original input space.
Kernel methods
7.Classification: To classify a new data point, the SVM algorithm computes the
Kernel methods are a class of algorithms used in machine learning for various inner product between the new data point and each support vector (data points
tasks, including classification, regression, and unsupervised learning. They are closest to the hyperplane) in the high-dimensional space. The sign of the sum of
particularly popular in the context of Support Vector Machines (SVMs) for these inner products determines the class of the new data point.
classification tasks. The kernel method allows SVMs to implicitly operate in a
high-dimensional feature space without explicitly computing the transformation, The kernel method is particularly useful when dealing with non-linear data, as it
thereby avoiding the computational burden associated with high-dimensional allows SVM to find a hyperplane that separates the data points in the original
data. input space, even when they are not linearly separable. This makes SVM a
powerful algorithm for a wide range of classification tasks.
Here's a step-by-step explanation of how the kernel method works in SVM:

1. Linear Separability: SVM is a binary classification algorithm that aims to find a

hyperplane that separates data points belonging to different classes. In a linearly
separable case, the hyperplane is a line in 2D space, a plane in 3D space, or a
hyperplane in higher-dimensional space.
2. Kernel Function: A kernel function is a mathematical function that computes the
inner product of two vectors in a high-dimensional space. Commonly used kernel
functions include:
 Linear Kernel: k(x, y) = x^T y
 Polynomial Kernel: k(x, y) = (x^T y + c)^d
 Radial Basis Function (RBF) Kernel: k(x, y) = exp(-γ||x - y||^2)
3. Inner Product: The kernel function essentially computes the inner product of two
vectors in the high-dimensional space. The inner product is a measure of
similarity between two vectors.
4. Kernel Trick: Instead of explicitly transforming the data into a high-dimensional
space and computing the inner product, the kernel trick allows us to compute the
inner product directly in the original input space using the kernel function.
Machine Learning Machine Learning

Decision Trees features and target variables.

Decision Trees are a popular and intuitive model used for both classification and 8. Disadvantages: Decision trees are prone to overfitting, especially with complex
regression tasks in machine learning. They are easy to understand and interpret, making data, and can be sensitive to small variations in the data. They also tend to create
them a valuable tool for many applications. biased trees when the class distribution is imbalanced.

1. Decision Tree Structure: A decision tree is a tree-like structure where each

internal node represents a "decision" based on the value of an input feature, and 9. Ensemble Methods: To address the limitations of individual decision trees,
each leaf node represents the outcome (class label or value) of the decision tree. ensemble methods like Random Forest and Gradient Boosting are often used.
The topmost node in the tree is called the "root," and the branches represent the These methods combine multiple decision trees to improve predictive performance
decision rules. and robustness.

2. Decision Tree Learning: The process of constructing a decision tree involves Decision trees are a versatile and powerful tool in machine learning, and understanding
recursively partitioning the input space into smaller regions based on the values of their structure and learning process can help in building effective models for various
the input features. The goal is to create "pure" regions where all data points belong tasks.
to the same class or have the same value.

3. Splitting Criteria: At each node, the decision tree algorithm selects a feature and
a threshold value to split the data into two child nodes. The splitting criteria aim to
maximize the "purity" of the resulting nodes, typically measured using metrics like
Gini impurity or information gain.

4. Stopping Criteria: The decision tree algorithm continues to grow the tree until a
stopping criterion is met, such as reaching a maximum depth, minimum number of
samples per node, or no further improvement in purity.

5. Predictions: To make predictions, the decision tree algorithm follows the decision
rules from the root node to the leaf node that corresponds to the input data's
feature values. The output at the leaf node is the predicted class label or value.

6. Example: Suppose we have a dataset of housing prices with features like square
footage, number of bedrooms, and location. The goal is to predict the price of a
house based on these features. A decision tree might have a root node that splits
the data based on the square footage (e.g., if square footage > 2000, go left;
otherwise, go right). The left child node might further split the data based on the
number of bedrooms, and so on, until we reach leaf nodes with predicted prices.

7. Advantages: Decision trees are easy to understand and interpret, can handle both
numerical and categorical data, and can capture non-linear relationships between
Machine Learning Machine Learning
Machine Learning Machine Learning

emsemble methods-pages-deleted
No ratings yet
emsemble methods-pages-deleted
2 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
DL_Unit1 (1)
100% (1)
DL_Unit1 (1)
79 pages
All DL
No ratings yet
All DL
72 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
1729585037_ML11_Generalization
No ratings yet
1729585037_ML11_Generalization
40 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
5.3 Model
No ratings yet
5.3 Model
26 pages
ML & DL
No ratings yet
ML & DL
19 pages
Underfitting and Overfitting
No ratings yet
Underfitting and Overfitting
4 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
Machine Learning-2
No ratings yet
Machine Learning-2
87 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
9 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
Deep Learning[1]
No ratings yet
Deep Learning[1]
26 pages
Diagnosing Bias vs Variance
No ratings yet
Diagnosing Bias vs Variance
11 pages
Chapter 1-ML
No ratings yet
Chapter 1-ML
27 pages
Machine Leafning
No ratings yet
Machine Leafning
5 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Unit 4
No ratings yet
Unit 4
50 pages
Data Science-Unit-4- 05.10.23
No ratings yet
Data Science-Unit-4- 05.10.23
59 pages
unit-1.2-Perceptron-2024
No ratings yet
unit-1.2-Perceptron-2024
107 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
module 3 modified
No ratings yet
module 3 modified
48 pages
Complete ML Concepts
No ratings yet
Complete ML Concepts
30 pages
ml UNIT 4 notes
No ratings yet
ml UNIT 4 notes
30 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
U&O Fitting
No ratings yet
U&O Fitting
6 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Ensemble Learning
No ratings yet
Ensemble Learning
46 pages
Classification
No ratings yet
Classification
53 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
Machine Learning Notes "2023
No ratings yet
Machine Learning Notes "2023
31 pages
LECTURE - 1
No ratings yet
LECTURE - 1
35 pages
ML HAND WRITTEN NOTES
No ratings yet
ML HAND WRITTEN NOTES
19 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
Overfitting & Feature Engineering.pptx
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
4 - Bias-Variance Tradeoff
No ratings yet
4 - Bias-Variance Tradeoff
28 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
Unit1 ML
No ratings yet
Unit1 ML
15 pages
Overfitting and Underfitting in Machine Learning
No ratings yet
Overfitting and Underfitting in Machine Learning
3 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
21 pages
DM assignment 2
No ratings yet
DM assignment 2
23 pages
Data analysis ch1
No ratings yet
Data analysis ch1
13 pages
Bias and Variance in Machine Learning
No ratings yet
Bias and Variance in Machine Learning
3 pages
Unit 2(P1)
No ratings yet
Unit 2(P1)
15 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Underfitting and Overfitting Slides and Transcript
No ratings yet
Underfitting and Overfitting Slides and Transcript
13 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
194 pages
Bias_and_Variance
No ratings yet
Bias_and_Variance
4 pages
Machine Learning Math Essentials _12.02.2025
No ratings yet
Machine Learning Math Essentials _12.02.2025
88 pages
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Introduction To Ai
No ratings yet
Introduction To Ai
3 pages
Machine Learning Assignment-1
No ratings yet
Machine Learning Assignment-1
7 pages
ML 5th
No ratings yet
ML 5th
8 pages
Unit 4 Agile
No ratings yet
Unit 4 Agile
7 pages
ML Unit 1'
No ratings yet
ML Unit 1'
13 pages
Air Pollution
No ratings yet
Air Pollution
20 pages
Individual Assignment 1 - STID3034
No ratings yet
Individual Assignment 1 - STID3034
6 pages
Yazan_Waqfi_Paper_Published_2022
No ratings yet
Yazan_Waqfi_Paper_Published_2022
21 pages
Lecture2-Linear Regression With One Variable
No ratings yet
Lecture2-Linear Regression With One Variable
49 pages
The Transformative Power of Generative AI - Reword
No ratings yet
The Transformative Power of Generative AI - Reword
26 pages
LSTM.pptx
No ratings yet
LSTM.pptx
11 pages
Artificial Intelligece DT Term Paper Final
No ratings yet
Artificial Intelligece DT Term Paper Final
20 pages
B.tech CSE 2019 Scheme Syllabi v0.9
No ratings yet
B.tech CSE 2019 Scheme Syllabi v0.9
29 pages
Lecture 7
No ratings yet
Lecture 7
60 pages
Support Vector Machine
No ratings yet
Support Vector Machine
7 pages
How To Boost Your Business
No ratings yet
How To Boost Your Business
13 pages
AI-ETAM - Copy - Docx E09450d67aa0d993
No ratings yet
AI-ETAM - Copy - Docx E09450d67aa0d993
25 pages
Literature Survey
No ratings yet
Literature Survey
16 pages
Machine Learning Based Modeling For Solid Oxide Fuel Cells Power Performance Prediction
No ratings yet
Machine Learning Based Modeling For Solid Oxide Fuel Cells Power Performance Prediction
6 pages
Glimpses_Data-in-The-Age-of-AI
No ratings yet
Glimpses_Data-in-The-Age-of-AI
24 pages
CS0302_ArtificialIntelligence_&_Expert_Systems
No ratings yet
CS0302_ArtificialIntelligence_&_Expert_Systems
6 pages
Ailet 2012 (Nlu D) Answer Key
No ratings yet
Ailet 2012 (Nlu D) Answer Key
1 page
(2022개정)2025년 공통영어1 Ne능률(민병천) 4과 예상문제 4회
No ratings yet
(2022개정)2025년 공통영어1 Ne능률(민병천) 4과 예상문제 4회
9 pages
Battery-2030 Roadmap Version2.0
100% (1)
Battery-2030 Roadmap Version2.0
82 pages
AL project
No ratings yet
AL project
39 pages
Self-Deception Explained - Jordan B. Peterson
100% (1)
Self-Deception Explained - Jordan B. Peterson
64 pages
Unit-5 Mahout
0% (1)
Unit-5 Mahout
26 pages
PA 5 UNIT
No ratings yet
PA 5 UNIT
35 pages
312303-PROGRAMMING IN C
No ratings yet
312303-PROGRAMMING IN C
8 pages
XVR5108HS-I3_V3_datasheet_20241112
No ratings yet
XVR5108HS-I3_V3_datasheet_20241112
4 pages
Quantitative Research
No ratings yet
Quantitative Research
12 pages
Wags Umi Distinguished Masters Thesis Award
100% (3)
Wags Umi Distinguished Masters Thesis Award
5 pages
1.1 Motivation
No ratings yet
1.1 Motivation
65 pages
HPCL Engineer (Mechanical) Official Paper (Held On_ 18 Aug, 2024)
No ratings yet
HPCL Engineer (Mechanical) Official Paper (Held On_ 18 Aug, 2024)
62 pages
DAY-1 PPT of Advanced Programming Lab-II
No ratings yet
DAY-1 PPT of Advanced Programming Lab-II
55 pages
Eyegaze Fixation and Attention Prediciton
No ratings yet
Eyegaze Fixation and Attention Prediciton
6 pages

Machine Learning Juunit2.pdf Lands

Uploaded by

Machine Learning Juunit2.pdf Lands

Uploaded by

Machine Learning Machine Learning

and improve generalization. depending on the distribution of the features:

Here's an example of SVM in action:

1. Linear Separability: SVM is a binary classification algorithm that aims to find a

Decision Trees features and target variables.

1. Decision Tree Structure: A decision tree is a tree-like structure where each

You might also like