0% found this document useful (0 votes)

192 views18 pages

Unit 3 ML

Supervised learning is a machine learning technique where models are trained on labeled data to understand the relationship between inputs and outputs. It includes two main types: regression for numeric predictions and classification for categorical predictions. Techniques like Decision Trees, Naive Bayes, and Support Vector Machines are commonly used in supervised learning for various applications.

Uploaded by

shubhrajkumar707

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

192 views18 pages

Unit 3 ML

Uploaded by

shubhrajkumar707

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Supervised Learning Kya Hota Hai?

Supervised learning ek aisa machine learning technique hai jisme:

Humare paas data hota hai jiska answer already known hota hai (yeh answer ko hum label bolte hain).
Model ko training dete hain ki woh input (X) se output (Y) ka relation samjhe.

📦 Example:
Agar humare paas kuch photos hain (input) aur unke sath likha hai ki kaunsi dog ki hai, kaunsi cat ki (label), toh hum model ko train karte hain ki woh naye photo
ko dekhke bata sake ki dog hai ya cat.

✅ Supervised Learning Techniques ke 2 main types hote hain:

1. Regression (जब output number ho)

📊 Jab aapko numeric value predict karni ho.
📌 Examples:
Ghar ka price predict karna
Temperature estimate karna
Student ka marks predict karna

🧠 Common Regression Techniques:

Linear Regression – straight line wali relationship
Polynomial Regression – curve wali relationship
Ridge/Lasso Regression – regularized regression (overfitting se bachata hai)

2. Classification (जब output category ho)

📚 Jab aapko category ya class predict karni ho.
📌 Examples:
Email spam hai ya nahi
Photo mein dog hai ya cat
Patient ko disease hai ya nahi

🧠 Common Classification Techniques:

Logistic Regression – yes/no prediction
K-Nearest Neighbors (KNN) – nearby points se predict karta hai
Decision Tree – tree structure follow karta hai
Random Forest – bohot saare decision trees milke decision lete hain
Support Vector Machine (SVM) – best boundary banata hai classes ke beech
Naive Bayes – probability based hota hai (zyada fast)

⚙️ Supervised Learning Process – Easy Steps

1. Data Collection: Labeled data collect karo
2. Data Splitting: Data ko Training aur Testing mein divide karo
3. Model Training: Training data se model sikhao
4. Model Testing: Test data pe model ko evaluate karo
5. Prediction: Naye data pe prediction karo

📈 Bonus: Accuracy Check Karne ke Tarike

Classification ke liye: Accuracy, Precision, Recall, F1-Score, Confusion Matrix
Regression ke liye: Mean Absolute Error (MAE), Mean Squared Error (MSE), R² Score
🌳 Decision Tree in Machine Learning – For Exam

✅ Definition:
A Decision Tree is a supervised learning algorithm used for both classification and regression tasks.
It splits the data into smaller parts based on certain conditions (features), forming a tree-like
structure with decision rules.

🧩 Structure of a Decision Tree:

1. Root Node – The first/main decision (based on one feature)
2. Internal Nodes – Represent the decisions (conditions) based on features
3. Branches – Show the outcome of the decision (Yes/No, True/False)
4. Leaf Nodes – Final output or result (class label or value)

🔍 How it works:
The algorithm selects the best feature to split the data.
It uses criteria like:
Gini Index
Entropy
Information Gain
It continues splitting until all data is classified or the tree reaches a stopping point.

📊 Example:
Predict if a person will play cricket based on weather:

[Weather?]
/ \
Sunny Rainy
| |
Play [Windy?]
/ \
Yes No
Don’t Play Play
⭐ Advantages:
Easy to understand and interpret
No need for much data preprocessing
Can handle both numerical and categorical data
Good for small datasets

⚠️ Disadvantages:
Can overfit the data (too many branches)
Sensitive to small changes in data
Complex trees are hard to interpret

🧠 Important Terms (for theory questions):

Term Meaning

Entropy Measures impurity/disorder in data

Information Gain How much entropy is reduced after a split

Gini Index Measures data purity like entropy (used in CART trees)

📌 Applications:
Medical diagnosis
Email spam filtering
Customer segmentation
Weather prediction
Loan approval

✍️ Sample Answer (write this in exam):

A Decision Tree is a tree-structured algorithm used in supervised learning for classification and regression
tasks. It works by splitting the data based on certain conditions (features) and making decisions at each node.
The tree continues splitting until it reaches a final output at the leaf node. It uses criteria like Gini Index,
Entropy, and Information Gain to find the best splits. Decision Trees are easy to understand but may overfit the
data if not pruned properly.
🔶 1. Information Gain (IG):
Socho tumhare paas ek basket hai jisme alag-alag fruits hain: aam (mango) aur seb (apple).
Ab tum chahte ho ki basket ko is tarah divide karo ki har group mein sirf ek hi type ka fruit ho.
Isme tum check karte ho ki kya koi question (ya feature) aisa hai jo ye clear divide de sakta hai?

📌 Definition in Easy Words:

Information Gain batata hai ki:

"Kitna confusion (impurity) kam hua ek particular feature (question) ko use karke?"

💡 Real-Life Analogy:
Tum class mein ho. Agar tum ek question puchte ho jaise:

"Jo bacche science mein ache hain unko ek side karo."

Agar is question se clearly 2 group ban jaate hain:

Ek group: science toppers
Dusra group: average students

Toh Information Gain zyada hai → kyunki tumne clearly divide kar diya.

🔢 Formula:

Information Gain = Entropy (Before Split) – Weighted Entropy (After Split)

Entropy matlab kitna randomness/confusion hai.

🔷 2. Gini Index:
Yeh bhi ek tarah ka impurity checker hai.

📌 Definition in Easy Words:

Gini Index batata hai ki:

"Ek group kitna mix hai?"

Agar ek group mein sab same category ke hain → Gini low (best).
Agar sab mix-mix hain → Gini high (bad).

💡 Real-Life Analogy:
Ek box mein sirf aam ho to Gini = 0 → perfect.
Agar aam, seb, kela sab mix hain → Gini high.

🔢 Formula:

Gini Index = 1 - (p₁² + p₂² + ... + pₙ²)

jahan p₁, p₂, ... har class ka probability hai.

🤔 Difference Between IG & Gini:

Feature Information Gain Gini Index

Based on Entropy (log based) Probability (square based)

Goal Confusion kitna kam hua? Group kitna shuddh hai?

Value Higher = Better Lower = Better

Use ID3 Algorithm CART Algorithm

📌 Exam ke liye 2-3 Lines likhne layak:

Information Gain ek measure hai jo batata hai ki kisi feature se data ko split karne ke baad kitna randomness kam hota hai. Ye entropy ke concept par based hota
hai.
Gini Index batata hai ki ek group kitna pure hai. Agar group mein sab same class ke hain to Gini 0 hota hai. Ye CART algorithm mein use hota hai.
📘 Naive Bayes Algorithm – For Exams (Point by Point)

✅ 1. Definition:
Naive Bayes is a supervised learning classification algorithm based on Bayes Theorem.
It assumes that features are independent of each other, which is why it is called "Naive".

✅ 2. Bayes Theorem:
Naive Bayes is built on Bayes’ Theorem of probability:

P (B∣A) ⋅ P (A)
P (A∣B) =
P (B)

P (A∣B): Probability of A given B (posterior probability)

P (B∣A): Probability of B given A (likelihood)
P (A): Probability of A (prior probability)
P (B): Probability of B (evidence)

✅ 3. Naive Assumption (Why "Naive"?):

It naively assumes that all features (input variables) are independent of each other.
In real-world data, this is rarely true, but the algorithm still works well.

✅ 4. Working Steps:
1. Convert the data into a frequency table (for categorical) or use mean/variance (for continuous).
2. Calculate Prior Probability for each class (e.g., spam or not spam).
3. Calculate Likelihood for each feature given the class.
4. Apply Bayes’ Theorem to calculate posterior probability for each class.
5. Choose the class with the highest posterior probability as the prediction.

✅ 5. Types of Naive Bayes:

Type Description

Multinomial Naive Bayes Used for text classification (like spam detection).
Assumes word counts.

Bernoulli Naive Bayes Used when features are binary (0 or 1, yes or no).

Gaussian Naive Bayes Used for continuous data assuming normal

distribution (like age, salary).
✅ 6. Applications:
Email spam filtering
Sentiment analysis
Document classification
Medical diagnosis
Real-time predictions

✅ 7. Advantages:
Simple and easy to implement
Works well with high-dimensional data
Fast and efficient for large datasets
Performs well even with less training data

✅ 8. Disadvantages:
Assumes independence of features (which is rarely true)
May not perform well with correlated features
Probability estimates can be poor if data is sparse

✅ 9. Simple Example (for understanding):

Let’s say we want to classify whether a message is Spam or Not Spam based on the words it contains:
Words like “free”, “win”, “money” are common in spam messages.
Naive Bayes will:
Count how many times “free” appears in spam vs non-spam.
Calculate the probability of “Spam” given “free”.
Repeat for all words and pick the class with highest probability.

✅ 10. Key Points for Exam:

Based on Bayes Theorem
Assumes independence between features
Used for classification problems
Simple, fast, and works well with large datasets
Three types: Multinomial, Bernoulli, Gaussian
📘 Support Vector Machines (SVM) – For Classification Problems (Exam-Oriented)

✅ 1. Definition:
Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification (and sometimes regression).
It aims to find the optimal boundary (hyperplane) that best separates data into different classes.

✅ 2. Goal of SVM:
To find a decision boundary (hyperplane) that separates classes with the maximum margin, i.e., the largest possible distance between the boundary and the
nearest data points from each class.

✅ 3. Important Terminologies:
Term Explanation

Hyperplane A line (in 2D), a plane (in 3D), or higher-dimensional surface

that separates data into classes

Support Vectors The data points that lie closest to the hyperplane and influence
its position

Margin The distance between the hyperplane and the nearest support
vectors from either class

Kernel A mathematical function that transforms data into a higher

dimension to make it linearly separable

✅ 4. Types of SVM:
1. Linear SVM
Used when the data is linearly separable, i.e., classes can be divided by a straight line or plane.
2. Non-Linear SVM
Used when the data is not linearly separable.
In this case, SVM uses a kernel function to project data into a higher-dimensional space where it can be separated linearly.

✅ 5. Kernel Functions (for Non-Linear SVM):

Kernel Type Description

Linear Kernel Used for linearly separable data

Polynomial Kernel Maps data to a higher dimension using polynomial

functions

Radial Basis Function (RBF) / Gaussian Widely used; works well with non-linear data

Sigmoid Kernel Similar to neural networks; less commonly used

✅ 6. Working of SVM (Step-by-Step):

1. Take input data with labels (classes).
2. Identify the best hyperplane that separates the classes.
3. Calculate the margin for different possible hyperplanes.
4. Choose the one with maximum margin.
5. Use support vectors (nearest points) to define the final hyperplane.
6. For non-linear data, apply a kernel function to transform it into a higher dimension.
7. Use this model to classify new/unseen data points.

✅ 7. Example:
Let’s say you want to classify emails as “Spam” or “Not Spam” based on keywords.
SVM will analyze the training data, find a boundary that separates spam emails from non-spam emails, and then use that boundary to classify future emails.

✅ 8. Applications of SVM:
Text classification (e.g., spam detection)
Handwriting recognition
Face and object detection
Medical diagnosis
Bioinformatics (e.g., gene classification)

✅ 9. Advantages of SVM:
Effective in high-dimensional spaces
Works well with clear margin of separation
Robust to overfitting, especially in high-dimensional data
Suitable for both linear and non-linear classification

✅ 10. Disadvantages of SVM:

Not suitable for large datasets (slow training)
Performance is sensitive to choice of kernel and parameters
Doesn’t perform well with noisy or overlapping data
Less efficient when the number of features >> number of samples

✅ 11. Key Points to Write in Exam (Short Answer Format):

SVM is a supervised classification algorithm based on maximum margin separation.
It finds an optimal hyperplane that best separates different classes.
The closest data points to the hyperplane are called support vectors.
For non-linear data, SVM uses kernel functions to transform data.
SVM is useful in text classification, image recognition, and bioinformatics.
The main goal of SVM is to maximize the margin between the two classes. The larger the
margin the better the model performs on new and unseen data.

Key Concepts of Support Vector Machine

Hyperplane: A decision boundary separating different classes in feature space and is
represented by the equation wx + b = 0 in linear classification.
Support Vectors: The closest data points to the hyperplane, crucial for determining the
hyperplane and margin in SVM.
Margin: The distance between the hyperplane and the support vectors. SVM aims to
maximize this margin for better classification performance.
Kernel: A function that maps data to a higher-dimensional space enabling SVM to handle
non-linearly separable data.
Hard Margin: A maximum-margin hyperplane that perfectly separates the data without
misclassifications.
Soft Margin: Allows some misclassifications by introducing slack variables, balancing
margin maximization and misclassification penalties when data is not perfectly separable.
C: A regularization term balancing margin maximization and misclassification penalties. A
higher C value forces stricter penalty for misclassifications.
Hinge Loss: A loss function penalizing misclassified points or margin violations and is
combined with regularization in SVM.
Dual Problem: Involves solving for Lagrange multipliers associated with support vectors,
facilitating the kernel trick and efficient computation.
Random Forest Algorithm in
Machine Learning
Last Updated : 23 Jul, 2025

Random Forest is a machine learning algorithm that uses many decision trees to make better predictions. Each tree looks
at different random parts of the data and their results are combined by voting for classification or averaging for regression.
This helps in improving accuracy and reducing errors.

Working of Random Forest Algorithm

Create Many Decision Trees: The algorithm makes many decision trees each using a random part of the data. So
every tree is a bit different.
Pick Random Features: When building each tree it doesn’t look at all the features (columns) at once. It picks a few
at random to decide how to split the data. This helps the trees stay different from each other.
Each Tree Makes a Prediction: Every tree gives its own answer or prediction based on what it learned from its
part of the data.
Combine the Predictions:
For classification we choose a category as the final answer is the one that most trees agree on i.e majority
voting.
For regression we predict a number as the final answer is the average of all the trees predictions.
Why It Works Well: Using random data and features for each tree helps avoid overfitting and makes the overall
prediction more accurate and trustworthy.

Random forest is also a ensemble learning technique which you can learn more about from: Ensemble Learning

Key Features of Random Forest

Handles Missing Data: It can work even if some data is missing so you don’t always need to fill in the gaps yourself.
Shows Feature Importance: It tells you which features (columns) are most useful for making predictions which
helps you understand your data better.
Works Well with Big and Complex Data: It can handle large datasets with many features without slowing down
or losing accuracy.
Used for Different Tasks: You can use it for both classification like predicting types or labels
and regression like predicting numbers or amounts.

Assumptions of Random Forest

Each tree makes its own decisions: Every tree in the forest makes its own predictions without relying on others.
Random parts of the data are used: Each tree is built using random samples and features to reduce mistakes.
Enough data is needed: Sufficient data ensures the trees are different and learn unique patterns and variety.
Different predictions improve accuracy: Combining the predictions from different trees leads to a more
accurate final result.
📘 Linear Regression – For Regression Problems

✅ 1. Definition:
Linear Regression is a supervised machine learning algorithm used for solving regression problems.
It models the relationship between a dependent variable (output) and one or more independent variables (input) using a straight
line.

✅ 2. Objective:
To find the best-fit straight line that predicts the value of the dependent variable based on the independent variable(s).

✅ 3. Types of Linear Regression:

Type Description

Simple Linear Regression One independent variable (e.g., height vs weight)

Multiple Linear Regression Two or more independent variables (e.g., house price vs
size, location, number of rooms)

✅ 4. Equation of Linear Regression:

🔹 Simple Linear Regression:
Y = a + bX
Where:
Y = Predicted value (dependent variable)
X = Input value (independent variable)
a = Intercept (value of Y when X = 0)
b = Slope (rate of change in Y for one unit change in X)

🔹 Multiple Linear Regression:

Y = a + b 1 X1 + b 2 X2 + ⋯ + b n Xn

✅ 5. Working Steps:
1. Collect input (X) and output (Y) data.
2. Fit a line that minimizes the error between actual and predicted values.
3. Use least squares method to find the best values of a and b.
4. Predict new values of Y based on new inputs.
✅ 6. Use Case Examples:
Problem Regression Type

Predicting house price Multiple Linear Regression

Predicting student marks from study hours Simple Linear Regression

Predicting salary based on experience Simple Linear Regression

✅ 7. Evaluation Metrics (for checking performance):

Metric Meaning

MSE (Mean Squared Error) Average of squared prediction errors

RMSE (Root Mean Squared Error) Square root of MSE

R² (R-squared score) How well the model fits the data (1 = perfect fit)

✅ 8. Advantages:
Simple to understand and implement
Fast and efficient
Works well when there is a linear relationship between variables
Good for baseline regression tasks

✅ 9. Disadvantages:
Only works when relationship is linear
Sensitive to outliers
Doesn’t work well with non-linear data
Assumes independence and no multicollinearity among variables

✅ 10. Exam-Ready 3–4 Line Answer:

Linear Regression is a supervised learning algorithm used for predicting numerical (continuous) values.
It finds the best-fit straight line to model the relationship between the dependent and independent variable(s).
It is used in applications like house price prediction, salary estimation, and student score forecasting
📘 Ordinary Least Squares (OLS) Regression – For Exams

✅ 1. Definition:
Ordinary Least Squares (OLS) Regression is a method used in linear regression to find the best-fitting line by minimizing the sum of
the squared errors (differences between predicted and actual values).

✅ 2. Objective:
To estimate the regression line in such a way that the sum of squared residuals is as small as possible.

This is why it's called "least squares" method.

✅ 3. Equation of Linear Regression (used in OLS):

Y = a + bX + e
Where:
Y = Dependent variable (output)
X = Independent variable (input)
a = Intercept (value of Y when X = 0)
b = Slope (change in Y per unit change in X)
e = Error term (residual)

✅ 4. Residual (Error):
Residual = Actual value (Y) − Predicted value (Ŷ)
OLS tries to minimize the sum of the squares of these residuals:

Minimize: ∑(Yi − Y^i )2

✅ 5. Working Steps of OLS Regression:

1. Take dataset with input (X) and output (Y).
2. Fit a linear equation Y = a + bX to the data.
3. Calculate the residuals (errors).
4. Use calculus to minimize the total squared error.
5. Find optimal values of a and b using formulas:

∑(Xi − Xˉ )(Yi − Yˉ )
b= and a = Yˉ − bXˉ

∑(Xi − Xˉ )2

✅ 6. Assumptions of OLS:
1. Linearity – Relationship between X and Y is linear
2. Independence – Observations are independent of each other
3. Homoscedasticity – Constant variance of errors
4. Normality – Errors are normally distributed
5. No multicollinearity – In multiple regression, independent variables are not highly correlated

✅ 7. Evaluation Metrics (for performance):

Metric Description

R² (R-squared) Measures how well the model explains the variation in

output

MSE (Mean Squared Error) Average of squared residuals

RMSE Square root of MSE (gives error in same units as output)

✅ 8. Applications:
Predicting house prices
Estimating salary based on experience
Forecasting sales based on marketing spend
Modeling relationships in economics or finance

✅ 9. Advantages of OLS:
Simple to understand and easy to compute
Provides best linear unbiased estimates (BLUE) under assumptions
Efficient when assumptions hold true

✅ 10. Disadvantages:
Sensitive to outliers
Assumes linear relationship
Performance declines if assumptions (like homoscedasticity, normality) are violated
Not suitable for non-linear problems without transformation

✅ **11. Exam-Ready Summary (3–4 lines):

Ordinary Least Squares (OLS) is a method in linear regression that estimates the best-fit line by minimizing the sum of squared
residuals (errors between actual and predicted values).
It gives accurate and unbiased results under specific statistical assumptions like linearity, independence, and homoscedasticity.
OLS is widely used in prediction and data modeling tasks.
Logistic Regression in
Machine Learning
Last Updated : 23 Jul, 2025

Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear
regression which predicts continuous values it predicts the probability that an input belongs to a specific class. It
is used for binary classification where the output can be one of two possible categories such as Yes/No,
True/False or 0/1. It uses sigmoid function to convert inputs into a probability value between 0 and 1. In this
article, we will see the basics of logistic regression and its core concepts.

Types of Logistic Regression

Logistic regression can be classified into three main types based on the nature of the dependent variable:

1. Binomial Logistic Regression: This type is used when the dependent variable has only two possible
categories. Examples include Yes/No, Pass/Fail or 0/1. It is the most common form of logistic regression and
is used for binary classification problems.
2. Multinomial Logistic Regression: This is used when the dependent variable has three or more possible
categories that are not ordered. For example, classifying animals into categories like "cat," "dog" or "sheep." It
extends the binary logistic regression to handle multiple classes.
3. Ordinal Logistic Regression: This type applies when the dependent variable has three or more
categories with a natural order or ranking. Examples include ratings like "low," "medium" and "high." It takes
the order of the categories into account when modeling.

Assumptions of Logistic Regression

Understanding the assumptions behind logistic regression is important to ensure the model is applied correctly,
main assumptions are:

1. Independent observations: Each data point is assumed to be independent of the others means there
should be no correlation or dependence between the input samples.
2. Binary dependent variables: It takes the assumption that the dependent variable must be binary, means
it can take only two values. For more than two categories SoftMax functions are used.
3. Linearity relationship between independent variables and log odds: The model assumes a
linear relationship between the independent variables and the log odds of the dependent variable which
means the predictors affect the log odds in a linear way.
4. No outliers: The dataset should not contain extreme outliers as they can distort the estimation of the logistic
regression coefficients.
5. Large sample size: It requires a sufficiently large sample size to produce reliable and stable results.

Understanding Sigmoid Function

1. The sigmoid function is a important part of logistic regression which is used to convert the raw output of the
model into a probability value between 0 and 1.

2. This function takes any real number and maps it into the range 0 to 1 forming an "S" shaped curve called the
sigmoid curve or logistic curve. Because probabilities must lie between 0 and 1, the sigmoid function is perfect
for this purpose.

3. In logistic regression, we use a threshold value usually 0.5 to decide the class label.

If the sigmoid output is same or above the threshold, the input is classified as Class 1.
If it is below the threshold, the input is classified as Class 0.

This approach helps to transform continuous input values into meaningful class predictions.

How does Logistic Regression work?

Logistic regression model transforms the linear regression function continuous value output into categorical value
output using a sigmoid function which maps any real-valued set of independent variables input into a value
between 0 and 1. This function is known as the logistic function.

Suppose we have input features represented as a matrix:

⎡ x11 ... x1m ⎤

x21 ... x2m

⋮ ⋱ ⋮
⎣ xn1 ... xnm ⎦

and the dependent variable is Y having only binary value i.e 0 or 1.

0 if Class 1
Y ={
1 if Class 2

then, apply the multi-linear function to the input variables X.

n
z = (∑i=1 wi xi ) + b

Here xi is the ith observation of X, wi = [w1 , w2 , w3 , ⋯ , wm ] is the weights or Coefficient and bis the bias term

also known as intercept. Simply this can be represented as the dot product of weight and bias.

z =w⋅X +b
At this stage, z is a continuous value from the linear regression. Logistic regression then applies the sigmoid
function to z to convert it into a probability between 0 and 1 which can be used to predict the class.

Now we use the sigmoid function where the input will be z and we find the probability between 0 and 1. i.e.
predicted y.
1
σ(z) = 1+e−z

Sigmoid function

As shown above the sigmoid function converts the continuous variable data into the probability i.e between 0 and
1.
σ(z) tends towards 1 as z→∞
σ(z) tends towards 0 as z → −∞
σ(z) is always bounded between 0 and 1

where the probability of being a class can be measured as:

P (y = 1) = σ(z)
P (y = 0) = 1 − σ(z)

Logistic Regression Equation and Odds:

It models the odds of the dependent event occurring which is the ratio of the probability of the event to the
probability of it not occurring:
p(x)
1−p(x) = ez
Taking the natural logarithm of the odds gives the log-odds or logit:

p(x)
log [ ]=z
1 − p(x)

p(x)
log [ ]=w⋅X +b
1 − p(x)

p(x)
= ew⋅X+b ⋯ Exponentiate both sides
1 − p(x)

p(x) = ew⋅X+b ⋅ (1 − p(x))

p(x) = ew⋅X+b − ew⋅X+b ⋅ p(x))

p(x) + ew⋅X+b ⋅ p(x)) = ew⋅X+b
p(x)(1 + ew⋅X+b ) = ew⋅X+b
ew⋅X+b
p(x) =
1 + ew⋅X+b

then the final logistic regression equation will be:

ew⋅X +b 1
p(X; b, w) = 1+ew⋅X +b

= 1+e−w⋅X +b

This formula represents the probability of the input belonging to Class 1.

Likelihood Function for Logistic Regression

The goal is to find weights w and bias b that maximize the likelihood of observing the data.
For each data point i
for y = 1, predicted probabilities will be: p(X;b,w) =p(x)
for y = 0 The predicted probabilities will be: 1-p(X;b,w) = 1 − p(x)

n
L(b, w) = ∏i=1 p(xi )yi (1 − p(xi ))1−yi

Taking natural logs on both sides:

n
log(L(b, w)) = ∑ yi log p(xi ) + (1 − yi ) log(1 − p(xi ))

i=1
n
= ∑ yi log p(xi ) + log(1 − p(xi )) − yi log(1 − p(xi ))

i=1
n n
p(xi )
= ∑ log(1 − p(xi )) + ∑ yi log

1 − p(xi

i=1 i=1

n n
= ∑ − log 1 − e−(w⋅xi +b) + ∑ yi (w ⋅ xi + b)

i=1 i=1
n n
= ∑ − log 1 + ew⋅xi +b + ∑ yi (w ⋅ xi + b)

i=1 i=1

This is known as the log-likelihood function.

Gradient of the log-likelihood function

To find the best w and b we use gradient ascent on the log-likelihood function. The gradient with respect to each
weight wj is:

n n
∂J (l(b, w) 1 w⋅xi +b
=−∑ e xij + ∑ yi xij

∂wj 1 + ew⋅xi +b

i=n i=1

n n

= − ∑ p(xi ; b, w)xij + ∑ yi xij

i=n i=1
n
= ∑ (yi − p(xi ; b, w))xij

i=n

Terminologies involved in Logistic Regression

Here are some common terms involved in logistic regression:

1. Independent Variables: These are the input features or predictor variables used to make predictions
about the dependent variable.
2. Dependent Variable: This is the target variable that we aim to predict. In logistic regression, the
dependent variable is categorical.
3. Logistic Function: This function transforms the independent variables into a probability between 0 and 1
which represents the likelihood that the dependent variable is either 0 or 1.
4. Odds: This is the ratio of the probability of an event happening to the probability of it not happening. It differs
from probability because probability is the ratio of occurrences to total possibilities.
5. Log-Odds (Logit): The natural logarithm of the odds. In logistic regression, the log-odds are modeled as a
linear combination of the independent variables and the intercept.
6. Coefficient: These are the parameters estimated by the logistic regression model which shows how strongly
the independent variables affect the dependent variable.
7. Intercept: The constant term in the logistic regression model which represents the log-odds when all
independent variables are equal to zero.
8. Maximum Likelihood Estimation (MLE): This method is used to estimate the coefficients of the logistic
regression model by maximizing the likelihood of observing the given data.

Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Machine Learning for Heart Failure Prediction
No ratings yet
Machine Learning for Heart Failure Prediction
15 pages
ML Interview Ques
No ratings yet
ML Interview Ques
12 pages
Machine Learning: Basics of Learning, Fit Issues
No ratings yet
Machine Learning: Basics of Learning, Fit Issues
13 pages
2023-24 ML Notes 2
No ratings yet
2023-24 ML Notes 2
16 pages
Data Mining Classifiers Overview
No ratings yet
Data Mining Classifiers Overview
57 pages
Inductive Learning in AI Explained
No ratings yet
Inductive Learning in AI Explained
45 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Unit 3 by GPT
No ratings yet
Unit 3 by GPT
10 pages
AIch 5
No ratings yet
AIch 5
50 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
51 pages
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
No ratings yet
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
26 pages
ML Unit II - Final
No ratings yet
ML Unit II - Final
138 pages
ES335
No ratings yet
ES335
22 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
37 pages
DS ML CompleteSlides PDF
No ratings yet
DS ML CompleteSlides PDF
211 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
34 pages
ML Exam Solutions
No ratings yet
ML Exam Solutions
6 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
Machine Learning Concepts and Questions
No ratings yet
Machine Learning Concepts and Questions
7 pages
Ia2 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia2 ML Scheme Common To Is, Ai, Cs
8 pages
Learning Concepts and Techniques Overview
No ratings yet
Learning Concepts and Techniques Overview
7 pages
Understanding Decision Trees in Machine Learning
No ratings yet
Understanding Decision Trees in Machine Learning
143 pages
Chapter Five Learning
No ratings yet
Chapter Five Learning
50 pages
ML
No ratings yet
ML
18 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
9 pages
Types of Machine Learning Models Explained
No ratings yet
Types of Machine Learning Models Explained
48 pages
Supervised Learning: Classification vs. Regression
No ratings yet
Supervised Learning: Classification vs. Regression
5 pages
Classification Algorithms
No ratings yet
Classification Algorithms
31 pages
Data Mining Techniques and History
No ratings yet
Data Mining Techniques and History
43 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
21 pages
WEEK 5 Machine Learning
No ratings yet
WEEK 5 Machine Learning
8 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Classification Personal
No ratings yet
Classification Personal
36 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Machine Learning Types and Applications
No ratings yet
Machine Learning Types and Applications
8 pages
Nonlinear Models in Supervised Learning
No ratings yet
Nonlinear Models in Supervised Learning
30 pages
ML Imp Que
No ratings yet
ML Imp Que
57 pages
Confusion Matrix and Classification Metrics
No ratings yet
Confusion Matrix and Classification Metrics
45 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
625 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Comprehensive Guide to Machine Learning
No ratings yet
Comprehensive Guide to Machine Learning
19 pages
Data Mining and Machine Learning Concepts
No ratings yet
Data Mining and Machine Learning Concepts
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Supervised Learning: Concepts & Algorithms
No ratings yet
Supervised Learning: Concepts & Algorithms
88 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
12 pages
Classification Techniques in Machine Learning
No ratings yet
Classification Techniques in Machine Learning
16 pages
6th - SEM Machine Learning Notes PDF
100% (1)
6th - SEM Machine Learning Notes PDF
36 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
AI ML Concepts
No ratings yet
AI ML Concepts
97 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
SubjectVerbAgreement-2
No ratings yet
SubjectVerbAgreement-2
5 pages
10information Gain Aur Gini Index
No ratings yet
10information Gain Aur Gini Index
2 pages
Coa Unit 4
No ratings yet
Coa Unit 4
33 pages
4subset Selection in ML
No ratings yet
4subset Selection in ML
1 page
6-4-Doubly Linked List Tutorial
No ratings yet
6-4-Doubly Linked List Tutorial
17 pages
2subset Selection in ML
No ratings yet
2subset Selection in ML
1 page
ML Unit 2 Hindi
No ratings yet
ML Unit 2 Hindi
16 pages
Linked Lists: Uses, Pros, and Cons
No ratings yet
Linked Lists: Uses, Pros, and Cons
2 pages
9-Inference in AI
No ratings yet
9-Inference in AI
2 pages
8-Resolution Theorem Proving
No ratings yet
8-Resolution Theorem Proving
2 pages
Prolog
No ratings yet
Prolog
17 pages
Understanding Circular Linked Lists
No ratings yet
Understanding Circular Linked Lists
19 pages
Multiply Polynomials with Linked Lists in C++
No ratings yet
Multiply Polynomials with Linked Lists in C++
2 pages
7-Data Types in C - Derived and Modifiers
No ratings yet
7-Data Types in C - Derived and Modifiers
1 page
Add Polynomials Using Linked Lists
No ratings yet
Add Polynomials Using Linked Lists
4 pages
Understanding Abstract Data Types
No ratings yet
Understanding Abstract Data Types
5 pages
Understanding Data Structures Explained
No ratings yet
Understanding Data Structures Explained
8 pages
4-Greedy Best First Search Algorithm
0% (1)
4-Greedy Best First Search Algorithm
3 pages
A* Search Algorithm Guide
No ratings yet
A* Search Algorithm Guide
4 pages
Advanced Data Analysis Techniques
No ratings yet
Advanced Data Analysis Techniques
27 pages
1-Hill Climbing in Artificial Intelligence
No ratings yet
1-Hill Climbing in Artificial Intelligence
5 pages
RP 4
No ratings yet
RP 4
18 pages
DTH Customer Churn Prediction Model
No ratings yet
DTH Customer Churn Prediction Model
28 pages
Validity and Reliability of The Kannada Version Of.15
No ratings yet
Validity and Reliability of The Kannada Version Of.15
7 pages
Parkinson's Disease Regression Analysis
No ratings yet
Parkinson's Disease Regression Analysis
8 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
5 pages
(Ebook PDF) Discovering Statistics Using IBM SPSS Statistics 4th Download
100% (2)
(Ebook PDF) Discovering Statistics Using IBM SPSS Statistics 4th Download
55 pages
Workplace Friendship and Organizational Level
No ratings yet
Workplace Friendship and Organizational Level
16 pages
1 PDF
No ratings yet
1 PDF
44 pages
Manuscript For Publication
No ratings yet
Manuscript For Publication
38 pages
Data Literacy Tips for Qlik Users
No ratings yet
Data Literacy Tips for Qlik Users
23 pages
Bivariate Statistics
No ratings yet
Bivariate Statistics
17 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
Forecast Generation and Evaluation Techniques
No ratings yet
Forecast Generation and Evaluation Techniques
29 pages
Ee - Psychology - Reliable Guide PDF
No ratings yet
Ee - Psychology - Reliable Guide PDF
140 pages
M.A. Psychology Curriculum 2023
No ratings yet
M.A. Psychology Curriculum 2023
43 pages
Biochemical Individuality. ISBN 9780879838935, 978-0879838935
100% (31)
Biochemical Individuality. ISBN 9780879838935, 978-0879838935
23 pages
Plus One Economics Revision Notes - Eng 2023-24 PDF Final Print
No ratings yet
Plus One Economics Revision Notes - Eng 2023-24 PDF Final Print
11 pages
Karaoui Arioua Evaluating 1 2019
No ratings yet
Karaoui Arioua Evaluating 1 2019
9 pages
IBM SPSS Practice Questions Answers
No ratings yet
IBM SPSS Practice Questions Answers
51 pages
Bba-Gen Sem 1
No ratings yet
Bba-Gen Sem 1
5 pages
Work-Life Balance and Job Satisfaction Study
No ratings yet
Work-Life Balance and Job Satisfaction Study
22 pages
Statistics An Introduction 5th. Edition Roger E. Kirk Available Full Chapters
No ratings yet
Statistics An Introduction 5th. Edition Roger E. Kirk Available Full Chapters
115 pages
Test of Hypothesis - T and Z Tests. Chi-Square Test. F Test.
No ratings yet
Test of Hypothesis - T and Z Tests. Chi-Square Test. F Test.
15 pages
Hierarchical Structure in Financial Markets
No ratings yet
Hierarchical Structure in Financial Markets
18 pages
B.E Maths Syllabus: Complex Analysis
No ratings yet
B.E Maths Syllabus: Complex Analysis
4 pages
Green, B. F. (1976) - On The Factor Score Controversy. Psychometrika, 41 (2), 263-266.
No ratings yet
Green, B. F. (1976) - On The Factor Score Controversy. Psychometrika, 41 (2), 263-266.
4 pages
Student Perception Impact on Math Learning
No ratings yet
Student Perception Impact on Math Learning
14 pages
Answers (Revision Packets) T (3) Statistics
No ratings yet
Answers (Revision Packets) T (3) Statistics
15 pages
Teaching Pronunciation with Songs EFL
No ratings yet
Teaching Pronunciation with Songs EFL
23 pages
BBA Quantitative Analysis Exam Guidelines
No ratings yet
BBA Quantitative Analysis Exam Guidelines
3 pages

Unit 3 ML

Uploaded by

Unit 3 ML

Uploaded by

Supervised Learning Kya Hota Hai?

Supervised learning ek aisa machine learning technique hai jisme:

✅ Supervised Learning Techniques ke 2 main types hote hain:

1. Regression (जब output number ho)

🧠 Common Regression Techniques:

2. Classification (जब output category ho)

🧠 Common Classification Techniques:

⚙️ Supervised Learning Process – Easy Steps

📈 Bonus: Accuracy Check Karne ke Tarike

🧩 Structure of a Decision Tree:

🧠 Important Terms (for theory questions):

Entropy Measures impurity/disorder in data

Information Gain How much entropy is reduced after a split

✍️ Sample Answer (write this in exam):

📌 Definition in Easy Words:

"Jo bacche science mein ache hain unko ek side karo."

Agar is question se clearly 2 group ban jaate hain:

Information Gain = Entropy (Before Split) – Weighted Entropy (After Split)

Entropy matlab kitna randomness/confusion hai.

📌 Definition in Easy Words:

"Ek group kitna mix hai?"

Gini Index = 1 - (p₁² + p₂² + ... + pₙ²)

jahan p₁, p₂, ... har class ka probability hai.

🤔 Difference Between IG & Gini:

Based on Entropy (log based) Probability (square based)

Goal Confusion kitna kam hua? Group kitna shuddh hai?

Value Higher = Better Lower = Better

Use ID3 Algorithm CART Algorithm

📌 Exam ke liye 2-3 Lines likhne layak:

P (A∣B): Probability of A given B (posterior probability)

✅ 3. Naive Assumption (Why "Naive"?):

✅ 5. Types of Naive Bayes:

Gaussian Naive Bayes Used for continuous data assuming normal

✅ 9. Simple Example (for understanding):

✅ 10. Key Points for Exam:

Hyperplane A line (in 2D), a plane (in 3D), or higher-dimensional surface

Kernel A mathematical function that transforms data into a higher

✅ 5. Kernel Functions (for Non-Linear SVM):

Linear Kernel Used for linearly separable data

Polynomial Kernel Maps data to a higher dimension using polynomial

Sigmoid Kernel Similar to neural networks; less commonly used

✅ 6. Working of SVM (Step-by-Step):

✅ 10. Disadvantages of SVM:

✅ 11. Key Points to Write in Exam (Short Answer Format):

Key Concepts of Support Vector Machine

Working of Random Forest Algorithm

Key Features of Random Forest

Assumptions of Random Forest

✅ 3. Types of Linear Regression:

Simple Linear Regression One independent variable (e.g., height vs weight)

✅ 4. Equation of Linear Regression:

🔹 Multiple Linear Regression:

Predicting house price Multiple Linear Regression

Predicting student marks from study hours Simple Linear Regression

Predicting salary based on experience Simple Linear Regression

✅ 7. Evaluation Metrics (for checking performance):

MSE (Mean Squared Error) Average of squared prediction errors

RMSE (Root Mean Squared Error) Square root of MSE

✅ 10. Exam-Ready 3–4 Line Answer:

This is why it's called "least squares" method.

✅ 3. Equation of Linear Regression (used in OLS):

Minimize: ∑(Yi − Y^i )2

✅ 5. Working Steps of OLS Regression:

✅ 7. Evaluation Metrics (for performance):

R² (R-squared) Measures how well the model explains the variation in

MSE (Mean Squared Error) Average of squared residuals

RMSE Square root of MSE (gives error in same units as output)

✅ **11. Exam-Ready Summary (3–4 lines):

Types of Logistic Regression

Assumptions of Logistic Regression

Understanding Sigmoid Function

How does Logistic Regression work?

Suppose we have input features represented as a matrix:

⎡ x11 ​ ... x1m ⎤

x21 ​ ... x2m​

and the dependent variable is Y having only binary value i.e 0 or 1.

⎡ x11 ... x1m ⎤

x21 ... x2m