ML Unit II_Final
ML Unit II_Final
Classification
Algorithms
Training
Data
Classifier
(Model)
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Step 2: Model Usage
Classifier
Testing Unseen
Data Data
(Jeff, Professor, 4)
Tenured?
Decision Tree Classification
Decision Tree Terminologies
Age is the splitting attribute at the root node of DT. Repeat the
procedure to determine the splitting attribute(except age) along
each of the branches from root node till stopping condition is
reached to generate the final DT
Final DT of the buys_computer
dataset
Example : DT Creation
Example : DT Creation
Example : DT Creation
Example : Usage of Information Gain
and Entropy in DT Creation
Decision Tree
A Classifier ( tree structure): used in classification and regression
Classification mostly uses Decision tree
Decision tree Model act as classifier : tree structure classifier
New input( unlabeled, unknown) is fed to the model and model classifies it to
a particular class
Decision tree has two nodes:
Decision node( branch: test conducted either yes or no) [corresponds to an Attribute]
Leaf node( no branch) [corresponds to a Class Label ]
Finally leaf node is a class….( assign a class to next coming sample)
06/27/2025 17
Decision Tree
Ex.: Let’s say you want to predict whether a person is fit given their
information like age, eating habit, and physical activity, etc.
The decision nodes here are questions like ‘What’s the age?’, ‘Does he
exercise?’, ‘Does he eat a lot of pizzas’? And the leaves, which are outcomes
like either ‘fit’, or ‘unfit’.
A thumb rule to identify the right hyper-plane: “Select the hyper-plane which segregates the
two classes better”. In this scenario, hyper-plane “B” has excellently performed this job.
How does it work?
needs to be maximized
Naïve Bayesian Classification -
Example
Class:
C1:buys_computer =
‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age = youth,
income = medium,
student = yes
credit_rating = fair)
Naïve Bayesian Classification -
Example
Test for X = (age = youth , income = medium,
student = yes, credit_rating = fair)
• Prior probability P(Ci): P(buys_computer = yes) = 9/14 = 0.643 P(buys_computer = no) =
5/14= 0.357
• To compute P(X|Ci) for each class, compute following conditional probabilities
P(age = youth| buys_computer = yes) = 2/9 = 0.222
P(age = youth | buys_computer = no) = 3/5 = 0.6
P(income = medium | buys_computer = yes) = 4/9 = 0.444
P(income = medium | buys_computer = no) = 2/5 = 0.4
P(student = yes | buys_computer = yes) = 6/9 = 0.667
P(student = yes | buys_computer = no) = 1/5 = 0.2
P(credit_rating = fair | buys_computer = yes) = 6/9 = 0.667
P(credit_rating = fair | buys_computer = no) = 2/5 = 0.4 0.028>0.007 ..
• P(X|Ci) : P(X|buys_computer = yes) = 0.222 x 0.444 x 0.667 x 0.667 =Therefore,
0.044 X belongs to class
(“buys_computer = yes”)
P(X|buys_computer = no) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
• P(X|Ci)*P(Ci) : P(X|buys_computer = yes) * P(buys_computer = yes) = 0.028
P(X|buys_computer = no) * P(buys_computer = no) = 0.007
Comment on Naive Bayes
Classification
Advantages
Easy to implement
Good results obtained in most of the cases
Disadvantages
Assumption: class conditional independence i.e. effect of an
attribute value on a given class is independent of the values of
other attributes, therefore loss of accuracy
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
https://www.kaggle.com/code/skalskip/iris-data-visualization-
and-knn-classification
68
Feature Scaling
• Feature scaling is the method to limit the
range of variables so that they can be
compared on common grounds.
What is Regression
Types of Regression
Cntd..
Regression analysis is a statistical technique that attempts to explore
and model the relationship between two or more variables.
Understanding Regression
• Dependent variable (The value to be predicted)
• Independent Variable (The predictors)
06/27/2025 74
Cntd..
• Linear Regression (Straight line model)
i. Simple linear Regression (Single independent variable and dependent variable is
continuous)
ii. Multiple regression (More than one independent variable and dependent
variable is continuous)
• Logistic Regression (Binary categorical outcome)
.Logistic model is used to model the probability of a certain class or
event existing such as pass/fail, win/lose, alive/dead or healthy/sick.
.. This can be extended to model several classes of events such as
determining whether an image contains a cat, dog, lion, etc..
• For example, the weight, height, and age represent continuous variable.
• a person's gender, occupation, or marital status are categorical or discrete variables
06/27/2025 75
Cntd..
Example: Advertising through TV, Radio, Newspaper to increase sales
• How accurately can we predict future sales?
• Is the relationship linear?
• Is there synergy among the advertising media?
06/27/2025 76
Cntd..
• Straightforward simple linear regression approach for predicting a
quantitative response Y on the basis of a single predictor variable X
06/27/2025 77
06/27/2025 78
Simple Linear Regression
ANALYZING DA
IV DV
Simple Linear Regression
SALARY EQUATION PLOTTING
(₹)
y = b0 + b1 *
x1
SALARY = b0 + b1 *
+10
EXPERIENCE
K
+1 Yr EXPERIEN
CE
Simple Linear Regression
Constant Coefficient
y = b0 + b1 *
x1
EXPERIEN
Least Square Method
1 2 -2 -2 4 4
2 4 -1 0 1 0
3 5 0 1 0 0
4 4 1 0 1 0
5 5 2 1 4 2
Problem Statement
When we have a single input attribute (x) and we want to use linear regression, this is called simple linear regression.
If we had multiple input attributes (e.g. x1, x2, x3, etc.) This would be called multiple linear regression.
The procedure for linear regression is different and simpler than that for multiple linear regression
06/27/2025 95
Cntd..
With simple linear regression we want to model our data as follows: y = B0 + B1 * x
This is a line where y is the output variable we want to predict, x is the input variable we know
and B0 and B1 are coefficients that we need to estimate that move the line around.
06/27/2025 96
Calculating the sum of these squared values gives us up denominator of 10 Now we can
calculate the value of our slope. B1 = 8 / 10 so further B1 = 0.8
06/27/2025 97
06/27/2025 98
06/27/2025 99
06/27/2025 100
Linear Regression
Simple linear regression: models the relationship between a dependent
variable and one independent variables using a linear function
Use the following steps to fit a multiple linear regression model to this dataset.
06/27/2025 110
Multiple Linear Regression
Step 1: Calcúlate X12, X22, X1y, X2y and X1X2.
06/27/2025 111
Multiple Linear Regression
Step 2: Calculate Regression Sums.
Next, make the following regression sum calculations:
Σx12 = ΣX12 – (ΣX1)2 / n = 38,767 – (555)2 / 8 = 263.875
Σx22 = ΣX22 – (ΣX2)2 / n = 2,823 – (145)2 / 8 = 194.875
Σx1y = ΣX1y – (ΣX1Σy) / n = 101,895 – (555*1,452) / 8 = 1,162.5
Σx2y = ΣX2y – (ΣX2Σy) / n = 25,364 – (145*1,452) / 8 = -953.5
Σx1x2 = ΣX1X2 – (ΣX1ΣX2) / n = 9,859 – (555*145) / 8 = -200.375
06/27/2025 112
Multiple Linear Regression
Step 3: Calculate b0, b1, and b2.
06/27/2025 113
Multiple Linear Regression
Step 5: Place b0, b1, and b2 in the estimated linear regression equation.
06/27/2025 114
Residual
Residual: degree of discrepancy between the model assumed and data
observed
Residuals (e) : Difference between the observed value of the dependent
variable (y) and the predicted value (ŷ)
ˆi
ei yi y
Each data point has one residual
Residual = [Observed value] – [Predicted value]
Residual is a more precise estimator of zero
How would you draw a line through the points? How do you
determine which line ‘fits best’?
Y
60
40
20
0 X
0 20 40 60
EPI 809/Spring 2008 119
Thinking Challenge
How would you draw a line through the points? How do you
determine which line ‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
EPI 809/Spring 2008 120
Thinking Challenge
How would you draw a line through the points? How do you
determine which line ‘fits best’?
Slope unchanged
Y
60
40
20
0 X
0 20 40 60
Intercept changed
EPI 809/Spring 2008 121
Thinking Challenge
How would you draw a line through the points? How do you
determine which line ‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
EPI 809/Spring 2008 122
Least Squares (LS)
‘Best Fit’ Means Difference Between Actual Y-Values & Predicted Y-Values
are a Minimum.
But Positive Differences Off-Set Negative. So square errors!
ˆ
n 2 n
Yi Yˆi 2
i
i 1 i 1
i 1
Y
Y2 X
0 1 2 2
^4
^2
^1 ^3
Yi 0 1 X i
X
EPI 809/Spring 2008 124
Coefficient Equations
• Prediction equation
yˆ i ˆ0 ˆ1 xi
• Sample slope
SS xy xi x yi y
ˆ1
2
SS xx ix x
• Sample Y - intercept
ˆ0 y ˆ1x
EPI 809/Spring 2008 125
Linear classification
Nonlinearity
Logistic Regression