0% found this document useful (0 votes)
6 views79 pages

Unit2_ML

Uploaded by

Priyanka Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views79 pages

Unit2_ML

Uploaded by

Priyanka Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Unit 2

Regression
Dr. ABHINANDAN P. SHIRAHATTI
Associate Professor,
Department of Computer Science Engineering,
KIT’s College of Engineering (Autonomous),
Kolhapur
Maharashtra – 416234

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Content
Simple linear regression – hypothesis, cost function, parameter learning
with gradient descent, learning rate, gradient descent for linear
regression, examples.
Simple linear regression in matrix form.
Multivariate linear regression – multiple features, hypothesis functions.
Gradient descent for multiple variables, feature scaling, polynomial
regression.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Simple Linear Regression
• Simple Linear Regression is a type of Regression algorithms that models the
relationship between a dependent variable and a single independent variable.
• The relationship shown by a Simple Linear Regression model is linear or a sloped
straight line, hence it is called Simple Linear Regression.
• The key point in Simple Linear Regression is that the dependent variable
must be a continuous/real value.
• However, the independent variable can be measured on continuous or
categorical values.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Simple Linear Regression
Simple Linear regression algorithm has mainly two objectives:
 Model the relationship between the two variables. Such as the relationship between
Income and expenditure, experience and Salary, etc.
 Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year, etc.
Simple Linear Regression Model:
𝒚 = 𝑎0 + 𝑎1𝒙+Ɛ
Where,
a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is increasing or
decreasing.
ε = The error term. (For a good model it will be negligible)
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Simple Linear Regression
Linear regression is a type of supervised machine learning algorithm that computes
the linear relationship between the dependent variable and one or more independent
features by fitting a linear equation to observed data.

When there is only one independent feature, it is known as Simple Linear Regression

when there are more than one feature, it is known as Multiple Linear Regression.

Similarly, when there is only one dependent variable, it is considered Univariate


Linear Regression, while when there are more than one dependent variables, it is
known as Multivariate Regression.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
What is the best Fit Line?
The best Fit Line equation provides a straight line that represents the
relationship between the dependent and independent variables.

The slope of the line indicates how much the dependent variable changes for a
unit change in the independent variable(s).

Linear regression is to locate the best-fit line, which implies that the error
between the predicted and actual values should be kept to a minimum.

There will be the least error in the best-fit line.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
What is the best Fit Line?
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Hypothesis function in Linear Regression
As we have assumed earlier that our independent feature is the experience i.e X
and the respective salary Y is the dependent variable.
Let’s assume there is a linear relationship between X and Y then the salary can
be predicted using:
.

The model gets the best regression fit line by finding the best θ1 and θ2 values

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Cost function for Linear Regression

A Machine Learning model should have a very high level of accuracy in order to
perform well with real-world applications. But how to calculate the accuracy of
the model, i.e., how good or poor our model will perform in the real world? In
such a case, the Cost function comes into existence. It is an important machine
learning parameter to correctly

A cost function is an important parameter that determines how well a machine


learning model performs for a given dataset. It calculates the difference
between the expected value and predicted value and represents it as a single
KIT real
| number.
Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why use Cost Function?
Suppose we have a dataset that contains the height and weights of cats & dogs,
and we need to classify them accordingly. If we plot the records using these
two features, we will get a scatter plot as below:

In the above image, the green dots are cats, and the yellow dots are dogs. Below are the three possible solutions for this classification
problem.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Below three classifiers have high accuracy, but the third solution is the best because it
correctly classifies each data point. The reason behind the best classification is that it is in
mid between both the classes, not close or not far to any of them.

we need a Cost function, It calculated the difference between the actual values and predicted
values and measured how wrong was our model in the prediction. By minimizing the value of
the cost function, we can get the optimal solution.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Types of Cost Function
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
Regression models are used to make a prediction for the continuous variables such as the price
of houses, weather prediction, loan predictions, etc. When a cost function is used with
Regression, it is known as the "Regression Cost Function." In this, the cost function is
calculated as the error based on the distance, such as:
Error= Actual Output-Predicted output
1. Mean error:
This cost function computes the error for all training data and derives the average of all these
errors. Computing the mean of the errors is the most straightforward and most intuitive
method possible.
These errors can be negative or positive. Therefore, they can cancel each other out during the
summation, and the average error of the model will be zero.
Therefore, it is not the recommended cost function but is the basis for other regression model
cost functions.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
2. Mean squared error
Mean squared error is one of the most commonly used and earliest described regression measures. MSE
represents the mean square of the difference between the prediction and the expected result. In other
words, MSE is a variation of MAE that squares the difference instead of taking the absolute value of
the difference. There is no possibility of negative errors.

It is also known as L2 loss.


In MSE, since each error is squared, it helps to penalize even small deviations in prediction when compared to
MAE.
But if our dataset has outliers that contribute to larger prediction errors, then squaring this error further will
magnify the error many times more and also lead to higher MSE error.
Hence we can say that it is less robust to outliers

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
3. Mean absolute error
Mean absolute error is a regression metric that measures the average error size for a group of
predictions without regard to direction. In other words, it is the average absolute difference
between the prediction and the expected result, with all individual variances of equal
importance. The Mean Absolute Error (or MAE) tells the average of the absolute differences
between predicted and actual values. By calculating MAE, we can understand how wrong the
model did the predictions.

 It is also known as L1 Loss.


 It is robust to outliers thus it will give better results even when our dataset has noise or
outliers.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
4. Root mean squared error (RMSE)
It is the square root of the mean of the square of all of the errors. Root Mean Square Error
(RMSE) measures the error between two data sets. In other words, it compares an observed
or known value and a predicted value.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solved Example on Regression Cost Function
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Gradient Descent in Linear Regression
Gradient Descent is an iterative optimization algorithm that tries to find the optimum
value (Minimum/Maximum) of an objective function.
It is one of the most used optimization techniques in machine learning projects for
updating the parameters of a model in order to minimize a cost function.
The main aim of gradient descent is to find the best parameters of a model which gives the
highest accuracy on training as well as testing datasets.
In gradient descent, The gradient is a vector that points in the direction of the steepest
increase of the function at a specific point.
Moving in the opposite direction of the gradient allows the algorithm to gradually
descend towards lower values of the function, and eventually reaching to the minimum
of the function.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Gradient Descent in Linear Regression
Gradient Descent is an iterative optimization algorithm that tries to find the optimum
value (Minimum/Maximum) of an objective function.
It is one of the most used optimization techniques in machine learning projects for
updating the parameters of a model in order to minimize a cost function.
The main aim of gradient descent is to find the best parameters of a model which gives the
highest accuracy on training as well as testing datasets.
In gradient descent, The gradient is a vector that points in the direction of the steepest
increase of the function at a specific point.
Moving in the opposite direction of the gradient allows the algorithm to gradually
descend towards lower values of the function, and eventually reaching to the minimum
of the function.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Gradient Descent in Linear Regression
Gradient Descent is an iterative optimization algorithm that tries to find the optimum
value (Minimum/Maximum) of an objective function.
It is one of the most used optimization techniques in machine learning projects for
updating the parameters of a model in order to minimize a cost function.
The main aim of gradient descent is to find the best parameters of a model which gives the
highest accuracy on training as well as testing datasets.
In gradient descent, The gradient is a vector that points in the direction of the steepest
increase of the function at a specific point.
Moving in the opposite direction of the gradient allows the algorithm to gradually
descend towards lower values of the function, and eventually reaching to the minimum
of the function.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Does Every ML Algorithm Rely on Gradient Descent?

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps Required in Gradient Descent Algorithm
Step 1 we first initialize the parameters of the model randomly
Step 2 Compute the gradient of the cost function with respect to each parameter. It involves making partial
differentiation of cost function with respect to the parameters.
Step 3 Update the parameters of the model by taking steps in the opposite direction of the model. Here we choose
a hyper parameter learning rate which is denoted by alpha. It helps in deciding the step size of the gradient.
Step 4 Repeat steps 2 and 3 iteratively to get the best parameter for the defined model

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression –Solved Example
1. Simple Linear Regression:
Assume that there is only one single independent variable X ,variable and dependent
variable Y and a relationship between X and Y is modeled by the relation.
Y=a+bX
Steps to find a and b value are given below:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution
Step 2

Step 3:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution
Step 1

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Step 2

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression using Matrix Method
Find Linear Regression of the data of week and product sales (in thousands)
Given in the below table.
Use Linear Regression in matrix form:
Predict the 5th and 6th week sales
Xi Yi
(week) (Sales in thousands)

1 1
2 3
3 4
4 8

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression Matrix method
Here , the independent variable X is represent in the form of Matrix
XT =[1 2 3 4]
The dependent variable Y is represent in the form of Matrix:
YT =[1 3 4 8]
The data can be given in the matrix form as follows:

The first column is used for setting the bias

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression Matrix method
The regression is given below:

 Step by step computation is given below:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression Matrix Method

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression using least squares method
The least squares method is a form of mathematical regression analysis used to
determine the line of best fit for a set of data, providing a visual demonstration
of the relationship between the data points.

Each point of data represents the relationship between a known independent


variable and an unknown dependent variable.

This method is commonly used by statisticians and traders who want to


identify trading opportunities and trends.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression using least squares method
 Here , X is independent variable and Y is dependent variable.
 Equation of linear regression:
 Y=a+bx a is the intercept and b is the slope or coefficient

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression using least square method

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multi Linear Regression
Multiple Linear Regression is an extension of simple linear regression that models the
relationship between two or more independent variables (predictors) and a dependent
variable (outcome). The goal is to establish a linear relationship between the predictors
and the outcome, allowing us to predict the outcome based on the values of the predictors.
 Mathematical Representation: The general form of a multiple linear
regression model is:
y 0 =β 0​ +β 1 x 1 +β 2​ x 2​ +⋯+β n​ x n​ +ϵ
Where:𝑦 is the dependent variable (what you're trying to predict).
𝑥1,𝑥2,…𝑥𝑛 are the independent variables (predictors).
β 0 is the intercept (the value of 𝑦 when all x i are 0).
β 0 ,β 1 ,​ β 2 , β n are the coefficients (the change in 𝑦 for a one-unit change in the corresponding x i ).
𝜖 is the error term (difference between the predicted and actual values).
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Key Concepts MR
 Linearity: The relationship between the dependent variable and each
independent variable is linear.
 Multicollinearity: This occurs when independent variables are highly
correlated with each other, which can make it difficult to determine the effect
of each predictor on the dependent variable.
 Residuals: The difference between the observed value and the value
predicted by the model. Ideally, residuals should be randomly distributed with
a mean of zero.
 R-squared (𝑅2): A statistical measure that represents the proportion of the
variance for the dependent variable that is explained by the independent
variables in the model. The value of 𝑅2 ranges from 0 to 1.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps to Implement Multiple Linear Regression

1. Data Collection: Gather the dataset that includes the dependent variable and the
independent variables.
2.Exploratory Data Analysis (EDA): Analyze the data to understand the relationships,
distributions, and potential issues like missing values.
3.Data Preprocessing:
1. Handling missing data: Impute or remove missing values.
2. Feature scaling: Standardize or normalize the data if necessary, especially if the predictors
have different units or scales.
4.Splitting the Dataset: Divide the data into training and testing sets to evaluate the
model’s performance.
5.Building the Model:
1. Choose a machine learning framework or library (e.g., scikit-learn in Python).
2. Fit the model to the training data.
3. Estimate the coefficients β 0​ ,β 1 ,​ β 2​ , β n ​ using methods like Ordinary Least Squares
KIT
(OLS).
| Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multiple Linear Regression

6.Model Evaluation:

1. Use metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and

𝑅2 to evaluate the model.


2. Check for multicollinearity using Variance Inflation Factor (VIF).

3. Analyze residuals to check assumptions like homoscedasticity (constant variance of


residuals).

7.Prediction: Use the model to make predictions on new data.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Example
Suppose you are trying to predict the price of a house (y) based on the size of
the house (x1), the number of bedrooms (x2), and the age of the house (x3).
The model might look like this:

Price=β0+ β1×Size + β2×Bedrooms + β3×Age + ϵ


Here:
β1 : tells you how much the price increases for each additional square foot of
size.
β2 : indicates the effect of adding one more bedroom.
β3 : represents how the price changes as the house gets older.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Advantages:
 It can model complex relationships between the dependent variable and multiple
predictors.
 The coefficients provide interpretable insights into how each predictor affects the
outcome.
Disadvantages:
 Sensitive to outliers and multicollinearity.
 Assumes a linear relationship, which might not always be the case.
Applications:
 Predicting real estate prices.
 Estimating the impact of marketing strategies on sales.
 Modeling the relationship between various health indicators and patient outcomes.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Mathematical model for MR
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multi Linear Regression problem solved with matrix method

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution Steps

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Step
•.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Example 2
• The Data in the table relate weight of person Y,
• X1 represents the weight of the carbohydrates in grams
• X2 represents the weight of the Protein in grams
• Obtain the regression equation.
• Also Predict theweight Of a person
• with given carb weight x1=8 and protien weight x2=7.6

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution steps
1. Represent the inputs in matrix:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multivariate Linear Regression
Multivariate Linear Regression and "Multiple Linear Regression" usually confer the same
concept in the context of linear regression modeling.
Both terms describe a linear regression version in which you have multiple impartial variables
(features) used to expect a single structured variable (target).
In different phrases, each phrase means a linear regression version with more than one predictor
variable.
Linear regression is a crucial device getting to know the method used for predicting a non-
stop purpose variable based totally on one or more unbiased features. When we've got more
than one unbiased capability, it is called multivariate linear regression.
Mathematical model:
y i =β 0i +β 1i x 1i +β 2i x 2i +⋯+β ni x ni +ϵ

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multivariate Linear Regression
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multivariate Linear Regression Matrix format

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Encoding Categorical Data for Machine Learning Models
•.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Feature Scaling

Feature Scaling is a technique to standardize the independent features present in the data in a
fixed range. It is performed during the data pre-processing to handle highly varying
magnitudes or values or units.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
What is Feature Scaling?
• In general Data set contains different types of variables having different
magnitude and units (kilograms, grams, Age in years, salary in thousands etc).
• The significant issue with variables is that they might differ in terms of range of
values.
• So the feature with large range of values will start dominating against other
variables.
• Models could be biased towards those high ranged features.
• So to overcome this problem, we do feature scaling.
• The goal of applying Feature Scaling is to make sure features are on almost the
same scale so that each feature is equally important and make it easier to process
by most ML algorithms.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why use Feature Scaling?

In machine learning, feature scaling is employed for a number of purposes:


• Scaling guarantees that all features are on a comparable scale and have comparable
ranges. This process is known as feature normalization.
• This is significant because the magnitude of the features has an impact on many machine
learning techniques.
• Larger scale features may dominate the learning process and have an excessive impact on
the outcomes.
• You can avoid this problem and make sure that each feature contributes equally to the
learning process by scaling the features.
• Algorithm performance improvement: When the features are scaled, several machine
learning methods, including gradient descent-based algorithms, distance-based algorithms
(such k-nearest neighbours), and support vector machines, perform better or converge
more quicklyoutcome.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why use Feature Scaling?

• Preventing numerical instability: Numerical instability can be prevented by avoiding


significant scale disparities between features.
• Examples include distance calculations or matrix operations, where having features with
radically differing scales can result in numerical overflow or underflow problems.
• Stable computations are ensured and these issues are mitigated by scaling the features.
• Scaling features makes ensuring that each characteristic is given the same consideration
during the learning process.
• Without scaling, bigger scale features could dominate the learning, producing skewed
outcomes.
• This bias is removed through scaling, which also guarantees that each feature contributes
fairly to model predictions.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why and how high range of features impact model performance?

Below table we see that both Age and Salary have different range of values. So when we
train a model it might give high importance to salary column just because the high range
of values. However it could not be the case and both columns have equal or near to equal
impact on target variable which could be based on age and salary whether a person will
buy a house or not. So in case of buying a house both age and salary have equal
importance. So here we need to do the feature scaling.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Lets understand above points by applying some mathematics.
In machine learning everything is measured in terms of numbers and when we want to
identify the nearest neighbors, similarity or dissimilarity of features then we calculate the
distance between features and based on distances we say two features are similar or not.
Similar means if we consider a feature with respect to the target variable then similarity
mean how much a feature impacts the target variable.
Lets understand this distance thing with an example. So there are many method to calculate
distance here we will take Euclidean distance.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why use Feature Scaling?

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
• Task is to identify the neighbors of emp2 then by above distance we can say emp3 is more
close to emp2 as emp2 and emp3 share less distance between them.
• But think now if we simply increase the salary number then this distance will increase
and then it will imply that emp2 and emp3 are not similar.
• So idea behind feature scaling is that value range should not impact the feature
behavior.
• When we want to do the comparison between two entities we should first bring them in
same page or same level or same scale then only we can do the comparison.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Feature Scaling Techniques
In Machine Learning there are two mostly used feature scaling techniques.
• Normalization
• Standardization

Normalization
Normalization is also known as min-max normalization or min-max scaling.
Normalization re-scales values in the range of 0-1

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Normalization

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Standardization
Standardization is also known as z-score Normalization.
In standardization, features are scaled to have zero-mean and one-standard-deviation.
It means after standardization features will have mean = 0 and standard deviation = 1.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Normalization vs Standardization
If you have outliers in your feature (column), normalizing your data will scale most of the
data to a small interval, which means all features will have the same scale and hence it
will not handle outliers well.
Standardization is more robust to outliers, and in many cases, it is preferable over Max-
Min Normalization.
Normalization is good to use when your data does not follow a Normal distribution.
This can be useful in algorithms that do not assume any distribution of the data like K-
Nearest Neighbors and Neural Networks.
Standardization, on the other hand, can be helpful in cases where the data follows a
Normal distribution. However, this does not have to be necessarily true.
Also, unlike normalization, standardization does not have a bounding range. So, even if
you have outliers in your data, they will not be affected by standardization.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
When to do Feature Scaling? After Train-Test Split or Before?
In general, feature scaling should be done after split to avoid data leakage.

If we do scaling before the split, then training data will also have information about test data which
will make it anyway perform good in test data but model might not perform good when it comes to
actual prediction on unseen data.
Data Leakage:
When information from outside the training data set is used to create the model. This can allow the
model to learn or know something that it otherwise would not know and in turn invalidate the
estimated performance of the model being constructed.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Preferred way:
• Split data
• Perform scaling on training
• Build model on training set
• Use the scaling parameters mean / variance from the training set to scale the
testing data separately. Lets take example of standardization where we use
standard deviation of training data to scale the features. So to scale test data
use the same standard deviation and scale test data (than calculating standard
deviation for testing data and use it.)
• Then test on testing set.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
ML Algorithms which need feature Scaling

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Polynomial Regression
Polynomial Regression is a form of linear regression in which the relationship
between the independent variable x and dependent variable y is modelled as
an nth-degree polynomial. Polynomial regression fits a nonlinear relationship
between the value of x and the corresponding conditional mean of y.
Why Polynomial Regression?
Polynomial regression is a type of regression analysis used in statistics and machine
learning when the relationship between the independent variable (input) and the dependent
variable (output) is not linear. While simple linear regression models the relationship as a
straight line, polynomial regression allows for more flexibility by fitting a polynomial
equation to the data.
When the relationship between the variables is better represented by a curve rather
than a straight line, polynomial regression can capture the non-linear patterns in the data.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Polynomial Regression
 It is also called the special case of Multiple Linear Regression in ML. Because we add
some polynomial terms to the Multiple Linear regression equation to convert it into
Polynomial Regression.

 It is a linear model with some modification in order to increase the accuracy.

 The dataset used in Polynomial regression for training is of non-linear nature.

 It makes use of a linear regression model to fit the complicated and non-linear functions
and datasets.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Need for Polynomial Regression
.

•If we apply a linear model on a linear dataset, then it provides us a good result as we have seen in
Simple Linear Regression, but if we apply the same model without any modification on a non-linear
dataset, then it will produce a drastic output. Due to which loss function will increase, the error rate will
be high, and accuracy will be decreased.
•So for such cases, where data points are arranged in a non-linear fashion, we need the
Polynomial Regression model. We can understand it in a better way using the below comparison
diagram of the linear dataset and non-linear dataset.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Equation of the Polynomial Regression Model

Simple Linear Regression equation:


y = a0+a1x .........(a)
Multiple Linear Regression equation:
y= a0+a1x+ a2x2+ a3x3+....+ anxn .........(b)
Polynomial Regression equation:
y(𝐹𝑝 )= a0+a1x + a2x2+ a3x3+....+ anxn ..........(c)

𝑌 = ෍ 𝑎𝑖 𝑋𝑖 + 𝐹𝑝
𝑖=0
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Polynomial Regression
The problem of non-linear regression can be solved by two methods:

1. Transformation of non-linear data to linear data, so that the linear regression can handle
the data
2. Use polynomial regression
3. Transformations
The trick is to convert non-linear data that can be handled using the linear regression method.
Let us consider an exponential function: y=aebx
The transformation can done by applying log function to both sides to get:

ln(y)=ln(a)+ln(ebx)
ln(y)=ln(a)+bx*ln(e)
ln(y)=ln(a)+bx

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Polynomial Regression
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Generally, polynomials of maximum degree 4 are used , as higher order polynomials take
some strange shapes and make the curve more flexible.

It leads to situation of overfitting and hence is avoided.

Consider the polynomial of 2nd degree

The Polynomial equation in given by: y= a0+a1x + a2x2

a0, a1, & a2 are calculated using the formula. a=X-1B

Where,

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Problem Solution
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
END

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111

You might also like