0% found this document useful (0 votes)

6 views79 pages

Unit2_ML

Uploaded by

Priyanka Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views79 pages

Unit2_ML

Uploaded by

Priyanka Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Unit 2

Regression
Dr. ABHINANDAN P. SHIRAHATTI
Associate Professor,
Department of Computer Science Engineering,
KIT’s College of Engineering (Autonomous),
Kolhapur
Maharashtra – 416234

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Content
Simple linear regression – hypothesis, cost function, parameter learning
with gradient descent, learning rate, gradient descent for linear
regression, examples.
Simple linear regression in matrix form.
Multivariate linear regression – multiple features, hypothesis functions.
Gradient descent for multiple variables, feature scaling, polynomial
regression.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Simple Linear Regression
• Simple Linear Regression is a type of Regression algorithms that models the
relationship between a dependent variable and a single independent variable.
• The relationship shown by a Simple Linear Regression model is linear or a sloped
straight line, hence it is called Simple Linear Regression.
• The key point in Simple Linear Regression is that the dependent variable
must be a continuous/real value.
• However, the independent variable can be measured on continuous or
categorical values.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Simple Linear Regression
Simple Linear regression algorithm has mainly two objectives:
 Model the relationship between the two variables. Such as the relationship between
Income and expenditure, experience and Salary, etc.
 Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year, etc.
Simple Linear Regression Model:
𝒚 = 𝑎0 + 𝑎1𝒙+Ɛ
Where,
a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is increasing or
decreasing.
ε = The error term. (For a good model it will be negligible)
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Simple Linear Regression
Linear regression is a type of supervised machine learning algorithm that computes
the linear relationship between the dependent variable and one or more independent
features by fitting a linear equation to observed data.

When there is only one independent feature, it is known as Simple Linear Regression

when there are more than one feature, it is known as Multiple Linear Regression.

Similarly, when there is only one dependent variable, it is considered Univariate

Linear Regression, while when there are more than one dependent variables, it is
known as Multivariate Regression.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
What is the best Fit Line?
The best Fit Line equation provides a straight line that represents the
relationship between the dependent and independent variables.

The slope of the line indicates how much the dependent variable changes for a
unit change in the independent variable(s).

Linear regression is to locate the best-fit line, which implies that the error
between the predicted and actual values should be kept to a minimum.

There will be the least error in the best-fit line.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
What is the best Fit Line?
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Hypothesis function in Linear Regression
As we have assumed earlier that our independent feature is the experience i.e X
and the respective salary Y is the dependent variable.
Let’s assume there is a linear relationship between X and Y then the salary can
be predicted using:
.

The model gets the best regression fit line by finding the best θ1 and θ2 values

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Cost function for Linear Regression

A Machine Learning model should have a very high level of accuracy in order to
perform well with real-world applications. But how to calculate the accuracy of
the model, i.e., how good or poor our model will perform in the real world? In
such a case, the Cost function comes into existence. It is an important machine
learning parameter to correctly

A cost function is an important parameter that determines how well a machine

learning model performs for a given dataset. It calculates the difference
between the expected value and predicted value and represents it as a single
KIT real
| number.
Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why use Cost Function?
Suppose we have a dataset that contains the height and weights of cats & dogs,
and we need to classify them accordingly. If we plot the records using these
two features, we will get a scatter plot as below:

In the above image, the green dots are cats, and the yellow dots are dogs. Below are the three possible solutions for this classification
problem.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Below three classifiers have high accuracy, but the third solution is the best because it
correctly classifies each data point. The reason behind the best classification is that it is in
mid between both the classes, not close or not far to any of them.

we need a Cost function, It calculated the difference between the actual values and predicted
values and measured how wrong was our model in the prediction. By minimizing the value of
the cost function, we can get the optimal solution.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Types of Cost Function
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
Regression models are used to make a prediction for the continuous variables such as the price
of houses, weather prediction, loan predictions, etc. When a cost function is used with
Regression, it is known as the "Regression Cost Function." In this, the cost function is
calculated as the error based on the distance, such as:
Error= Actual Output-Predicted output
1. Mean error:
This cost function computes the error for all training data and derives the average of all these
errors. Computing the mean of the errors is the most straightforward and most intuitive
method possible.
These errors can be negative or positive. Therefore, they can cancel each other out during the
summation, and the average error of the model will be zero.
Therefore, it is not the recommended cost function but is the basis for other regression model
cost functions.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
2. Mean squared error
Mean squared error is one of the most commonly used and earliest described regression measures. MSE
represents the mean square of the difference between the prediction and the expected result. In other
words, MSE is a variation of MAE that squares the difference instead of taking the absolute value of
the difference. There is no possibility of negative errors.

It is also known as L2 loss.

In MSE, since each error is squared, it helps to penalize even small deviations in prediction when compared to
MAE.
But if our dataset has outliers that contribute to larger prediction errors, then squaring this error further will
magnify the error many times more and also lead to higher MSE error.
Hence we can say that it is less robust to outliers

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
3. Mean absolute error
Mean absolute error is a regression metric that measures the average error size for a group of
predictions without regard to direction. In other words, it is the average absolute difference
between the prediction and the expected result, with all individual variances of equal
importance. The Mean Absolute Error (or MAE) tells the average of the absolute differences
between predicted and actual values. By calculating MAE, we can understand how wrong the
model did the predictions.

 It is also known as L1 Loss.

 It is robust to outliers thus it will give better results even when our dataset has noise or
outliers.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
4. Root mean squared error (RMSE)
It is the square root of the mean of the square of all of the errors. Root Mean Square Error
(RMSE) measures the error between two data sets. In other words, it compares an observed
or known value and a predicted value.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solved Example on Regression Cost Function
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Regression Cost Function
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Gradient Descent in Linear Regression
Gradient Descent is an iterative optimization algorithm that tries to find the optimum
value (Minimum/Maximum) of an objective function.
It is one of the most used optimization techniques in machine learning projects for
updating the parameters of a model in order to minimize a cost function.
The main aim of gradient descent is to find the best parameters of a model which gives the
highest accuracy on training as well as testing datasets.
In gradient descent, The gradient is a vector that points in the direction of the steepest
increase of the function at a specific point.
Moving in the opposite direction of the gradient allows the algorithm to gradually
descend towards lower values of the function, and eventually reaching to the minimum
of the function.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Does Every ML Algorithm Rely on Gradient Descent?

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps Required in Gradient Descent Algorithm
Step 1 we first initialize the parameters of the model randomly
Step 2 Compute the gradient of the cost function with respect to each parameter. It involves making partial
differentiation of cost function with respect to the parameters.
Step 3 Update the parameters of the model by taking steps in the opposite direction of the model. Here we choose
a hyper parameter learning rate which is denoted by alpha. It helps in deciding the step size of the gradient.
Step 4 Repeat steps 2 and 3 iteratively to get the best parameter for the defined model

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression –Solved Example
1. Simple Linear Regression:
Assume that there is only one single independent variable X ,variable and dependent
variable Y and a relationship between X and Y is modeled by the relation.
Y=a+bX
Steps to find a and b value are given below:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution
Step 2

Step 3:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution
Step 1

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Step 2

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression using Matrix Method
Find Linear Regression of the data of week and product sales (in thousands)
Given in the below table.
Use Linear Regression in matrix form:
Predict the 5th and 6th week sales
Xi Yi
(week) (Sales in thousands)

1 1
2 3
3 4
4 8

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression Matrix method
Here , the independent variable X is represent in the form of Matrix
XT =[1 2 3 4]
The dependent variable Y is represent in the form of Matrix:
YT =[1 3 4 8]
The data can be given in the matrix form as follows:

The first column is used for setting the bias

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression Matrix method
The regression is given below:

 Step by step computation is given below:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression Matrix Method

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression using least squares method
The least squares method is a form of mathematical regression analysis used to
determine the line of best fit for a set of data, providing a visual demonstration
of the relationship between the data points.

Each point of data represents the relationship between a known independent

variable and an unknown dependent variable.

This method is commonly used by statisticians and traders who want to

identify trading opportunities and trends.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression using least squares method
 Here , X is independent variable and Y is dependent variable.
 Equation of linear regression:
 Y=a+bx a is the intercept and b is the slope or coefficient

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Linear Regression using least square method

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multi Linear Regression
Multiple Linear Regression is an extension of simple linear regression that models the
relationship between two or more independent variables (predictors) and a dependent
variable (outcome). The goal is to establish a linear relationship between the predictors
and the outcome, allowing us to predict the outcome based on the values of the predictors.
 Mathematical Representation: The general form of a multiple linear
regression model is:
y 0 =β 0 +β 1 x 1 +β 2 x 2 +⋯+β n x n +ϵ
Where:𝑦 is the dependent variable (what you're trying to predict).
𝑥1,𝑥2,…𝑥𝑛 are the independent variables (predictors).
β 0 is the intercept (the value of 𝑦 when all x i are 0).
β 0 ,β 1 , β 2 , β n are the coefficients (the change in 𝑦 for a one-unit change in the corresponding x i ).
𝜖 is the error term (difference between the predicted and actual values).
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Key Concepts MR
 Linearity: The relationship between the dependent variable and each
independent variable is linear.
 Multicollinearity: This occurs when independent variables are highly
correlated with each other, which can make it difficult to determine the effect
of each predictor on the dependent variable.
 Residuals: The difference between the observed value and the value
predicted by the model. Ideally, residuals should be randomly distributed with
a mean of zero.
 R-squared (𝑅2): A statistical measure that represents the proportion of the
variance for the dependent variable that is explained by the independent
variables in the model. The value of 𝑅2 ranges from 0 to 1.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps to Implement Multiple Linear Regression

1. Data Collection: Gather the dataset that includes the dependent variable and the
independent variables.
2.Exploratory Data Analysis (EDA): Analyze the data to understand the relationships,
distributions, and potential issues like missing values.
3.Data Preprocessing:
1. Handling missing data: Impute or remove missing values.
2. Feature scaling: Standardize or normalize the data if necessary, especially if the predictors
have different units or scales.
4.Splitting the Dataset: Divide the data into training and testing sets to evaluate the
model’s performance.
5.Building the Model:
1. Choose a machine learning framework or library (e.g., scikit-learn in Python).
2. Fit the model to the training data.
3. Estimate the coefficients β 0 ,β 1 , β 2 , β n using methods like Ordinary Least Squares
KIT
(OLS).
| Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multiple Linear Regression

6.Model Evaluation:

1. Use metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and

𝑅2 to evaluate the model.

2. Check for multicollinearity using Variance Inflation Factor (VIF).

3. Analyze residuals to check assumptions like homoscedasticity (constant variance of

residuals).

7.Prediction: Use the model to make predictions on new data.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Example
Suppose you are trying to predict the price of a house (y) based on the size of
the house (x1), the number of bedrooms (x2), and the age of the house (x3).
The model might look like this:

Price=β0+ β1×Size + β2×Bedrooms + β3×Age + ϵ

Here:
β1 : tells you how much the price increases for each additional square foot of
size.
β2 : indicates the effect of adding one more bedroom.
β3 : represents how the price changes as the house gets older.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Advantages:
 It can model complex relationships between the dependent variable and multiple
predictors.
 The coefficients provide interpretable insights into how each predictor affects the
outcome.
Disadvantages:
 Sensitive to outliers and multicollinearity.
 Assumes a linear relationship, which might not always be the case.
Applications:
 Predicting real estate prices.
 Estimating the impact of marketing strategies on sales.
 Modeling the relationship between various health indicators and patient outcomes.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Mathematical model for MR
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multi Linear Regression problem solved with matrix method

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution Steps

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Step
•.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Example 2
• The Data in the table relate weight of person Y,
• X1 represents the weight of the carbohydrates in grams
• X2 represents the weight of the Protein in grams
• Obtain the regression equation.
• Also Predict theweight Of a person
• with given carb weight x1=8 and protien weight x2=7.6

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Solution steps
1. Represent the inputs in matrix:

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Steps

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multivariate Linear Regression
Multivariate Linear Regression and "Multiple Linear Regression" usually confer the same
concept in the context of linear regression modeling.
Both terms describe a linear regression version in which you have multiple impartial variables
(features) used to expect a single structured variable (target).
In different phrases, each phrase means a linear regression version with more than one predictor
variable.
Linear regression is a crucial device getting to know the method used for predicting a non-
stop purpose variable based totally on one or more unbiased features. When we've got more
than one unbiased capability, it is called multivariate linear regression.
Mathematical model:
y i =β 0i +β 1i x 1i +β 2i x 2i +⋯+β ni x ni +ϵ

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Multivariate Linear Regression
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Encoding Categorical Data for Machine Learning Models
•.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Feature Scaling

Feature Scaling is a technique to standardize the independent features present in the data in a
fixed range. It is performed during the data pre-processing to handle highly varying
magnitudes or values or units.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
What is Feature Scaling?
• In general Data set contains different types of variables having different
magnitude and units (kilograms, grams, Age in years, salary in thousands etc).
• The significant issue with variables is that they might differ in terms of range of
values.
• So the feature with large range of values will start dominating against other
variables.
• Models could be biased towards those high ranged features.
• So to overcome this problem, we do feature scaling.
• The goal of applying Feature Scaling is to make sure features are on almost the
same scale so that each feature is equally important and make it easier to process
by most ML algorithms.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why use Feature Scaling?

In machine learning, feature scaling is employed for a number of purposes:

• Scaling guarantees that all features are on a comparable scale and have comparable
ranges. This process is known as feature normalization.
• This is significant because the magnitude of the features has an impact on many machine
learning techniques.
• Larger scale features may dominate the learning process and have an excessive impact on
the outcomes.
• You can avoid this problem and make sure that each feature contributes equally to the
learning process by scaling the features.
• Algorithm performance improvement: When the features are scaled, several machine
learning methods, including gradient descent-based algorithms, distance-based algorithms
(such k-nearest neighbours), and support vector machines, perform better or converge
more quicklyoutcome.
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why use Feature Scaling?

• Preventing numerical instability: Numerical instability can be prevented by avoiding

significant scale disparities between features.
• Examples include distance calculations or matrix operations, where having features with
radically differing scales can result in numerical overflow or underflow problems.
• Stable computations are ensured and these issues are mitigated by scaling the features.
• Scaling features makes ensuring that each characteristic is given the same consideration
during the learning process.
• Without scaling, bigger scale features could dominate the learning, producing skewed
outcomes.
• This bias is removed through scaling, which also guarantees that each feature contributes
fairly to model predictions.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why and how high range of features impact model performance?

Below table we see that both Age and Salary have different range of values. So when we
train a model it might give high importance to salary column just because the high range
of values. However it could not be the case and both columns have equal or near to equal
impact on target variable which could be based on age and salary whether a person will
buy a house or not. So in case of buying a house both age and salary have equal
importance. So here we need to do the feature scaling.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Lets understand above points by applying some mathematics.
In machine learning everything is measured in terms of numbers and when we want to
identify the nearest neighbors, similarity or dissimilarity of features then we calculate the
distance between features and based on distances we say two features are similar or not.
Similar means if we consider a feature with respect to the target variable then similarity
mean how much a feature impacts the target variable.
Lets understand this distance thing with an example. So there are many method to calculate
distance here we will take Euclidean distance.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Why use Feature Scaling?

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
• Task is to identify the neighbors of emp2 then by above distance we can say emp3 is more
close to emp2 as emp2 and emp3 share less distance between them.
• But think now if we simply increase the salary number then this distance will increase
and then it will imply that emp2 and emp3 are not similar.
• So idea behind feature scaling is that value range should not impact the feature
behavior.
• When we want to do the comparison between two entities we should first bring them in
same page or same level or same scale then only we can do the comparison.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Feature Scaling Techniques
In Machine Learning there are two mostly used feature scaling techniques.
• Normalization
• Standardization

Normalization
Normalization is also known as min-max normalization or min-max scaling.
Normalization re-scales values in the range of 0-1

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Normalization

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Standardization
Standardization is also known as z-score Normalization.
In standardization, features are scaled to have zero-mean and one-standard-deviation.
It means after standardization features will have mean = 0 and standard deviation = 1.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Normalization vs Standardization
If you have outliers in your feature (column), normalizing your data will scale most of the
data to a small interval, which means all features will have the same scale and hence it
will not handle outliers well.
Standardization is more robust to outliers, and in many cases, it is preferable over Max-
Min Normalization.
Normalization is good to use when your data does not follow a Normal distribution.
This can be useful in algorithms that do not assume any distribution of the data like K-
Nearest Neighbors and Neural Networks.
Standardization, on the other hand, can be helpful in cases where the data follows a
Normal distribution. However, this does not have to be necessarily true.
Also, unlike normalization, standardization does not have a bounding range. So, even if
you have outliers in your data, they will not be affected by standardization.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
When to do Feature Scaling? After Train-Test Split or Before?
In general, feature scaling should be done after split to avoid data leakage.

If we do scaling before the split, then training data will also have information about test data which
will make it anyway perform good in test data but model might not perform good when it comes to
actual prediction on unseen data.
Data Leakage:
When information from outside the training data set is used to create the model. This can allow the
model to learn or know something that it otherwise would not know and in turn invalidate the
estimated performance of the model being constructed.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Preferred way:
• Split data
• Perform scaling on training
• Build model on training set
• Use the scaling parameters mean / variance from the training set to scale the
testing data separately. Lets take example of standardization where we use
standard deviation of training data to scale the features. So to scale test data
use the same standard deviation and scale test data (than calculating standard
deviation for testing data and use it.)
• Then test on testing set.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
ML Algorithms which need feature Scaling

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Polynomial Regression
Polynomial Regression is a form of linear regression in which the relationship
between the independent variable x and dependent variable y is modelled as
an nth-degree polynomial. Polynomial regression fits a nonlinear relationship
between the value of x and the corresponding conditional mean of y.
Why Polynomial Regression?
Polynomial regression is a type of regression analysis used in statistics and machine
learning when the relationship between the independent variable (input) and the dependent
variable (output) is not linear. While simple linear regression models the relationship as a
straight line, polynomial regression allows for more flexibility by fitting a polynomial
equation to the data.
When the relationship between the variables is better represented by a curve rather
than a straight line, polynomial regression can capture the non-linear patterns in the data.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Polynomial Regression
 It is also called the special case of Multiple Linear Regression in ML. Because we add
some polynomial terms to the Multiple Linear regression equation to convert it into
Polynomial Regression.

 It is a linear model with some modification in order to increase the accuracy.

 The dataset used in Polynomial regression for training is of non-linear nature.

 It makes use of a linear regression model to fit the complicated and non-linear functions
and datasets.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Need for Polynomial Regression
.

•If we apply a linear model on a linear dataset, then it provides us a good result as we have seen in
Simple Linear Regression, but if we apply the same model without any modification on a non-linear
dataset, then it will produce a drastic output. Due to which loss function will increase, the error rate will
be high, and accuracy will be decreased.
•So for such cases, where data points are arranged in a non-linear fashion, we need the
Polynomial Regression model. We can understand it in a better way using the below comparison
diagram of the linear dataset and non-linear dataset.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Equation of the Polynomial Regression Model

Simple Linear Regression equation:

y = a0+a1x .........(a)
Multiple Linear Regression equation:
y= a0+a1x+ a2x2+ a3x3+....+ anxn .........(b)
Polynomial Regression equation:
y(𝐹𝑝 )= a0+a1x + a2x2+ a3x3+....+ anxn ..........(c)

𝑌 = ෍ 𝑎𝑖 𝑋𝑖 + 𝐹𝑝
𝑖=0
KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Polynomial Regression
The problem of non-linear regression can be solved by two methods:

1. Transformation of non-linear data to linear data, so that the linear regression can handle
the data
2. Use polynomial regression
3. Transformations
The trick is to convert non-linear data that can be handled using the linear regression method.
Let us consider an exponential function: y=aebx
The transformation can done by applying log function to both sides to get:

ln(y)=ln(a)+ln(ebx)
ln(y)=ln(a)+bx*ln(e)
ln(y)=ln(a)+bx

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Polynomial Regression
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Generally, polynomials of maximum degree 4 are used , as higher order polynomials take
some strange shapes and make the curve more flexible.

It leads to situation of overfitting and hence is avoided.

Consider the polynomial of 2nd degree

The Polynomial equation in given by: y= a0+a1x + a2x2

a0, a1, & a2 are calculated using the formula. a=X-1B

Where,

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
Problem Solution
.

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111
END

KIT | Department of Basic Sciences and Humanities | Course: INTRODUCTION TO PYTHON PROGRAMMING | Course Code : UHSES0111

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Lecture 4 - Cost Function
No ratings yet
Lecture 4 - Cost Function
18 pages
UNIT4 CostFunctions
No ratings yet
UNIT4 CostFunctions
23 pages
Linear Regression by Sam
No ratings yet
Linear Regression by Sam
27 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
ML Section2
No ratings yet
ML Section2
36 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Linear Best Fit
No ratings yet
Linear Best Fit
2 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Implementation of Linear Regression With Python
No ratings yet
Implementation of Linear Regression With Python
5 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
chp2 cost functions
No ratings yet
chp2 cost functions
7 pages
10.Introduction to Artificial Intelligence
No ratings yet
10.Introduction to Artificial Intelligence
25 pages
Regression Linear Simple
No ratings yet
Regression Linear Simple
37 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
simple linear regression with example problem
No ratings yet
simple linear regression with example problem
12 pages
ml-unit-3-notes-1
No ratings yet
ml-unit-3-notes-1
58 pages
AI lab7
No ratings yet
AI lab7
13 pages
02 LR
No ratings yet
02 LR
11 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
LinearRegression
No ratings yet
LinearRegression
24 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Cost Function
100% (1)
Cost Function
21 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
ML Khuraim
No ratings yet
ML Khuraim
27 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
39 pages
Top 7 Loss Functions to Evaluate Regression Models
No ratings yet
Top 7 Loss Functions to Evaluate Regression Models
8 pages
MODULE 2
No ratings yet
MODULE 2
21 pages
ML 02 Regression 2
No ratings yet
ML 02 Regression 2
30 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Regression Analysis: A Journey from Simple to Complex
From Everand
Regression Analysis: A Journey from Simple to Complex
Pasquale De Marco
No ratings yet
Understanding Analysis: Foundations and Applications
From Everand
Understanding Analysis: Foundations and Applications
Tanmay Shroff
No ratings yet
Unit 3_ML_CH-1
No ratings yet
Unit 3_ML_CH-1
45 pages
Ds unit 1 notes
No ratings yet
Ds unit 1 notes
23 pages
Ds unit 3 notes
No ratings yet
Ds unit 3 notes
29 pages
Ds unit 2 notes
No ratings yet
Ds unit 2 notes
26 pages
Dive Into Deep Learning
No ratings yet
Dive Into Deep Learning
837 pages
DL Notes ALL
No ratings yet
DL Notes ALL
63 pages
Gradient Descent
No ratings yet
Gradient Descent
3 pages
Unit IV Artificial Neural Networks
No ratings yet
Unit IV Artificial Neural Networks
25 pages
Article Review 10 Eng
No ratings yet
Article Review 10 Eng
28 pages
Dl-Unit-2 - 1
No ratings yet
Dl-Unit-2 - 1
47 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
2-Mathematical Optimization and Deep Learning
No ratings yet
2-Mathematical Optimization and Deep Learning
53 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
MLP(Backward propagation) (1)
No ratings yet
MLP(Backward propagation) (1)
16 pages
L6 Adaptive Filters
No ratings yet
L6 Adaptive Filters
35 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Course Notes For MATH 524: Non-Linear Optimization
No ratings yet
Course Notes For MATH 524: Non-Linear Optimization
112 pages
2023 02 Exam
No ratings yet
2023 02 Exam
5 pages
Response Surface Methodology
No ratings yet
Response Surface Methodology
26 pages
CS 256: LMS Algorithms
No ratings yet
CS 256: LMS Algorithms
23 pages
Amazon ML Summer School Previous Year Questions
100% (2)
Amazon ML Summer School Previous Year Questions
8 pages
Minimization of Welding Residual Stress and Distortion in
No ratings yet
Minimization of Welding Residual Stress and Distortion in
22 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
No ratings yet
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
15 pages
Large Scale Machine Learning
No ratings yet
Large Scale Machine Learning
24 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
NeurIPS 2023 Contextual Stochastic Bilevel Optimization Paper Conference
No ratings yet
NeurIPS 2023 Contextual Stochastic Bilevel Optimization Paper Conference
23 pages
Tval3 Thesis
No ratings yet
Tval3 Thesis
92 pages
Viva Questions
No ratings yet
Viva Questions
3 pages
Artificial Neural Networks Notes PDF
100% (1)
Artificial Neural Networks Notes PDF
27 pages
Dkhichi 2014
No ratings yet
Dkhichi 2014
8 pages
Optimization Techniques On Riemannian Manifolds
No ratings yet
Optimization Techniques On Riemannian Manifolds
24 pages

Unit2_ML

Uploaded by

Unit2_ML

Uploaded by

Unit 2

Similarly, when there is only one dependent variable, it is considered Univariate

There will be the least error in the best-fit line.

A cost function is an important parameter that determines how well a machine

It is also known as L2 loss.

 It is also known as L1 Loss.

The first column is used for setting the bias

 Step by step computation is given below:

Each point of data represents the relationship between a known independent

This method is commonly used by statisticians and traders who want to

𝑅2 to evaluate the model.

3. Analyze residuals to check assumptions like homoscedasticity (constant variance of

7.Prediction: Use the model to make predictions on new data.

Price=β0+ β1×Size + β2×Bedrooms + β3×Age + ϵ

In machine learning, feature scaling is employed for a number of purposes:

• Preventing numerical instability: Numerical instability can be prevented by avoiding

 It is a linear model with some modification in order to increase the accuracy.

 The dataset used in Polynomial regression for training is of non-linear nature.

Simple Linear Regression equation:

It leads to situation of overfitting and hence is avoided.

Consider the polynomial of 2nd degree

The Polynomial equation in given by: y= a0+a1x + a2x2

a0, a1, & a2 are calculated using the formula. a=X-1B

You might also like