0% found this document useful (0 votes)

20 views

Linear Regression - Everything You Need To Know About Linear Regression

Uploaded by

Divyanshu Chandra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Linear Regression - Everything You Need To Know About Linear Regression

Uploaded by

Divyanshu Chandra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Everything you need to Know about Linear Regression!

×
Advance Your Career with Gen AI Curriculum, Expert Mentorship, and Industry Projects
Register Now

Home

KAVITA MALI — Updated On September 21st, 2023

Beginner Guide Linear Regression Machine Learning R

This article was published as a part of the Data Science Blogathon

Overview
Linear Regression is a simple yet powerful and mostly used algorithm in data science.
There are a plethora of real-world applications of Linear Regression.
This comprehensive guide will introduce you to Linear Regression along with an implementation in Python on a real-
world dataset.

Introduction
Models use machine learning algorithms, during which the machine learns from the data just like humans learn from
their experiences. Machine learning models can be broadly divided into two categories based on the learning algorithm
which can further be classified based on the task performed and the nature of the output.

1. Supervised learning methods: It contains past data with labels which are then used for building the model.

Regression: The output variable to be predicted is continuous in nature, e.g. scores of a student, diamond prices, etc.
Classification: The output variable to be predicted is categorical in nature, e.g.classifying incoming emails as spam
or ham, Yes or No, True or False, 0 or 1.
2. Unsupervised learning methods: It contains no predefined labels assigned to the past data.

Clustering: No predefined labels are assigned to groups/clusters formed,e.g. customer segmentation.

Linear Regression is a supervised learning algorithm in machine learning that supports finding the linear correlation
among variables. The result or output of the regression problem is a real or continuous value.

In this article, we will cover linear regression and its components comprehensively. We’ll look at simple and multiple
linear regression, why it matters, its applications, its drawbacks, and then deep dive into linear regression including how
to perform it in Python on a real-world dataset.

Table of contents
What is Linear Regression?
Simple Linear Regression
What is the best fit line?
Cost Function for Linear Regression
Gradient Descent for Linear Regression
Evaluation Metrics for Linear Regression
Coefficient
Everything of Determination
you need to Know aboutorLinear
R-Squared (R2)
Regression!
Root Mean Squared Error
Assumptions of Linear Regression
Hypothesis in Linear Regression
Multiple Linear Regression
Considerations of Multiple Linear Regression
Multicollinearity
Overfitting and Underfitting in Linear Regression
Bias Variance Tradeoff
Overfitting
Underfitting:
Hands-on Coding: Linear Regression Model
Step 1: Importing Python Libraries
Step 2: Loading the Dataset
Step 3: Visualization
Step 4: Performing Simple Linear Regression
Step 5: Performing predictions on the test set
Frequently Asked Questions

What is Linear Regression?

Linear regression is a type of statistical analysis used to predict the relationship between two variables. It assumes a
linear relationship between the independent variable and the dependent variable, and aims to find the best-fitting line
that describes the relationship. The line is determined by minimizing the sum of the squared differences between the
predicted values and the actual values.

Linear regression is commonly used in many fields, including economics, finance, and social sciences, to analyze and
predict trends in data. It can also be extended to multiple linear regression, where there are multiple independent
variables, and logistic regression, which is used for binary classification problems.

Simple Linear Regression

In a simple linear regression, there is one independent variable and one dependent variable. The model estimates the
slope and intercept of the line of best fit, which represents the relationship between the variables. The slope represents
the change in the dependent variable for each unit change in the independent variable, while the intercept represents
the predicted value of the dependent variable when the independent variable is zero.

Linear regression is a quiet and the simplest statistical regression method used for predictive analysis in machine
learning. Linear regression shows the linear relationship between the independent(predictor) variable i.e. X-axis and the
dependent(output) variable i.e. Y-axis, called linear regression. If there is a single input variable X(independent variable),
such linear regression is called simple linear regression.
Everything you need to Know about Linear Regression!

The graph above presents the linear relationship between the output(y) and predictor(X) variables. The blue line is
referred to as the best-fit straight line. Based on the given data points, we attempt to plot a line that fits the points the
best.

To calculate best-fit line linear regression uses a traditional slope-intercept form which is given below,

Yi = β 0 + β 1 X i

where Yi = Dependent variable, β0 = constant/Intercept, β1 = Slope/Intercept, Xi = Independent variable.

This algorithm explains the linear relationship between the dependent(output) variable y and the
independent(predictor) variable X using a straight line Y= B0 + B1 X.

But how the linear regression finds out which is the best fit line?

The goal of the linear regression algorithm is to get the best values for B0 and B1 to find the best fit line. The best fit line
is a line that has the least error which means the error between predicted values and actual values should be minimum.

Random Error(Residuals)

In regression, the difference between the observed value of the dependent variable(yi) and the predicted
value(predicted) is called the residuals.

εi = ypredicted – yi

where ypredicted = B0 + B1 Xi

What is the best fit line?

In simple terms, the best fit line is a line that fits the given scatter plot in the best way. Mathematically, the best fit line is
obtained by minimizing the Residual Sum of Squares(RSS).

Cost Function for Linear Regression

The cost function helps to work out the optimal values for B0 and B1, which provides the best fit line for the data points.

In Linear Regression, generally Mean Squared Error (MSE) cost function is used, which is the average of squared error
that occurred between the ypredicted and yi.

We calculate MSE using simple linear equation y=mx+b:

Everything you need to Know about Linear Regression!

Using the MSE function, we’ll update the values of B0 and B1 such that the MSE value settles at the minima. These
parameters can be determined using the gradient descent method such that the value for the cost function is minimum.

Gradient Descent for Linear Regression

Gradient Descent is one of the optimization algorithms that optimize the cost function(objective function) to reach the
optimal minimal solution. To find the optimum solution we need to reduce the cost function(MSE) for all data points.
This is done by updating the values of B0 and B1 iteratively until we get an optimal solution.

A regression model optimizes the gradient descent algorithm to update the coefficients of the line by reducing the cost
function by randomly selecting coefficient values and then iteratively updating the values to reach the minimum cost
function.

Let’s take an example to understand this. Imagine a U-shaped pit. And you are standing at the uppermost point in the pit,
and your motive is to reach the bottom of the pit. Suppose there is a treasure at the bottom of the pit, and you can only
take a discrete number of steps to reach the bottom. If you opted to take one step at a time, you would get to the bottom
of the pit in the end but, this would take a longer time. If you decide to take larger steps each time, you may achieve the
bottom sooner but, there’s a probability that you could overshoot the bottom of the pit and not even near the bottom. In
the gradient descent algorithm, the number of steps you’re taking can be considered as the learning rate, and this
decides how fast the algorithm converges to the minima.

To update B0 and B1, we take gradients from the cost function. To find these gradients, we take partial derivatives for B0
and B1.
Everything you need to Know about Linear Regression!

We need to minimize the cost function J. One of the ways to achieve this is to apply the batch gradient descent
algorithm. In batch gradient descent, the values are updated in each iteration. (Last two equations shows the updating
of values)

The partial derivates are the gradients, and they are used to update the values of B0 and B1. Alpha is the learning rate.

Evaluation Metrics for Linear Regression

The strength of any linear regression model can be assessed using various evaluation metrics. These evaluation metrics
usually provide a measure of how well the observed outputs are being generated by the model.

The most used metrics are,

1. Coefficient of Determination or R-Squared (R2)

2. Root Mean Squared Error (RSME) and Residual Standard Error (RSE)

Coefficient of Determination or R-Squared (R2)

R-Squared is a number that explains the amount of variation that is explained/captured by the developed model. It
always ranges between 0 & 1 . Overall, the higher the value of R-squared, the better the model fits the data.
Mathematically
Everything youit need
can be
torepresented
Know aboutas,
Linear Regression!

R2 = 1 – ( RSS/TSS )

Residual sum of Squares (RSS) is defined as the sum of squares of the residual for each data point in the plot/data. It
is the measure of the difference between the expected and the actual observed output.

Total Sum of Squares (TSS) is defined as the sum of errors of the data points from the mean of the response variable.
Mathematically TSS is,

where y hat is the mean of the sample data points.

The significance of R-squared is shown by the following figures,

Root Mean Squared Error

The Root Mean Squared Error is the square root of the variance of the residuals. It specifies the absolute fit of the
model to the data i.e. how close the observed data points are to the predicted values. Mathematically it can be
represented as,

To make this estimate unbiased, one has to divide the sum of the squared residuals by the degrees of freedom rather
than the total number of data points in the model. This term is then called the Residual Standard Error(RSE).
Mathematically it can be represented as,

R-squared is a better measure than RSME. Because the value of Root Mean Squared Error depends on the units of the
variables (i.e. it is not a normalized measure), it can change with the change in the unit of the variables.

Assumptions of Linear Regression

Regression is a parametric approach, which means that it makes assumptions about the data for the purpose of analysis.
For successful regression analysis, it’s essential to validate the following assumptions.

1. Linearity of residuals: There needs to be a linear relationship between the dependent variable and independent
variable(s).
Everything you need to Know about Linear Regression!

2. Independence of residuals: The error terms should not be dependent on one another (like in time-series data wherein the next value

is dependent on the previous one). There should be no correlation between the residual terms. The absence of this phenomenon is

known as Autocorrelation.

There should not be any visible patterns in the error terms.

3. Normal distribution of residuals: The mean of residuals should follow a normal distribution with a mean equal to zero or close to

zero. This is done in order to check whether the selected line is actually the line of best fit or not.

If the error terms are non-normally distributed, suggests that there are a few unusual data points that must be studied
closely to make a better model.

4. The equal variance of residuals: The error terms must have constant variance. This phenomenon is known
as Homoscedasticity.

The presence of non-constant variance in the error terms is referred to as Heteroscedasticity. Generally, non-constant
variance arises in the presence of outliers or extreme leverage values.
Hypothesis into Linear
Everything you need Know aboutRegression
Linear Regression!

Once you have fitted a straight line on the data, you need to ask, “Is this straight line a significant fit for the data?” Or
“Is the beta coefficient explain the variance in the data plotted?” And here comes the idea of hypothesis testing on the
beta coefficient. The Null and Alternate hypotheses in this case are:

H 0 : B1 = 0

H A : B1 ≠ 0

To test this hypothesis we use a t-test, test statistics for the beta coefficient is given by,

Assessing the model fit

Some other parameters to assess a model are:

1. t statistic: It is used to determine the p-value and hence, helps in determining whether the coefficient is significant
or not
2. F statistic: It is used to assess whether the overall model fit is significant or not. Generally, the higher the value of
the F-statistic, the more significant a model turns out to be.

Multiple Linear Regression

Multiple linear regression is a technique to understand the relationship between a single dependent variable
and multiple independent variables.

The formulation for multiple linear regression is also similar to simple linear regression with

the small change that instead of having one beta variable, you will now have betas for all the variables used. The formula
is given as:

Y = B 0 + B 1 X 1 + B 2 X 2 + … + B pX p + ε

Considerations of Multiple Linear Regression

All the four assumptions made for Simple Linear Regression still hold true for Multiple Linear Regression along with a
few new additional assumptions.

1. Overfitting: When more and more variables are added to a model, the model may become far too complex and
usually ends up memorizing all the data points in the training set. This phenomenon is known as the overfitting of a
model. This usually leads to high training accuracy and very low test accuracy.
2. Multicollinearity: It is the phenomenon where a model with several independent variables, may have some
variables interrelated.
3. Feature Selection: With more variables present, selecting the optimal set of predictors from the pool of given
features (many of which might be redundant) becomes an important task for building a relevant and better model.

Multicollinearity
As multicollinearity makes it difficult to find out which variable is actually contributing towards the prediction of the
response variable, it leads one to conclude incorrectly, the effects of a variable on the target variable. Though it does
not affect the precision of the predictions, it is essential to properly detect and deal with the multicollinearity present in
the model, as random
Everything you needremoval ofabout
to Know any ofLinear
these correlated variables from the model causes the coefficient values to swing
Regression!
wildly and even change signs.

Multicollinearity can be detected using the following methods.

1. Pairwise Correlations: Checking the pairwise correlations between different pairs of independent variables can
throw useful insights in detecting multicollinearity.
2. Variance Inflation Factor (VIF): Pairwise correlations may not always be useful as it is possible that just one variable
might not be able to completely explain some other variable but some of the variables combined could be ready to
do this. Thus, to check these sorts of relations between variables, one can use VIF. VIF basically explains the
relationship of one independent variable with all the other independent variables. VIF is given by,

where i refers to the ith variable which is being represented as a linear combination of the rest of the independent variables.

The common heuristic followed for the VIF values is if VIF > 10 then the value is definitely high and it should be
dropped. And if the VIF=5 then it may be valid but should be inspected first. If VIF < 5, then it is considered a good vif
value.

Overfitting and Underfitting in Linear Regression

There have always been situations where a model performs well on training data but not on the test data. While training
models on a dataset, overfitting, and underfitting are the most common problems faced by people.

Before understanding overfitting and underfitting one must know about bias and variance.

Bias:

Bias is a measure to determine how accurate is the model likely to be on future unseen data. Complex models, assuming
there is enough training data available, can do predictions accurately. Whereas the models that are too naive, are very
likely to perform badly with respect to predictions. Simply, Bias is errors made by training data.

Generally, linear algorithms have a high bias which makes them fast to learn and easier to understand but in general, are
less flexible. Implying lower predictive performance on complex problems that fail to meet the expected outcomes.

Variance:

Variance is the sensitivity of the model towards training data, that is it quantifies how much the model will react when
input data is changed.

Ideally, the model shouldn’t change too much from one training dataset to the next training data, which will mean that
the algorithm is good at picking out the hidden underlying patterns between the inputs and the output variables.

Ideally, a model should have lower variance which means that the model doesn’t change drastically after changing the
training data(it is generalizable). Having higher variance will make a model change drastically even on a small change in
the training dataset.

Let’s understand what is a bias-variance tradeoff is.

Bias Variance
Everything you need toTradeoff
Know about Linear Regression!

The aim of any supervised machine learning algorithm is to achieve low bias and low variance as it is more robust. So
that the algorithm should achieve better performance.

There is no escape from the relationship between bias and variance in machine learning.

There is an inverse relationship between bias and variance,

An increase in bias will decrease the variance.

An increase in the variance will decrease the bias.
There is a trade-off that plays between these two concepts and the algorithms must find a balance between bias and
variance.

As a matter of fact, one cannot calculate the real bias and variance error terms because we do not know the actual
underlying target function.

Now coming to the overfitting and underfitting.

Overfitting
When a model learns each and every pattern and noise in the data to such extent that it affects the performance of the
model on the unseen future dataset, it is referred to as overfitting. The model fits the data so well that it interprets
noise as patterns in the data.

When a model has low bias and higher variance it ends up memorizing the data and causing overfitting. Overfitting causes
the model to become specific rather than generic. This usually leads to high training accuracy and very low test accuracy.

Detecting overfitting is useful, but it doesn’t solve the actual problem. There are several ways to prevent overfitting,
which are stated below:

Cross-validation
If the training data is too small to train add more relevant and clean data.
If the training data is too large, do some feature selection and remove unnecessary features.
Regularization

Underfitting:
Underfitting is not often discussed as often as overfitting is discussed. When the model fails to learn from the training
dataset and is also not able to generalize the test dataset, is referred to as underfitting. This type of problem can be very
easily detected by the performance metrics.
When a modelyou
Everything hasneed
highto
bias andabout
Know low variance it ends up not generalizing the data and causing underfitting. It is unable
Linear Regression!
to find the hidden underlying patterns from the data. This usually leads to low training accuracy and very low test
accuracy. The ways to prevent underfitting are stated below,

Increase the model complexity

Increase the number of features in the training data
Remove noise from the data.

Hands-on Coding: Linear Regression Model

This is the section where you’ll find out how to perform the regression in Python. We will use Advertising sales channel
prediction data. You can access the data here.

TV Radio Newspaper Sales

230.1 37.8 69.2 22.1
44.5 39.3 45.1 10.4
17.2 45.9 69.3 12.0
151.5 41.3 58.5 16.5
180.8 10.8 58.4 17.9
8.7 48.9 75.0 7.2
57.5 32.8 23.5 11.8

‘Sales’ is the target variable that needs to be predicted. Now, based on this data, our objective is to create a predictive
model, that predicts sales based on the money spent on different platforms for marketing.

Let us straightaway get right down to some hands-on coding to get this prediction done. Please don’t feel overlooked if
you do not have experience with Python. In fact, the best way to learn is to get your hands dirty by solving a problem –
like the one we are doing.

Step 1: Importing Python Libraries

The first step is to fire up your Jupyter notebook and load all the prerequisite libraries in your Jupyter notebook. Here
are the important libraries that we will be needing for this linear regression.

NumPy (to perform certain mathematical operations)

pandas (to storethe data in a pandas DataFrames)
matplotlib.pyplot (you will use matplotlib to plot the data)
In order to load these, just start with these few lines of codes in your first cell:

#Importing all the necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Supress Warnings
import warnings
warnings.filterwarnings('ignore')

The last line of code helps in suppressing the unnecessary warnings.

Step 2: Loading the Dataset

Let us now import data into a DataFrame. A DataFrame is a data type in Python. The simplest way to understand it
would be that it stores all your data in tabular format.

#Read the given CSV file, and view some sample records
advertising = pd.read_csv( "advertising.csv" )
advertising.head()
Everything you need to Know about Linear Regression!

Step 3: Visualization
Let us plot the scatter plot for target variable vs. predictor variables in a single plot to get the intuition. Also, plotting a
heatmap for all the variables,

#Importing seaborn library for visualizations

import seaborn as sns

#to plot all the scatterplots in a single plot

sns.pairplot(advertising, x_vars=[ 'TV', ' Newspaper.,'Radio' ], y_vars = 'Sales', size = 4, kind =
'scatter' )
plt.show()

#To plot heatmap to find out correlations

sns.heamap( advertising.corr(), cmap = 'YlGnBl', annot = True )
plt.show()

From the scatterplot and the heatmap, we can observe that ‘Sales’ and ‘TV’ have a higher correlation as compared to
others because it shows a linear pattern in the scatterplot as well as giving 0.9 correlation.
You can go ahead
Everything you and
needplay with the
to Know visualizations
about and can find out interesting insights from the data.
Linear Regression!

Step 4: Performing Simple Linear Regression

Here, as the TV and Sales have a higher correlation we will perform the simple linear regression for these variables.

We can use sklearn or statsmodels to apply linear regression. So we will go ahead with statmodels.

We first assign the feature variable, `TV`, during this case, to the variable `X` and the response variable, `Sales`, to the
variable `y`.

X = advertising[ 'TV' ]
y = advertising[ 'Sales' ]

And after assigning the variables you need to split our variable into training and testing sets. You’ll perform this by
importing train_test_split from the sklearn.model_selection library. It is usually a good practice to keep 70% of the
data in your train dataset and the rest 30% in your test dataset.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split( X, y, train_size = 0.7, test_size = 0.3,
random_state = 100 )

In this way, you can split the data into train and test sets.

One can check the shapes of train and test sets with the following code,

print( X_train.shape )
print( X_test.shape )
print( y_train.shape )
print( y_test.shape )

importing statmodels library to perform linear regression

import statsmodels.api as sm

By default, the statsmodels library fits a line on the dataset which passes through the origin. But in order to have an
intercept, you need to manually use the add_constant attribute of statsmodels. And once you’ve added the constant to
your X_train dataset, you can go ahead and fit a regression line using the OLS (Ordinary Least Squares) the attribute
of statsmodels as shown below,

# Add a constant to get an intercept

X_train_sm = sm.add_constant(X_train)
# Fit the resgression line using 'OLS'
lr = sm.OLS(y_train, X_train_sm).fit()

One can see the values of betas using the following code,

# Print the parameters,i.e. intercept and slope of the regression line obtained
lr.params

Here, 6.948 is the intercept, and 0.0545 is a slope for the variable TV.

Now, let’s see the evaluation metrics for this linear regression operation. You can simply view the summary using the
following code,
#Performing a summary operation lists out all different parameters of the regression line fitted
Everything you need to Know about Linear Regression!
print(lr.summary())

As you can see, this code gives you a brief summary of the linear regression. Here are some key statistics from the
summary:

1. The coefficient for TV is 0.054, with a very low p-value. The coefficient is statistically significant. So the association
is not purely by chance.
2. R – squared is 0.816 Meaning that 81.6% of the variance in `Sales` is explained by `TV`. This is a decent R-squared
value.
3. F-statistics has a very low p-value(practically low). Meaning that the model fit is statistically significant, and the
explained variance isn’t purely by chance.

Step 5: Performing predictions on the test set

Now that you have simply fitted a regression line on your train dataset, it is time to make some predictions on the test
data. For this, you first need to add a constant to the X_test data like you did for X_train and then you can simply go on
and predict the y values corresponding to X_test using the predict the attribute of the fitted regression line.

# Add a constant to X_test

X_test_sm = sm.add_constant(X_test)
# Predict the y values corresponding to X_test_sm
y_pred = lr.predict(X_test_sm)

You can see the predicted values with the following code,

y_pred.head()

To check how well the values are predicted on the test data we will check some evaluation metrics using sklearn library.
#Imporitng libraries
Everything you need to Know about Linear Regression!
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

#RMSE value
print( "RMSE: ",np.sqrt( mean_squared_error( y_test, y_pred ) )
#R-squared value
print( "R-squared: ",r2_score( y_test, y_pred ) )

We are getting a decent score for both train and test sets.

Apart from `statsmodels`, there is another package namely `sklearn` that can be used to perform linear regression. We
will use the `linear_model` library from `sklearn` to build the model. Since we have already performed a train-test split,
we don’t need to do it again.

There’s one small step that we need to add, though. When there’s only a single feature, we need to add an additional
column in order for the linear regression fit to be performed successfully. Code is given below,

X_train_lm = X_train_lm.values.reshape(-1,1)
X_test_lm = X_test_lm.values.reshape(-1,1)

One can check the change in the shape of the above data frames.

print(X_train_lm.shape)
print(X_train_lm.shape)

To fit the model, write the below code,

from sklearn.linear_model import LinearRegression

#Representing LinearRegression as lr (creating LinearRegression object)
lr = LinearRegression()
#Fit the model using lr.fit()
lr.fit( X_train_lm , y_train_lm )

You can get the intercept and slope values with sklearn using the following code,

#get intercept
print( lr.intercept_ )
#get slope
print( lr.coef_ )

This is how we can perform the simple linear regression.

Frequently Asked Questions

Q1. What are the parameters of a linear regression?
A.Everything
Linear regression
you needhas
totwo main
Know parameters:
about slope (weight) and intercept. The slope represents the change in the
Linear Regression!
dependent variable for a unit change in the independent variable. The intercept is the value of the dependent variable
when the independent variable is zero. The goal is to find the best-fitting line that minimizes the difference between
predicted and actual values.
Q2. What is the formula for linear regression line?
A. The formula for a linear regression line is:
y = mx + b
Where y is the dependent variable, x is the independent variable, m is the slope (weight), and b is the intercept. It
represents the best-fitting straight line that describes the relationship between the variables by minimizing the sum of
squared differences between actual and predicted values.

Conclusion
Having covered the most fundamental concept in machine learning, you’re now able to implement it on a number of
your datasets.

Whatever you learned in this discussion is quite sufficient for you to pick a simple dataset and go ahead to create a
linear regression model on it.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

blogathon linear regression

About the Author

KAVITA MALI
A Mathematics student turned Data Scientist. I am an aspiring data scientist who aims at learning all the necessary concepts in
Data Science in detail. I am passionate about Data Science knowing data manipulation, data visualization, data analysis, EDA,
Machine Learning, etc which will help to find valuable insights from the data.

Our Top Authors

Download
Analytics Vidhya App for the Latest blog/Article

Next Post
Intro to Rapidminer: A No-Code Development Platform for Data Mining (with Case Study)
Everything you need to Know about Linear Regression!

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment

Name*

Email*

Website

Notify me of follow-up comments by email. Notify me of new posts by email.

Submit

Top Resources

10 Best AI Image Generator Tools to Use in 2023

avcontentteam - AUG 17, 2023

Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
From Everand
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Jim Frost
5/5 (4)
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Artificial Intelligence and Intellectual Property Slides
50% (2)
Artificial Intelligence and Intellectual Property Slides
29 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
2-Linear Regression
No ratings yet
2-Linear Regression
31 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear Regression - 1st draft (1)
No ratings yet
Linear Regression - 1st draft (1)
5 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Regression
No ratings yet
Regression
4 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Group_1_Practical
No ratings yet
Group_1_Practical
16 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
linear regression (1)
No ratings yet
linear regression (1)
8 pages
AI Lec5
No ratings yet
AI Lec5
42 pages
02 LR
No ratings yet
02 LR
11 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
Modern Pridictive Modelling(Regression)
No ratings yet
Modern Pridictive Modelling(Regression)
12 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37 (1)
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37 (1)
115 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
Linear Regression
No ratings yet
Linear Regression
4 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Regression Linear Simple
No ratings yet
Regression Linear Simple
37 pages
-18-Linear Regression
No ratings yet
-18-Linear Regression
29 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Regression
No ratings yet
Regression
16 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Introduction to Generative AI
No ratings yet
Introduction to Generative AI
6 pages
Wa0197.
No ratings yet
Wa0197.
4 pages
Ai Model Life Cycle
No ratings yet
Ai Model Life Cycle
6 pages
Farming Made Easy Using Machine Learning
No ratings yet
Farming Made Easy Using Machine Learning
5 pages
Meeting Insights Summarisation Using Speech Recognition
No ratings yet
Meeting Insights Summarisation Using Speech Recognition
8 pages
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
32 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
Ai for remte sensing assignments notes
No ratings yet
Ai for remte sensing assignments notes
16 pages
Mule Proposal
No ratings yet
Mule Proposal
21 pages
Unit 3
No ratings yet
Unit 3
13 pages
Drug Recommendation System Based On Sentiment Analysis of Drug Reviews Using Machine Learning
No ratings yet
Drug Recommendation System Based On Sentiment Analysis of Drug Reviews Using Machine Learning
8 pages
(Template) IBM-CBSE - AI Project Logbook
No ratings yet
(Template) IBM-CBSE - AI Project Logbook
30 pages
Syllabi MTech Artificial Intelligence
No ratings yet
Syllabi MTech Artificial Intelligence
56 pages
KMEANS
No ratings yet
KMEANS
13 pages
3RD Year Result
No ratings yet
3RD Year Result
2 pages
Special Recruitment Drive 2024 Advertisement
No ratings yet
Special Recruitment Drive 2024 Advertisement
4 pages
Raphael Sonabend PHD Thesis
No ratings yet
Raphael Sonabend PHD Thesis
345 pages
Mini Project Documentation Format
No ratings yet
Mini Project Documentation Format
48 pages
Machine Learning new
No ratings yet
Machine Learning new
41 pages
new ppt
No ratings yet
new ppt
12 pages
Detailed List & Syllabuses of Courses: Syrian Arab Republic Damascus University
No ratings yet
Detailed List & Syllabuses of Courses: Syrian Arab Republic Damascus University
14 pages
Revolutionizing Finance
No ratings yet
Revolutionizing Finance
10 pages
Fault Prediction of Transformer Using Machine Learning and DGA
No ratings yet
Fault Prediction of Transformer Using Machine Learning and DGA
5 pages
Exam 2012 Data Mining Questions and Answers
No ratings yet
Exam 2012 Data Mining Questions and Answers
14 pages
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr - Swathi Y
No ratings yet
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr - Swathi Y
7 pages
01A092019 Review Impact of Artificial Intelligence On Society
100% (2)
01A092019 Review Impact of Artificial Intelligence On Society
13 pages
Ganesh Vandana Ma2 03
No ratings yet
Ganesh Vandana Ma2 03
5 pages
Machine Learning Lab Assessment 5: 18BCE2301 Devangshu Mazumder
No ratings yet
Machine Learning Lab Assessment 5: 18BCE2301 Devangshu Mazumder
10 pages
Vulnerability Assessment of A Large Sized Power System Using Neural Network Considering Various Feature Extraction Methods
No ratings yet
Vulnerability Assessment of A Large Sized Power System Using Neural Network Considering Various Feature Extraction Methods
10 pages

Linear Regression - Everything You Need To Know About Linear Regression

Uploaded by

Linear Regression - Everything You Need To Know About Linear Regression

Uploaded by

Everything you need to Know about Linear Regression!

KAVITA MALI — Updated On September 21st, 2023

This article was published as a part of the Data Science Blogathon

Clustering: No predefined labels are assigned to groups/clusters formed,e.g. customer segmentation.

What is Linear Regression?

Simple Linear Regression

where Yi = Dependent variable, β0 = constant/Intercept, β1 = Slope/Intercept, Xi = Independent variable.

What is the best fit line?

Cost Function for Linear Regression

We calculate MSE using simple linear equation y=mx+b:

Gradient Descent for Linear Regression

Evaluation Metrics for Linear Regression

The most used metrics are,

1. Coefficient of Determination or R-Squared (R2)

Coefficient of Determination or R-Squared (R2)

where y hat is the mean of the sample data points.

The significance of R-squared is shown by the following figures,

Root Mean Squared Error

Assumptions of Linear Regression

There should not be any visible patterns in the error terms.

Assessing the model fit

Multiple Linear Regression

Considerations of Multiple Linear Regression

Multicollinearity can be detected using the following methods.

Overfitting and Underfitting in Linear Regression

Let’s understand what is a bias-variance tradeoff is.

There is an inverse relationship between bias and variance,

An increase in bias will decrease the variance.

Now coming to the overfitting and underfitting.

Increase the model complexity

Hands-on Coding: Linear Regression Model

TV Radio Newspaper Sales

Step 1: Importing Python Libraries

NumPy (to perform certain mathematical operations)

#Importing all the necessary libraries

The last line of code helps in suppressing the unnecessary warnings.

Step 2: Loading the Dataset

#Importing seaborn library for visualizations

#to plot all the scatterplots in a single plot

#To plot heatmap to find out correlations

Step 4: Performing Simple Linear Regression

from sklearn.model_selection import train_test_split

importing statmodels library to perform linear regression

# Add a constant to get an intercept

Step 5: Performing predictions on the test set

# Add a constant to X_test

To fit the model, write the below code,

from sklearn.linear_model import LinearRegression

This is how we can perform the simple linear regression.

Frequently Asked Questions

blogathon linear regression

About the Author

Our Top Authors

Notify me of follow-up comments by email. Notify me of new posts by email.

10 Best AI Image Generator Tools to Use in 2023

avcontentteam - AUG 17, 2023

You might also like