0% found this document useful (0 votes)

2 views

Linear Regression

The document provides an overview of regression analysis, focusing on linear regression as a method to predict a dependent variable based on independent variables. It outlines the assumptions, prerequisites, and properties of linear regression, including the importance of the regression equation and model performance metrics. Additionally, it includes a Python implementation example for linear regression using libraries like pandas and scikit-learn.

Uploaded by

abhijaychauhan88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Linear Regression

Uploaded by

abhijaychauhan88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

LINEAR Python

REGRESSION
WHAT IS REGRESSION?
Regression searches for relationships among variables.
For example, you can observe several employees of some company
and try to understand how their salaries depend on the features,
such as experience, level of education, role, city they work in, and
so on.

Generally, in regression analysis, you usually consider some

phenomenon of interest and have a number of observations. Each
observation has two or more features. Following the assumption
that (at least) one of the features depends on the others, you try to
establish a relation among them.
WHAT IS REGRESSION?
The dependent features are called the dependent variables, outputs,
or responses.
The independent features are called the independent variables,
inputs, or predictors.

Regression problems usually have one continuous and unbounded

dependent variable. The inputs, however, can be continuous, discrete,
or even categorical data such as gender, nationality, brand, and so on.
It is a common practice to denote the outputs with 𝑦 and inputs with 𝑥.

as the vector 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the number of inputs.

If there are two or more independent variables, they can be represented
LINEAR REGRESSION
Linear regression is probably one of the most important and widely
used regression techniques. It’s among the simplest regression
methods. One of its main advantages is the ease of interpreting
results.
Problem Formulation

variable 𝑦 on the set of independent variables 𝐱 = (𝑥₁, …, 𝑥ᵣ),

When implementing linear regression of some dependent

where 𝑟 is the number of predictors, you assume a linear

relationship between 𝑦 and 𝐱: 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀. This
equation is the regression equation. 𝛽₀, 𝛽₁, …, 𝛽ᵣ are the
regression coefficients, and 𝜀 is the random error.
SIMPLE LINEAR
REGRESSION
Introduction
“Linear Regression” is a method to predict dependent variable (Y) based on values of
independent variables (X). It can be used for the cases where we want to predict
some continuous quantity. E.g., Predicting traffic in a retail store, predicting a user’s
dwell time or number of pages visited on xyz.com etc.

The mathematical formula of the linear regression can be written as

y = b0 + b1*x , where:

b0 and b1 are known as the regression beta coefficients or parameters:

b0 is the intercept of the regression line; that is the predicted value when x = 0.
b1 is the slope of the regression line.
Y = B0 + B1*X
PREREQUISITES
To start with Linear Regression, you must be aware of a few basic
concepts of statistics. i.e.,
Correlation (r) – Explains the relationship between two variables,
possible values -1 to +1
Variance (σ2) – Measure of spread in your data
Standard Deviation (σ) – Measure of spread in your data (Square
root of Variance)
Normal distribution
Residual (error term) – {Actual value – Predicted value}
ASSUMPTIONS OF LINEAR
REGRESSION
In order to fit a linear regression line data should satisfy few basic
but important assumptions. If your data doesn’t follow the
assumptions, your results may be wrong as well as misleading.
Linearity & Additive: There should be a linear relationship
between dependent and independent variables and the impact of
change in independent variable values should have additive impact
on dependent variable.
Normality of error distribution: Distribution of differences
between Actual & Predicted values (Residuals) should be normally
distributed.
ASSUMPTIONS OF LINEAR
REGRESSION
Homoscedasticity: Variance of errors should be constant versus,
 Time
 The predictions
 Independent variable values

Statistical independence of errors: The error terms (residuals)

should not have any correlation among themselves. E.g., In case of
time series data there shouldn’t be any correlation between
consecutive error terms
LINEAR REGRESSION LINE
While doing linear regression our objective is to fit a line through
the distribution which is nearest to most of the points. Hence
reducing the distance (error term) of data points from the fitted
line.
BASIC PARAMETERS
For example, in above figure (left) dots represent various data points and line
(right) represents an approximate line which can explain the relationship
between ‘x’ & ‘y’ axes. Through, linear regression we try to find out such a
line. For example, if we have one dependent variable ‘Y’ and one independent
variable ‘X’ – relationship between ‘X’ & ‘Y’ can be represented in a form of
following equation:
Y = Β0 + Β1X
Where,
Y = Dependent Variable
X = Independent Variable
Β0 = Constant term a.k.a Intercept
Β1 = Coefficient of relationship between ‘X’ & ‘Y’
FEW PROPERTIES OF LINEAR
REGRESSION LINE
Regression line always passes through mean of independent
variable (x) as well as mean of dependent variable (y)
Regression line minimizes the sum of “Square of Residuals”. That’s
why the method of Linear Regression is known as “Ordinary Least
Square (OLS)”

Β1 explains the change in Y with a change in X by one unit. In

other words, if we increase the value of ‘X’ by one unit then what
will be the change in value of Y
MODEL PERFORMANCE
Once you build the model, the next logical question comes in mind is to
know whether your model is good enough to predict in future or the
relationship which you built between dependent and independent variables
is good enough or not.
For this purpose there are various metrics which we look into-
R – Square (R2)
Adjusted R-Square
Mean Square Error (MSE)
Root Mean Square Error (RMSE)
Mean Absolute Error (MAE)
Mean Absolute Percentage Error (MAPE)
MULTIPLE LINEAR
REGRESSION
Fundamentally there is no difference between ‘Simple’ & ‘Multiple’
linear regression. Both works on OLS principle and procedure to get
the best line is also similar. In the case of later, regression equation
will take a shape like:
Y=B0+B1X1+B2X2+B3X3.....
Where,
Bi : Different coefficients
Xi : Various independent variables
PYTHON
IMPLEMENTATION
#import libraries
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
%matplotlib inline

#load dataset
df = pd.read_csv('homeprices.csv')
df.head()
PYTHON
IMPLEMENTATION
#visualization
plt.xlabel('area')
plt.ylabel('price')
plt.scatter(df.area,df.price,color='red',marker='+')

new_df_x = df.drop('price',axis='columns')
new_df_x

price_y = df.price
price_y
# Create linear regression object
reg = linear_model.LinearRegression()
PYTHON
IMPLEMENTATION
#fit the data
reg.fit(new_df_x,price_y)
#predict
predict_price_y = reg.predict(new_df_x)
# model evaluation
rmse = mean_squared_error(price_y, predict_price_y)
r2 = r2_score(price_y, predict_price_y)
# printing values
print('Slope:' ,reg.coef_)
print('Intercept:', reg.intercept_)
print('Root mean squared error: ', rmse)
print('R2 score: ', r2)
PYTHON
IMPLEMENTATION
#(1) Predict price of a home with area = 3300 sqr ft
predct_price = reg.predict([[3300]])
predct_price
# plotting values
# data points
plt.scatter(new_df_x, price_y, s=10)
plt.xlabel('Area')
plt.ylabel('Price')
# predicted values
plt.plot(new_df_x, predict_price_y, color='r')
plt.show()

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Hanan
No ratings yet
Hanan
9 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
MOD3_EDA
No ratings yet
MOD3_EDA
16 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Linear Regression - 1st draft (1)
No ratings yet
Linear Regression - 1st draft (1)
5 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
Linear Regression - FDS
No ratings yet
Linear Regression - FDS
18 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Regression Unit-2
No ratings yet
Regression Unit-2
5 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
-18-Linear Regression
No ratings yet
-18-Linear Regression
29 pages
Module 4
No ratings yet
Module 4
41 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear regression for machine learning
No ratings yet
Linear regression for machine learning
9 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
ML Exp1 C36
No ratings yet
ML Exp1 C36
13 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Regression Analysis
No ratings yet
Regression Analysis
52 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
7 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Day 2-Data Science
No ratings yet
Day 2-Data Science
16 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Regression
No ratings yet
Regression
25 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
unit5_R
No ratings yet
unit5_R
5 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
5 - Part II - Regression Analysis w-notes(1)
No ratings yet
5 - Part II - Regression Analysis w-notes(1)
10 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Introduction to ML
No ratings yet
Introduction to ML
17 pages
Regression Metrics
No ratings yet
Regression Metrics
11 pages
k-means
No ratings yet
k-means
25 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Setting the Unit of Analysis
No ratings yet
Setting the Unit of Analysis
34 pages
Probability
No ratings yet
Probability
22 pages
Data Mining
No ratings yet
Data Mining
13 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Watson Studio (1)
No ratings yet
Watson Studio (1)
8 pages
Confusion Matrix
No ratings yet
Confusion Matrix
16 pages
Statistics
No ratings yet
Statistics
7 pages
CHAID Decision Tree
No ratings yet
CHAID Decision Tree
14 pages
Analytics Overview
No ratings yet
Analytics Overview
34 pages
Shypar: A Spectral Coarsening Approach To Hypergraph Partitioning
No ratings yet
Shypar: A Spectral Coarsening Approach To Hypergraph Partitioning
14 pages
Muslim Boy Names: You Can Also Add To The Names
No ratings yet
Muslim Boy Names: You Can Also Add To The Names
48 pages
Nernst Eqn
No ratings yet
Nernst Eqn
17 pages
Interface 2 Key Competences Diagnostic Test: Reading
No ratings yet
Interface 2 Key Competences Diagnostic Test: Reading
6 pages
Cell 7
No ratings yet
Cell 7
6 pages
Week 1 - Introduction
No ratings yet
Week 1 - Introduction
30 pages
42955097
No ratings yet
42955097
12 pages
Bryostatin-1 Alleviates Experimental Multiple Sclerosis
No ratings yet
Bryostatin-1 Alleviates Experimental Multiple Sclerosis
6 pages
Siliconsmart Ds
No ratings yet
Siliconsmart Ds
3 pages
Data Sheet VERTY NOVA 400 M
No ratings yet
Data Sheet VERTY NOVA 400 M
3 pages
OTT Platform Mini Project
No ratings yet
OTT Platform Mini Project
45 pages
Malpas
No ratings yet
Malpas
32 pages
De Tuyen Sinh Mau MH 24-25
No ratings yet
De Tuyen Sinh Mau MH 24-25
8 pages
Physical Science: Quarter 2 - Week 7
50% (2)
Physical Science: Quarter 2 - Week 7
16 pages
Msme Policy 2022-23
No ratings yet
Msme Policy 2022-23
20 pages
Passover Meditations
No ratings yet
Passover Meditations
60 pages
Soce-C 2019 W - Formula
100% (3)
Soce-C 2019 W - Formula
2 pages
2021 Article 419
No ratings yet
2021 Article 419
27 pages
US and Metric Thread Sizes PDF
No ratings yet
US and Metric Thread Sizes PDF
1 page
CT Skills1
No ratings yet
CT Skills1
2 pages
EDM F001 Course Syllabus - Masters MD21
No ratings yet
EDM F001 Course Syllabus - Masters MD21
5 pages
Trading Strategy - Technical Analysis With Python TA-Lib
No ratings yet
Trading Strategy - Technical Analysis With Python TA-Lib
12 pages
INTRO To CBR-1
No ratings yet
INTRO To CBR-1
35 pages
Literary Elements in The Scarlet Letter
100% (2)
Literary Elements in The Scarlet Letter
30 pages
CHEMISTRY Form 2 Term 2 Joint Exam 2022 Marking Scheme
No ratings yet
CHEMISTRY Form 2 Term 2 Joint Exam 2022 Marking Scheme
8 pages
BookMap NT User Guide
100% (1)
BookMap NT User Guide
14 pages
PAN'S PIPE
No ratings yet
PAN'S PIPE
5 pages
MMB Help
No ratings yet
MMB Help
552 pages
Datasheet Flatpack2 48-3000
No ratings yet
Datasheet Flatpack2 48-3000
2 pages
RATIO and PROPORTION
No ratings yet
RATIO and PROPORTION
20 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

LINEAR Python

Generally, in regression analysis, you usually consider some

Regression problems usually have one continuous and unbounded

as the vector 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the number of inputs.

variable 𝑦 on the set of independent variables 𝐱 = (𝑥₁, …, 𝑥ᵣ),

where 𝑟 is the number of predictors, you assume a linear

The mathematical formula of the linear regression can be written as

b0 and b1 are known as the regression beta coefficients or parameters:

Statistical independence of errors: The error terms (residuals)

Β1 explains the change in Y with a change in X by one unit. In

You might also like