0% found this document useful (0 votes)
2 views

Linear Regression

The document provides an overview of regression analysis, focusing on linear regression as a method to predict a dependent variable based on independent variables. It outlines the assumptions, prerequisites, and properties of linear regression, including the importance of the regression equation and model performance metrics. Additionally, it includes a Python implementation example for linear regression using libraries like pandas and scikit-learn.

Uploaded by

abhijaychauhan88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Linear Regression

The document provides an overview of regression analysis, focusing on linear regression as a method to predict a dependent variable based on independent variables. It outlines the assumptions, prerequisites, and properties of linear regression, including the importance of the regression equation and model performance metrics. Additionally, it includes a Python implementation example for linear regression using libraries like pandas and scikit-learn.

Uploaded by

abhijaychauhan88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

LINEAR Python

REGRESSION
WHAT IS REGRESSION?
Regression searches for relationships among variables.
For example, you can observe several employees of some company
and try to understand how their salaries depend on the features,
such as experience, level of education, role, city they work in, and
so on.

Generally, in regression analysis, you usually consider some


phenomenon of interest and have a number of observations. Each
observation has two or more features. Following the assumption
that (at least) one of the features depends on the others, you try to
establish a relation among them.
WHAT IS REGRESSION?
The dependent features are called the dependent variables, outputs,
or responses.
The independent features are called the independent variables,
inputs, or predictors.

Regression problems usually have one continuous and unbounded


dependent variable. The inputs, however, can be continuous, discrete,
or even categorical data such as gender, nationality, brand, and so on.
It is a common practice to denote the outputs with 𝑦 and inputs with 𝑥.

as the vector 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the number of inputs.


If there are two or more independent variables, they can be represented
LINEAR REGRESSION
Linear regression is probably one of the most important and widely
used regression techniques. It’s among the simplest regression
methods. One of its main advantages is the ease of interpreting
results.
Problem Formulation

variable 𝑦 on the set of independent variables 𝐱 = (𝑥₁, …, 𝑥ᵣ),


When implementing linear regression of some dependent

where 𝑟 is the number of predictors, you assume a linear


relationship between 𝑦 and 𝐱: 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀. This
equation is the regression equation. 𝛽₀, 𝛽₁, …, 𝛽ᵣ are the
regression coefficients, and 𝜀 is the random error.
SIMPLE LINEAR
REGRESSION
Introduction
“Linear Regression” is a method to predict dependent variable (Y) based on values of
independent variables (X). It can be used for the cases where we want to predict
some continuous quantity. E.g., Predicting traffic in a retail store, predicting a user’s
dwell time or number of pages visited on xyz.com etc.

The mathematical formula of the linear regression can be written as


y = b0 + b1*x , where:

b0 and b1 are known as the regression beta coefficients or parameters:


b0 is the intercept of the regression line; that is the predicted value when x = 0.
b1 is the slope of the regression line.
Y = B0 + B1*X
PREREQUISITES
To start with Linear Regression, you must be aware of a few basic
concepts of statistics. i.e.,
Correlation (r) – Explains the relationship between two variables,
possible values -1 to +1
Variance (σ2) – Measure of spread in your data
Standard Deviation (σ) – Measure of spread in your data (Square
root of Variance)
Normal distribution
Residual (error term) – {Actual value – Predicted value}
ASSUMPTIONS OF LINEAR
REGRESSION
In order to fit a linear regression line data should satisfy few basic
but important assumptions. If your data doesn’t follow the
assumptions, your results may be wrong as well as misleading.
Linearity & Additive: There should be a linear relationship
between dependent and independent variables and the impact of
change in independent variable values should have additive impact
on dependent variable.
Normality of error distribution: Distribution of differences
between Actual & Predicted values (Residuals) should be normally
distributed.
ASSUMPTIONS OF LINEAR
REGRESSION
Homoscedasticity: Variance of errors should be constant versus,
 Time
 The predictions
 Independent variable values

Statistical independence of errors: The error terms (residuals)


should not have any correlation among themselves. E.g., In case of
time series data there shouldn’t be any correlation between
consecutive error terms
LINEAR REGRESSION LINE
While doing linear regression our objective is to fit a line through
the distribution which is nearest to most of the points. Hence
reducing the distance (error term) of data points from the fitted
line.
BASIC PARAMETERS
For example, in above figure (left) dots represent various data points and line
(right) represents an approximate line which can explain the relationship
between ‘x’ & ‘y’ axes. Through, linear regression we try to find out such a
line. For example, if we have one dependent variable ‘Y’ and one independent
variable ‘X’ – relationship between ‘X’ & ‘Y’ can be represented in a form of
following equation:
Y = Β0 + Β1X
Where,
Y = Dependent Variable
X = Independent Variable
Β0 = Constant term a.k.a Intercept
Β1 = Coefficient of relationship between ‘X’ & ‘Y’
FEW PROPERTIES OF LINEAR
REGRESSION LINE
Regression line always passes through mean of independent
variable (x) as well as mean of dependent variable (y)
Regression line minimizes the sum of “Square of Residuals”. That’s
why the method of Linear Regression is known as “Ordinary Least
Square (OLS)”

Β1 explains the change in Y with a change in X by one unit. In


other words, if we increase the value of ‘X’ by one unit then what
will be the change in value of Y
MODEL PERFORMANCE
Once you build the model, the next logical question comes in mind is to
know whether your model is good enough to predict in future or the
relationship which you built between dependent and independent variables
is good enough or not.
For this purpose there are various metrics which we look into-
R – Square (R2)
Adjusted R-Square
Mean Square Error (MSE)
Root Mean Square Error (RMSE)
Mean Absolute Error (MAE)
Mean Absolute Percentage Error (MAPE)
MULTIPLE LINEAR
REGRESSION
Fundamentally there is no difference between ‘Simple’ & ‘Multiple’
linear regression. Both works on OLS principle and procedure to get
the best line is also similar. In the case of later, regression equation
will take a shape like:
Y=B0+B1X1+B2X2+B3X3.....
Where,
Bi : Different coefficients
Xi : Various independent variables
PYTHON
IMPLEMENTATION
#import libraries
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
%matplotlib inline

#load dataset
df = pd.read_csv('homeprices.csv')
df.head()
PYTHON
IMPLEMENTATION
#visualization
plt.xlabel('area')
plt.ylabel('price')
plt.scatter(df.area,df.price,color='red',marker='+')

new_df_x = df.drop('price',axis='columns')
new_df_x

price_y = df.price
price_y
# Create linear regression object
reg = linear_model.LinearRegression()
PYTHON
IMPLEMENTATION
#fit the data
reg.fit(new_df_x,price_y)
#predict
predict_price_y = reg.predict(new_df_x)
# model evaluation
rmse = mean_squared_error(price_y, predict_price_y)
r2 = r2_score(price_y, predict_price_y)
# printing values
print('Slope:' ,reg.coef_)
print('Intercept:', reg.intercept_)
print('Root mean squared error: ', rmse)
print('R2 score: ', r2)
PYTHON
IMPLEMENTATION
#(1) Predict price of a home with area = 3300 sqr ft
predct_price = reg.predict([[3300]])
predct_price
# plotting values
# data points
plt.scatter(new_df_x, price_y, s=10)
plt.xlabel('Area')
plt.ylabel('Price')
# predicted values
plt.plot(new_df_x, predict_price_y, color='r')
plt.show()

You might also like