0% found this document useful (0 votes)

57 views12 pages

Lab 6 - Linear Regression and Multiple Linear Regression

The document provides an overview of linear regression, including simple and multiple linear regression implementations in Python. It explains the assumptions of linear regression, such as linearity, multicollinearity, autocorrelation, and homoscedasticity, and discusses applications in various fields like economics and finance. Additionally, it includes code examples for implementing linear regression using datasets like Boston housing and diabetes data.

Uploaded by

prof.severussnape.hp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views12 pages

Lab 6 - Linear Regression and Multiple Linear Regression

Uploaded by

prof.severussnape.hp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Linear Regression (Python Implementation)

Linear regression is a statistical method for modeling relationships between a dependent

variable with a given set of independent variables.
Note: We refer to dependent variables as responses and independent variables
as features for simplicity.

In order to provide a basic understanding of linear regression, we start with the most
basic version of linear regression, i.e. Simple linear regression.

Simple Linear Regression

Simple linear regression is an approach for predicting a response using a single feature.
It is assumed that the two variables are linearly related. Hence, we try to find a linear
function that predicts the response value(y) as accurately as possible as a function of the
feature or independent variable(x).

Let us consider a dataset where we have a value of response y for every feature x:

For generality, we define:

x as feature vector, i.e x = [x_1, x_2, …., x_n],

y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in above example, n=10).

A scatter plot of the above dataset looks like:-

Code: Python implementation of above technique
1) Run IDLE
2) Click File>New File and type the following code and save the code as LRM.py
import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if __name__ == "__main__":
main()
3) Click Run>Run Module and observe the following output and model graph

Output:
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
And graph obtained looks like this:

4) Change X and Y with different values and run the LRM.py file and note down the output in
observation

5) Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing Linear
Regression modeling and note down the steps and outputs in your observation.
Multiple linear regression
Code: Python implementation of multiple linear regression techniques on
the Boston house pricing dataset using Scikit-learn.
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, metrics

# load the boston dataset

boston = datasets.load_boston(return_X_y=False)

# defining feature matrix(X) and response vector(y)

X = boston.data
y = boston.target

# splitting X and y into training and testing sets

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,

random_state=1)

# create linear regression object

reg = linear_model.LinearRegression()

# train the model using the training sets

reg.fit(X_train, y_train)

# regression coefficients
print('Coefficients: ', reg.coef_)

# variance score: 1 means perfect prediction

print('Variance score: {}'.format(reg.score(X_test, y_test)))

# plot for residual error

## setting plot style
plt.style.use('fivethirtyeight')

## plotting residual errors in training data

plt.scatter(reg.predict(X_train), reg.predict(X_train) - y_train,
color = "green", s = 10, label = 'Train data')

## plotting residual errors in test data

plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test,
color = "blue", s = 10, label = 'Test data')

## plotting line for zero residual error

plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2)

## plotting legend
plt.legend(loc = 'upper right')

## plot title
plt.title("Residual errors")

## method call for showing the plot

plt.show()

Output:
Coefficients:
[ -8.80740828e-02 6.72507352e-02 5.10280463e-02 2.18879172e+00
-1.72283734e+01 3.62985243e+00 2.13933641e-03 -1.36531300e+00
2.88788067e-01 -1.22618657e-02 -8.36014969e-01 9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

and Residual Error plot looks like this:

Exercise:
Use the diabetes data set from UCI and Pima Indians Diabetes data set and
perform multiple linear regression. Also compare the results of the above
analysis for the two data sets

In the above example, we determine the accuracy score using Explained

Variance Score.

We define:
explained_variance_score = 1 – Var{y – y’}/Var{y}

where y’ is the estimated target output, y the corresponding (correct) target

output, and Var is Variance, the square of the standard deviation.
The best possible score is 1.0, lower values are worse.

Assumptions:
Given below are the basic assumptions that a linear regression model makes
regarding a dataset on which it is applied:
 Linear relationship: Relationship between response and feature variables
should be linear. The linearity assumption can be tested using scatter plots.
As shown below, 1st figure represents linearly related variables whereas
variables in the 2nd and 3rd figures are most likely non-linear. So, 1st figure
will give better predictions using linear regression.
 Little or no multi-collinearity: It is assumed that there is little or no
multicollinearity in the data. Multicollinearity occurs when the features (or
independent variables) are not independent of each other.
 Little or no auto-correlation: Another assumption is that there is little or no
autocorrelation in the data. Autocorrelation occurs when the residual errors
are not independent of each other.
 Homoscedasticity: Homoscedasticity describes a situation in which the
error term (that is, the “noise” or random disturbance in the relationship
between the independent variables and the dependent variable) is the same
across all values of the independent variables. As shown below, figure 1 has
homoscedasticity while figure 2 has heteroscedasticity.
Applications:

 Trend lines: A trend line represents the variation in quantitative data with
the passage of time (like GDP, oil prices, etc.). These trends usually follow a
linear relationship. Hence, linear regression can be applied to predict future
values. However, this method suffers from a lack of scientific validity in
cases where other potential changes can affect the data.

 Economics: Linear regression is the predominant empirical tool in

economics. For example, it is used to predict consumer spending, fixed
investment spending, inventory investment, purchases of a country’s
exports, spending on imports, the demand to hold liquid assets, labor
demand, and labor supply.

 Finance: The capital price asset model uses linear regression to analyze
and quantify the systematic risks of an investment.

 Biology: Linear regression is used to model causal relationships between

parameters in biological systems.
References:

 https://www.geeksforgeeks.org/linear-regression-python-implementation/
 https://en.wikipedia.org/wiki/Linear_regression
 https://en.wikipedia.org/wiki/Simple_linear_regression
 http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
 http://www.statisticssolutions.com/assumptions-of-linear-regression/

Regression Analysis and Equations
No ratings yet
Regression Analysis and Equations
16 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Dav 2,3
No ratings yet
Dav 2,3
6 pages
Practical 5
No ratings yet
Practical 5
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
223a1131 ML Exp 1
No ratings yet
223a1131 ML Exp 1
8 pages
ML Exp1 C36
No ratings yet
ML Exp1 C36
13 pages
ML Linear Regression Trupesh Patel
No ratings yet
ML Linear Regression Trupesh Patel
23 pages
Exp 1 121a1047 Lavanya Kurup ML
No ratings yet
Exp 1 121a1047 Lavanya Kurup ML
11 pages
Dav Exp
No ratings yet
Dav Exp
11 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
4 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Regression Model
No ratings yet
Regression Model
6 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Linear Regression Code
No ratings yet
Linear Regression Code
5 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
AI Lab9
No ratings yet
AI Lab9
5 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
ML Unit
No ratings yet
ML Unit
23 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
DS Unit 4
No ratings yet
DS Unit 4
21 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
47 Exp2 Dav
No ratings yet
47 Exp2 Dav
15 pages
Practical 8
No ratings yet
Practical 8
5 pages
Linear Regression Explained
No ratings yet
Linear Regression Explained
8 pages
Simple Linear Regression in Python
No ratings yet
Simple Linear Regression in Python
3 pages
Simple Linear Regression (Precious)
No ratings yet
Simple Linear Regression (Precious)
3 pages
ML Exp 2
No ratings yet
ML Exp 2
8 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
3 pages
AIand MLlab 5
No ratings yet
AIand MLlab 5
10 pages
Assignment 7
No ratings yet
Assignment 7
4 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Implementation of Linear Regression With Python
No ratings yet
Implementation of Linear Regression With Python
5 pages
Regression
No ratings yet
Regression
16 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Module 2 Notes
No ratings yet
Module 2 Notes
4 pages
Linear Regression - FDS
No ratings yet
Linear Regression - FDS
18 pages
Linear Regression on Indian Temperature Data
No ratings yet
Linear Regression on Indian Temperature Data
7 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
171 pages
Linear Regression Notes Extended
No ratings yet
Linear Regression Notes Extended
3 pages
Exp 2
No ratings yet
Exp 2
6 pages
Exp 4 - LM
No ratings yet
Exp 4 - LM
5 pages
Simple Linear Regression: Math Behind
0% (1)
Simple Linear Regression: Math Behind
6 pages
ML Exp 1
No ratings yet
ML Exp 1
6 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Dav Exp3
No ratings yet
Dav Exp3
3 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Linear Regression Mastry
No ratings yet
Linear Regression Mastry
6 pages
Linear Regression Model 1
No ratings yet
Linear Regression Model 1
23 pages
Ivsc International Valuation Standards
No ratings yet
Ivsc International Valuation Standards
23 pages
Correlation
No ratings yet
Correlation
9 pages
Chapter 5 - Lect Note
No ratings yet
Chapter 5 - Lect Note
8 pages
04 Employee Compensation - Post-Employment and Share-Based
No ratings yet
04 Employee Compensation - Post-Employment and Share-Based
24 pages
ESIC Recruitment for IT and Consultant Roles
No ratings yet
ESIC Recruitment for IT and Consultant Roles
17 pages
Chapter 4 - CPH Lecture
No ratings yet
Chapter 4 - CPH Lecture
4 pages
Reading 2 Time-Series Analysis - Answers
No ratings yet
Reading 2 Time-Series Analysis - Answers
63 pages
Actuaries: Key Players in Insurance
No ratings yet
Actuaries: Key Players in Insurance
90 pages
Black-Scholes-Merton Model Problems
100% (1)
Black-Scholes-Merton Model Problems
2 pages
Insurance Control Strategies
No ratings yet
Insurance Control Strategies
8 pages
Practical 10A
No ratings yet
Practical 10A
1 page
Grade 7 Mps q4 Post Test Science
No ratings yet
Grade 7 Mps q4 Post Test Science
4 pages
Relationships Between Two Quantitative Variables: Questions On Topic Four
No ratings yet
Relationships Between Two Quantitative Variables: Questions On Topic Four
6 pages
ORM Tools & Risk Management Guide
No ratings yet
ORM Tools & Risk Management Guide
52 pages
Econometrics Diagnostics for Finance
No ratings yet
Econometrics Diagnostics for Finance
3 pages
RGRSSN Assgnmnt
No ratings yet
RGRSSN Assgnmnt
11 pages
Assignment - Group 1
No ratings yet
Assignment - Group 1
2 pages
Challenges in Actuarial Science Today
No ratings yet
Challenges in Actuarial Science Today
13 pages
Demography Lec 2. Population Pyramids
No ratings yet
Demography Lec 2. Population Pyramids
81 pages
Life Primer (Tables)
No ratings yet
Life Primer (Tables)
53 pages
Epidemiology: Risk & Odds Ratios
No ratings yet
Epidemiology: Risk & Odds Ratios
56 pages
Qualitative Data Models Guide
No ratings yet
Qualitative Data Models Guide
34 pages
Awash Insurance Company
No ratings yet
Awash Insurance Company
3 pages
Compound Interest Practice Questions
No ratings yet
Compound Interest Practice Questions
21 pages
Machine Learning Assignment on Regression
No ratings yet
Machine Learning Assignment on Regression
2 pages
SA3 Study Guide 2024
No ratings yet
SA3 Study Guide 2024
30 pages
Risk and Return Analysis Assignment
No ratings yet
Risk and Return Analysis Assignment
2 pages
EPI Formula Sheet
No ratings yet
EPI Formula Sheet
4 pages
Time Series Regression Analysis Techniques
No ratings yet
Time Series Regression Analysis Techniques
6 pages
Pressure Switch and Insurance Insights
No ratings yet
Pressure Switch and Insurance Insights
115 pages

Lab 6 - Linear Regression and Multiple Linear Regression

Uploaded by

Lab 6 - Linear Regression and Multiple Linear Regression

Uploaded by

Linear Regression (Python Implementation)

Linear regression is a statistical method for modeling relationships between a dependent

Simple Linear Regression

For generality, we define:

x as feature vector, i.e x = [x_1, x_2, …., x_n],

A scatter plot of the above dataset looks like:-

def estimate_coef(x, y):

# mean of x and y vector

# calculating cross-deviation and deviation about x

# calculating regression coefficients

return (b_0, b_1)

def plot_regression_line(x, y, b):

# predicted response vector

# plotting the regression line

# function to show plot

# plotting regression line

# load the boston dataset

# defining feature matrix(X) and response vector(y)

# splitting X and y into training and testing sets

# create linear regression object

# train the model using the training sets

# variance score: 1 means perfect prediction

# plot for residual error

## plotting residual errors in training data

## plotting residual errors in test data

## plotting line for zero residual error

## method call for showing the plot

and Residual Error plot looks like this:

In the above example, we determine the accuracy score using Explained

where y’ is the estimated target output, y the corresponding (correct) target

 Economics: Linear regression is the predominant empirical tool in

 Biology: Linear regression is used to model causal relationships between

You might also like