0% found this document useful (0 votes)

626 views

3.multiple Linear Regression - Jupyter Notebook

The document discusses multiple linear regression analysis. It summarizes that multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables. It then builds a regression model to predict sales based on price, quantity ordered, and quarter. It finds that quantity ordered and price are significant predictors of sales, but quarter is not. Removing quarter improves the model. It interprets R-squared and adjusted R-squared as measures of the model's predictive power.

Uploaded by

AnuvidyaKarthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

626 views

3.multiple Linear Regression - Jupyter Notebook

Uploaded by

AnuvidyaKarthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

3/14/22, 6:16 AM 3.

Multiple Linear Regression - Jupyter Notebook

Loading required R packages

In [10]:

library(tidyverse)

-- Attaching packages ------------------------------------------------------

------------------------- tidyverse 1.3.1 --

v ggplot2 3.3.5 v purrr 0.3.4

v tibble 3.1.5 v dplyr 1.0.7

v tidyr 1.1.4 v stringr 1.4.0

v readr 2.0.2 v forcats 0.5.1

-- Conflicts ---------------------------------------------------------------
------------------- tidyverse_conflicts() --

x dplyr::filter() masks stats::filter()

x dplyr::lag() masks stats::lag()

As a predictive analysis, the multiple linear regression is used to explain the relationship between one
continuous dependent variable and two or more independent variables. The independent variables can be
continuous or categorical

There are 3 major uses for multiple linear regression analysis.

First, it might be used to identify the strength of
the effect that the independent variables have on a dependent variable.

Second, it can be used to forecast effects or impacts of changes. That is, multiple linear regression analysis
helps us to understand how much will the dependent variable change when we change the independent
variables.

Third, multiple linear regression analysis predicts trends and future values. The multiple linear regression
analysis can be used to get point estimates.

Loading the data

In [11]:

data=read.csv('F:/dharssini karthikeyan/COLLEGE sem IV/Predictive analytics/Lab/sales_data_

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 1/5

3/14/22, 6:16 AM 3.Multiple Linear Regression - Jupyter Notebook

In [12]:

head(data)

A data.frame: 6 × 25

ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER SALES ORDERDA

<int> <int> <dbl> <int> <dbl> <c

2/24/2
1 10107 30 95.70 2 2871.00
0

2 10121 34 81.35 5 2765.90 5/7/2003 0

3 10134 41 94.74 2 3884.34 7/1/2003 0

8/25/2
4 10145 45 83.26 6 3746.70
0

10/10/2
5 10159 49 100.00 14 5205.27
0

10/28/2
6 10168 36 96.66 1 3479.76
0

Building model
sales(dependent) = b0 + b1* price + b2* quantity_ordered+b3* quarter_id

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 2/5

3/14/22, 6:16 AM 3.Multiple Linear Regression - Jupyter Notebook

In [14]:

model <- lm(SALES ~ QUANTITYORDERED + QTR_ID + PRICEEACH, data = data)

summary(model)

Call:

lm(formula = SALES ~ QUANTITYORDERED + QTR_ID + PRICEEACH, data = data)

Residuals:

Min 1Q Median 3Q Max

-1488.0 -658.3 -241.9 373.6 6447.7

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5111.6059 108.3658 -47.170 <2e-16 ***

QUANTITYORDERED 103.6180 1.8418 56.260 <2e-16 ***

QTR_ID 10.4922 14.9035 0.704 0.481

PRICEEACH 59.7755 0.8888 67.254 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 952.5 on 2819 degrees of freedom

Multiple R-squared: 0.7329, Adjusted R-squared: 0.7326

F-statistic: 2578 on 3 and 2819 DF, p-value: < 2.2e-16

Interpretation:

examining the F-statistic and the associated p-value:

it can be seen that p-value of the F-statistic is < 2.2e-16, which is highly significant. This means that, at least,
one of the predictor variables is significantly related to the outcome variable.

To see which predictor variables are significant, we use coefficients table, which shows the estimate of
regression beta coefficients and the associated t-statitic p-values

In [15]:

summary(model)$coefficient

A matrix: 4 × 4 of type dbl

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5111.60587 108.365835 -47.1699025 0.0000000

QUANTITYORDERED 103.61802 1.841774 56.2599032 0.0000000

QTR_ID 10.49216 14.903451 0.7040086 0.4814856

PRICEEACH 59.77552 0.888806 67.2537321 0.0000000

For a given the predictor, the t-statistic evaluates whether or not there is significant association between the
predictor and the outcome variable, that is whether the beta coefficient of the predictor is significantly different
from zero.

It can be seen that,quantity ordered and price each are significantly associated to changes in sales while
changes in different quadrant is not significantly associated with sales.

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 3/5

3/14/22, 6:16 AM 3.Multiple Linear Regression - Jupyter Notebook

We found that different quadrants is not significant in the multiple regression model. This means that, for a fixed
change in quantity ordered and price each, changes in different quadrant will not significantly affect sales units.

As the quarter_id variable is not significant, it is possible to remove it from the model:

sales = b0 + b1* price + b2* quantity_ordered+b3* quarter_id

In [16]:

model <- lm(SALES ~ QUANTITYORDERED + PRICEEACH, data = data)

summary(model)

Call:

lm(formula = SALES ~ QUANTITYORDERED + PRICEEACH, data = data)

Residuals:

Min 1Q Median 3Q Max

-1492.8 -661.4 -243.8 372.3 6461.7

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5081.9486 99.8336 -50.90 <2e-16 ***

QUANTITYORDERED 103.5722 1.8405 56.27 <2e-16 ***

PRICEEACH 59.7811 0.8887 67.27 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 952.4 on 2820 degrees of freedom

Multiple R-squared: 0.7328, Adjusted R-squared: 0.7326

F-statistic: 3867 on 2 and 2820 DF, p-value: < 2.2e-16

sales = -5081.9486 +59.7811* price + 103.5722* quantity_ordered

R-squared
In multiple linear regression, the R2 represents the correlation coefficient between the observed values of the
outcome variable (y) and the fitted (i.e., predicted) values of y. For this reason, the value of R will always be
positive and will range from zero to one.

R2 represents the proportion of variance, in the outcome variable y, that may be predicted by knowing the value
of the x variables. An R2 value close to 1 indicates that the model explains a large portion of the variance in the
outcome variable.

A problem with the R2, is that, it will always increase when more variables are added to the model, even if
those variables are only weakly associated with the response. A solution is to adjust the R2 by taking into
account the number of predictor variables.

The adjustment in the “Adjusted R Square” value in the summary output is a correction for the number of x
variables included in the prediction model.

In [ ]:

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 4/5

3/14/22, 6:16 AM 3.Multiple Linear Regression - Jupyter Notebook

In [ ]:

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 5/5

Aptis Practice Test D
93% (15)
Aptis Practice Test D
18 pages
Capstone Project - DS With R
No ratings yet
Capstone Project - DS With R
2 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
17 pages
Upwork
No ratings yet
Upwork
333 pages
HW 6 Sol
100% (1)
HW 6 Sol
15 pages
Patki Paper 2
100% (1)
Patki Paper 2
488 pages
Ambady, N. & Bharucha, J. (2009) - Culture and The Brain.
No ratings yet
Ambady, N. & Bharucha, J. (2009) - Culture and The Brain.
5 pages
2016 Medical Diagnosis With The Aid of Using Fuzzy Logic
100% (1)
2016 Medical Diagnosis With The Aid of Using Fuzzy Logic
19 pages
Exam Preparation Python - Jupyter Notebook
No ratings yet
Exam Preparation Python - Jupyter Notebook
17 pages
Python Seaborn Tutorial - Jupyter Notebook
No ratings yet
Python Seaborn Tutorial - Jupyter Notebook
19 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
Python Basic - Jupyter Notebook
No ratings yet
Python Basic - Jupyter Notebook
15 pages
Python For Data Science Nympy and Pandas
No ratings yet
Python For Data Science Nympy and Pandas
4 pages
Matplotlib Exercises - Jupyter Notebook
No ratings yet
Matplotlib Exercises - Jupyter Notebook
7 pages
Numpy Pandas Cheatsheet PDF
No ratings yet
Numpy Pandas Cheatsheet PDF
1 page
Time Series Modeling: Shouvik Mani April 5, 2018
No ratings yet
Time Series Modeling: Shouvik Mani April 5, 2018
46 pages
Multivariate Linear Regression
No ratings yet
Multivariate Linear Regression
30 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
2 pages
Day-18 (Pandas in Python)
No ratings yet
Day-18 (Pandas in Python)
23 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Time Series Summary
100% (1)
Time Series Summary
23 pages
Time Series Analysis
100% (1)
Time Series Analysis
2 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
NumPy Arrays and Pandas Series Object
No ratings yet
NumPy Arrays and Pandas Series Object
18 pages
Question Bank For Midterm - Jupyter Notebook
No ratings yet
Question Bank For Midterm - Jupyter Notebook
25 pages
ML Project - Jupyter Notebook
No ratings yet
ML Project - Jupyter Notebook
5 pages
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
No ratings yet
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
52 pages
Jupyter Notebook Beginner Guide
No ratings yet
Jupyter Notebook Beginner Guide
10 pages
Metrics Final Slides From Darmouth PDF
100% (1)
Metrics Final Slides From Darmouth PDF
126 pages
Time Series Analysis - An Introduction
No ratings yet
Time Series Analysis - An Introduction
38 pages
Time Series Forecasting - Rose - Buisness Report
100% (1)
Time Series Forecasting - Rose - Buisness Report
69 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Time Series Python
67% (3)
Time Series Python
51 pages
Char Lie
100% (1)
Char Lie
64 pages
Vijay Borade - 03nov2023 - ENews - Express - Learner - Colaboratory - Final
No ratings yet
Vijay Borade - 03nov2023 - ENews - Express - Learner - Colaboratory - Final
23 pages
Time Series
No ratings yet
Time Series
40 pages
Graded Quiz 1 - Working With Python Great Lakes
No ratings yet
Graded Quiz 1 - Working With Python Great Lakes
6 pages
Machine Learning in 10 Pages PDF
No ratings yet
Machine Learning in 10 Pages PDF
10 pages
Independent Component Analysis: Algorithms and Applications
100% (1)
Independent Component Analysis: Algorithms and Applications
31 pages
Arima
100% (1)
Arima
4 pages
PM P L Lohitha 12-12-22 Business Report
100% (1)
PM P L Lohitha 12-12-22 Business Report
31 pages
Chapter Simple Linear Regression 1
100% (1)
Chapter Simple Linear Regression 1
77 pages
Time Series Analysis
No ratings yet
Time Series Analysis
3 pages
The Next Level of Data Visualization in Python
100% (1)
The Next Level of Data Visualization in Python
17 pages
Untitled5 - Jupyter Notebook
100% (2)
Untitled5 - Jupyter Notebook
11 pages
Pradeep Chauhan Business Report 09july'23
100% (1)
Pradeep Chauhan Business Report 09july'23
32 pages
Time Series Models and Forecasting and Forecasting
No ratings yet
Time Series Models and Forecasting and Forecasting
49 pages
Statistics 578 Assignment 5 Homework
100% (6)
Statistics 578 Assignment 5 Homework
13 pages
Stata Ts Introduction To Time-Series Commands
No ratings yet
Stata Ts Introduction To Time-Series Commands
6 pages
Spectral Clustering: Eyal David Image Processing Seminar May 2008
No ratings yet
Spectral Clustering: Eyal David Image Processing Seminar May 2008
52 pages
Mod7 CVX CVXOPT
No ratings yet
Mod7 CVX CVXOPT
69 pages
VAR Lecture2
100% (1)
VAR Lecture2
39 pages
Independent Component Analysis
100% (1)
Independent Component Analysis
16 pages
Australian Gas Production - Project On Time Series Forecasting
100% (19)
Australian Gas Production - Project On Time Series Forecasting
29 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
Jupyter Notebook Help
No ratings yet
Jupyter Notebook Help
18 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
Polynomial Regression and Step Function
100% (1)
Polynomial Regression and Step Function
6 pages
Regression analysis
No ratings yet
Regression analysis
7 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
34 pages
Amit Sir - Assignment
No ratings yet
Amit Sir - Assignment
19 pages
Machine Learning-Lecture 1(Student)
No ratings yet
Machine Learning-Lecture 1(Student)
14 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
NASA's Space Science and Applications Program
No ratings yet
NASA's Space Science and Applications Program
208 pages
Performance Appraisal Sample Wording
No ratings yet
Performance Appraisal Sample Wording
6 pages
Download Complete Natural Law Ethics in Theory and Practice A Joseph Boyle Reader 1st Edition Joseph Boyle PDF for All Chapters
100% (4)
Download Complete Natural Law Ethics in Theory and Practice A Joseph Boyle Reader 1st Edition Joseph Boyle PDF for All Chapters
40 pages
Relief Device - Inlet Piping - Beyond The 3% Rule
No ratings yet
Relief Device - Inlet Piping - Beyond The 3% Rule
6 pages
Social and Emotional Learning
No ratings yet
Social and Emotional Learning
15 pages
ME5118 Chap 6
No ratings yet
ME5118 Chap 6
22 pages
Direct Address
No ratings yet
Direct Address
7 pages
6th Bio Worksheet 5 Answer
No ratings yet
6th Bio Worksheet 5 Answer
10 pages
Levianelotulung, JURNAL ILONA PIRI PDF
No ratings yet
Levianelotulung, JURNAL ILONA PIRI PDF
12 pages
Weldtite Catalogue 2016
No ratings yet
Weldtite Catalogue 2016
43 pages
Connections-Bolted 1
No ratings yet
Connections-Bolted 1
46 pages
Notes On Sanyam and Swasthya
No ratings yet
Notes On Sanyam and Swasthya
3 pages
Social Cognitive Theory
No ratings yet
Social Cognitive Theory
3 pages
Non-Linear Dynamics Homework Solutions Week 4: Strogatz Portion
No ratings yet
Non-Linear Dynamics Homework Solutions Week 4: Strogatz Portion
4 pages
Warrior Acquisition Tutorial
100% (1)
Warrior Acquisition Tutorial
66 pages
Journal of Power: To Cite This Article: Richard Jenkins (2008) : Erving Goffman: A Major Theorist of Power?, Journal of
No ratings yet
Journal of Power: To Cite This Article: Richard Jenkins (2008) : Erving Goffman: A Major Theorist of Power?, Journal of
13 pages
BRKENS-1852
No ratings yet
BRKENS-1852
45 pages
Claudia Eckert Castellano
No ratings yet
Claudia Eckert Castellano
3 pages
Communication-Signal Analysis
No ratings yet
Communication-Signal Analysis
12 pages
An Emergency Command Center
No ratings yet
An Emergency Command Center
44 pages
Aircraft Systems Lab Manual
100% (1)
Aircraft Systems Lab Manual
28 pages
VR17 17ME1204: Siddhartha Engineering College
No ratings yet
VR17 17ME1204: Siddhartha Engineering College
2 pages
Lattner Boiler Company - Instruction Manual For He Boilers
100% (1)
Lattner Boiler Company - Instruction Manual For He Boilers
41 pages
Myths and Realities of Charging Dri/Hbi in Electric Arc Furnaces
No ratings yet
Myths and Realities of Charging Dri/Hbi in Electric Arc Furnaces
11 pages
What Is The Electron
100% (3)
What Is The Electron
288 pages
Nurul Nadiah Binti Kamal: School Badge
No ratings yet
Nurul Nadiah Binti Kamal: School Badge
12 pages
Project Controls Specification Summary
No ratings yet
Project Controls Specification Summary
16 pages

3.multiple Linear Regression - Jupyter Notebook

Uploaded by

3.multiple Linear Regression - Jupyter Notebook

Uploaded by

3/14/22, 6:16 AM 3.

Multiple Linear Regression - Jupyter Notebook

Loading required R packages

-- Attaching packages ------------------------------------------------------

v ggplot2 3.3.5 v purrr 0.3.4

v tibble 3.1.5 v dplyr 1.0.7

v tidyr 1.1.4 v stringr 1.4.0

v readr 2.0.2 v forcats 0.5.1

x dplyr::filter() masks stats::filter()

x dplyr::lag() masks stats::lag()

There are 3 major uses for multiple linear regression analysis.

Loading the data

data=read.csv('F:/dharssini karthikeyan/COLLEGE sem IV/Predictive analytics/Lab/sales_data_

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 1/5

ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER SALES ORDERDA

<int> <int> <dbl> <int> <dbl> <c

2 10121 34 81.35 5 2765.90 5/7/2003 0

3 10134 41 94.74 2 3884.34 7/1/2003 0

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 2/5

model <- lm(SALES ~ QUANTITYORDERED + QTR_ID + PRICEEACH, data = data)

lm(formula = SALES ~ QUANTITYORDERED + QTR_ID + PRICEEACH, data = data)

Min 1Q Median 3Q Max

-1488.0 -658.3 -241.9 373.6 6447.7

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5111.6059 108.3658 -47.170 <2e-16 ***

QUANTITYORDERED 103.6180 1.8418 56.260 <2e-16 ***

QTR_ID 10.4922 14.9035 0.704 0.481

PRICEEACH 59.7755 0.8888 67.254 <2e-16 ***

Residual standard error: 952.5 on 2819 degrees of freedom

Multiple R-squared: 0.7329, Adjusted R-squared: 0.7326

F-statistic: 2578 on 3 and 2819 DF, p-value: < 2.2e-16

examining the F-statistic and the associated p-value:

A matrix: 4 × 4 of type dbl

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5111.60587 108.365835 -47.1699025 0.0000000

QUANTITYORDERED 103.61802 1.841774 56.2599032 0.0000000

QTR_ID 10.49216 14.903451 0.7040086 0.4814856

PRICEEACH 59.77552 0.888806 67.2537321 0.0000000

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 3/5

sales = b0 + b1* price + b2* quantity_ordered+b3* quarter_id

model <- lm(SALES ~ QUANTITYORDERED + PRICEEACH, data = data)

lm(formula = SALES ~ QUANTITYORDERED + PRICEEACH, data = data)

Min 1Q Median 3Q Max

-1492.8 -661.4 -243.8 372.3 6461.7

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5081.9486 99.8336 -50.90 <2e-16 ***

QUANTITYORDERED 103.5722 1.8405 56.27 <2e-16 ***

PRICEEACH 59.7811 0.8887 67.27 <2e-16 ***

Residual standard error: 952.4 on 2820 degrees of freedom

Multiple R-squared: 0.7328, Adjusted R-squared: 0.7326

F-statistic: 3867 on 2 and 2820 DF, p-value: < 2.2e-16

sales = -5081.9486 +59.7811* price + 103.5722* quantity_ordered

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 4/5

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 5/5

You might also like