0% found this document useful (0 votes)
626 views

3.multiple Linear Regression - Jupyter Notebook

The document discusses multiple linear regression analysis. It summarizes that multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables. It then builds a regression model to predict sales based on price, quantity ordered, and quarter. It finds that quantity ordered and price are significant predictors of sales, but quarter is not. Removing quarter improves the model. It interprets R-squared and adjusted R-squared as measures of the model's predictive power.

Uploaded by

AnuvidyaKarthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
626 views

3.multiple Linear Regression - Jupyter Notebook

The document discusses multiple linear regression analysis. It summarizes that multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables. It then builds a regression model to predict sales based on price, quantity ordered, and quarter. It finds that quantity ordered and price are significant predictors of sales, but quarter is not. Removing quarter improves the model. It interprets R-squared and adjusted R-squared as measures of the model's predictive power.

Uploaded by

AnuvidyaKarthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

3/14/22, 6:16 AM 3.

Multiple Linear Regression - Jupyter Notebook

Loading required R packages


In [10]:

library(tidyverse)

-- Attaching packages ------------------------------------------------------


------------------------- tidyverse 1.3.1 --

v ggplot2 3.3.5 v purrr 0.3.4

v tibble 3.1.5 v dplyr 1.0.7

v tidyr 1.1.4 v stringr 1.4.0

v readr 2.0.2 v forcats 0.5.1

-- Conflicts ---------------------------------------------------------------
------------------- tidyverse_conflicts() --

x dplyr::filter() masks stats::filter()

x dplyr::lag() masks stats::lag()

As a predictive analysis, the multiple linear regression is used to explain the relationship between one
continuous dependent variable and two or more independent variables. The independent variables can be
continuous or categorical

There are 3 major uses for multiple linear regression analysis.


First, it might be used to identify the strength of
the effect that the independent variables have on a dependent variable.

Second, it can be used to forecast effects or impacts of changes. That is, multiple linear regression analysis
helps us to understand how much will the dependent variable change when we change the independent
variables.

Third, multiple linear regression analysis predicts trends and future values. The multiple linear regression
analysis can be used to get point estimates.

Loading the data


In [11]:

data=read.csv('F:/dharssini karthikeyan/COLLEGE sem IV/Predictive analytics/Lab/sales_data_

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 1/5


3/14/22, 6:16 AM 3.Multiple Linear Regression - Jupyter Notebook

In [12]:

head(data)

A data.frame: 6 × 25

ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER SALES ORDERDA

<int> <int> <dbl> <int> <dbl> <c

2/24/2
1 10107 30 95.70 2 2871.00
0

2 10121 34 81.35 5 2765.90 5/7/2003 0

3 10134 41 94.74 2 3884.34 7/1/2003 0

8/25/2
4 10145 45 83.26 6 3746.70
0

10/10/2
5 10159 49 100.00 14 5205.27
0

10/28/2
6 10168 36 96.66 1 3479.76
0

Building model
sales(dependent) = b0 + b1* price + b2* quantity_ordered+b3* quarter_id

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 2/5


3/14/22, 6:16 AM 3.Multiple Linear Regression - Jupyter Notebook

In [14]:

model <- lm(SALES ~ QUANTITYORDERED + QTR_ID + PRICEEACH, data = data)


summary(model)

Call:

lm(formula = SALES ~ QUANTITYORDERED + QTR_ID + PRICEEACH, data = data)

Residuals:

Min 1Q Median 3Q Max

-1488.0 -658.3 -241.9 373.6 6447.7

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5111.6059 108.3658 -47.170 <2e-16 ***

QUANTITYORDERED 103.6180 1.8418 56.260 <2e-16 ***

QTR_ID 10.4922 14.9035 0.704 0.481

PRICEEACH 59.7755 0.8888 67.254 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 952.5 on 2819 degrees of freedom

Multiple R-squared: 0.7329, Adjusted R-squared: 0.7326

F-statistic: 2578 on 3 and 2819 DF, p-value: < 2.2e-16

Interpretation:

examining the F-statistic and the associated p-value:

it can be seen that p-value of the F-statistic is < 2.2e-16, which is highly significant. This means that, at least,
one of the predictor variables is significantly related to the outcome variable.

To see which predictor variables are significant, we use coefficients table, which shows the estimate of
regression beta coefficients and the associated t-statitic p-values

In [15]:

summary(model)$coefficient

A matrix: 4 × 4 of type dbl

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5111.60587 108.365835 -47.1699025 0.0000000

QUANTITYORDERED 103.61802 1.841774 56.2599032 0.0000000

QTR_ID 10.49216 14.903451 0.7040086 0.4814856

PRICEEACH 59.77552 0.888806 67.2537321 0.0000000

For a given the predictor, the t-statistic evaluates whether or not there is significant association between the
predictor and the outcome variable, that is whether the beta coefficient of the predictor is significantly different
from zero.

It can be seen that,quantity ordered and price each are significantly associated to changes in sales while
changes in different quadrant is not significantly associated with sales.

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 3/5


3/14/22, 6:16 AM 3.Multiple Linear Regression - Jupyter Notebook

We found that different quadrants is not significant in the multiple regression model. This means that, for a fixed
change in quantity ordered and price each, changes in different quadrant will not significantly affect sales units.

As the quarter_id variable is not significant, it is possible to remove it from the model:

sales = b0 + b1* price + b2* quantity_ordered+b3* quarter_id

In [16]:

model <- lm(SALES ~ QUANTITYORDERED + PRICEEACH, data = data)


summary(model)

Call:

lm(formula = SALES ~ QUANTITYORDERED + PRICEEACH, data = data)

Residuals:

Min 1Q Median 3Q Max

-1492.8 -661.4 -243.8 372.3 6461.7

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -5081.9486 99.8336 -50.90 <2e-16 ***

QUANTITYORDERED 103.5722 1.8405 56.27 <2e-16 ***

PRICEEACH 59.7811 0.8887 67.27 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 952.4 on 2820 degrees of freedom

Multiple R-squared: 0.7328, Adjusted R-squared: 0.7326

F-statistic: 3867 on 2 and 2820 DF, p-value: < 2.2e-16

sales = -5081.9486 +59.7811* price + 103.5722* quantity_ordered

R-squared
In multiple linear regression, the R2 represents the correlation coefficient between the observed values of the
outcome variable (y) and the fitted (i.e., predicted) values of y. For this reason, the value of R will always be
positive and will range from zero to one.

R2 represents the proportion of variance, in the outcome variable y, that may be predicted by knowing the value
of the x variables. An R2 value close to 1 indicates that the model explains a large portion of the variance in the
outcome variable.

A problem with the R2, is that, it will always increase when more variables are added to the model, even if
those variables are only weakly associated with the response. A solution is to adjust the R2 by taking into
account the number of predictor variables.

The adjustment in the “Adjusted R Square” value in the summary output is a correction for the number of x
variables included in the prediction model.

In [ ]:

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 4/5


3/14/22, 6:16 AM 3.Multiple Linear Regression - Jupyter Notebook

In [ ]:

In [ ]:

In [ ]:

In [ ]:

localhost:8888/notebooks/3.Multiple Linear Regression.ipynb# 5/5

You might also like