0% found this document useful (0 votes)

17 views34 pages

Regression

Uploaded by

dianeadouke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views34 pages

Regression

Uploaded by

dianeadouke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Regression Models

Serge Nyawa

October 2023
Roadmap

▶ Linear Regression
▶ Logistic Regression
Introductory examples

Given the fact that Digital Transformation has been revealed to

have remarkable impacts on economic development, factors that
impact Digitalization is of great interest for researchers and policy
makers. We are interested on relationship between the
digitalization level, as measured by an index of digitalization, and a
set of economic, socio-demographic and institutional factors. A
linear regression model is appropriated to achieve this goal.
Introductory examples

In a linear regression model, there is a linear relation between the

dependent variable(variable to be explained) and the independent
variables (explanatory variables):
Digitalization = a0 + a1 GDP + a2 Population + a3 School + a4 Internet + ε

In the previous example, a0 , a1 , a2 , a3 and a4 are parameters to be

estimated. ε is a random error term that represents the difference
between the linear model and a particular observed value for the outcome
variable.
Objectives

▶ Use R to estimate a regression model

▶ Measure the impact of some covariates or predictors on
another dependent variable or outcome
▶ Use a regression model for prediction purposes
Linear Regression
▶ The model

Y = a0 + a1 X1 + a2 X2 + ... + ap Xp + ε
In the previous example, Y is the outcome variable, X1 , ..., Xp are
input variables, ai , i = 0, 1, ..., p, are parameters to be estimated.
ε is a random error term, normally distributed, with zero mean and
constant variance: ε ∼ N(0, σ 2 ).
▶ Ordinary least Squares (OLS) is a common technique to
estimate parameters. The aim is to minimize the sum of
squared residuals:

n
X
[Yi − (a0 + a1 X1 + a2 X2 + ... + ap Xp )]2
i=1
Linear Regression

▶ The solution to this optimization problem is:

(â0 , â1 , ..., âp )T = (X T X )−1 X T Y

where X = (1, X1 , ..., Xp ).
▶ After estimating the model, it is important to confirm the
goodness of fit of the model and the statistical significance of
the estimated parameters. This includes checking the
R-squared, analysing the pattern of residuals and hypothesis
testing. Statistical significance is checked by an F-test of the
overall fit, followed by t-tests of individual parameters.
Linear regression with R: case study

Given the fact that Digital Transformation has been revealed to

▶ Variables
▶ Digitalization: measures countries’ digital adoption across
three dimensions of the economy: people, government, and
business;
▶ GDP: a monetary measure of the market value of all the final
goods and services produced in a specific time period by
countries;
▶ Population: all residents regardless of legal status or
citizenship;
▶ School: percentage of the population with successful
completion of education at the secondary level ;
▶ Internet: number of individuals who have used the Internet
(from any location) in the last 3 months.
Linear regression with R: case study

▶ The regression equation to be estimated :

Digitalization = a0 + a1 GDP + a2 Population + a3 School + a4 Internet + ε

Linear regression with R: a case study

▶ Load packages

library('readxl')
library('lattice')
Linear regression with R: a case study

▶ Data loading

data_regr <- read_excel("C:/Users/[Link]/Documents/Regression_data.xlsx")

data_regr$Population<-log(data_regr$Population)

data_regr$GDP<-log(data_regr$GDP)
Linear regression with R: a case study

▶ A summary of the dataset

summary(data_regr)

## country Digital_Index Population GDP

## Length:125 Min. :0.1599 Min. :11.46 Min. :20.76
## Class :character 1st Qu.:0.4298 1st Qu.:15.23 1st Qu.:23.50
## Mode :character Median :0.6049 Median :16.16 Median :24.98
## Mean :0.5735 Mean :16.20 Mean :25.12
## 3rd Qu.:0.7142 3rd Qu.:17.33 3rd Qu.:26.48
## Max. :0.8706 Max. :21.05 Max. :30.55
## NA’s :3 NA’s :2 NA’s :2
## School Internet
## Min. : 3.506 Min. : 2.40
## 1st Qu.: 20.922 1st Qu.:31.45
## Median : 48.039 Median :60.26
## Mean : 47.057 Mean :56.86
## 3rd Qu.: 67.253 3rd Qu.:79.46
## Max. :131.541 Max. :98.24
## NA’s :2 NA’s :2
Linear regression with R: a case study
▶ Pair-wise relationships of the variables: the scatterplot matrix

splom(~data_regr[c(2:6)], groups=NULL,data=data_regr)

100
60 80 100
80
60
Internet
40
20
0 20 40
0
120 60 80 120
100
80
60 School 60
40
0 20 40 60 20
0
30 26 28 30
28
26 GDP 26
24
22 24 26 22

20 16 18 20
18
Population16
16
14
12 14 16 12

0.8 0.6 0.8

0.6
Digital_Index
0.4
0.2 0.4 0.2

Matrice de nuages de points

Linear regression with R: a case study
▶ Estimation of the model

results <- lm(Digital_Index ~ GDP+Population + School + Internet, data_regr)

summary(results)

##
## Call:
## lm(formula = Digital_Index ~ GDP + Population + School + Internet,
## data = data_regr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.279285 -0.044079 0.004405 0.042522 0.176924
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1850669 0.0977502 -1.893 0.06084 .
## GDP 0.0387886 0.0116399 3.332 0.00116 **
## Population -0.0288345 0.0122183 -2.360 0.01996 *
## School 0.0007503 0.0003494 2.148 0.03384 *
## Internet 0.0038918 0.0006219 6.258 6.89e-09 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.06919 on 115 degrees of freedom
## (5 observations effacées parce que manquantes)
## Multiple R-squared: 0.8684, Adjusted R-squared: 0.8639
## F-statistic: 189.8 on 4 and 115 DF, p-value: < 2.2e-16
Linear regression with R: a case study

▶ Confidence Intervals on the Parameters

confint(results, level = .95)

## 2.5 % 97.5 %
## (Intercept) -3.786912e-01 0.008557346
## GDP 1.573210e-02 0.061845093
## Population -5.303663e-02 -0.004632407
## School 5.830613e-05 0.001442306
## Internet 2.660068e-03 0.005123600
Linear regression with R: a case study

▶ Prediction: 95% confidence interval of the Digital Adoption

Index for CHAD
GDP <- 23.05
Population <- 16.49
School <- 7
Internet<-5
prediction_info<-[Link](GDP, Population, School,Internet)
Linear regression with R: a case study

▶ Prediction: 95% confidence interval on the expected sales

conf_int_Digital <- predict(results,prediction_info,leve1=.95,interval="confide

conf_int_Digital

## fit lwr upr

## 1 0.2582403 0.2316145 0.2848661
Linear regression with R: a case study
▶ Diagnostics
▶ Evaluating the Residuals: centered on zero with a constant
variance
with(results, {plot([Link],residuals,ylim=c(-40,40))
points(c(min([Link]),max([Link])), c(0,0), type="l")})
40
20
residuals

0
−20
−40

0.2 0.3 0.4 0.5 0.6 0.7 0.8

[Link]
Linear regression with R: a case study
▶ Diagnostics
▶ Evaluating the Normality Assumption of residuals

hist(results$residuals, main="Histogram of residuals")

Histogram of residuals
35
30
25
Frequency

20
15
10
5
0

−0.3 −0.2 −0.1 0.0 0.1 0.2

results$residuals
Linear regression with R: a case study
▶ Evaluating the Normality Assumption of residuals

qqnorm(results$residuals, ylab="Residuals")
qqline(results$residuals)

Normal Q−Q Plot

0.1
0.0
Residuals

−0.1
−0.2

−2 −1 0 1 2

Theoretical Quantiles
Logistic Regression

▶ Dependent variable Y: categorical (binary or multinomial)

▶ One or more predictor variables that may be either continuous
or categorical
▶ The goal is to model the probability of a random variable Y
being 0 or 1 given experimental data
Logistic Regression

▶ Description of the model

(
1 if α0 + α1 X1 + α2 X2 + ... + αp Xp + ε > 0
Y =
0 else

where ε follows a logistic distribution (Its CDF is given by:

F (x ) = 1+e1 −x ).
▶ Y and (X1 , ..., Xp ) are observed;
▶ There is not a direct functionnal link between Y and X;
▶ S(X , α) = α0 + α1 X1 + α2 X2 + ... + αp Xp is the score
function.
Linear regression with R: a case study
▶ It can be shown that

1
hα (X ) ≡ P(Y = 1|X , α) =
1+ e −S(X ,α)
▶ Also we can check that

hα (X )
log( ) = α0 + α1 X1 + α2 X2 + ... + αp Xp
1 − hα (X )
▶ The log-odds or natural logarithm of the odds of the
“success” is a linear function of the values of predictors;
▶ This latter equation is usefull for the interpretation of
coefficients;
▶ MLE is used to estimate the model:

(αˆ1 , ..., αˆp ) = Arg Max Πi hα (xi )yi (1 − hα (xi ))(1−yi )

Logistic regression with R: Predictive Maintenance

Manufacturers have only begun to capitalize on artificial

intelligence capabilities on the factory floor. Today, AI’s key roles
in manufacturing expand beyond robotics and automation: they
now include creating a pivotal role in predictive maintenance. A
company has collected data on maintenance encountered. They
want to fit a model able to predict the machine failure.
Logistic regression with R: Predictive Maintenance

▶ Variable description
▶ Air_temperature: air temperature
▶ Process_temperature: process temperature
▶ Rotational_speed: speed of rotation
▶ Torque: torque values
▶ Tool_wear: tool wear in the process
▶ Machine_failure: label that indicates, whether the machine
has failed
Logistic regression with R

▶ Load the dataset and have a view

Maintenance_data<-read.csv2("C:/Users/[Link]/Documents/Maintenance_data.csv")

summary(Maintenance_data)

## [Link] Type Air_temperature Process_temperature

## Length:1000 Length:1000 Min. :295.6 Min. :306.1
## Class :character Class :character 1st Qu.:297.6 1st Qu.:308.5
## Mode :character Mode :character Median :298.3 Median :309.0
## Mean :299.0 Mean :309.3
## 3rd Qu.:299.2 3rd Qu.:309.9
## Max. :304.4 Max. :313.7
## Rotational_speed Torque Tool_wear Machine_failure
## Min. :1181 Min. : 3.80 Min. : 0 Min. :0.000
## 1st Qu.:1370 1st Qu.:34.80 1st Qu.: 61 1st Qu.:0.000
## Median :1459 Median :43.50 Median :122 Median :0.000
## Mean :1524 Mean :43.07 Mean :120 Mean :0.339
## 3rd Qu.:1585 3rd Qu.:51.60 3rd Qu.:184 3rd Qu.:1.000
## Max. :2886 Max. :76.60 Max. :253 Max. :1.000
Logistic regression with R

▶ Correcting importation errors

Maintenance_data$Type<-factor(Maintenance_data$Type)
Maintenance_data$Machine_failure<-factor(Maintenance_data$Machine_failure)
summary(Maintenance_data)

## [Link] Type Air_temperature Process_temperature

## Length:1000 H: 99 Min. :295.6 Min. :306.1
## Class :character L:633 1st Qu.:297.6 1st Qu.:308.5
## Mode :character M:268 Median :298.3 Median :309.0
## Mean :299.0 Mean :309.3
## 3rd Qu.:299.2 3rd Qu.:309.9
## Max. :304.4 Max. :313.7
## Rotational_speed Torque Tool_wear Machine_failure
## Min. :1181 Min. : 3.80 Min. : 0 0:661
## 1st Qu.:1370 1st Qu.:34.80 1st Qu.: 61 1:339
## Median :1459 Median :43.50 Median :122
## Mean :1524 Mean :43.07 Mean :120
## 3rd Qu.:1585 3rd Qu.:51.60 3rd Qu.:184
## Max. :2886 Max. :76.60 Max. :253
Logistic regression with R

▶ The dataset is divided in two: a training set and a test set

Maintenance_data_training<-Maintenance_data[1:(dim(Maintenance_data)[1]-100),]
Maintenance_data_test<-Maintenance_data[(dim(Maintenance_data)[1]-100+1):dim(Maintenance_data)[1],]
Logistic regression with R
logit_model <- glm (Machine_failure~Type+Air_temperature+Process_temperature+
Rotational_speed+Torque+Tool_wear,
data=Maintenance_data_training,binomial(link="logit"))
summary(logit_model)

##
## Call:
## glm(formula = Machine_failure ~ Type + Air_temperature + Process_temperature +
## Rotational_speed + Torque + Tool_wear, family = binomial(link = "logit"),
## data = Maintenance_data_training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2171 -0.2198 -0.0553 0.0362 3.4500
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.550e+02 6.750e+01 -8.222 < 2e-16 ***
## TypeL 1.022e+00 6.251e-01 1.634 0.102
## TypeM 6.482e-01 6.721e-01 0.964 0.335
## Air_temperature 2.100e+00 2.260e-01 9.292 < 2e-16 ***
## Process_temperature -3.905e-01 2.753e-01 -1.419 0.156
## Rotational_speed 1.760e-02 1.996e-03 8.818 < 2e-16 ***
## Torque 3.728e-01 3.609e-02 10.331 < 2e-16 ***
## Tool_wear 2.487e-02 3.217e-03 7.729 1.08e-14 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1161.6 on 899 degrees of freedom
## Residual deviance: 269.3 on 892 degrees of freedom
## AIC: 285.3
##
Logistic regression with R

▶ Interpretation: an increase in balance of one unit is associated

with an increase of 5.820e-03 in the log odds of default; being
student reduces the the log odds of default by -7.113e-01.
Students are less likely to default than non-students
(Intuitive??)
Logistic regression with R

▶ Which variable is the most important?

[Link](“caret”)

library("caret")

varImp(logit_model)

## Overall
## TypeL 1.6344863
## TypeM 0.9644174
## Air_temperature 9.2923369
## Process_temperature 1.4188504
## Rotational_speed 8.8181380
## Torque 10.3305166
## Tool_wear 7.7290194
Logistic regression with R

▶ Diagnostics
▶ Pseudo-R2: how well the fitted model explains the data as
compared to the default model of no predictor variables and
only an intercept term; values closer to one indicating that the
model has good predictive power

Residual deviance 1473.0

Pseudo − R 2 = 1 − =1− = 0.466
Null deviance 2758.8
Logistic regression with R
▶ Diagnostics
▶ Classification Rate: how well the model does in predicting
the dependent variable on out-of-sample observations?

prediction_test <- predict(logit_model, newdata = Maintenance_data_test,

type = "response")

[Link](table(Maintenance_data_test$Machine_failure,
prediction_test > 0.5))

##
## FALSE TRUE
## 0 0.72 0.01
## 1 0.02 0.25

▶ The results show 95.6% of the predicted observations are true negatives and
about 1.4% are true positives;
▶ Type II error is 2.4%: in those cases, the model predicts customers will not
default but they did;
▶ Type I error is 0.006%: in those cases, the models predicts customers will
default but they never did.

Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
12 pages
Linear Regression Model 1
No ratings yet
Linear Regression Model 1
23 pages
R Language for Regression Analysis
No ratings yet
R Language for Regression Analysis
144 pages
Linear Regression - 1st Draft
No ratings yet
Linear Regression - 1st Draft
5 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Simple Linear Regression Homework Solutions
50% (2)
Simple Linear Regression Homework Solutions
6 pages
Lecture 13 BA
No ratings yet
Lecture 13 BA
36 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
ML Exp1 C36
No ratings yet
ML Exp1 C36
13 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
1 - Linear Models
No ratings yet
1 - Linear Models
22 pages
Module 2-Supervised Learning
No ratings yet
Module 2-Supervised Learning
74 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Linear Regression - FDS
No ratings yet
Linear Regression - FDS
18 pages
Supervised Learning: Regression Techniques
No ratings yet
Supervised Learning: Regression Techniques
34 pages
Regression Models for Math Majors
100% (1)
Regression Models for Math Majors
30 pages
Data Analytics and Visualization Unit-II
No ratings yet
Data Analytics and Visualization Unit-II
23 pages
Regression Analysis
No ratings yet
Regression Analysis
52 pages
CH6 Regression
No ratings yet
CH6 Regression
18 pages
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
No ratings yet
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
72 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Linear Regression Explained
No ratings yet
Linear Regression Explained
8 pages
Dav Exp
No ratings yet
Dav Exp
11 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Linear Regression Notes Extended
No ratings yet
Linear Regression Notes Extended
3 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Linear Regression Formula: Get Free Access To Lakhs of
No ratings yet
Linear Regression Formula: Get Free Access To Lakhs of
11 pages
Understanding Blue Property Assumptions
No ratings yet
Understanding Blue Property Assumptions
27 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Linear Regression DL
No ratings yet
Linear Regression DL
8 pages
Machine Learning in Python
No ratings yet
Machine Learning in Python
36 pages
Fe5209 3 Ay 2023
No ratings yet
Fe5209 3 Ay 2023
59 pages
Fe5209 3 Ay 2024
No ratings yet
Fe5209 3 Ay 2024
59 pages
Sample Lab File
No ratings yet
Sample Lab File
4 pages
Linear Regression Models 2018
No ratings yet
Linear Regression Models 2018
68 pages
Python ML Course Notes
No ratings yet
Python ML Course Notes
36 pages
Practical 5
No ratings yet
Practical 5
8 pages
Understanding Linear Regression
No ratings yet
Understanding Linear Regression
20 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
ch12 0
No ratings yet
ch12 0
43 pages
Linear Regression for Academics
No ratings yet
Linear Regression for Academics
28 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
7 pages
03 Regression
No ratings yet
03 Regression
33 pages
Course+notes Regression Analysis
No ratings yet
Course+notes Regression Analysis
8 pages
15 Types of Regression You Should Know
No ratings yet
15 Types of Regression You Should Know
30 pages
Course Notes Linear Regression
No ratings yet
Course Notes Linear Regression
8 pages
Strengths and Limitations of Qualitative and Quantitative Research Methods
No ratings yet
Strengths and Limitations of Qualitative and Quantitative Research Methods
20 pages
Video Impact on 4th Grade English Learning
No ratings yet
Video Impact on 4th Grade English Learning
15 pages
Advances in Investment Analysis and Portfolio Management Volume 8 Volume 8 1st Edition Cheng-Few Lee PDF Download
100% (2)
Advances in Investment Analysis and Portfolio Management Volume 8 Volume 8 1st Edition Cheng-Few Lee PDF Download
42 pages
BioInformatics Syllabus
No ratings yet
BioInformatics Syllabus
65 pages
FINAL (SG) - PR1 11 - 12 - UNIT 4 - LESSON 1 - Relevant Literature Sources For Qualitative Research PDF
No ratings yet
FINAL (SG) - PR1 11 - 12 - UNIT 4 - LESSON 1 - Relevant Literature Sources For Qualitative Research PDF
16 pages
CESS Workshop On SPSS STATA and Qualitative Data Analysis
No ratings yet
CESS Workshop On SPSS STATA and Qualitative Data Analysis
2 pages
Statistics I
100% (2)
Statistics I
686 pages
Research Design Report
No ratings yet
Research Design Report
13 pages
Lab 9
No ratings yet
Lab 9
2 pages
Grade 7 Mathematics Pre-Final Exam
No ratings yet
Grade 7 Mathematics Pre-Final Exam
6 pages
Slide 6
No ratings yet
Slide 6
37 pages
Chapter 9 Audit Sampling
No ratings yet
Chapter 9 Audit Sampling
47 pages
WILP ASM Mid-Sem (Regular) Solutions
No ratings yet
WILP ASM Mid-Sem (Regular) Solutions
4 pages
Impact of EVA & MVA on Market Cap
No ratings yet
Impact of EVA & MVA on Market Cap
2 pages
Introduction To Computational Mass Transfer With Applications To Chemical Engineering 2nd Edition Kuo-Tsung Yu PDF Download
100% (3)
Introduction To Computational Mass Transfer With Applications To Chemical Engineering 2nd Edition Kuo-Tsung Yu PDF Download
100 pages
Jansport Bag Market Study
No ratings yet
Jansport Bag Market Study
7 pages
SAT Suite Question Bank - Problem-Solving and Data Analysis (COM RESPOSTA)
No ratings yet
SAT Suite Question Bank - Problem-Solving and Data Analysis (COM RESPOSTA)
198 pages
Section 6.4
No ratings yet
Section 6.4
21 pages
SPSS Guide for Research Students
No ratings yet
SPSS Guide for Research Students
6 pages
Borderline Personality and The Rorschach
No ratings yet
Borderline Personality and The Rorschach
6 pages
Seaborn Cheat Sheet Python For Data Science: 3 Plotting With Seaborn 3 Plotting With Seaborn
No ratings yet
Seaborn Cheat Sheet Python For Data Science: 3 Plotting With Seaborn 3 Plotting With Seaborn
1 page
Understanding Regression Analysis: by Amy Gallo
No ratings yet
Understanding Regression Analysis: by Amy Gallo
16 pages
Understanding Cluster Sampling Techniques
No ratings yet
Understanding Cluster Sampling Techniques
74 pages
GuidedLDA: Seeded Topic Modeling
No ratings yet
GuidedLDA: Seeded Topic Modeling
3 pages
Proposal Chapter 3
No ratings yet
Proposal Chapter 3
13 pages
Nta (Ugc-Net) : Ask, Learn & Lead
No ratings yet
Nta (Ugc-Net) : Ask, Learn & Lead
18 pages
Npar Tests: Npar Tests /K-W Hasil by Perlakuan (1 2) /missing Analysis
No ratings yet
Npar Tests: Npar Tests /K-W Hasil by Perlakuan (1 2) /missing Analysis
59 pages
Ms5e Mba - HR - 2019 21
No ratings yet
Ms5e Mba - HR - 2019 21
83 pages
Head First Android Development 1st Edition Dawn Griffiths No Waiting Time
No ratings yet
Head First Android Development 1st Edition Dawn Griffiths No Waiting Time
158 pages

Regression

Uploaded by

Regression

Uploaded by

Regression Models

Given the fact that Digital Transformation has been revealed to

In a linear regression model, there is a linear relation between the

In the previous example, a0 , a1 , a2 , a3 and a4 are parameters to be

▶ Use R to estimate a regression model

▶ The solution to this optimization problem is:

(â0 , â1 , ..., âp )T = (X T X )−1 X T Y

Given the fact that Digital Transformation has been revealed to

▶ The regression equation to be estimated :

Digitalization = a0 + a1 GDP + a2 Population + a3 School + a4 Internet + ε

data_regr <- read_excel("C:/Users/[Link]/Documents/Regression_data.xlsx")

▶ A summary of the dataset

## country Digital_Index Population GDP

0.8 0.6 0.8

Matrice de nuages de points

results <- lm(Digital_Index ~ GDP+Population + School + Internet, data_regr)

▶ Confidence Intervals on the Parameters

confint(results, level = .95)

▶ Prediction: 95% confidence interval of the Digital Adoption

▶ Prediction: 95% confidence interval on the expected sales

conf_int_Digital <- predict(results,prediction_info,leve1=.95,interval="confide

## fit lwr upr

0.2 0.3 0.4 0.5 0.6 0.7 0.8

hist(results$residuals, main="Histogram of residuals")

−0.3 −0.2 −0.1 0.0 0.1 0.2

Normal Q−Q Plot

▶ Dependent variable Y: categorical (binary or multinomial)

▶ Description of the model

where ε follows a logistic distribution (Its CDF is given by:

(αˆ1 , ..., αˆp ) = Arg Max Πi hα (xi )yi (1 − hα (xi ))(1−yi )

Manufacturers have only begun to capitalize on artificial

▶ Load the dataset and have a view

## [Link] Type Air_temperature Process_temperature

▶ Correcting importation errors

## [Link] Type Air_temperature Process_temperature

▶ The dataset is divided in two: a training set and a test set

▶ Interpretation: an increase in balance of one unit is associated

▶ Which variable is the most important?

Residual deviance 1473.0

prediction_test <- predict(logit_model, newdata = Maintenance_data_test,

You might also like