0% found this document useful (0 votes)

58 views9 pages

D Linear Regression With R

This document discusses simple and multiple linear regression. It presents the relevant R commands for performing linear regression and uses birth weight data to demonstrate key concepts. Simple linear regression is used to model birth weight based on mother's weight. The regression line is plotted and parameters are estimated. Diagnostics like residuals and confidence intervals are also examined.

Uploaded by

Bùi Nguyên Hoàng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views9 pages

D Linear Regression With R

Uploaded by

Bùi Nguyên Hoàng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Simple and Multiple Linear Regression

A. Simple linear regression with R

This chapter is a brief introduction to simple and multiple linear regression and how to use this
method in a real context. We present the relevant R commands and use a real data set as a
connecting thread as we present the key concepts for this method. We treat the case of qualitative
explanatory variables, as well as interaction of explanatory variables.
We discuss model validation with a study of residuals and mention the issue of collinearity. We
also present a few methods for variable selection.
We return to the data set Birth-weight. We wish to explain the variability of child weight
at birth as a function of characteristics of the mother, of family history and of behaviour during
pregnancy. The explained variable is weight at birth (quantitative variable BWT, expressed in
grammes); the explanatory:
This study focused on risks associated with low weight at birth; the data were collected at the
Baystate Medical Centre, Massachusetts, in 1986. Physicians have been interested in low weight
at birth for several years, because underweight babies have high rates of infant mortality and
infant anomalies. The behaviour of the mother-to-be during pregnancy (diet, smoking habits) can
have a significant impact on the chances of having a full-term pregnancy, and thus of giving
birth to a child of normal weight. The data file includes information on 189 women
(identification number: ID) who came to the centre for consultation. Weight at birth is
categorized as low if the child weighs less than 2,500 g.

Loading the data:

> mydata <- read.csv("wb.csv",header=TRUE,sep = "\t")
> summary(mydata)
ID AGE LWT
Min. : 4.0 Min. :14.00 Min. : 80.0
1st Qu.: 68.0 1st Qu.:19.00 1st Qu.:110.0
Median :123.0 Median :23.00 Median :121.0
Mean :121.1 Mean :23.24 Mean :129.8
3rd Qu.:176.0 3rd Qu.:26.00 3rd Qu.:140.0
Max. :226.0 Max. :45.00 Max. :250.0
RACE SMOKE PTL
Min. :1.000 Min. :0.0000 Min. :0.0000
1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.000 Median :0.0000 Median :0.0000
Mean :1.847 Mean :0.3915 Mean :0.1958
3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :3.000 Max. :1.0000 Max. :3.0000
HT UI FVT
Min. :0.00000 Min. :0.0000 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.00000 Median :0.0000 Median :0.0000
Mean :0.06349 Mean :0.1481 Mean :0.7937
3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :1.00000 Max. :1.0000 Max. :6.0000
BWT LOW
Min. : 709 Min. :0.0000
1st Qu.:2414 1st Qu.:0.0000
Median :2977 Median :0.0000
Mean :2945 Mean :0.3122
3rd Qu.:3475 3rd Qu.:1.0000
Max. :4990 Max. :1.0000
The weight of the mother is expressed in pounds. We first transform the data.frame to recode this
variable in kilogrammes (1 pound = 0.45359237kg).
> mydata <- transform(mydata,LWT=LWT*0.4535923)
> attach(mydata)
1. Graphical Inspection
To study the relationship between child weight at birth and weight of the mother, we first draw
the scatterplot of the points (child weight; mother weight) using the instruction plot(BWT~LWT)
> plot(BWT~LWT,xlab="Mother weight",ylab="Child weight at birth")
We observe a slight increase in child weight when mother weight increases, although this
relationship is not very clear.
2. Parameter Estimation
We now study the following model:
BWT = 1 +  2 LWT + 

> model1<-lm(BWT~LWT)
> model1

Call:
lm(formula = BWT ~ LWT)

Coefficients:
(Intercept) LWT
2369.672 9.765
The above R output gives the least squares estimates of ˆ1 = 2369.672 and ˆ2 = 9.765

We can now draw the regression line on the scatter plot, using the function abline()
> plot(BWT~LWT,xlab="Mother weight",ylab="Child weight")
> abline(model1,col="blue")

3. Tests on Parameters
Note that the function lm() performs a complete analysis of the linear model and that you can get
a summary of the calculations related to the data set with the function summary().

> summary(model1)

Call:
lm(formula = BWT ~ LWT)
Residuals:
Min 1Q Median 3Q Max
-2192.18 -503.63 -3.91 508.25 2075.53

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2369.672 228.431 10.374 <2e-16 ***
LWT 9.765 3.777 2.586 0.0105 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 718.2 on 187 degrees of freedom

Multiple R-squared: 0.03452, Adjusted R-squared: 0.02935
F-statistic: 6.686 on 1 and 187 DF, p-value: 0.01048

Here is a description of the information in this output.

Call: formula used in the model.
Residuals: descriptive analysis of residuals ˆi = yˆi − yi , we shall see that residuals are used to
validate the assumptions of the regression model.
Coefficients: this table has four columns:
- Estimate gives the estimates of the parameters of the regression line.
- Std. Error gives the estimate of the standard deviation of the estimators of the regression
line.
- t value gives the realization of Student’s test statistic associated with the hypotheses
H 0 : i = 0; H1 : i  0

- Pr(>|t|) gives the p-value of Student’s test.

Signif. codes: codes for significance levels.
Residual standard error: an estimate of the standard deviation of the noise  and the associated
degree of freedom n – 2 .

Multiple R-squared: coefficient of determination r 2 (percentage of variation explained by the

regression).

Adjusted R-squared: adjusted ra 2 (of limited interest for simple linear regression).

F-statistic: realization of Fisher’s test statistic associated with the hypotheses

H 0 :  2 = 0; H1 :  2  0 . The associated degrees of freedom (1 and n = 2) are given, as is the p-
value.
To get an estimate by confidence interval of the regression coefficients, we can use the function
confint().
> confint(model1)
2.5 % 97.5 %
(Intercept) 1919.039836 2820.30429
LWT 2.314692 17.21502
4. Confidence and Prediction Intervals for a New Value
Consider a new observation x0 of variable X for which we have not observed the corresponding
value y0 of the response variable Y. This value y0 is unknown, since it is not observed, and is a
realization of the random variable Y0 = 1 +  2 X 0 +  0 .

The predictor of Y0 for the new value x0 is given by Yˆ 0p = ˆ1 + ˆ2 x0 .

We can also propose a prediction interval at level 1 −  for Y0 , by finding two random bounds
such that the random variable falls in the interval with probability 1 −  :

 
 1 (x − x) 
Yˆ 0p  t1(−n− 2)
/2 1 +
ˆ + n 0 
 
 ( xi − x )
n 2
 
 i =1 

Note that the realization yˆ 0p = ˆ1 + ˆ2 x0 is called the prediction of the unobserved value
y0 = 1 +  2 x0 +  0 .

Similarly, note that an estimator of the fixed and unknown value E (Y0 | X = x0 ) = 1 +  2 x0 is
given by Eˆ (Y | X = x ) = Yˆ = ˆ + ˆ x .
0 0 0 1 2 0

We can also propose a prediction interval at level 1 −  for E (Y0 | X = x0 ) ,

 
 1 ( x0 − x )
2 
ˆ
Y0  t1− /2ˆ
( n − 2)
+ 
 n n 
 ( xi − x )
2
 
 i =1 

The function to define the prevision interval and confidence interval for a new value x0 is
predict().
Use data Weight – Birth, we calculate the prediction of the weight of a baby whose mother
weighs lwt = 56 kg.
> lwt0 <- 56
> predict(model1,data.frame(LWT=lwt0),interval="prediction")
fit lwr upr
1 2916.504 1495.699 4337.309
For the confidence interval of the mean value of the weight of babies with a mother weighing 56
kg:
> predict(model1,data.frame(LWT=lwt0),interval="confidence")
fit lwr upr
1 2916.504 2811.225 3021.783
We now represent the confidence interval and prediction interval for a series of new values of the
mother’s weight
> x <- seq(min(LWT),max(BWT),length=50)
> predint <- predict(model1,data.frame(LWT=x),interval=
+ "prediction")[,c("lwr","upr")]
> confint <- predict(model1,data.frame(LWT=x),interval=
+ "confidence")[,c("lwr","upr")]
> plot(BWT~LWT,xlab="Mother weight",ylab="Child weight")
> abline(model1)
> matlines(x,cbind(confint,predint),lty=c(2,2,3,3),
+ col=c("red","red","blue","blue"),lwd=c(2,2,1,1))
> legend("bottomright",lty=c(2,3),lwd=c(2,1),
+ c("confidence","prediction"),col=c("red","blue"))

B. Multiple Linear Regression

1. Graphical Inspection
With data in part A, we make regression of child weight at birth as a function of mother age,
weight and smoking status during pregnancy.
Before estimating the model, we present a scatter plot of all pairs of variables:
> pairs(BWT~LWT+AGE+SMOKE)

2. Parameter Estimation
As for simple linear regression, the model is estimated using function lm():
> model2 <- lm(BWT~AGE+LWT+SMOKE)
> model2

Call:
lm(formula = BWT ~ AGE + LWT + SMOKE)

Coefficients:
(Intercept) AGE LWT SMOKE
2362.720 7.093 8.860 -267.213

3. Tests on Parameters
Tests on parameters are performed by function summary().
> summary(model2)
Call:
lm(formula = BWT ~ AGE + LWT + SMOKE)

Residuals:
Min 1Q Median 3Q Max
-2069.89 -433.18 13.67 516.45 1813.75

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2362.720 300.687 7.858 3.11e-13 ***
AGE 7.093 9.925 0.715 0.4757
LWT 8.860 3.791 2.337 0.0205 *
SMOKE -267.213 105.802 -2.526 0.0124 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 708.8 on 185 degrees of freedom

Multiple R-squared: 0.06988, Adjusted R-squared: 0.05479
F-statistic: 4.633 on 3 and 185 DF, p-value: 0.003781

The results output by summary() are presented in the same fashion as for simple linear
regression. Parameter estimates are given in the column Estimate.
The realizations of Student’s test statistics associated with the hypotheses H 0 :  2 = 0; H1 :  2  0
are given in column t-value; the associated p-values are in column Pr(>|t|). Residual
standard error gives the estimate of and the number of associated degrees of freedom n – p – 1.
The coefficient of determination R2 (multiple R-squared) and an adjusted version
(adjusted R-squared) are given, as are the realization of Fisher’s global test statistic (F-
statistic) and the associated p-value.
3. Interpreting Results from Study “Weight at Birth”
Given the result of Fisher’s global test (p-value D 0.003781), we can conclude that at least one of
the explanatory variables is associated with child weight at birth, after adjusting for the other
variables. The individual Student tests indicate that:
• Mother weight is linearly associated with child weight, after adjusting for age and
smoking status, with risk of error less than 5 % (p-value = 0.0205). At same age and
smoking status, an increase of 1 kg in mother weight corresponds to an increase of 8.860
g of average child weight at birth.
• The age of the mother is not significantly linearly associated with child weight at birth
when mother weight and smoking status are already taken into account (p-value =
0.20661).
• Weight at birth is significantly lower for a child born to a mother who smokes, compared
to children born to non-smoker mothers of same age and weight, with a risk of error less
than 5 % (p-value = 0.012). At same age and mother weight, child weight at birth is
267.213 g less for a smoker mother than for a non-smoker mother.
4. Interpreting Results from Study “Weight at Birth”
Suppose we wish to predict the weight at birth of a child whose mother is 23 years old, weighs
57 kg and smokes. The function predict() gives a prediction, a prediction interval and a
confidence interval for the mean weight of children whose mothers have these characteristics.
> newdata <- data.frame(AGE=23,LWT=57,SMOKE=1)
> predict(model2,newdata,interval="pred")
fit lwr upr
1 2763.693 1355.943 4171.444
> predict(model2,newdata,interval="conf")
fit lwr upr
1 2763.693 2600.914 2926.472
5. Testing a Linear Sub-hypothesis: Partial Fisher Test
Fisher’s partial test is used to test the contribution of a subset of explanatory variables in a model
which already includes other explanatory variables. For example, consider the following two
models:
Model 1: BWT = 1 +  2 LWT + 

Model 2: BWT = 1 +  2 LWT + 3 AGE +  4 SMOKE + 

Fisher’s test is used to test the joint contribution of variables AGE and SMOKE in model 2. The
hypotheses of the test are H 0 :  2 = 3 = 0 and H1 : at least one of the coefficients  2 or  3 is
non-zero. The following instructions are used for this test:
> anova(model1,model2)
Analysis of Variance Table

Model 1: BWT ~ LWT

Model 2: BWT ~ AGE + LWT + SMOKE
Res.Df RSS Df Sum of Sq F Pr(>F)
1 187 96468171
2 185 92935223 2 3532949 3.5164 0.03171 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The p-value of the test (Pr(>F)=0.03171) indicates that at least one of the two variables
AGE or SMOKE gives extra information to predict child weight at birth, when mother weight has
already been taken into account.

Sophia Rabe-Hesketh, Anders Skrondal - Multilevel and Longitudinal Modeling Using Stata. 2 Vols.-Stata Press (2012)
100% (2)
Sophia Rabe-Hesketh, Anders Skrondal - Multilevel and Longitudinal Modeling Using Stata. 2 Vols.-Stata Press (2012)
1,030 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Econtrix
No ratings yet
Econtrix
14 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Hypothesis Testing in R
No ratings yet
Hypothesis Testing in R
13 pages
R Illustration 2021 Logistic Regression
No ratings yet
R Illustration 2021 Logistic Regression
18 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
lecture_oct_2_2024_ab
No ratings yet
lecture_oct_2_2024_ab
15 pages
330 Lecture18 2014
No ratings yet
330 Lecture18 2014
24 pages
Logit Probit
No ratings yet
Logit Probit
66 pages
Notes 12
No ratings yet
Notes 12
41 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Linear Model
No ratings yet
Linear Model
10 pages
W3 - Testing Means - Choose Your Test
No ratings yet
W3 - Testing Means - Choose Your Test
7 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
4 pages
Machine Learning-Lecture 1(Student)
No ratings yet
Machine Learning-Lecture 1(Student)
14 pages
CS1B Actuarial Statistics Solutions
No ratings yet
CS1B Actuarial Statistics Solutions
13 pages
Solutions Week 10
No ratings yet
Solutions Week 10
7 pages
Model Linear
No ratings yet
Model Linear
33 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
Statistical_Computing
No ratings yet
Statistical_Computing
8 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
Regression with Linear Predictors Complete DOCX Download
100% (12)
Regression with Linear Predictors Complete DOCX Download
16 pages
Lec Topic6
No ratings yet
Lec Topic6
33 pages
Amta - Final - Notes.r: ### Step Wise AIC Regression
No ratings yet
Amta - Final - Notes.r: ### Step Wise AIC Regression
6 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
HW5
No ratings yet
HW5
8 pages
Community Project: Simple Linear Regression in SPSS
No ratings yet
Community Project: Simple Linear Regression in SPSS
4 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
78_Outliers_etc
No ratings yet
78_Outliers_etc
4 pages
STATISTICAL-MODELLING
No ratings yet
STATISTICAL-MODELLING
39 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
17 pages
HW4 Solutions: Problem 6.2
No ratings yet
HW4 Solutions: Problem 6.2
8 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Exercice V
No ratings yet
Exercice V
5 pages
R Lab Hypothesis Testing
No ratings yet
R Lab Hypothesis Testing
6 pages
QUIZ Notes
No ratings yet
QUIZ Notes
5 pages
Pset2 Question
No ratings yet
Pset2 Question
5 pages
R stastics pdf
No ratings yet
R stastics pdf
30 pages
Homework 2
100% (1)
Homework 2
12 pages
DA R Assignment2
No ratings yet
DA R Assignment2
9 pages
Metrics Practice Test 1 Group 7
No ratings yet
Metrics Practice Test 1 Group 7
6 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
R Workshop PART 2
No ratings yet
R Workshop PART 2
36 pages
jl1DPGEQRai25HgJgc3J_Simple-linear-regression
No ratings yet
jl1DPGEQRai25HgJgc3J_Simple-linear-regression
7 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
Stata Output For ANCOVA Section
No ratings yet
Stata Output For ANCOVA Section
8 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
No ratings yet
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
3 pages
WEEK
No ratings yet
WEEK
17 pages
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
No ratings yet
4-R Code and PPT - Predicting Medical Expenses Using Linear Regression - New Without Prerequsit
17 pages
Stepwiseselection MATTOUHI AICHA
No ratings yet
Stepwiseselection MATTOUHI AICHA
7 pages
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
2e Introduction to Financial Models for Management and Planning
No ratings yet
2e Introduction to Financial Models for Management and Planning
668 pages
EC4401-07-2023
No ratings yet
EC4401-07-2023
7 pages
Contractor Loyalty
No ratings yet
Contractor Loyalty
22 pages
Bautista, Rnie Riena S - Forecasting Assignment
No ratings yet
Bautista, Rnie Riena S - Forecasting Assignment
3 pages
Schuberth2023b
No ratings yet
Schuberth2023b
12 pages
Stata Help Hausman Test PDF
No ratings yet
Stata Help Hausman Test PDF
9 pages
Regression PPT Final
100% (1)
Regression PPT Final
59 pages
Let's Interact! Modeling Interaction Effects in Linear and Generalized Linear Models Using SAS
No ratings yet
Let's Interact! Modeling Interaction Effects in Linear and Generalized Linear Models Using SAS
69 pages
UCS-401_CSE7th M L Lect 07_Case Study of Polynomial Regressions
No ratings yet
UCS-401_CSE7th M L Lect 07_Case Study of Polynomial Regressions
10 pages
Simultaneous Equations Models
No ratings yet
Simultaneous Equations Models
30 pages
Vector Error Correction Model
No ratings yet
Vector Error Correction Model
13 pages
Sample Questions
No ratings yet
Sample Questions
8 pages
Unit 13
No ratings yet
Unit 13
39 pages
Ch4_2945310
No ratings yet
Ch4_2945310
2 pages
Unveiling The Road Ahead Modeling Vehicle Ownership Growth in The Dominican Republic Using Gompertz Curve
No ratings yet
Unveiling The Road Ahead Modeling Vehicle Ownership Growth in The Dominican Republic Using Gompertz Curve
4 pages
Uji Normalitas
No ratings yet
Uji Normalitas
2 pages
NOPANE
No ratings yet
NOPANE
11 pages
The SAS System
No ratings yet
The SAS System
5 pages
Chapter 2 Slides Handout
No ratings yet
Chapter 2 Slides Handout
48 pages
Econometrics Tintner
No ratings yet
Econometrics Tintner
17 pages
Linear Regression Models For Panel Data Using SAS, STATA, LIMDEP and SPSS
100% (2)
Linear Regression Models For Panel Data Using SAS, STATA, LIMDEP and SPSS
67 pages
Agenda: Bias Examples Other Examples
No ratings yet
Agenda: Bias Examples Other Examples
8 pages
2-Way Poisson Interactions
No ratings yet
2-Way Poisson Interactions
2 pages
Time Series Analysis With MATLAB and Econometrics Toolbox
No ratings yet
Time Series Analysis With MATLAB and Econometrics Toolbox
2 pages
Econometrics of Planning and Efficiency
No ratings yet
Econometrics of Planning and Efficiency
198 pages
Lecture Note 2019 PDF
100% (1)
Lecture Note 2019 PDF
235 pages
John Geweke Present Positions
No ratings yet
John Geweke Present Positions
26 pages
Confidence Intervals (2)
No ratings yet
Confidence Intervals (2)
24 pages
Applied Economics Letters
No ratings yet
Applied Economics Letters
7 pages
SPSS 16.0 Tutorial To Develop A Regression Model
No ratings yet
SPSS 16.0 Tutorial To Develop A Regression Model
12 pages

D Linear Regression With R

Uploaded by

D Linear Regression With R

Uploaded by

Simple and Multiple Linear Regression

A. Simple linear regression with R

Loading the data:

Residual standard error: 718.2 on 187 degrees of freedom

Here is a description of the information in this output.

- Pr(>|t|) gives the p-value of Student’s test.

Multiple R-squared: coefficient of determination r 2 (percentage of variation explained by the

F-statistic: realization of Fisher’s test statistic associated with the hypotheses

The predictor of Y0 for the new value x0 is given by Yˆ 0p = ˆ1 + ˆ2 x0 .

We can also propose a prediction interval at level 1 −  for E (Y0 | X = x0 ) ,

B. Multiple Linear Regression

Residual standard error: 708.8 on 185 degrees of freedom

Model 2: BWT = 1 +  2 LWT + 3 AGE +  4 SMOKE + 

Model 1: BWT ~ LWT

You might also like