0% found this document useful (0 votes)
121 views14 pages

R Code for Simple Linear Regression

Uploaded by

Velocidad 0.5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views14 pages

R Code for Simple Linear Regression

Uploaded by

Velocidad 0.5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 1 Simple Linear Regression

(Part 2)

1 Software R and regression analysis

Downloadable from http://www.r-project.org/; some useful commands

• setwd(’path..’) ... to change the directory for data loading and saving

• read.table .... for reading/loading data

• data$variable .... variable in the data

• plot(X, Y) ... plotting Y against X (starting a new plot);

• lines(X, Y)... to add lines on an existing plot.

• object = lm(y ∼ x)... to call “lm” to estimate a model and stored the calculation
results in ”object”

• Exporting the plotted figure (save as .pfd, .ps or other files)

Example 1.1 Suppose we have 10 observations for (X, Y ): (1.2, 1.91), (2.3, 4.50), (3.5,
2.13), (4.9, 5.77), (5.9, 7.40), (7.1, 6.56), (8.3, 8.79), (9.2, 6.56), (10.5, 11.14), (11.5, 9.88).
They are stored in file (data010201.dat). We hope to fit a linear regression model

Yi = β0 + β1 Xi + εi , i = 1, ..., n

code of R (the words after # are comments only)

mydata = read.table(’data010201.dat’) # read the data from the file

1
X = mydata$V1 # select X
Y = mydata$V2 # select Y
plot(X, Y) # plot the observations (data)
myreg = lm(Y ∼ X) # do the linear regression
summary(myreg) # output the estimation

Coefficients:
Estimate Std. Error t value P r(> |t|)
(Intercept) 1.3931 0.9726 1.432 0.189932
X 0.7874 0.1343 5.862 0.000378 ***

Sign. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.406 on 8 degrees of freedom


Multiple R-squared: 0.8111, Adjusted R-squared: 0.7875
F-statistic: 34.36 on 1 and 8 DF, p-value: 0.0003778

lines(X, myreg$fitted) # plot the fitted


title("Scatter of (X,Y) and fitted linear regression model") # add title
# Please get to know how to make a figure file for latter use

Scatter of (X,Y) and fitted linear regression model


10
8
Y

6
4
2

2 4 6 8 10

Figure 1: (R code)

2
The fitted regression line/model is

Ŷ = 1.3931 + 0.7874X

For any new subject/individual with X  , its prediction of E(Y ) is

Ŷ = b0 + b1 X  .

For the above data,

• If X  = −3, then we predict Ŷ = −0.9690

• If X  = 3, then we predict Ŷ = 3.7553

• If X  = 0.5, then we predict Ŷ = 1.7868

2 Properties of Least squares estimators

Statistical properties in theory

• LSE is unbiased: E{b1 } = β1 , E{b0 } = β0 .

Proof: By the model, we have

Ȳ = β0 + β1 X̄ + ε̄

and
n
(X − X̄)(Yi − Ȳ )
b1 = n i
i=1
(Xi − X̄)2
n i=1
i=1 (Xi − X̄)(β
0 + β1 Xi + εi − β0 − β1 X̄ − ε̄)
= n 2
i=1 (Xi − X̄)
n
(X − X̄)(εi − ε̄)
= β1 + i=1 n i
(Xi − X̄)2
n i=1
(Xi − X̄)εi
= β1 + i=1n 2
i=1 (Xi − X̄)

recall that Eεi = 0. It follows that

Eb1 = β1 .

3
For b0 ,

E(b0 ) = E(Ȳ − b1 X̄) = β0 + β1 X̄ − E(b1 )X̄ = β0 + β1 X̄ − β1 X̄

= β0

• Variance of the estimators

σ2 1 2 X̄ 2
V ar(b1 ) = n 2
, V ar(b0 ) = σ + n 2
σ2
i=1 (Xi − X̄) n i=1 (Xi − X̄)

[Proof:
n
(Xi − X̄)εi
V ar(b1 ) = V ar( i=1
n 2
)
i=1 (Xi − X̄)
n n

2 −2
= { (Xi − X̄) } V ar{ (Xi − X̄)εi }
i=1 i=1
n
 n

= { (Xi − X̄)2 }−2 (Xi − X̄)2 σ 2
i=1 i=1
σ2
= n .
i=1 (Xi − X̄)2

We shall prove the second equation later.]

• Estimated (fitted) regression function Ŷi = b0 + b1 Xi . We also call Ŷi = b0 + b1 Xi the


fitted value.
E{Ŷi } = EYi

[Proof:
E(Ŷi ) = E(b0 + b1 Xi ) = E(b0 ) + E(b1 )Xi = β0 + β1 Xi = EYi

Numerical properties of fitted regression line


Recall the normal equations
n

−2 (Yi − b0 − b1 Xi ) = 0
i=1
n

−2 Xi (Yi − b0 − b1 Xi ) = 0
i=1

4
and ei = Yi − Ŷi = Yi − b0 − b1 Xi . It follows
n

ei = 0
i=1
n
Xi ei = 0
i=1

The following properties follows



n
• ei = 0
i=1


n 
n
• Yi = Ŷi
i=1 i=1


n
• e2i = minb0 ,b1 {Q}
i=1


n
• Xi ei = 0
i=1


n
• Ŷi ei = 0
i=1

• Regression line always goes to (X̄, Ȳ )

• Yi − Ȳ = β1 (Xi − X̄) + i , where i = εi − ε̄.

• The coefficient and the correlation coefficient

sY
b1 = rX,Y
sX

where  
 
 n  n
 (Xi − X̄)2  (Yi − Ȳ )2
 i=1 
sX = , sY = i=1
n−1 n−1
n
(Xi − X̄)(Yi − Ȳ )
rX,Y =  i=1 n
n 2 2
i=1 (Xi − X̄) i=1 (Yi − Ȳ )

2.1 Estimation of Error Terms Variance σ 2

• Sum of squares of residuals or error sum of squares (SSE)


n
 n

SSE = (Yi − Ŷi )2 = e2i
i=1 i=1

5
• Estimate σ 2 by

n 
n
(Yi − Ŷi )2 e2i
i=1 i=1
s2 = =
n−2 n−2
called mean squared error (MSE), i.e.

n
e2i
i=1
M SE =
n−2

or denoted by σ̂ 2 .

Why is it divided by n − 2? because there are TWO constraints on ei , i = 1, ..., n, i.e.


the normal equations.

• s2 is unbiased estimator of σ 2 , i.e. E(s2 ) = σ 2

[Proof: For any ξ1 , ..., ξn IID with mean μ and variance σ 2 , we have
n
 n

E ¯2=E
(ξi − ξ) [(ξi − μ) − (ξ̄ − μ)]2
i=1 i=1
n

= E{ (ξi − μ)2 − n(ξ̄ − μ)2 }
i=1
n

= V ar(ξ) − nV ar(ξ̄)
i=1
= nσ 2 − σ 2

= (n − 1)σ 2

This is why we estimate σ 2 by n ¯2


− ξ)
2 i=1 (ξi
σ̂ = .
n−1
Consider
n-1 terms


1 1 1
V ar(ξ1 − ξ̄) = V ar{(1 − )ξ1 − ξ2 − ... − ξn }
n n n
1 2 2 1 2 1 2
= (1 − ) σ + 2 σ + ... + 2 σ
n n n
2 1 2 n−1 2
= (1 − + 2 )σ + σ
n n n2
1
= (1 − )σ 2 .
n

6
similarly, for any i,

1 2
V ar(ξi − ξ̄) = (1 − )σ .
n

Now turn to the estimator s2 . Consider


n
 n
 n

E{ (Yi − Ŷi )2 } = E(Yi − Ŷi )2 = V ar(Yi − Ŷi ) + {E(Yi − Ŷi )}2
i=1 i=1 i=1
n

= V ar{(Yi − Ȳ − b1 (Xi − X̄))2 }
i=1
n

= {V ar(Yi − Ȳ ) − 2Cov(Yi − Ȳ , b1 (Xi − X̄)) + V ar(b1 )(Xi − X̄)2 }
i=1
n

= {V ar(Yi − Ȳ ) − 2Cov((Yi − Ȳ )(Xi − X̄), b1 ) + V ar(b1 )(Xi − X̄)2 }
i=1
n

= {V ar(εi − ε̄) − 2Cov((Yi − Ȳ )(Xi − X̄), b1 ) + V ar(b1 )(Xi − X̄)2 }
i=1
n
 n

2
= (n − 1)σ − 2Cov( (Yi − Ȳ )(Xi − X̄), b1 ) + V ar(b1 ) (Xi − X̄)2
i=1 i=1
n
 n

2 2
= (n − 1)σ − 2Cov(b1 (Xi − X̄) , b1 ) + V ar(b1 ) (Xi − X̄)2
i=1 i=1
n

2
= (n − 1)σ − V ar(b1 ) (Xi − X̄)2 = (n − 2)σ 2 .
i=1

Thus
E(s2 ) = σ 2

Example For the above example, the MSE (estimator of σ 2 = V ar(εi )) is


n

M SE = e2i /(n − 2) = 1.975997.
i=1

or

σ̂ = M SE = 1.405702

which is also called Residual standard error.


How to find the value in the output of R?

7
3 Regression Without Predictors

At first glance, it doesn’t seem that studying regression without predictors would be very
useful. Certainly, we are not suggesting that using regression without predictors is a major
data analysis tool. We do think that it is worthwhile to look at regression models without
predictors to see what they can tell us about the nature of the constant. Understanding the
regression constant in these simpler models will help us to understand both the constant
and the other regression coefficients in later more complex models.
Model
Yi = β0 + εi , i = 1, 2, ..., n.

where as before, we assume

εi , i = 1, 2, ..., n are IID with E(εi ) = 0 and V ar(εi ) = σ 2

(We shall call this model Regression Without Predictors)


The least square estimator b0 is to minimizer of
n

Q= {Yi − b0 }2
i=1

Note that
 n
dQ
= −2 {Yi − b0 }
db0
i=1

Letting it equal 0, we have the normal equation


n

{Yi − b0 } = 0
i=1

which leads to the (ordinary) least square estimator

b0 = Ȳ .

The fitted model is


Ŷi = b0 .

The fitted residuals are


ei = Yi − Ŷi = Yi − Ȳi

8
• Can you prove the estimator is unbiased, i.e Eb0 = β0 ?

• How to estimate σ 2 ?
n
1  2
σ̂ 2 = ei
n−1
i=1

Why it is divided by n − 1?

4 Inference in regression

Next, we consider the simple linear regression model

Y1 = β0 + β1 X1 + ε1

Y2 = β0 + β1 X2 + ε2
..
. (1)

Yn = β0 + β1 Xn + εn

under assumptions of normal random errors.

• Xi is a known, observed, and nonrandom

• ε1 , ..., εn are independent N (0, σ 2 ), Thus Yi is random

• β0 , β1 and σ 2 are parameters.

By the assumption, we have

E(Yi ) = β0 + β1 Xi

and
V ar(Yi ) = σ 2

4.1 Inference of β1

We need to check whether β1 = 0 (or any other specified value, say -1.5), why

• To check whether X and Y has linear relationship

9
• To see whether the model can be simplified (if β1 = 0, the model becomes Yi = β0 +εi ,
a regression model without predictors.) For example, Hypotheses H0 : β1 = 0 v.s.
Ha : β1 = 0

Sample distribution of b1 recall


n
(X − X̄)(Yi − Ȳ )
b1 = i=1 n i 2
i=1 (Xi − X̄)

Theorem 4.1 For model (1) with normal assumption of εi then


σ2
b1 ∼ N β1 , n 2
i=1 (Xi − X̄)

Proof Recall the fact that any linear combination of independent normal distributed random
variables is still normal. To find its distribution, we only need to find its mean and variance.
Since Yi are normal and independent, thus b1 is normal, and

Eb1 = β1

and (we have proved that)


σ2
V ar(b1 ) = n 2
i=1 (Xi − X̄)
The theorem follows.

Question: what is the distribution of b1 / V ar(b1 ) under H0 ? Can we use this Theorem
to test the hypothesis H0 ? why
Estimated Variance of b1 . (Estimating σ 2 by M SE)
n 2
M SE ei /(n − 2)
s2 (b1 ) = n 2
= i=1
n 2
i=1 (Xi − X̄) i=1 (X i − X̄)

s(b1 ) is the Standard Error (or S.E.) of b1 , (or called Standard deviation)
sample distribution of (b1 − β1 )/s(b1 )

b1 − β1
follows t(n − 2) for model (1)
s(b1 )

Confidence interval for β1 . Let t1−α/2 (n−2) or t(1−α/2, n−2) the 1−α/2−quantile
of t(n − 2).

P (t(α/2, n − 2) ≤ (b1 − β1 )/s(b1 ) ≤ t(1 − α/2, n − 2)) = 1 − α

10
By symmetry of the distribution, we have

t(1 − α/2, n − 2) = −t(α/2, n − 2)

Thus, with confidence 1 − α, we have the confidence interval for β1 is

−t(1 − α/2, n − 2) ≤ (b1 − β1 )/s(b1 ) ≤ t(1 − α/2, n − 2)

i.e.
b1 − t(1 − α/2, n − 2) ∗ s(b1 ) ≤ β1 ≤ b1 + t(1 − α/2, n − 2) ∗ s(b1 )

Example 4.2 For the example above, find the 95% confidence interval for β1 ?
solution: since n = 10, we have t(1 − 0.05/2, 8) = 2.306; the SE for b1 is s(b1 ) = 0.1343.
Thus the confidence interval is

b1 ± t(1 − 0.05/2, 8) ∗ s(b1 ) = 0.7874 ± 2.306 ∗ 0.1343 = [0.4777, 1.0971]

Test of β1

• Two-sided Test: to check whether β1 is 0

H0 : β0 = 0, Ha : β1 = 0

Under H0 , we have random variable

b1
t= ∼ t(n − 2)
s(b1 )

Suppose the significance level is α (usually, 0.05, 0.01). Calculate t, say t∗

– If |t∗ | ≤ t(1 − α/2; n − 2), then accept H0 .

– If |t∗ | > t(1 − α/2; n − 2), then reject H0 .

The test can also be done based on the p-value, defined as p = P (|t| > |t∗ |). It is
easy to see that
p-value < α ⇐⇒ |t∗ | > t(1 − α/2; n − 2)

Thus

11
– If p-value ≥ α, then accept H0 .

– If p-value < α, then reject H0 .

• One-sided test: for example to check whether β1 is positive (or negative)

H0 : β1 ≥ 0, Ha : β1 < 0

Under H0 , we have

b1 b1 − β1 β1
t= = + ∼ t(n − 2) + a positive term
s(b1 ) s(b1 ) s(b1 )
Suppose the significance level is α (usually, 0.05, 0.01). Calculate t, say t∗

– If t∗ ≥ t(α; n − 2), then accept H0 .

– If t∗ < t(α; n − 2), then reject H0 .

4.2 Inference about β0

Sample distribution of b0
b0 = Ȳ − b1 X̄

Theorem 4.3 For model (1) with normal assumption of εi then


1 X̄ 2
b0 ∼ N β0 , σ 2 [ + n 2
]
n i=1 (Xi − X̄)

[Proof The expectation is

Eb0 = E{Ȳ } − E(b1 )X̄ = (β0 + β1 X̄) − β1 X̄ = β0

Let ki =  n(Xi −X̄) 2 , then (see the proof at the beginning of this part)
i=1 (Xi −X̄)

n

b1 = β1 + ki εi .
i=1

Thus
n n n
1   1
b0 = β0 + εi − ki εi = β0 + [ − ki X̄]εi
n n
i=1 i=1 i=1
The variance is
n
 1 1 X̄ 2
V ar(b0 ) = [ − ki X̄]2 σ 2 = [ + n 2
]σ 2
n
i=1
n i=1 (X i − X̄)

12
Therefore the Theorem follows.]
Estimated Variance of b0 (by replacing σ 2 with MSE).
1 X̄ 2 
s2 (b0 ) = M SE + n 2
n i=1 (Xi − X̄)

s(b0 ) is the Standard Error (or S.E.) of b0 , (or called Standard deviation)
Sample distribution of (b0 − β0 )/s(b0 )

b0 − β0
follows t(n − 2) for model (1)
s(b0 )

Confidence interval for β0 : with confidence 1 − α, we have the confidence interval

b0 − t(1 − α/2, n − 2) ∗ s(b0 ) ≤ β1 ≤ b0 + t(1 − α/2, n − 2) ∗ s(b0 )

Test of β0

• Two-sided Test: to check whether β1 is 0

H0 : β0 = 0, Ha : β0 = 0

Under H0 , we have
b0
t= ∼ t(n − 2)
s(b0 )
Suppose the significance level is α (usually, 0.05, 0.01). If the calculated t, say t∗

– If |t∗ | ≤ t(1 − α/2; n − 2), then accept H0 .

– If |t∗ | > t(1 − α/2; n − 2), then reject H0 .

Similarly, the test can also be done based on the p-value, defined as p = P (|t| > |t∗ |).
It is easy to see that

p-value < α ⇐⇒ |t∗ | > t(1 − α/2; n − 2)

Thus

– If p-value ≥ α, then accept H0 .

13
– If p-value < α, then reject H0 .

• One-sided test:to check whether β1 is positive (or negative)

H0 : β0 ≤ 0, Ha : β0 > 0

Example 4.4 For the example above, with significance level 0.05,

1. Test H0 : β0 = 0 versus H1 : β0 = 0

2. Test H0 : β1 = 0 versus H1 : β1 = 0

3. Test H0 : β0 ≥ 0 versus H1 : β0 < 0

Answer:

1. since n = 10, t(0.975, 8) = 2.306. |t∗ | = 1.432 < 2.306. Thus, we accept H0

(another approach: p-value = 0.1899 > 0.05, we accept H0 )

2. The t-value is |t∗ | = 5.862 > 2.306, thus we reject H0 , i.e. b1 is significantly different
from 0.

(another approach: p-value = 0.000378 < 0.05, we reject H0 )

3. t(0.05, 8) = −1.86, since t∗ = 1.3931 > −1.86 we accept H0

How to find these test from the output of the R code?

14

You might also like