0% found this document useful (0 votes)
73 views9 pages

Definition of Simple Linear Regression

This document provides an overview of simple linear regression. It defines the standard linear regression model as Y = β0 + β1X + ε, where Y is the dependent variable, X is the independent variable, β0 is the intercept, β1 is the slope, and ε is the error term. It describes the properties of the model, including that β0 and β1 are unknown parameters that are estimated. The least squares method is used to estimate β0 and β1 by minimizing the residual sum of squares between the observed and fitted Y values. The least squares estimators b0 and b1 are proven to be unbiased estimators of the true parameters β0 and β1.

Uploaded by

tinkit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views9 pages

Definition of Simple Linear Regression

This document provides an overview of simple linear regression. It defines the standard linear regression model as Y = β0 + β1X + ε, where Y is the dependent variable, X is the independent variable, β0 is the intercept, β1 is the slope, and ε is the error term. It describes the properties of the model, including that β0 and β1 are unknown parameters that are estimated. The least squares method is used to estimate β0 and β1 by minimizing the residual sum of squares between the observed and fitted Y values. The least squares estimators b0 and b1 are proven to be unbiased estimators of the true parameters β0 and β1.

Uploaded by

tinkit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Chapter 2.

Simple linear regression

1. Definition of simple linear regression

1.1. Standard model


Y = β0 + β1 X + ε
• Y dependent variable
• X independent variable
• ε random error term
• Parameters
o β0 intercept
o β1 slope

(a) Data setting


yi = β0 + β1 xi + εi, i =1, 2, …, n

• β0, β1
o constant parameters
• xi
o nonrandom, observed with negligible error
• εi
o Random
o Zero mean, E(εi) = 0
o Constant variance, Var(εi) = σ2
o Uncorrelated, Cov(εi, εj) = 0, for i ≠ j
o Usually assumed independent identically distributed (i.i.d.)

(b) Properties
(i) Dependent variable yi
• yi is a random variable
• E(yi) = β0 + β1 xi
• E(Y|X) = β0 + β1 X
o Mean of Y conditional on X
• Var(yi) = σ2
o Independent of xi
• Cov(yi, yj) = 0, for i ≠ j

(ii) Regression parameters


• β0 and β1 are called regression coefficients
o Depend on the units used on Y and X
• β1 is the change in E(Y) per unit increase of X
• Characteristics of β0 depends on the range of the X in the data
o When X = 0 is included, β0 = E(Y|X=0)
o When X = 0 is far away from the data, β0 has no particular meaning
o Usually just call β0 the intercept

1.2. Alternative model


• Dummy variable
yi = β0 x0 + β1 xi + εi
o x0 ≡ 1 for all i

1
• Centered linear regression model
yi = β 0* + β1 ( xi − x ) + ε i
o β 0* = ? , express in terms of the components in the base model.

2. Estimation

2.1. Least squares estimates for β 0 and β1


• The linear regression model assume the relation between X and Y to be linear.
o What would be the exact relation between X and Y?
• β0 and β1 are the true but unknown parameters under the linear relationship.
o What are the actual values of the parameters?

(a) Definitions
• b0 = βˆ0 and b1 = βˆ1 are estimators of β0 and β1
• Fitted value
yˆ i = b0 + b1 xi
• Residual
ei = yi − yˆ i
• Fitted regression line
yˆ = b0 + b1 x
o Is it the true relation when the realization of b0 and b1 are obtained?

(b) Least squares method


• Estimate β0 and β1 so that the fitted regression line (based on the estimated parameters) lies
“closest” to the data
• The parameter estimates are obtained by minimizing the residual sum of squares
n n
SSE = ∑ ( yi − yˆ i ) = ∑ ( yi − f ( xi | b0 , b1 ))
2 2

i =1 i =1

o In simple linear regression model, f ( x | b0 , b1 ) = b0 + b1 x


o The estimators are called the least squares estimators
o SSE corresponds to the squared vertical distances from the observed values to the fitted
line

2
• For the simple linear regression model, the LSE b0 and b1 satisfy
⎧ ∂ ⎡ n
⎪ ⎢ ∑ ( yi − b0 − b1 xi )2 ⎤⎥ = 0
⎪ ∂b0 ⎣ i =1 ⎦

⎪ ∂ ⎡∑ ( yi − b0 − b1 xi )2 ⎤ = 0
n

⎪⎩ ∂b1 ⎢⎣ i =1 ⎥

⎧ n
⎪⎪ − ∑ 2( yi − b0 − b1 xi ) = 0
i =1
⎨ n
⎪− ∑ 2 xi ( yi − b0 − b1 xi ) = 0
⎪⎩ i =1
⎧ n n

⎪⎪ ∑ i y = nb 0 + b1 ∑ xi
i =1 i =1
⎨n n n
⎪∑ xi yi = b0 ∑ xi + b1 ∑ xi2
⎪⎩ i =1 i =1 i =1

o From the 1st equation


1⎛ n n

⎜ ∑ yi − b1 ∑ xi ⎟ = y − b1 x
b0 =
n ⎝ i =1 i =1 ⎠
o Substitute b0 into the 2nd equation
n n n

∑ xi yi = ∑ ( y − b1 x )xi + b1 ∑ xi2
i =1 i =1 i =1
n n

∑ x ( y − y ) = b ∑ x (x − x )
i =1
i i 1
i =1
i i

n n
⎛ n
⎞ n

∑ (xi − x )( yi − y ) + ∑ x ( yi − y ) = b1 ⎜ ∑ (xi − x )(xi − x ) + ∑ x (xi − x )⎟


i =1 i =1 ⎝ i=1 i =1 ⎠

(xi − x )( yi − y ) + x ∑ ( yi − y ) = b1 ⎛⎜ ∑ (xi − x )(xi − x ) + x ∑ (xi − x )⎞⎟


n n n n


i =1 i =1 ⎝ i=1 i =1 ⎠
S XY = b1S XX
S
b1 = XY
S XX

• The least squares estimators b0 and b1 are linear estimators as they are linear combinations
of yi
n n n n

∑ (x − x )( y
i i − y) ∑ (x − x ) y − y ∑ (x − x ) ∑ (x − x ) y
i i i i i n
b1 = i =1
= i =1 i =1
= i =1
= ∑ k i yi
S XX S XX S XX i =1

x −x
o where ki = i is independent of yi
S XX
1⎛ n n
⎞ 1⎛ n ⎛ n ⎞ n ⎞ n 1⎛ n

b0 = ⎜ ∑ yi − b1 ∑ xi ⎟ = ⎜⎜ ∑ yi − ⎜ ∑ ki yi ⎟∑ xi ⎟⎟ = ∑ ⎜1 − ki ∑ xi ⎟ yi
n ⎝ i=1 i =1 ⎠ n ⎝ i=1 ⎝ i=1 ⎠ i=1 ⎠ i=1 n ⎝ i =1 ⎠

3
(c) Distribution of the least squares estimators when the linear model is true
• b0 and b1 are unbiased estimators for β0 and β1
⎛ n ⎞ n n
⎜ ∑ (xi − x ) yi ⎟ ∑ ( xi − x )E ( yi ) ∑ (xi − x )(β 0 + β1 xi )
E (b1 ) = E ⎜ i=1 ⎟ = i=1 = i=1
⎜ S XX ⎟ S XX S XX
⎜ ⎟
⎝ ⎠
n n n
β 0 ∑ ( xi − x ) + β1 ∑ (xi − x )xi β1 ∑ (xi − x )2
= i =1 i =1
= i =1
S XX S XX
= β1

E (b0 ) = E ( y − b1 x ) = E ( y ) − E (b1 x )
1 n 1 n
= ∑ i E ( y ) − x E (b1 ) = ∑ (β 0 + β1xi ) − β1 x
n i=1 n i=1
1 n
= β 0 + β1 ∑ xi − β1 x
n i=1
= β0

Example

Westwood company data


• Man-hours : dependent variable
• Lot size: independent variable
• Regression model
o Man-hours = β0 + β1 Lot size + ε 180
y = 2x + 10
• Least squares estimates 160

S 6800
o b1 = XY = =2
140

S XX 3400 120

o b0 = y − b1 x = 110 − 2 × 50 = 10 100

• Estimated regression line:


80

60
o Man-hours = 10 + 2 Lot size
• b1 = +2 40

o Man-hours increase with log-size 20

o When log-size increases by 1 unit, man- 0


0 20 40 60 80 100
hours increase by 2 units

• b0 = 10
o When log-size = 0, man-hors = 10 unit
o Not reliable as data range for X excludes
zero

4
Example

Shocks data
• All observations are considered
• Time: dependent variable
• Shocks: independent variable
• Regression model Time
15
o Time = β0 + β1 Shocks + ε 14
13
• Least squares estimates 12
S − 208.4 11
o b1 = XY = = - 0.6129 10
S XX 340 9
8
o b0 = y − b1 x = 5.8875 − (−0.6129 × 7.5) = 10.4846 7
6
• Estimated regression line 5
o Time = 10.48456 - 0.612941×Shocks 4
3
• b1 = -0.6129 2
o Time decreases with number of shocks 0 2 4 6 8 10 12 14 16
o When number of shock increases by 1, time Shocks
decreases by 0.6129 seconds
• b0 = 10.48
o When number of shocks = 0, time = 10.48 seconds
o Data range for X includes zero

• Variance of the sampling distribution of b1


⎛ n ⎞ n n
⎜ ∑ ( xi − x ) yi ⎟ ∑ (xi − x ) Var ( yi ) ∑ (x − x ) σ
2 2 2
i
σ2
Var (b1 ) = Var ⎜ i=1 ⎟ = i=1 = i =1
=
⎜ S XX ⎟ (S XX )2 (S XX )2 S XX
⎜ ⎟
⎝ ⎠
o Q Cov ( yi , y j ) = 0, i ≠ j

• Variance of the sampling distribution of b0


o Consider
(x − x )yi ⎞⎟ 1
Cov( y , b1 ) = Cov⎜⎜ y , ∑ i

S XX ⎟= S ∑ (x − x )Cov( y, y )
i i
⎝ ⎠ XX

⎛1 n ⎞ ⎛ yi ⎞
∑ (x − x )Cov⎜⎜ n ∑ y , y ⎟⎟ = S ∑ (x − x )Cov⎜⎝ n , y ⎟⎠
1 1
= i j i i i
S XX ⎝ j =1 ⎠ XX

σ 2
=
nS XX
∑ (x − x )
i

=0
o Therefore
Var (b0 ) = Var ( y − b1 x )
= Var ( y ) + x 2Var (b1 ) − 2 x Cov( y , b1 )
⎛ 1 x2 ⎞
= σ 2 ⎜⎜ + ⎟⎟
⎝ n S XX ⎠

5
• Covariance between b0 and b1
xσ 2
Cov(b0 , b1 ) = Cov(( y − b1 x ), b1 ) = Cov( y , b1 ) − x Cov(b1 , b1 ) = −
S XX
o We obtain SXX and x from the data, how about σ2?

Gauss-Markov theorem

The least square estimators b0 and b1 are unbiased and have minimum variance among all
unbiased linear estimators (Exercise)

(d) Remarks
• The inference and prediction by the fitted line are only valid for X values in the range of the
data set.
• A linear relationship between two variables can exist without causation.
• The simple linear regression model applies only if the true relationship between the two
variables is a straight-line relationship.
• When the magnitude of the slope estimate b1 is close to zero, the fitted regression line will
be nearly parallel to the x-axis. Then the supplementary variable X will be of little use for
the prediction of Y.

2.2. Estimate of error variance


• The error variance
σ 2 = Var (ε i ) = Var ( yi − (β 0 + β1 xi ))
• Error mean square (or mean square error, MSE) is defined as
n

∑ (y − yˆ i )
2
i
MSE = s 2 = i =1

n−2
o Error (or residual) degrees of freedom (df) = n − 2
o s2 is unbiased under the important assumption that the model is correct
E s2 = σ 2 ( )
• Estimates of the variances and covariance of b0 and b1
o Obtained by replacing σ2 by s2
s2
s 2 {b1} =
S XX
⎛ 1 x2 ⎞
s {b0 } = s ⎜⎜ +
2 2
⎟⎟
⎝ n S XX ⎠
x ⋅ s2
Cov(b0 , b1 ) = −
^

S XX

Example

Westwood company data


• b0 = 10, b1 = 2
o yˆ i = 10 + 2 xi

6
Production run Log-size (X) Man-hours (Y) Predicted Man-hours ( Ŷ )
1 30 73 70
2 20 50 50
3 60 128 130
4 80 170 170
5 40 87 90
6 50 108 110
7 60 135 130
8 30 69 70
9 70 148 150
10 60 132 130

• s2 =
1
10 − 2
[ ]
(73 − 70)2 + (50 − 50)2 + L + (132 − 130)2 = 60 = 7.5
8
o s = 7.5 = 2.74
o Degrees of freedom = n – 2 = 8
• Sample variance of b1
s2 7.5
o = = 0.002206
S XX 3400
o SE (b1 ) = 0.002206 = 0.046967
• Sample variance of b0
⎛ 1 x2 ⎞ ⎛1 50 2 ⎞
o s 2 ⎜⎜ + ⎟⎟ = 7.5⎜⎜ + ⎟⎟ = 6.264706
⎝ n S XX ⎠ ⎝ 10 3400 ⎠
o SE (b0 ) = 6.264706 = 2.502939
• Sample covariance of b0 and b1
xs 2 50 × 7.5
o − =− = −0.1103
S XX 3400

Example

Shocks data
• b0 = 10.4846, b1 = -0.6129
o yˆ = 10.4846 − 0.6129 x

X Y Predicted Y
0 11.4 10.4846
1 11.9 9.8716
2 7.1 9.2587
3 14.2 8.6457
4 5.9 8.0328
5 6.1 7.4199
… … …

• s2 =
1
16 − 2
[ ]
(11.4 − 10.48)2 + (11.9 − 9.87 )2 + L = 5.0943
o s = 5.0943 = 2.257

7
o df = 14

• Sample variance for b1


5.0943
o = 0.0150
340
o SE (b1 ) = 0.1224
• Sample variance for b0
⎛ 1 7.52 ⎞
o 5.0943⎜⎜ + ⎟⎟ = 1.1612
⎝ 16 340 ⎠
o SE (b0 ) = 1.0776
• Sample covariance of b0 and b1
7.5 × 5.0943
o − = −0.1124
340

2.3. Maximum likelihood estimation


(a) Likelihood function
• Assume εi ~ i.i.d. N(0,σ2) with probability density function
f (ε i ) = φ (ε i )
• The joint density function / likelihood function
n n
L = ∏ f (ε i ) = ∏ φ (ε i )
i =1 i =1

• The pdf for the normal distribution with mean 0 is


⎛ x2 ⎞
φ (x ) =
1
Exp ⎜− ⎟
(2π )1/ 2 σ ⎜⎝ 2σ 2 ⎟⎠
• The likelihood function is
L = L(β 0 , β1 , σ 2 | x, y )
1 ⎛ 1 n

=
(2π ) σ
n/2 n
Exp⎜ − 2
⎝ 2σ
∑ε
i =1
i
2


⎛ 1 n
2⎞
∑ (y − β 0 − β1 xi ) ⎟
1
= Exp⎜ − 2
(2π ) σ
n/2 n
⎝ 2σ i =1
i

(b) Maximum likelihood estimates (MLE) for β0 and β1


• MLE for β0 and β1 maximize L, i.e. to minimize
1 n
2 ∑
( yi − β 0 − β1 xi )2
2σ i =1
o Equivalent to minimizing SSE
o Under the normal theory assumption, the MLE of the regression coefficients β0 and β1
are the least squares estimators

(c) MLE for error variance


• The log-likelihood is given as
( )
n
l = log(L ) = k − log σ 2 − 2 ∑ (y − β 0 − β1 xi )
n 1 2


i
2 i =1

o where k is free of σ2

8
• Substituted by b0 and b1
n
2
( ) 1 n
l = k − log σ 2 − 2 ∑ ( yi − yˆ i )
2σ i =1
2

• MLE for σ2 maximizes l


∂l 1 n
= −
n
+ 4 ∑
( yi − yˆ i )2 = 0
∂σ 2
2σ 2
2σ i =1
n
− n + 2 ∑ ( yi − yˆ i ) = 0
1 2

σˆ i =1
n

∑ (y i − yˆ i )
2

σˆ 2 = i =1
n
n−2 2
= s
n
o σˆ 2 is a biased estimator of σ2

You might also like