Lecture 2 Multivariate Linear Regression Models
Lecture 2 Multivariate Linear Regression Models
1
➢ Specifically, the linear regression model with a single response takes the
form
𝑌 = 𝛽0 + 𝛽1 𝑧1 + 𝛽2 𝑧2 + ⋯ … … … … … . . +𝛽𝑟 𝑧𝑟 + 𝜀
[Response] = [mean (dependent on 𝑧1 , 𝑧2 , … … … … … . . , 𝑧𝑟 )] + [𝑒𝑟𝑟𝑜𝑟]
➢ The term "linear" refers to the fact that the mean is a linear function of
the unknown parameters 𝛽0 , 𝛽1 , … … … … . . , 𝛽𝑟 .
➢
➢ With n independent observations on Y and the associated values of 𝑧𝑖 ,
✓ the complete model becomes
𝑌1 = 𝛽0 + 𝛽1 𝑧11 + 𝛽2 𝑧12 + ⋯ … … … + 𝛽𝑟 𝑧1𝑟 + 𝜀1
𝑌2 = 𝛽0 + 𝛽1 𝑧21 + 𝛽2 𝑧22 + ⋯ … … … + 𝛽𝑟 𝑧2𝑟 + 𝜀2 (1)
⋮ ⋮ ⋮
𝑌𝑛 = 𝛽0 + 𝛽1 𝑧𝑛1 + 𝛽2 𝑧𝑛2 + ⋯ … … … + 𝛽𝑟 𝑧𝑛𝑟 + 𝜀𝑛
where the error terms are assumed to have the following properties:
i. 𝐸(𝜀𝑖 ) = 0;
ii. 𝑉𝑎𝑟(𝜀𝑖 ) = 𝜎 2 (𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡); 𝑎𝑛𝑑 (2)
iii. 𝐶𝑜𝑣(𝜀𝑖 , 𝜀𝑗 ) = 0, 𝑗 ≠ 𝑘
➢ Although the error-term assumptions in (2) are very modest,
✓ we shall later need to add the assumption of joint normality for
making confidence statements and testing hypotheses.
➢ In matrix notation, (1) becomes
𝑌1 1 𝑧11 𝑧12 ⋯ 𝑧1𝑟 𝛽0 𝜀1
𝑌 1 𝑧21 𝑧22 ⋯ 𝑧2𝑟 𝛽1 𝜀2
[ 2] = [ ][ ] + [ ⋮ ]
⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮
𝑌𝑛 1 𝑧𝑛1 𝑧𝑛2 ⋯ 𝑧𝑛𝑟 𝛽𝑟 𝜀𝑛
(observed response vector) (Designed matrix)
or 𝑌 = 𝑍 𝛽 + 𝜀 (3)
(𝑛 × 1) (𝑛 × (𝑟 + 1)) (𝑟 + 1) × 1) (𝑛 × 1)
and the specification in (2) become
2
1. 𝐸(𝜀) = 0; and
2. 𝐶𝑜𝑣(𝜀) = 𝐸(𝜀𝜀 ′ ) = 𝜎 2 𝐼
Example: Determine the linear regression model for fitting a straight line
Mean response = 𝐸(𝑌) = 𝛽0 + 𝛽1 𝑧1
to the data
𝑧1 0 1 2 3 4
y 1 4 3 8 9
➢ Before the responses 𝑌 = [𝑌1 , 𝑌2 , … … . . , 𝑌5 ] are observed, the errors 𝜀 ′ =
′
3
➢ Let b be trial values for 𝛽.
➢ Consider the difference 𝒚𝒋 − 𝒃𝟎 − 𝒃𝟏 𝒛𝒋𝟏 − ⋯ … … − 𝒃𝒓 𝒛𝒋𝒓 between the
observed response 𝒚𝒋 and the value 𝒃𝟎 + 𝒃𝟏 𝒛𝒋𝟏 + ⋯ + 𝒃𝒓 𝒛𝒋𝒓 that would be
expected if b were the "true” parameter vector.
➢ Typically, the differences 𝒚𝒋 − 𝒃𝟎 − 𝒃𝟏 𝒛𝒋𝟏 − ⋯ − 𝒃𝒓 𝒛𝒋𝒓 will not be zero,
because the response fluctuates about its expected value.
➢ The method of least squares selects b so as to minimize the sum of the
squares of the differences:
𝑛
2
𝑆(𝑏) = ∑(𝑦𝑗 − 𝑏0 − 𝑏1 𝑧𝑗1 − ⋯ … … − 𝑏𝑟 𝑧𝑗𝑟 )
𝑗=0
➢ The deviations
𝜀̂𝑗 = 𝑦𝑗 − 𝛽̂0 − 𝛽̂1 𝑧𝑗1 − ⋯ … … . −𝛽̂𝑟 𝑧𝑗𝑟 , 𝑗 = 1, 2, … … , 𝑛 (5)
are called residuals.
➢ The vector of residuals 𝜀̂ = 𝑦 − 𝑍𝛽̂ contains the information about the
remaining unknown parameter 𝝈𝟐 .
4
𝛽̂ = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦
satisfy 𝑍 ′ 𝜀̂ = 0 and 𝑦̂ ′ 𝜀̂ = 0.
Also, the
2
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 = ∑𝑛𝑗=1(𝑦𝑗 − 𝛽̂0 − 𝛽̂1 𝑧𝑗1 − ⋯ − 𝛽̂𝑟 𝑧𝑗𝑟 )
= 𝜀̂ ′ 𝜀̂
= 𝑦 ′ 𝑦 − 𝑦 ′ 𝑍𝛽̂
Example: Calculate the least square estimates 𝛽̂ , the residuals 𝜀̂, and the residual
sum of squares for a linear model
𝑌𝑗 = 𝛽0 + 𝛽1 𝑧𝑗1 + 𝜀𝑗
𝑓𝑖𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎
𝑧1 0 1 2 3 4
y 1 4 3 8 9
Solution: We have
Z 𝑍′ 𝑦 𝑍′𝑍 (𝑍 ′ 𝑍)−1 𝑍′𝑦
1 0 1
1 1 4
1 1 1 1 1 5 10 0.6 −0.2 25
1 2 [ ] 3 [ ] [ ] [ ]
0 1 2 3 4 10 30 −0.2 0.1 70
1 3 8
[1 4] [9 ]
5
Calculation of (𝒁′ 𝒁)−𝟏
5 10
𝑍′𝑍 = [ ]
10 30
|𝑍 ′ 𝑍| = 150 − 100 = 50
Cofactor of 5 = (-1)2(30)=30
Cofactor of 10 = (-1)3 (10)= -10
Cofactor of 10 = (-1)3 (10)= -10
Cofactor of 30 = (-1)4 (5)= 5
30 −10
Cofactor matrix of 𝑍 ′ 𝑍 = [ ]
−10 5
30 −10
Adj 𝑍 ′ 𝑍 = 𝑇𝑟𝑎𝑛𝑝𝑜𝑠𝑒 𝑜𝑓 𝑐𝑜𝑓𝑎𝑐𝑡𝑜𝑟 𝑚𝑎𝑡𝑟𝑖𝑥 𝑍 ′ 𝑍 = [ ]
−10 5
1 1 30 −10 0.6 −0.2
(𝑍 ′ 𝑍)−1 = ′ Adj 𝑍 ′ 𝑍 = [ ]=[ ]
|𝑍 𝑍| 50 −10 5 −0.2 0.1
Consequently,
𝛽̂ 0.6 −0.2 25 1
𝛽̂ = [ 0 ] = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦 = [ ][ ] = [ ]
̂
𝛽1 −0.2 0.1 70 2
and the fitted equation is
𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑧 = 1 + 2𝑧
The vector of fitted (predicted) value is
1 0 1
1 1 3
1
𝑦̂ = 𝑍𝛽̂ = 1 2 [ ] = 5
2
1 3 7
[1 4 ] [ 9]
6
1 1 0
4 3 1
so 𝜀̂ = 𝑦 − 𝑦̂ = 3 − 5 = −2
8 7 1
[ 9 ] [9 ] [ 0 ]
7
Example: Find 𝑅2 for the above problem.
y z 𝑦̂ = 1 + 2𝑧 (𝑦̂ − 𝑦̅)2 (𝑦 − 𝑦̅)2
1 0 1 16 16
4 1 3 4 1
3 2 5 0 4
8 3 7 4 9
9 4 9 16 16
5 𝑛 𝑛
2 2
∑ 𝑦 = 25 ∑(𝑦̂𝑗 − 𝑦̅) = 40 ∑(𝑦𝑗 − 𝑦̅) = 46
𝑗=1 𝑗=1 𝑗=1
25
𝑦̅ = =5
5
2
∑𝑛𝑗=1(𝑦̂𝑗 − 𝑦̅) 40
𝑅2 = 2 = = 0.87
∑𝑛𝑗=1(𝑦𝑗 − 𝑦̅) 46
Adjusted 𝑹𝟐
➢ Incidentally, 𝑅2 is biased upward, particularly in small samples. Therefore,
adjusted 𝑅2 is sometimes used. The formula is
2
(𝑁 − 1)(1 − 𝑅2
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 = 1 − ( )
𝑁−𝐾−1
where
8
➢ Note that, unlike 𝑅2 , adjusted 𝑅 2 can actually get smaller as additional
variables are added to the model.
➢ One of the claimed benefits for adjusted 𝑅2 is that it “punishes” you for
including extraneous variables in the model.
✓ Extraneous means irrelevant or unrelated.
✓ These variables can influence the dependent variable but are beyond
the researchers' control, and sometimes even their awareness.
✓ They make it difficult to determine the actual impact of the
independent (intentionally manipulated) variable.
✓ If left uncontrolled, extraneous variables can lead to inaccurate
conclusions about the relationship between independent and
dependent variables.
➢ Also note that, as N gets larger, the difference between 𝑅2 and adjusted 𝑅2
gets smaller and smaller.
9
✓ then we need to assume that the 𝜺𝒊 follow some probability
distribution.
➢ For reasons we assumed that the 𝜺𝒊 follow the normal distribution with
zero mean and constant variance 𝝈𝟐 .
➢ Moreover, the estimators 𝛽̂0 , 𝛽̂1 , 𝑎𝑛𝑑 𝛽̂2 are themselves normally distributed
with means equal to true 𝛽0 , 𝛽1 , 𝑎𝑛𝑑 𝛽2 and with the variances
1 𝑋̅12 ∑ 𝑥2𝑖
2
+ 𝑋̅22 ∑ 𝑥1𝑖
2
− 2𝑋̅1 𝑋̅2 ∑ 𝑥1𝑖 𝑥2𝑖 2
̂
𝑣𝑎𝑟(𝛽0 ) = [ + ]𝜎
2 ∑ 2
𝑛 ∑ 𝑥1𝑖 𝑥2𝑖 − (∑ 𝑥1𝑖 𝑥2𝑖 )2
𝑠𝑒(𝛽̂0 ) = +√𝑣𝑎𝑟(𝛽̂0 )
2
∑ 𝑥2𝑖
𝑣𝑎𝑟(𝛽̂1 ) = [ 2 2 2
] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )
Or, equivalently,
𝑠𝑒(𝛽̂1 ) = +√𝑣𝑎𝑟(𝛽̂1 )
2
∑ 𝑥1𝑖
𝑣𝑎𝑟(𝛽̂2 ) = [ 2 2 ] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )2
Or, equivalently,
𝑠𝑒(𝛽̂2 ) = +√𝑣𝑎𝑟(𝛽̂2 )
10
Example
• Husbands’ hours of housework per week (Y)
• Number of children (X1)
• Husbands’ years of education (X2)
Table: 1
2 2
𝑦𝑖 𝑥1𝑖 𝑥2𝑖 𝑥1𝑖 𝑥2𝑖 𝑥1𝑖 𝑥2𝑖
1 1 12 1 144 12
2 1 14 1 196 14
3 1 16 1 256 16
5 1 16 1 256 16
3 2 18 4 324 36
1 2 16 4 256 32
5 3 12 9 144 36
0 3 12 9 144 36
6 4 10 16 100 40
3 4 12 16 144 48
7 5 12 25 144 60
4 5 16 25 256 80
2 2
∑ 𝑥1𝑖 ∑ 𝑥2𝑖 = ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑥1𝑖 𝑥2𝑖
∑ 𝑦𝑖 =40 = 32 166 = 112 = 2108 = 426
SPSS Results
Change Statistics
ANOVAa
11
b. Predictors: (Constant), Husbands’ years of education , Number of children
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Husbands’ years of
.002 .272 .003 .008 .994
education
32 166
𝑋̅1 = = 2.67, 𝑋̅2 = = 13.83
12 12
From Table 2:
𝜎̂ 2 = 𝑀𝑒𝑎𝑛 𝑆𝑢𝑚 𝑠𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 = 4.29
1 𝑋̅12 ∑ 𝑥2𝑖
2
+ 𝑋̅22 ∑ 𝑥1𝑖
2
− 2𝑋̅1 𝑋̅2 ∑ 𝑥1𝑖 𝑥2𝑖 2
̂
𝑣𝑎𝑟(𝛽0 ) = [ + ]𝜎
2 ∑ 2
𝑛 ∑ 𝑥1𝑖 𝑥2𝑖 − (∑ 𝑥1𝑖 𝑥2𝑖 )2
1 (2.67)2 (2108)+(13.83)2 (112)−2(2.67)(13.83)(426)
=[ + (112)(2108)−(426)2
] 4.229
12
1 7.13(2108)+(191.27)(112)−31461.04
=[ + ] 4.229
12 236096−181476
15030.04+21422.24−31461.04
= [0.083 + ] 4.29
54620
36452.28−31461.04
= [0.083 + ] 4.29
5420
= [0.083 + 0.92]4.29
= 4.30
𝑠𝑒(𝛽̂0 ) = +√𝑣𝑎𝑟(𝛽̂0 ) = +√4.30 = 2.07
2
∑ 𝑥2𝑖
𝑣𝑎𝑟(𝛽̂1 ) = [ 2 2 2
] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )
2108
𝑣𝑎𝑟(𝛽̂1 ) = [ ] 4.229
(112)(2108) − (426)2
2108
=[ ] 4.229
236096−181476
12
2108
=[ ] 4.229 = .0385 × 4.229 = 0.163
54620
112
𝑣𝑎𝑟(𝛽̂2 ) = [ ] 4.229
(112)(2108) − (426)2
112
=[ ] 4.229
236096−181476
112
=[ ] 4.229 = 0.0087
54620
Significance Testing
➢ Significance testing involves testing the significance of the overall
regression equation as well as specific regression coefficients.
➢ The null hypothesis for the overall test is that the coefficient of multiple
2
determination in the population, 𝑅𝑝𝑜𝑝 , is zero.
2
𝐻0 : 𝑅𝑝𝑜𝑝 =0
2
𝐻𝐴 : 𝑅𝑝𝑜𝑝 ≠0
➢ This is equivalent to the following null hypothesis
𝐻0 : 𝛽2 = ⋯ … … … . . = 𝛽𝐾 = 0
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽 ≠ 0
➢ One of the formulae for F, which is mainly useful when the original data are
not available is
𝑅2 (𝑁 − 𝐾 − 1)
𝐹=
(1 − 𝑅2 )𝐾
with F (K, N-K-1) distribution.
13
➢ For the data of above example
𝑅2 (𝑁 − 𝐾 − 1) . 249(12 − 2 − 1) 2.241
𝐹= = = = 1.49
(1 − 𝑅2 )𝐾 (1 − .249)2 1.502
➢ The critical value of F with K=2 and N-K-1=12 – 2 -1 = 9 df is 4.26 (using
F distribution Table).
Comment: The critical value of F=4.29 is greater than the calculated value of F
=1.49 at 5% level of significance, providing strong evidence in favor of null
hypothesis.
14
Step 4: Accept 𝐻0 .
15