STAT 526 - Spring 2011 Olga Vitek
Homework 3 - Solution
Each part of the problems 5 points
1. KNNL 25.17
(Note: you can choose either the restricted or the unrestricted version of the model. Please state clearly
which model you use.)
Here is the unrestricted model below,
Yijk = µ + αi + βj + (αβ)ij + εijk
where P
αi is the fixed coats effect. i αi = 0, i = 1, 2, 3
iid
βj is the random batch effect, βj ∼ N (0, σβ2 ), j = 1, 2, 3, 4
2
(αβ)ij is the interaction effect, (αβ)ij ∼ N (0, σαβ )
2
εijk ∼N (0, σ ), k = 1, 2, 3, 4
a = 3, b = 4,, n = 4
(a) Test interaction effect :
Analysis of Variance Table
Response: value
Df Sum Sq Mean Sq F value Pr(>F)
coatF 2 150.388 75.194 15.591 1.327e-05 ***
batchF 3 152.852 50.951 10.564 3.984e-05 ***
coatF:batchF 6 1.852 0.309 0.064 0.9988
Residuals 36 173.625 4.823
2 2
- Hypothesis: H0 : σαβ = 0 vs Ha : σαβ > 0, i = 1, 2, 3, j = 1, 2, 3, 4
∗
- Decision rule: Reject H0 if F > F0.95,6,36 = 2.36375
- Conclusion: Since F ∗ = M SAB/M SE = 0.064 , we do not reject H0 and conclude that there
are no interaction effects with p = 0.9988
R code
X <- read.table("CH25PR17.txt", sep="", as.is=TRUE, header=FALSE)
colnames(X)<-c("value","coat","batch","replicate")
# Factor
X$coatF<-factor(X$coat)
X$batchF<-factor(X$batch)
#ANOVA table
fit1<-aov(value~coatF*batchF, data=X)
anova(fit1)
qf(0.095,6,36) # F(0.95,6,36)
1-pf(0.064,6,36) # p-value
1
(b) Test main effects:
i. Factor A (the number of coats) main effect
- Hypothesis: H0 : αi = 0 vs Ha : at least one αi is not 0, i = 1, 2, 3
- Decision rule: Reject H0 if F ∗ = M SA/M SAB > F0.95,2,6 = 5.143253
- Conclusion: Since F ∗ = 75.194/0.309 = 243.3463 , we reject H0 and conclude that there is
Factor A main effect for the number of coats with p ≈ 0
ii. Factor B (batch) main effect
- Hypothesis: H0 : σβ2 = 0 vs Ha : σβ2 > 0 - Decision rule: Reject H0 if F ∗ = M SB/M SAB >
F0.95,3,6 = 4.757063
- Conclusion: Since F ∗ = 50.951/0.309 = 164.89 , we reject H0 and conclude that there is
main effect for Gender with p ≈ 0
(c) Comparisons
i. CI for D̂1 = Ȳ2· − Ȳ1· :
D̂1 = Ȳ2· − Ȳ1· = 76.79375
P − 73.10625 = 3.6875,
s2 {D̂} = √M SAB/bn × c2i = 0.309/(4 × 4) × (1 + 1) = 0.038625,
s{D̂1 } = 0.038625 = 0.1965324
t(1 − 0.05/2, 2 · 3) = t(0.975, 6) = 2.446912
Therefore, 95% Confidence interval for D̂1 is
3.6875 ± 2.4469(0.1965324) = (3.2066, 4.168397)
, which does not include zero.
ii. CI for D̂2 = Ȳ3· − Ȳ2· :
D̂2 = Ȳ3· − Ȳ2· = 76.925P− 76.79375 = 0.13125,
s2 {D̂} = √M SAB/bn × c2i = 0.309/(4 × 4) × (1 + 1) = 0.038625,
s{D̂1 } = 0.038625 = 0.1965324
t(1 − 0.05/2, 2 · 3) = t(0.975, 6) = 2.446912
Therefore, 95% Confidence interval for D̂2 is
0.13125 ± 2.4469(0.1965324) = (−0.349675, 0.6121475)
, which includes zero.
iii. Conclusion : at 90% joint family confidence, there is a significant difference between the
number of coats=6 (Factor A=1) and the number of coats=8 (Factor A=2) . But, there is
no significant difference between the number of coats=8 (Factor A=2) and the number of
coats=10 (Factor A=3).
(d) 95% confidence interval for µ2· using Satterthwaite procedure :
Yijk = µ + αi + βj + (αβ)ij + εijk
µˆ2· = µ + α2 = 73.10625 + 3.68750 = 76.79375 and based on unrestricted mixed model, the
standard error for the estimate for µ2· is
4 4 4 4 4 4
1 XX 1X 1X 1 XX
µ̂2· = y2jk = µ + α2 + βj + (αβ)2j + εijk
4 × 4 j=1 4 j=1 4 j=1 4 × 4 j=1
k=1 k=1
and therefore
1 1 2 1
µ̂2· ∼ N (µ + α2 , σβ2 + σαβ + σ2 )
4 4 16
2
σ̂β2 2
σ̂αβ σ̂ 2 anσ̂β2 + anσ̂αβ
2
+ aσ̂ 2 ((a − 1)nσβ2 + (a − 1)σ 2 ) + (σ 2 + anσβ2 + nσαβ
2
)
s2 (µ̂2· ) = + + = =
b b bn abn abn
a−1 1
= M SAB + M SB
abn abn
3−1 1
= (0.309) + (50.951)
(3 × 4 × 4) (3 × 4 × 4)
= 1.074354
s(µ̂2· ) = 1.036510
and the degrees of freedom are
2 1 2
( 48 M SAB + 48 M SB)
df = 2
( 48 M SAB)2 1
( 48 M SB)2
6 + 3
= 3.072991
t(1-0.05/2, 3)= 3.182446,
Therefore, 95% confidence interval for µ2· is
76.79375 ± 3.182446(1.03651) = (73.53657, 80.05093)
The average market value for a pearl with 8 coats is between 73.539 and 80.0485 with 95%
confidence interval.
(e) 90% confidence interval for σβ2 using MLS procedure :
MLS procedure designed to estimate a linear combination of two expected mean squares for
balanced studies of the form :
L = c1 E{M S1} + c2 E{M S2 }, c1 > 0, c2 < 0
Using the unrestricted model,
L̂ = σ̂β2 = c1 M S1 + c2 M S2
M SB − M SAB M SB M SAB 50.951 0.309
= = − = − = 4.220167
an an an 3×4 3×4
1 1
So, c1 = 12 , c2 = − 12 , M S1 = M SB, M S2 = M SAB
Using R code below (with formula in p. 1045), 90% confidence interval for σβ2 is (1.610339
36.173883) by R result, σβ2 is not much different from σ 2 .
R code
# 25.17(e) : MLS procedure
alpha=0.1
dfAB=6
dfB=3
c1=1/(3*4)
c2=-1/(3*4)
MSAB=0.309
MSA=75.194
3
MSB=50.951
F1<-qf(1-alpha/2, dfB, Inf)
F2<-qf(1-alpha/2, dfAB, Inf)
F3<-qf(1-alpha/2, Inf, dfB)
F4<-qf(1-alpha/2, Inf, dfAB)
F5<-qf(1-alpha/2, dfB, dfAB)
F6<-qf(1-alpha/2, dfAB, dfB)
G1<-1.0-1/F1
G2<-1.0-1/F2
G3<-((F5-1)^2-(G1*F5)^2-(F4-1)^2)/F5
G4<-F6*(((F6-1)/F6)^2-((F3-1)/F6)^2-G2^2)
HL<-sqrt((G1*c1*MSB)^2+((F4-1)*c2*MSAB)^2-(G3*c1*c2*MSB*MSAB))
HU<-sqrt(((F3-1)*c1*MSB)^2+(G2*c2*MSAB)^2-(G4*c1*c2*MSB*MSAB))
L=c1*MSB+c2*MSAB
c(L-HL, L+HU)
2. KNNL 25.27
(a) ML estimates of parameters :
Here is the model fit by R below,
Yijk = µ1· + αi + βj + (αβ)ij + εijk
where
αi is the fixed coats effect (the mean difference from the first level), α1 = 0, i = 2, 3
iid
βj is the random batch effect, βj ∼ N (0, σβ2 ), j = 1, 2, 3, 4
2
(αβ)ij is the interaction effect, (αβ)ij ∼ N (0, σαβ )
2
εijk ∼N (0, σ ), k = 1, 2, 3, 4
a = 3, b = 4,, n = 4
The estimates of the parameters of the fixed effects model fit in R are
µ̂1· = 73.4181, α̂1 = 0, α̂2 = 3.3757, α̂3 = 3.8196
The corresponding parameters of the unrestricted model above are
µ̂ = 73.4181 + 3.8196+3.3757
3 = 75.81652, α̂1 = 75.81652 − 73.4181 = 2.398428, α̂2 = 75.816526 −
73.418098P − 3.37565 = −0.977222, α̂3 = 75.816526 − 73.418098 − 3.819601 = −1.421173 (and
therefore αi = 0)
The estimates of variances are the same in both models σ̂β2 = 2.9937, σ̂αβ
2
= 0, σ̂ 2 = 3.1030.
2
σ̂αβ = 0. If an estimated variance component is zero, this provides evidence that the true value
of the parameter is on the boundary of the parameter space, and the likelihood ratio test should
not be used.
Linear mixed model fit by maximum likelihood
Formula: value ~ 1 + coatF + (1 | batchF) + (1 | coatF:batchF)
Data: X
AIC BIC logLik deviance REMLdev
204.6 215.6 -96.3 192.6 189.3
Random effects:
Groups Name Variance Std.Dev.
4
coatF:batchF (Intercept) 1.3187e-13 3.6314e-07
batchF (Intercept) 2.9937e+00 1.7302e+00
Residual 3.1030e+00 1.7615e+00
Number of obs: 46, groups: coatF:batchF, 12; batchF, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 73.4181 0.9778 75.09
coatF2 3.3757 0.6338 5.33
coatF3 3.8196 0.6450 5.92
R code
X<-X[-c(3,38),]
fit2.25<- lmer(value ~ 1 + coatF + (1|batchF) + (1|coatF:batchF), data=X, REML=FALSE)
summary(fit2.25)
(b) additive model: no chang in the estimates after dropping the interaction term.
Linear mixed model fit by maximum likelihood
Formula: value ~ 1 + coatF + (1 | batchF)
Data: X
AIC BIC logLik deviance REMLdev
202.6 211.7 -96.3 192.6 189.3
Random effects:
Groups Name Variance Std.Dev.
batchF (Intercept) 2.9938 1.7302
Residual 3.1030 1.7615
Number of obs: 46, groups: batchF, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 73.4181 0.9778 75.09
coatF2 3.3757 0.6338 5.33
coatF3 3.8196 0.6450 5.92
R code
fit2.25b<- lmer(value ~ 1 + coatF + (1|batchF), data=X, REML=FALSE)
summary(fit2.25b)
(c) Test factor B (batch) main effect by likelihood ratio test
- Hypothesis: H0 : σβ2 = 0 vs Ha : σβ2 > 0
- Decision rule: Reject if G2 > χ2(1−α,p−q) = χ2(0.95,1) = 3.841459
- Conclusion: G2 = 21.43516, therefore reject H0 and conclude that there is significant Factor B
(batch) effect with p ≈ 0.
R code
fit2.25c.Ha<- lmer(value ~ 1 + coatF + (1|batchF), data=X, REML=FALSE)
fit2.25c.H0<- lm(value ~ 1 + coatF, data=X)
5
## likelihood ratio test
testStat <- as.numeric(2*( logLik(fit2.25c.Ha) - logLik(fit2.25c.H0)))
testStat
pchisq(testStat, 1, lower=FALSE)
qchisq(1-0.05,1)
(d) Test factor A (The number of coats) main effect by likelihood ratio test
- Hypothesis: H0 : αi = 0 vs Ha : at least one αi is not 0, i = 1, 2, 3
- Decision rule: Reject if G2 > χ2(1−α,p−q) = χ2(0.95,2) = 5.991465
- Conclusion: G2 = 29.1224, therefore reject H0 and conclude that there is significant Factor A
(the number of coats) effect with p ≈ 0.
R code
fit2.25d.Ha<- lmer(value ~ 1 + coatF + (1|batchF), data=X, REML=FALSE)
fit2.25d.H0<- lmer(value ~ 1 + (1|batchF), data=X, REML=FALSE)
## likelihood ratio test
testStat <- as.numeric(2*( logLik(fit2.25d.Ha) - logLik(fit2.25d.H0)))
testStat
pchisq(testStat, 2, lower=FALSE)
qchisq(1-0.05,2)
(e) Since distributional form of ML estimates are unknown, we used bootstrap. I got 95% confidence
interval for σβ2 is (1.471843, 4.461730)
R code
getCoef.boot <- function(x) {
select.id <- sample(1:nrow(x), nrow(x), replace=TRUE)
x.boot <- x[sort(select.id) ,]
lmer.boot <-lmer(value ~ 1 + coatF + (1|batchF) + (1|coatF:batchF), data=x.boot, REML=FALSE)
VarCorr(lmer.boot)$batch[1,1]
}
n <- 100
sigma2b.boot <- rep(NA, n)
for (i in 1:n) {
cat("Bootstrap iteration:", i, "\n")
sigma2b.boot[i] <- getCoef.boot(X)
}
left <- VarCorr(fit2.25)$batch[1,1] - abs(quantile(sigma2b.boot, 1-0.05/2) - VarCorr(fit2.25)$batch[1,1]
right <- VarCorr(fit2.25)$batch[1,1] + abs(quantile(sigma2b.boot,0.05/2) - VarCorr(fit2.25)$batch[1,1] )
c(left, right)
3. [Methods Qualifying Exam, January 2010: use paper and pencil.]
A 3 x 4 factorial study was designed to investigate the effects of 3 fertilization methods (factor A)
and 4 seeding rates (factor B) on the yield of sugar beets. To this end they conducted an experiment
with the combinations of all levels of both factors. Each treatment combination was replicated in three
plots. The two tables below show the mean yield values ȳij· for each treatment combination, as well
as some summaries.
6
Level of A Level of B ȳi·· Source Mean Squares
1 2 3 4 Fertilizer (A) 12.15515
1 15.87 16.3 17.19 17.66 16.755 Seeding rate (B) 4.605967
2 17.41 17.63 18.12 18.51 17.9175 Interaction (AB) 0.1554333
3 14.81 15.73 16.32 16.8 15.915 Error 0.065175
(a) i. State the ANOVA model, and the corresponding assumptions, that can be used to answer
research questions in this experiments.
Answer:
The 2-way fixed effects ANOVA model is
Yijk = µ + αi + βj + (αβ)ij + εijk
where
µ is the overall mean P
αi is the deviation of fertilizer i from the overall mean, i αP
i =0
βi is the deviation of seedling rate j from the overall mean,
P j βj P
=0
(αβ)ij is the non-additive effect (i.e. the interaction) i (αβ)ij = j (αβ)ij = 0
iid
εijk ∼ N (0, σ 2 ) is the random error
ii. Test whether the change in yield due to the use of different fertilizers depends on the seedling
rate. State the null and the alternative hypotheses, and the conclusions. Use the significance
level of 5%.
Answer:
H0 : (αβ)ij = 0 for all i and j. Ha : at least one (αβ)ij 6= 0
M S(AB) 0.1554
F = = = 2.38435 < F (1 − 0.05, 6, 24) = 2.508189
M S(Error) 0.065175
We fail to reject H0 .
iii. Test whether their is a difference in yield between the fertilizers. State the null and the
alternative hypotheses, and the conclusions. Use the significance level of 5%.
Answer:
H0 : αi = 0 for all i. Ha : at least one αi 6= 0
M S(A) 12.15515
F = = = 186.5 < F (1 − 0.05, 2, 24) = 3.4
M S(Error) 0.065175
We reject H0 .
(b) Test whether the use of fertilizers 1 and 2 results in a same mean yield. State the null and the
alternative hypotheses, and the conclusions. Use the significance level of 5%.
Answer:
We test H0 : µ2· − µ1· = 0 against Ha : µ2· − µ1· 6= 0
The contrast estimated by
µ̂2· − µ̂1· = ȳ2·· − ȳ1·· = 17.9175 − 16.755 = 1.1625
7
and
M SE 0.065175
s2 {µ̂2· − µ̂1· } = 2 · = = 0.0108625
12 6
The test statistic
1.1625
√ = 11.15 > Student24 (1 − 0.05) = 1.71
0.0108625
Therefore we reject H0 .
(c) Suppose now that instead of focusing on 4 levels of seeding rate, the researchers were interested in
variations in yield that can result from varying the seeding rate over the entire operational range.
Therefore, the four seeding rates above are not the fixed levels of interest, but a random sample
of all possible levels.
i. Modify the model above to reflect the different experimental setting, and state the assump-
tions.
Answer: The 2-way mixed effects ANOVA model is
Yijk = µ + αi + βj + (αβ)ij + εijk
where
µ is the overall mean P
αi is the deviation of fertilizer i from the overall mean, i αi = 0
iid
βi is the deviation of seedling rate j from the overall mean, βi ∼ N (0, σβ2 )
iid 2
(αβ)ij is the joint non-additive effect (i.e. the interaction) (αβ)ij ∼ N (0, σαβ )
iid
εijk ∼ N (0, σ 2 ) is the random error
ii. Using the modified model, test whether the change in yield due to the use of different fer-
tilizers depends on the seedling rate. State the null and the alternative hypotheses, and the
conclusions. Use the significance level of 5%.
Answer:
2 2
H0 : σ(αβ) = 0. Ha : σ(αβ) 6= 0. Same test statistic and conclusion as in (a,ii).
iii. Using the modified model, test whether their is a difference in yield between the fertilizers.
State the null and the alternative hypotheses, and the conclusions. Use the significance level
of 5%.
Answer:
Same H0 and Ha as in (a,iii). Test statistic
M S(A) 12.15515
F = = = 78.21 > F (1 − 0.05, 2, 6) = 5.143253
M S(AB) 0.1554
We reject H0 .
8
iv. Test whether the use of fertilizers 1 and 2 results in a same mean yield, given this new model
specification. State the null and the alternative hypotheses, and the conclusions. Use the
significance level of 5%. Compare with the results of (b).
Answer:
-Hypothesis: H0 : L = µ1 − µ2 = 0, Ha : L 6= 0
L̂ = Ȳ1·· − Ȳ2·· = −1.1625
nσ 2 +σ 2
× c2i = M SAB × 2 = 0.155433
P
s(L̂) = αβ nb 12 6 = 0.0259
L̂
| | = | − 7.223| > t(0.975,6) = 2.447
s{L̂}
We reject H0
4. [Methods Qualifying Exam, January 2005: use paper and pencil.]
The school superintendent is concerned about the development of technology skills in middle school.
Since there are 3 middle schools in his district, all of which go about this instruction differently, he
decided to assess if there were any differences across schools. He first compiled a long list of “tech
skills” and randomly selected 5 to be used in his study. He then randomly selected 20 students from
each school and assigned each to one of the five tasks so that there were 4 students per task per school.
Each student then performed the skill and a score between 0 and 100 was assigned.
(a) If a two-way ANOVA is to be used for the analysis, should it be treated as a fixed effects, random
effects, or mixed effects model? Explain.
Answer:
School should be treated as fixed effect since all must be chosen. Skill effect should be treated as
random effect since skills were selected from a long list.
(b) Complete the following ANOVA table and determine which effects are significant at the α = .05
level. State your conclusions, making sure to estimate all variances and describing any additional
mean comparisons youd like to perform.
Answer:
In order to calculate the distribution of the F statistic, we consider the unrestricted model
Yijk = µ + αi + βj + (αβ)ij + εijk
where µ is the overall mean P
αi is the deviation of school i from the overall mean, i = 1, 2, 3, i αi = 0
iid
βi is the deviation of skill j from the overall mean, j = 1, · · · , 5, βi ∼ N (0, σβ2 )
2
(αβ)ij is the non-additive effect (i.e. the interaction), (αβ)ij ∼ N (0, σαβ )
iid
εijk ∼ N (0, σ 2 ) is the random error, k = 1, 2, 3, 4
a = 3, b = 5, n = 4
Source DF SS MS EMS F
σ 2 + bn
SSA 220
P 2 2 M SA 110
School(fixed) a-1 =2 220.0 dfA = 2 = 110 αi /(a − 1) + nσαβ M SAB = 22 = 5
Skill(random) b-1 =4 96.0 SSB
dfB = 96
4 = 24 σ 2 + anσβ 2 2
+ nσαβ M SB
M SAB = 24
22 = 1.09
School × Skill (a-1)(b-1)=8 176.0 SSAB 176
dfAB = 8 = 22 σ 2 + nσαβ
2 M SAB 22
M SE = 10 = 2.2
Error ab(n-1)=45 450.0 SSE 450
dfe = 45 = 10 σ2
9
i. school effect :
- Hypothesis : H0 : α1 = α2 = α3 = 0 - Decision rule : Reject if F ∗ = M SA/M SAB >
F(0.95,2,8) = 4.45897 - Conclusion : F ∗ = M SA/M SAB = 110/22 = 5 > 4.45897, Reject H0
and conclude that there is school effect.
ii. skill effect :
- Hypothesis : H0 : σβ = 0 - Decision rule : Reject if F ∗ = M SB/M SAB > F(0.95,4,8) =
3.837853 - Conclusion : F ∗ = M SB/M SE = 24/22 = 1.09 < 3.837853, Do not reject H0 and
conclude that there is no significant school effect.
iii. interaction effect :
- Hypothesis : H0 : σαβ = 0 - Decision rule : Reject if F ∗ = M SAB/M SE > F(0.95,8,45) =
2.152133 - Conclusion : F ∗ = M SB/M SE = 22/10 = 2.2 > 2.152133, Reject H0 and
conclude that there is significant school and skill interaction effect.
To estimate those parameters we have σ̂ 2 = M SE = 10, σ̂αβ 2
= M SAB−Mn
SE
= 22−10
4 = 3, and
2 M SB−M SAB 24−22
σ̂β = an = 12 = 0.166667.
(c) If the grand skill level of the middle schools (average over the three schools) is of interest, describe
how one would construct a 95% confidence interval.
Answer:
Let µ be the overall mean. Then we can construct the 95% confidence interval based on t45
distribution. The overall average is
3 5 4 3 3 5 3 5 4
1 XXX 1X 1 XX 1 XXX
µ̂ = yijk = µ + βj + (αβ)ij + εijk
60 i=1 j=1 5 j=1 15 i=1 j=1 60 i=1 j=1
k=1 k=1
indicating that
1 1 2 1
µ̂ ∼ N (µ, σβ2 + σαβ + σ2 )
5 15 60
s2 {µ̂} = 15 · 76 + 15
1 1
· 3 + 60 · 10 = 0.6 Then, we can estimated accordingly and so a 95% confidence
interval for µ is derived based on 4 degree of freedom.
5. [Methods Qualifying Exam, January 2009: use paper and pencil.]
The following dataset concerns an experiment where six pullets were placed into each of 12 pens. Four
blocks were formed from groups of three pens based on location. Three treatments were applied. The
number of eggs produced was recorded.
treat block eggs
1 O 1 330
2 O 2 288
3 O 3 295
4 O 4 313
5 E 1 372
6 E 2 340
7 E 3 343
8 E 4 341
9 F 1 359
10 F 2 337
11 F 3 373
12 F 4 302
10
An analysis on this dataset using lme in nlme package provides the output (Note: lme is an older
implementation of mixed effects models in R, and has a slightly different model syntax. If the syntax
presents challenges, you can refit the model with lmer using the data above.)
> remlme1 <- lme(eggs~treat,random=~1|block,data=eggprod,method=REML)
> summary(remlme1)
Linear mixed-effects model fit by REML
Data: eggprod
AIC BIC logLik
95.41439 96.40051 -42.70719
Random effects:
Formula: ~1 | block
(Intercept) Residual
StdDev: 11.39932 19.67020
Fixed effects: eggs ~ treat
Value Std.Error DF t-value p-value
(Intercept) 349.00 11.36729 6 30.702129 0.0000
treatF -6.25 13.90893 6 -0.449352 0.6690
treatO -42.50 13.90893 6 -3.055591 0.0224
(a) Describe the model used in the above analysis (in terms of Y = Xβ + Zγ + ε, where X includes
all covariates with fixed effects and Z includes all covariates with random effects), and report the
estimates of all parameters describing the model.
Answer:
Y = Xβ +Zγ + ε,where γ ∼ N (0, σ 2 D = I3 σblock
2
) and ε ∼N (0, σ 2 I)
YO1 1 0 1 1 0 0 0
YO2 1 0 1 0 1 0 0
YO3 1 0 1 0 0 1 0
YO4 1 0 1 0 0 0 1
YE1 1 0 0 1 0 0 0 γ1
YE2
1 0 0
βE 349.00
0 1 0 0
γ2
Y= YE3 , X = 1 0 0, β̂ = βF = −6.25 , Z = 0 0 1 0, γ̂ = γ3 ,
YE4
1 0 0
βO −42.50
0 0 0 1
γ4
YF 1 1 1 0 1 0 0 0
YF 2 1 1 0 0 1 0 0
YF 3 1 1 0 0 0 1 0
YF 4 1 1 0 0 0 0 1
σ̂ = 19.6703,σ̂block = 11.39932
βE : reference treatment, treatE effect
βF : the difference between treatF and treatE
βO : the difference between treatO and treatE
γ̂ : random block effects vector
(b) Give reason to use REML instead of ML to fit the model.
Answer:
REML produces less biased estimates for the variance components associated with the factors,
especially for factors with smaller number of levels and in balanced case it coincides with ANOVA
estimates.
11
(c) A new pen is added to the first block and treated with treatment F . Please predict the number
of eggs for this pen along with an estimate of the variability in this prediction. What if this new
pen is placed in a new location (i.e., a fifth block)?
> fixed.effects(remlme1)
(Intercept) treatF treatO
349.00 -6.25 -42.50
> random.effects(remlme1)
(Intercept)
1 10.497605 2 -5.562476 3 2.132979 4 -7.068108
Answer:
Ŷblock1 = 349 − 6.25 + 10.4976 = 353.2476, V ar(Ŷf irstblock ) = σ̂ 2 = 19.68022 = 387.3103
Ŷblock5 = 349 − 6.25 = 342.75 , V ar(Ŷf if thblock ) = σ̂ 2 + σ̂block
2
= 11.399322 + 19.68022 = 517.2548
(d) We also tried other methods and other models on this dataset, and got the log- likelihood function
values:
> remlme2 <- lme(eggs~1,random=~1|block,data=eggprod,method=REML)
> logLik(remlme2)
log Lik. -53.65615
> mlme1 <- lme(eggs~treat,random=~1|block,data=eggprod,method=ML)
> logLik(mlme1)
log Lik. -52.44424
> mlme2 <- lme(eggs~1,random=~1|block,data=eggprod,method=ML)
> logLik(mlme2)
log Lik. -56.65651
Please determine the significance of the treatment using a likelihood ratio test (assuming χ2
distribution). Should you improve the accuracy of this p-value? If yes, please state your approach.
Answer:
G2 = −2{logL(R) − logL(F )} = −2 × {−56.65651 + 52.44424} = 8.42454 > χ22 = 5.991465,
Therefore, there is a significant treatment effect.
(e) Please propose an approach to testing whether there is a significant difference between the blocks,
and state how you would like to calculate your test statistics and the p-value.
Answer:
- Hypothesis : H0 : σ 2 = 0, Ha : σ 2 > 0
- LR test is not appropriate because the standard derivation of the asymptotic χ2 distribution for
the likelihood ratio statistic depends on the null hypothesis lying in the interior of the parameter
space and this assumption is broken when we test if a variance is zero. The null distribution in
these circumstances is unknown in general and we must resort to numerical methods if we wish
for precise testing. If you do use the χ2 distribution with the usual degrees of freedom, then the
test will tend to be conservative, p-values will tend to be larger. An alternative is to use bootstrap
to estimate the sampling distribution and p-value.
12