Analysis of Variance
Analysis of Variance
“Analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic ”.
– Ronald Fisher
Introduction to ANOVA
• Analysis of Variance (ANOVA) is a hypothesis testing
techniques that is used for comparing mean values of more
than two groups simultaneously.
where Yij is the value of the outcome variable of jth observation for ith
factor level, is the overall mean value of all observations, ij is the
error assumed to be a normal distribution with mean 0 and standard
deviation .
Hypothesis
Let,
P(A) = P(Retain H0 in test A|H0 in test A is true)
P(B) = P(Retain H0 in test B|H0 in test B is true)
P(C) = P(Retain H0 in test C|H0 in test C is true)
Note that values of P(A) = P(B) = P(C) = 1 – = 1 – 0.05 = 0.95
The conditional probability of simultaneously retaining all 3 null hypotheses
when they are true is P(A B C) = 0.8573.
Multiple t-Tests for Comparing Several Means
For the case discussed above, if we retain the null hypothesis based on 3
individual tests, then the Type I error is 1 – 0.8573 = 0.1426. That is, when
more than 2 groups are involved, checking the population parameter values
simultaneously using t-tests is inappropriate since the Type I and Type II errors
will be estimated incorrectly. For this reason, we use analysis of variance
(ANOVA) whenever we need to compare 3 or more groups for population
parameter values simultaneously.
One-Way Analysis of Variance (ANOVA)
One-way ANOVA is appropriate under the following conditions:
• H0: 1 = 2 = 3=…= k
• HA: Not all values are equal
Note that the alternative hypothesis, ‘not all values are equal’,
implies that some of them could be equal. The null hypothesis
is equivalent to stating that the factor effects 1, 2, …, k defined
in Eq Yij i ij are zero
Comparing three means (1, 2, and 3).
If the mean values of different groups are not equal, then the
variation of cases within the group will be much smaller
compared to variations between groups.
One-Way Analysis of Variance (ANOVA)
We are interested in analyzing single factor effect with k levels,
thus we will have k groups.
Let
k = Number of groups (or samples)
ni = Number of observations in group i (i = 1, 2, …, k)
k
n = Total number of observations (= ni
i 1
)
Yij = Observation j in group i
1 ni
i Mean of groupi Yij
n i j1
1 k ni
Overall mean Yij
n i 1 j1
Total Variation
To arrive at the statistic, we calculate the following measures,
which are variations within group and between groups:
• Sum of Squares of Total Variation (SST): Total variation is
the sum of squared variation of all values of response variable
(Yij) from the overall mean () and is given by
k ni
SST (Yij ) 2
i 1 j 1
The degrees of freedom for SST is (n 1) since only the value of
is estimated from n observations and thus only one degree of
freedom is lost. Mean Square Total (MST) variation is given by
SST
MST
n 1
Variation between groups
• Sum of Squares of Between (SSB) Group Variation: Sum of
squares of between variation is the sum of squared variation
between the group mean (i) and the overall mean () of the
data and is given by
k
SSB ni ( i ) 2
i 1
k ni k k ni
ij
(Y
i 1 j 1
) 2
i i
n (
i 1
) 2
ij i
(Y
i 1 j 1
) 2
That is
SST = SSB + SSW
Cochran’s Theorem
According to Cochran’s theorem (Kutner et al., 2013, page 70):
‘If Y1, Y2, …, Yn are drawn from a normal distribution with
mean and standard deviation and sum of squares of total
variation [Eq. (7.4)] is decomposed into k sum of squares (SSr)
with degrees of freedom dfr, then the ratio (SSr/2) are
independent 2 variables with dfr degrees of freedom if
k
df r n 1
r 1
SSB /( k 1) MSB
F
SSW /( n k ) MSW
Note that the test statistic is a one-tailed test (right tailed) since
we are interested in finding whether the variation between groups
is greater than variation within the groups
ANOVA Example
Ms Rachael Khanna the brand manager of ENZO detergent
powder at the ‘one stop’ retail was interested in understanding
whether the price discounts has any impact on the sales
quantity of ENZO. To test whether the price discounts had any
impact, price discounts of 0% (no discount), 10% and 20%
were given on randomly selected days. The quantity (in
kilograms) of ENZO sold in a day under different discount
levels is shown in Table(next slide). Conduct a one-way
ANOVA to check whether discount had any significant impact
on the sales quantity at = 0.05.
Sales of ENZO at different price discounts
No Discount (0% discount)
39 32 25 25 37 28 26 26 40 29
37 34 28 36 38 38 34 31 39 36
34 25 33 26 33 26 26 27 32 40
10% Discount
34 41 45 39 38 33 35 41 47 34
47 44 46 38 42 33 37 45 38 44
38 35 34 34 37 39 34 34 36 41
20% Discount
42 43 44 46 41 52 43 42 50 41
41 47 55 55 47 48 41 42 45 48
40 50 52 43 47 55 49 46 55 42
Solution
In this case, the number of groups k = 3; n1 = n2 = n3 = 30; 1 =
32, 2 = 38.77, 3 = 46.4; and = 39.05.
The sum of squares of between groups variation (SSB) is given by
k
SSB ni ( i ) 2 30 [(32 39.05) 2 (38.77 39.05) 2 (46.4 39.05) 2 ] 3114 .156
i 1
So
SSB 3114 .156
MSB 1557.078
k 1 2
Solution Continued…
The sum of squares of within the group variation is given by
k ni 30 30 30
SSW (Yij i ) (Y1 j 32) (Y2 j 38.77) (Y3 j 46.4) 2 2056.567
2 2 2
i 1 j 1 j 1 j 1 j 1
SSW 2056.567
MSW 23.63
nk 90 3
ANOVA
Source of Variation SS df MS F P-value F crit
1557.07 65.8698 3.10129
Between Groups 3114.15556 2 8 6 3.82E-18 6
Within Groups 2056.56667 87 23.6387
Total 5170.72222 89
Example
Share Raja Khan (SRK) is a top stockbroker and believes that
the average annual stock return depends on the industrial
sector. To validate his belief, SRK collected annual return of
shares from three different industrial sectors consumer
goods, services, and industrial goods. The annual return of
shares in 20152016 for different sectors is shown in Table in
next slide.
Annual return of stocks under different industrial sector
Annual return on 30 consumer goods stocks
6.32% 14.73% 11.95% 12.36% 10.28% 3.81% 10.15% 11.06% 6.29% 5.15%
8.44% 14.28% 8.89% 5.98% 6.96% 11.62% 5.22% 5.34% 5.93% 7.10%
10.91% 8.20% 10.19% 9.04% 8.61% 9.39% 2.63% 2.77% 4.76% 9.60%
11.48% 9.71% 11.19% 8.21% 1.64% 1.45% 10.12% 13.85% -10.27% 5.26%
12.05% 4.47% 8.71% 5.59% 10.02% 7.65% 10.03% 7.87% 6.59% 13.60%
6.74% 7.11% 5.69% 2.48% 5.42% 8.00% 2.55% 8.34% 4.99% 3.39%
8.73% 13.85% 5.29% 9.06% 2.84% 5.82% 7.66% 4.12% 9.10% 8.76%
10.77% 1.48% 4.71% 10.66% 0.44% 2.94% 6.55% 2.84% 3.90% 7.28%
Solution
In this case, the number of cases k = 3; n1 = n2 = n3 = 30; 1 =
0.082, 2 = 0.079, 3 = 0.0605; and = 0.0743
The sum of squares of between groups (SSB) variation is given
by
k
SSB ni (i )2 30 [(0.082 0.0743) 2 (0.079 0.0743) 2 (0.0605 0.0743) 2 ] 0.0087
i 1
Therefore
SSB 0.0087
MSB 0.0043
k 1 2
Solution Continued…
The sum of squares of within the group variation is given by
k ni 30 30 30
SSW (Yij i ) (Y1 j 0.082) (Y2 j 0.079) (Y3 j 0.0605)2 0.1463
2 2 2
i 1 j 1 j 1 j 1 j 1
So
SSW 0.1463
MSW 0.0016
nk 90 3
MSB 0.0043
F2,87 2.592
MSW 0.0016
The critical F-value with degrees of freedom (2, 87) for = 0.05 is 3.101
[Excel function FINV(0.05, 2, 87) or F.INV.RT(0.05, 2, 87)].
The p-value for F2,87 = 2.592 is 0.0805 [using Excel function FDIST(2.592, 2,
87) or F.DIST.RT(2.592,2,87)].
Since the calculated F-statistic is less than the critical F-value, we retain the
null hypothesis and conclude that the average annual returns under industrial
sectors consumer goods, services, and industrial goods are not different.
F-distribution with critical value
ANOVA Table
Microsoft Excel ANOVA Table for Example
Total 0.155039 89
Two-Way Analysis of Variance (ANOVA)
Yijk i j i j ijk
Where,
b
SSB a c ( j ) 2
j 1
where ij is the average of ith level of factor A and jth level of
factor B.
SSA
a1 MSA = SSA/(a 1) F = MSA/MSW
SSB
b1 MSB = SSB/(b 1) F = MSB/MSW
SSAB
(a 1)(b 1) MSAB = SSAB/(a 1)(b 1) F = MSAB/MSW
SSW
ab(c 1) MSW = SSW/ab(c 1)
Example
ANOVA
Source of Variation
SS df MS F P-value F crit
Sample