20-Introduction To Analysis of Variance
20-Introduction To Analysis of Variance
of variance
[email protected]
+265993375505
Multi-sample z- and t-tests
• We can use the z- or t-tests to compare more
than two samples.
• If the number of samples is n, the number of
possible pairwise comparisons that are
possible is given by nC2.
Number of Possible number of pairwise
samples comparisons (given by nC2)
2 1
3 3
4 6
5 10
10 45
A review of the Type I Error
• Remember that when you reject a
hypothesis at (1- α), you are actually
committing Type I Error = α
• That is to say, you might be rejecting the null
hypothesis wrongly with a probability of 𝜶.
• When you are dealing with a multi-sample
situation and you carry out n z- or t- tests,
the probability of accepting the null
hypothesis when it is correct, 1 − 𝛼 𝑛
becomes smaller and we stand a high chance
of making the wrong decision, overall.
A review of the Type I Error
• The Type I Error for the combined set of these
comparisons is given by 1 − 1 − 𝛼 𝑛 and is
called the experiment-wise error.
• This is equal to 0.05 = α for a two-sample z or t
test, 0.142 > α for a three-sample test and
0.265 > α for a four-sample test
• To maintain Type I Error at 0.05 for the whole
test—which is what we want—we use the F
statistic to confidently compare more than two
samples.
• The F-statistic test is popularly called analysis of
variance.
The F statistic
• It is defined as the ratio of
• the mean sum of squares due to the variability
between groups
• to the mean sum of squares due to the
variability within groups.
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
𝐹=
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
• The critical value of F is read off from tables on
the F-distribution knowing the Type-I error and
the degrees of freedom between & within the
groups.
F statistic assumptions
•The populations have normal
distributions.
•The populations have the same
variance or standard deviations
•The samples are simple random
samples.
•The samples are independent of
each other.
Definition of the F statistic
• Consider the weaning weights of some kids as given
in the table below.
Kid 1 2 3 4 5 6
Weight 8 10 10 8 8 10
54
• The mean is = 9.
6
• The variance
2
of2 the data
2
is 2 2 2
−1 + 1 + 1 + −1 + −1 + 1 6
5 = = 1.2
5
Definition of the F statistic
• We may discover that the kids are of different sex as
in the table below
Kid 1 2 3 4 5 6
Sex F M M F F M
Weight 8 10 10 8 8 10
Kid 1 2 3 4 5 6
Sex F M M F F M
Weight 8.4 9.8 9.9 7.7 7.9 10.3
Definition of the F statistic
• There is some variation of each of the observation
from the mean of the group
yi j = + si + ei j
• Where
• yij is the weight of individual j belonging to sex i
• μ is the (overall) population mean
• si is the mean deviation of sex i (i = male,
female) from the population mean
• eij is the deviation of the weight j from the
overall mean not attributable to sex i (also
called random error)
ANOVA Procedure
Step 1: Formulation of the hypotheses
• Null hypothesis
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = ⋯ 𝜇𝑖
where 𝑖 is the number of populations to
be compared
• Alternate hypothesis
P-value
Step 5. Reject or fail to reject H0
based on the P-value
a n
TSS = xij − C
2 The total sum of squares that includes all sources
of variation. This is the total SS.
i =1 j =1
2
a
xi The sum of squares attributable to the variable of
SST = i =1 − C
classification. This is the between SS, or among
groups SS or treatment SS.
ni