STA1502/003/0/2021
Tutorial letter 003/0/2021
Statistical inference I
STA1502
Year module
Department of Statistics
STUDY UNIT 2: SUMMARY
university
Define tomorrow. of south africa
STUDY UNIT 2
2.1. Inference about the difference between two
population proportions
Comparing proportions from two independent populations is analogous to comparing means from
two independent populations.
The steps are:
Step 1
The hypotheses can be one of the following
1. H0 : P1 P2 D 0 against H1 : P1 P2 6D 0
2. H0 : P1 P2 D 0 against H1 : P1 P2 > 0
3. H0 : P1 P2 D 0 against H1 : P1 P2 < 0
where P1 and P2 are the population proportions.
Step 2
The sample proportions
X1
– PO1 D ; X 1 is the number of successes and n 1 is the size of sample 1.
n1
X2
– PO2 D ; X 2 is the number of successes and n 2 is the size of sample 2.
n2
Step 3
The test statistic Z for proportion is
PO1 PO2
ZDq
1 1
PN 1 PN C
n1 n2
PN is a pooled variance for proportion
X1 C X2
PN D
n1 C n2
2
STA1502/003/0/2021
Step 4
The confidence interval for proportion
v
u
u PO1 1 PO1 PO2 1 PO2
t
PO1 PO2 Z C
2 n1 n2
The standard error for proportion equivalent to
v
u
u PO1 1 PO1 PO2 1 PO2
t
C
n1 n2
or s
1 1
PN 1 PN C
n1 n2
Step 5
The decision rule
Reject the null hypothesis H0 if the test statistic Z is greater than the critical value, otherwise
we do not reject H0 :
Reject the null hypothesis H0 if the p–value is less than (level of significance).
Reject the null hypothesis H0 if zero lies between the two confidence limits.
2.2 One–way Analysis of Variance:
Analysis of variance (ANOVA) is used when we have more than two independent samples.
ANOVA approach allows to compare multiple population or groups by taking samples from
each population (or group) to examine the effects of differences among two or more groups.
The total of variation among the group of treatment is denoted SST.
The total of variation within the group that measures the random variation is denoted SSE.
The total variation .SS Total/ D SST C SS E
– SST: Sum squared for treatment
X
k 2
SST D n j XN j X
jD1
3
– SST: Sum squared error.
SS E D .n 1 1/ S12 C .n 2 1/ S22 C ::: C .n k 1/ sk2
XN 1 C XN 2 C :::X k
XD is grand mean
n
or
X nj
k X
xi j
iD1 iD1
XD
n
n D n 1 C n 2 C ::: C n j
The hypotheses equal to
H0 : 1 D 2 D ::: D k (if we have k treatment)
H1 : At least one population mean is different from the other population means
nj
k X
X 2
The SS Total D Xi j X :
jD1 iD1
Mean squares for treatment is
SST
M ST D
k 1
where k 1 is the degrees of freedom for treatment k is the number of treatment or group.
Mean squares for error is
SS E
MSE D
n k
Mean squares total is
SST otal
M S T otal D
n 1
The test statistic for one–way ANOVA is
M ST
FD
MSE
The critical value is F. ;k 1;n k/ :
4
STA1502/003/0/2021
ANOVA SUMMARY TOTAL
Source of variation Degrees of freedom Sum of squares Mean of squares F
M ST
Treatment k 1 SST M ST FD
MSE
Error n k SS E MSE
Total n 1 SSTotal
2.3 Multiple comparisons
This method enable us to identify which treatment means are responsible for the differences.
The ANOVA test enable us to determine whether differences exist between two or more
population means.
There are three methods
1. Fisher’s least significant difference method (LSD)
2. The Bonferroni method
3. Tukey’s method
2.3.1 The Least Significant difference LSD is
s
1 1
LSD D t MSE C
2 ;.n k/ ni nj
where M S E is an unbiased estimator of the common variance of the populations we are testing.
We will conclude that i and j differ if
XN j XN j > L S D
where XN i XN j is the pairwise absolute differences given always a positive difference.
5
2.3.2 Bonferroni test
k .k 1/
The number of hypothesis to be tested is determined by C D where k is the number
2
of group or treatment.
The test statistic is
XN i XN j
Ti j D s
1 1
MSE C
ni nj
where n i and n j are the sample sizes of groups (treatments).
XN i and XN j are the sample means of groups (or treatments)
MSE is mean squares error calculated from the ANOVA table.
The critical value is
t and t
I.n k/ I .n k/
2c 2c
k .k 1/
where c D
2
2.4 Randomized Block (two–way) Analysis
of variance
The randomized block design identifies two factors: Treatment and blocks that both of which affect
the response.
The null and alternative are
H0 : 1 D 2 D ::: D k
H1 : The population means are not all the same.
The ANOVA table for the randomized block design
6
STA1502/003/0/2021
Source of variation Degrees of freedom Sum of squares Mean of squares F
SST M ST
Treatment k 1 SST M ST D FD
k 1 MSE
SS B MSB
Block b 1 SS B MSB D FD
b 1 MSE
SS E
Error .k 1/ .b 1/ SS E MSE D
.k 1/ .b 1/
Total kb 1 SSTotal
k D number of treatment (or group)
b D number of block
X
k 2
SST D b XN X
jD1
X
k 2
SS B D b XN X
kD1
X
k X
k 2
SST otal D Xi j X
jD1 iD1