B.Sc. (Hons.
) Biotechnology
Core Course 13:
Basics of Bioinformatics and
Biostatistics (BIOT 3013 )
Unit 3:
Test of significance: Z-test,
t-test and Chi-square test
Dr. Satarudra Prakash Singh
Department of Biotechnology
Mahatma Gandhi Central University,
Motihari
Test of significance
• It is used in estimating population parameters
using sample data.
• For example, an administrator of a big
hospital is interested in the knowing the mean
age of patients admitted during the last year.
• The administrator draw a random sample of
size n from the patient population and
compute the average x , which he use as a
point estimate of µ.
Test of significance
• Because random sampling involves chance,
then it can’t be expected to be equal to µ.
• The value of may x be greater than or less
than µ.
• Statistical inference (test of significance) is the
method by which we can reach to a
conclusion about a population on the basis of
the sample information drawn from the same
population.
Hypotheses testing
about population parameters using sample statistics
• It is a statement about one or more
populations .
• For example, a hospital administrator may
want to test the hypothesis that the average
length of stay of patients admitted to the
hospital is 5 days.
Hypothesis testing
• There are two hypotheses involved in
hypothesis testing
1. Null hypothesis H0: It is the hypothesis to be
tested .
2. Alternative hypothesis HA : It is a statement
of what we believe is true if our sample data
provided reason to reject the null hypothesis.
Hypothesis Testing steps about the
mean of a population
1.Data collection: find out the determine
variable, sample size (n), sample mean( x ),
population standard deviation (σ) or sample
standard deviation (s) if they are unknown .
2. Assumptions : Now we have two cases:
• Case1: Population is normally distributed with
known or unknown variance (sample size n may
be small or large),
• Case2: Population is not normal with known or
unknown variance (n is large i.e. n≥30).
• 3.Hypotheses: we have to test three cases
• Case I: we want to test that the population
mean is different than 50.
H0: μ=μ0
HA: μ ≠ μ0
• Case II : we want to test that the population
mean is greater than 50.
H0: μ = μ0
HA: μ > μ0
• Case III : we want to test that the population mean is
less than 50.
H0: μ = μ0
HA: μ< μ0
4.Test Statistic:
• Case 1: Population is normal distributed.
σ2 is known σ2 is unknown
( n large or small)
n large n small
X - µo
Z = X - µo
σ Z =
X - µo t =
n s s
n n
• Case2: If population is not normal and n is large.
i)If σ2 is known ii) If σ2 is unknown
X - µo X - µo
Z = Z =
σ s
n n
Text Book : Basic Concepts and
9
Methodology for the Health Sciences
5.Decision Rule on the basis of level of significance (α)
i) If HA: μ≠ μ0
Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 (when use Z - test)
or Reject H 0 if T >t1-α/2,n-1 or T< - t1-α/2,n-1 (when use t-
test)
ii) If HA: μ> μ0
Reject H0 if Z>Z1-α (when use Z - test)
or Reject H0 if T>t1-α,n-1 (when use t - test)
iii) If HA: μ< μ0
Reject H0 if Z< - Z1-α (when use Z - test)
Or Reject H0 if T<- t1-α,n-1 (when use t - test)
6. Decision: If we reject H0, we can conclude that HA is
10
true.
7. An alternative decision rule can be applied
using the p- value.
i) If the p-value is less than or equal to α ,we
reject the null hypothesis (p ≤ α).
ii) If the p-value is greater than α ,we do not
reject the null hypothesis (p > α).
Example 1
• Suppose a researcher is interested in the mean
age of a certain population. A random sample
of 10 individuals drawn from the target
population that has a mean of 27 years.
Assuming the population is normally
distributed with variance of 20. Can we
conclude that the mean is different from 30
years? (α=0.05) . If the p-value is 0.0340, how
can we use it in making a decision?
Solution
1-Data: variable is age, n=10,
x=27,σ2=20, α=0.05
2-Hypotheses: H0 : μ=30; HA: μ ≠ 30
3-Test Statistic: Z = X - µ = 27 − 30
o
σ 20
Z calculated = -2.12 n
10
4.Decision Rule: The alternative hypothesis is true
HA: μ≠ 30; hence we reject H0.
if Z cal. >Z1-0.025/2= Z0.975
or Z cal. < - Z1-0.025/2= - Z0.975
• Z0.975=1.96(from standard table A1) 13
Can we conclude that μ<30
Decision Rule: Reject H0 if Z< Z α, where
Z α= -1.645 (from table A1 at α= 5% ).
Decision: Thus, we can conclude that the
population mean is smaller than 30.
Example 2
• Among 157 African-American men, the
sample mean systolic blood pressure was 146
mm Hg with a standard deviation of 27.
Assuming the population distribution is not
normal, can we conclude that the mean
systolic blood pressure for a population of
African-American is greater than 140 mm Hg
at α=0.01.
Solution
1. Data: Variable is systolic blood pressure,
n=157 , sample mean=146, s=27, α=0.01.
2. Hypotheses: H0 :μ=140; HA: μ>140
3.Test Statistic:
X -µ 146−140 6
• Z = s = 27 =
o
2.1548
= 2.78
n 157
Text Book : Basic Concepts and
16
Methodology for the Health Sciences
4. Desicion Rule:
we reject H0 if Z>Z1-α
= Z0.99= 2.33
(from table A1)
5. Decision: Hence, we can conclude that the
mean systolic blood pressure for a population
of African-American is greater than 140 mm
Hg.
Text Book : Basic Concepts and
17
Methodology for the Health Sciences
Student's t-test
• It is used to test the null hypothesis that there
is no difference between the means of the two
groups.
• There are three cases:
i) one-sample t-test : To test if a sample mean
(as an estimate of a population mean) differs
significantly from a given population mean.
The formula for one sample t-test is =(x- u)/SE
Where x = sample mean, u = population mean
and SE = standard error of mean
ii) The unpaired t-test
To test if the population means estimated by
two independent samples differ significantly.
The formula for unpaired t-test is: t=(X1-X2)/SE
where X1-X2 is the difference between the
means of the two groups and SE denotes the
standard error of the difference.
iii) The paired t-test
To test if the population means estimated by
two dependent samples differ significantly.
Usually, it is used when measurements are
made on the same subjects before and after a
treatment.
The formula for paired t-test is: d/SE
where d is the mean difference and SE
denotes the standard error of this difference.
Chi-square-test
It is used to analyze the categorical data.
It compares the frequencies and tests whether the
observed data differ significantly from the expected
data if there were no differences between groups (H0).
It is calculated by the sum of the squared difference
between observed (O) and the expected (E) data
divided by the expected (E) data.
The Decision Rule
• The quantity χ-square will be small if the
observed and expected frequencies are close
together and will be large if the differences
are large.
• The computed value of χ-square is compared
with the tabulated value with degrees of
freedom = (r-1)(c-1) where r is the number of
rows and c is the number of columns.
• Reject H0, if χ-square is greater than or equal
to the tabulated χ-square for the chosen value
of α.
References
• Biostatistics: Basic Concepts and
Methodology for the Health Sciences, 10ed,
ISV. Wayne W. Daniel, Chad L. Cross. ISBN:
9788126551897. 954 pages.
• Ali Z, Bhaskar SB. Basic statistical tools in
research and data analysis. Indian J Anaesth.
2016 Sep;60(9):662-669. doi: 10.4103/0019-
5049.190623. Erratum in: Indian J Anaesth.
2016 Oct;60(10 ):790. PMID: 27729694;
PMCID: PMC5037948.
Thank you.
Email: [email protected]