BMS2901 Introductory
Biostatistics and Data Analysis
Hypothesis Testing
Week4
20 Feb 2025
Dr. Hsiang-yu Yuan
Outlines
• In this chapter, we will discuss
• What are null hypothesis and alternative hypothesis, and what are type I error and
type II error.
• How to perform hypothesis testing using one-sample inference, including:
• calculation of t statistics and critical value for one-sided alternative
• calculation of p-value for one-sided alternative
• calculation of t statistics and critical value for two-sided alternative
• calculation of p-value for two-sided alternative
• How to perform hypothesis testing using two-sample inference
• How to calculate the power of a test and determine the appropriate sample size
Introduction
Hypothesis-testing framework specifies two hypotheses:
Null and alternative hypothesis
Hypothesis-testing provides an objective framework for making decisions
using probabilities methods, rather than relying on subjective
impressions.
It provides a uniform decision-making criterion that is consistent.
In a one-sample problem, hypotheses are specified about a single
distribution.
In a two-sample problem, two different distributions are compared.
Example 1: Birth weight in a low
socioeconomic status area
• What is a p-value? https://www.youtube.com/watch?v=9jW9G8MO4PQ
Example 1: Birth weight in a low
socioeconomic status area
• Average birth weight in the general population
is 120 oz. You take a sample of 100 babies born
in the hospital you work at (that is located in a
low-socioeconomic status (low-SES) area), and
find that the sample mean birth weight =115.4
oz with a sample standard deviation s=24.8 oz.
• You wonder: is the mean birth weight of low-
SES babies indeed lower than that in the
general population? Or is this observed
difference merely due to chance?
• We need an objective method to determine
which hypothesis is right
A Hypothesis test is a statistical test that is used to
determine whether there is enough evidence in a
sample of data to infer that a certain condition is true
for the entire population.
What is a Null
Hypothesis?
▪ E.g. parameter interest: the mean birth weight of low-SES babies, denoted by µ
▪ We propose a value for the mean birth weight in the general population, denoted
as µ0.
▪ The null hypothesis is,
▪ H0: µ = µ0 E.g. H0: µ = 120
▪ Begin with the assumption that the null hypothesis is true
▪ E.g. H0 : the mean birth weight of low-SES babies is equal to that in the
general population
▪ Similar to the notion of innocent until proven guilty
▪ Purpose: to determine if the data leads us to reject the null hypothesis
Alternative Hypothesis
1. Is set up to represent research goal
2. Opposite of null hypothesis
E.g. Ha : the mean birth weight of SES babies is lower than that in the general
population
3. Ha: µ < 120.
4. Always has inequality sign: ≠, <, or >
▪ ≠ will lead to two-sided tests
▪ < , > will lead to one-sided tests
The null hypothesis, denoted by Ho, is the hypothesis that is to be tested.
The alternative hypothesis, denoted by H1 is the hypothesis that in some
sense contradicts the null hypothesis.
H0: = 0 vs. H1: < 0
Suppose the only possible decisions are whether H0 is true or H1 is true.
For ease of notation, all outcomes in a hypothesis testing situation
generally refer to the null hypothesis.
If we decide H0 is true, then we say we accept H0. If we decide H1 is true,
then we state that H0 is not true or, equivalently, that we reject H0. Thus,
four possible outcomes can occur:
Four possible outcomes
a. Accept H0, and H0 is in fact true.
b. Accept H0, and H1 is in fact true.
c. Reject H0, and H0 is in fact true.
d. Reject H0, and H1 is in fact true.
False True
Negative
Positive
False True
Negative
Positive
False positive
False True
False positive False negative
False True
Type II error
Type I error
False positive False negative
➢The probability of a type I error is the probability of rejecting the null
hypothesis when H0 is true.
It is denoted by (also called false positive) and is commonly referred to
as the significance level of a test.
➢The probability of a type II error is the probability of accepting the null
hypothesis when H1 is true.
This probability is a function of as well as other factors. It is usually
denoted by (also called false negative) .
The power of a test is defined as
1 - = 1 – probability of a type II error = Pr(rejecting H0|H1 true) ->
probability that the test correctly rejects the null hypothesis H0 when an
alternative hypothesis H1 is true
The aim in hypothesis testing is to use statistical tests that make and
as small as possible. This goal requires compromise because making
small involves rejecting the null hypothesis less often, whereas making
small involves accepting the null hypothesis less often.
Determination of Statistical Significance for Results
from Hypothesis Tests
Either of the following methods can be used to establish whether results
from hypothesis tests are statistically significant:
(1) The test statistic t can be computed and compared with the critical
value tn-1, at an level of 0.05. If H0: = 0 vs. H1: < 0 is being
tested and t < tn-1,0.05, then H0 is rejected and the results are declared
statistically significant (p < 0.05). Otherwise, H0 is accepted and the
results are declared not statistically significant (p 0.05) This
approach is called the critical-value method.
(2) The exact p-value can be computed and, if p < 0.05, then H0 is
rejected and the results are declared statistically significant.
Otherwise, if p 0.05, then H0 is accepted and the results are
declared not statistically significant. This approach is called the p-
value method (also called the the hypothesis testing approach).
One-sample t Test for the Mean of a Normal Distribution with
Unknown Variance (Alternative Mean < Null Mean)
To test the hypothesis H 0: = 0, unknown vs. H1: < 0,
unknown with a significance level of , we compute
(𝑥ҧ − µ)Τ(𝑠Τ 𝑛)
If t < tn-1,, then we reject H0.
If t tn-1,, then we accept H0.
The value of t is called a test statistic because the test procedure is
based on this statistic.
The value tn-1, is called a critical value because the outcome of the
test depends on whether the test statistic t < tn-1, = critical value,
whereby we reject H0 or t tn-1,, whereby we accept H0.
Compute:
(𝑥ҧ − 0)Τ(𝑠Τ 𝑛) = (115.4 − 120)Τ(24.8Τ 100)
=-1.85
The value of t is called a test statistic because the test procedure is based
on this statistic.
• If t < tn-1,, then we reject H0.
• If t tn-1,, then we accept H0.
The general approach in which we compute a test statistic and determine the
outcome of a test by comparing the test statistic with a critical value determined by
the type I error is called the critical-value method of hypothesis testing.
The p-value for any hypothesis test, also denoted as the level is the
probability of rejecting the null hypothesis when it is true. That is, the p-
value is at the level at which the given value of the test statistic (such as
t) is on the borderline between the acceptance and rejection regions.
Solving for p as a function of
t we get p = Pr(tn-1 ≤ t)
The p-value can also be thought of as the probability of obtaining a
test statistic as extreme as or more extreme than the actual test
statistic obtained, given that the null hypothesis is true.
The p-value indicates exactly how significant the results are
without performing repeated significance tests at different levels.
Guidance for Judging the Significance of a p-Value
➢If 0.01 ≤ p < 0.05, then the results are significant.
➢If 0.001 ≤ p < 0.01, then the results are highly significant.
➢If p < 0.001, then the results are very highly significant.
➢If p > 0.05, then the results are considered not statistically
significant (denoted as NS).
➢If 0.05 ≤ p < 0.1, then a trend toward statistical significance is
sometimes noted.
One-Sample t Test for the Mean of a Normal Distribution
with Unknown Variance (Alternative Mean > Null Mean)
To test the hypothesis H 0: = 0 vs. H1: > 0
with a significance level of , the best test is based on t, where
t = (𝑥ҧ − µ0)Τ(𝑠Τ 𝑛)
➢If t > tn-1,1- then H0 is rejected
➢If t ≤ tn-1,1- then H0 is accepted
The p-value for this test is given by p = Pr(tn-1 > t)
p-Value for the One-Sample t Test for the Mean of a
Normal Distribution (Two-Sided Alternative)
Let t = (x - 0)/(s/√n)
The p-value is the probability under the null hypothesis of obtaining a test
statistic as extreme as or more extreme than the observed test statistic,
where, because a two-sided alternative hypothesis is being used,
extremeness is measured by the absolute value of the test statistic.
One-Sample Test for the Mean of a Normal
Distribution: Two-Sided Alternatives
A two-tailed test is a test in which the values of the parameter being
studied (in this case ) under the alternative hypothesis are allowed to be
either greater than or less than the values of the parameter under the null
hypothesis (0).
A reasonable decision rule to test for alternatives on either side of the null
mean is to reject H0 if it is either too small or too large. Another way of
stating the rule is that H0 will be rejected if t is either < c1 or > c2 for some
constants c1, c2, and H0 will be accepted if c1 ≤ t ≤ c2.
Pr(reject H0|H0 true) = Pr(t < c1 or t > c2|H0 true)
= Pr(t < c1|H0 true) + Pr(t > c2|H0 true) =
Pr(t < c1|H0 true) = Pr(t > c2|H0 true) = /2
One-sample t Test for the Mean of a Normal Distribution
with Unknown Variance (Two-Sided Alternative)
To test the hypothesis H 0: = 0 vs. H1: 0 with a significance
level of , the best test is based on t = (x - 0)/ (s/√n).
If |t| > tn-1,1- /2 then H0 is rejected.
If |t| > tn-1,1- /2 then H0 is accepted.
What is a T test? https://www.youtube.com/watch?v=nU1AbbUsY4I
Example 2: Cardiovascular Disease
• Suppose we want to compare fasting serum-cholesterol levels among
recent Asian immigrants to the United States with typical levels found
in the general U.S. population. Assume cholesterol levels in women
aged 21-40 in the United States are approximately normally
distributed with mean µ0 =190 mg/dL. It is unknown whether
cholesterol levels among recent Asian Immigrants are higher or lower
than those in the General U.S. Population.
• Parameter of interest: µ= cholesterol levels among recent Asian
immigrants
• H0: µ = 190 vs H1: µ ≠ 190
How to generate hypothesis
• Formally, a statistical hypothesis testing problem includes two
hypotheses:
1. Null hypothesis (H0)
2. Alternative hypothesis (H1)
• Hypothesis must be stated before analysis
• A Belief about a population parameter
• Parameter is population mean, proportion, variance NOT samples. We will
never have a hypothesis statement with either or in it.
• Blood tests are performed on 100 female Asian immigrants ages 21-
40, and the mean level is 181.25 mg/dL with standard deviation =
40mg/dL.
• How to test the hypothesis that the mean cholesterol level of recent
female Asian immigrants is different from the mean in the general
U.S. population.
• Blood tests are performed on 100 female Asian immigrants ages 21-
40, and the mean level is 181.25 mg/dL with standard deviation =
40mg/dL.
• How to test the hypothesis that the mean cholesterol level of recent
female Asian immigrants is different from the mean in the general
U.S. population.
• sample mean 𝑥ҧ = 181.25
• Population mean µ0 = 190
• Standard deviation of samples (s) = 40
• Population size n = 100
• sample mean 𝑥ҧ = 181.25
• Population mean µ0 = 190
• Standard deviation of samples (s) = 40
• Population size n = 100
• Recall the standard normal distribution, , however we use t
distribution (𝑋ത − µ)Τ(𝑠Τ 𝑛) to compute the test statistic because
we need to take into account the degrees of freedom n-1.
When is a one-sided test more appropriate than a two-
sided test?
Generally, the sample mean falls in the expected direction from 0 and it is
easier to reject H0 using a one-sided test than using a two-sided test.
A two-sided test can be more conservative because it is not necessary to guess
the appropriate side of the null hypothesis for the alternative hypothesis.
In some cases, only alternatives on one side of the null mean are of interest or
are possible, and a one-sided test is better than a two-sided test because it has
more power (since it is easier to reject H0 based on a finite sample if H1 is
actually true).
The decision whether to use a one-sided or two-sided test must be made before
the data analysis (or before data collection) begins so as not to bias conclusions
based on results of hypothesis testing.
Do not change from a two-sided to a one-sided test after looking at the data.
The Power of a Test
It tells us how likely it is that a statistically significant difference will
be detected based on a finite sample size n, if the alternative
hypothesis is true, that is, if the true mean differs from the mean
under the null hypothesis (0).
(𝑥ҧ − µ0)Τ(Τ 𝑛) > z1- multiplying both sides with /√n, and adding 0,
we get 𝑥ҧ > 0 + z1- Τ 𝑛 or 𝑥≤ ҧ 0 + z1- Τ 𝑛
Simply speaking, the power of a statistical test is the probability of
correctly rejecting the null hypothesis (H0) when the alternative
hypothesis (H1) is true.
Power for the One-Sample z Test for the Mean of a Normal
Distribution with Known Variance (One-Sided Alternative)
The power of the test for the hypothesis H0: = 0 vs. H1: = 1
where the underlying distribution is normal and the population variance (2) is
assumed known is given by
(z +|0 - 1|(√n/)) or (-z1- + |0 - 1|(√n/))
Power for the One-Sample z Test for the Mean of a Normal
Distribution with Known Variance (One-Sided Alternative)
Factors affecting the power:
➢If the significance level is made smaller ( decreases), z increases and hence
the power decreases.
➢If the alternative mean is shifted farther away from the null mean (|0 - 1|
increases), then the power increases.
➢If the standard deviation of the distribution of individual observations
increases ( increases), then the power decreases.
➢If the sample size increases (n increases), then the power increases.
Sample-Size Determination: One-Sided Alternatives
Suppose we wish to test H0: = 0 vs. H1: = 1 where the data are
normally distributed with mean and known variance 2. The
sample size needed to conduct a one-sided test with significance
level and probability of detecting a significant difference = 1-
Factors Affecting the Sample Size
➢The sample size increases as 2 increases.
➢The sample size increases as the significance level is made
smaller ( decreases).
➢The sample size increases as the required power increases ( 1-
increases).
➢The sample size decreases as the absolute value of the distance
between the null and alternative means (|0 - 1|) increases.
Sample-Size Estimation When Testing for the Mean of a
Normal Distribution (Two-Sided Case)
Suppose we wish to test H0: = 0 vs. H1: = 1 where the data are
normally distributed with mean and known variance 2. The
sample size needed to conduct a one-sided test with significance
level and power 1- is
Sample-Size Estimation Based on CI Width
Suppose we wish to estimate the mean of a normal distribution with
sample variance s2 and require that the two-sided 100% × (1 - ) CI
for be no wider than L. The number of subjects needed is
approximately
One-sample Inference for the Binomial Distribution:
Normal-Theory Methods
Let the test statistic
If z < z/2 or z > z1-/2, then H0 is rejected.
If z/2 ≤ z ≤ z1-/2, then H0 is accepted. This test should only be used if
np0q0 5.
Computation of the p-value for the one-Sample Binomial
Test: Normal Theory Method (Two-Sided Alternative)
Let the test statistic
If then p-value = 2 × (z) = twice the area to the left of z under
an N(0,1) curve. If then p-value = 2 × [1- (z)] = twice the area
to the right of z under an N(0,1) curve.
Computation of the p-Value for the One-Sample
Binomial Test: Exact Method (Two-Sided Alternative)
If np0q0 < 5
Power for the One-Sample Binomial Test (Two-Sided
Alternative)
H0: p = p0 vs. H1: p p0 for the specific alternative p = p1 is given by
Sample-Size Estimation for the One-Sample Binomial Test (Two-Sided
Alternative)
Suppose we wish to test H0: p = p0 vs. H1: p p0. The sample size
needed to conduct a two-sided test with significance level and power
1- vs. the specific alternative hypothesis p = p1 is
𝜇
𝜎
Two-Sample t Test for Independent Samples with
Equal Variances (used for cross-sectional study)
If the difference between the two sample means
x1 – x2 is far from 0, then H0 will be rejected; otherwise, it will be
accepted.
Because the two samples are independent, X1 – X2 is normally
distributed with mean 1 - 2 and variance 2(1/n1 + 1/n2). In
symbols,
Under H0, 1 = 2
https://www.r-bloggers.com/2009/07/two-
sample-students-t-test-1/
If 2 were known, then X1 – X2 could be divided by √(1/n1 + 1/n2), to get
However, 2 in general is unknown and must be estimated from the data
using sample variances s12 and s22. The pooled estimate of the variance from
two independent samples is given by
Suppose we wish to test the hypothesis H0: 1 = 2 vs. H1: 1 2 with a
significance level of for two normally distributed populations, where 2 is
assumed to be the same for each population. Compute the test statistic
If t > tn1 + n2 -2,1-/2 or t < -tn1+ n2 -2,1-/2 then H0 is rejected.
If –tn1+n2-2,1-/2 ≤ t ≤ tn1+n2 -2,1-/2 then H0 is accepted.
Summary
In this chapter, we introduced
1. Specification of the null (H0) and alternative (H1) hypotheses;
2. type I error (), type II error (), and the power (1-) of a hypothesis test;
the p-value of a hypothesis test and the distinction between on-sided and
two-sided tests;
3. methods for estimating appropriate sample size as determined by the
prespecified null and alternative hypotheses and type I and type II errors.
4. These concepts were applied to many one-sample or two-sample
hypothesis testing cases. Each of the hypothesis tests was shown to be
conducted in one of two ways
1. Specifying critical values to determine the acceptance and rejection regions (critical-
value method) based on a specified type I error .
2. Computing p-values (p-value method)
The End
Appendix
Paired T test
The Paired t Test (used for longitudinal study)
Assume that the systolic blood pressure (SBP) of the ith woman is
normally distributed at baseline with mean i and variance 2 and at
follow-up with mean i+ and variance 2.
: mean difference in SBP between follow-up and baseline.
If = 0, difference is 0; if > 0, then Oral Contraceptive (OC) pills
associated with increased mean SBP; if < 0, then OC pills associated
with lowered mean SBP.
The Paired t Test (cont…)
Hypothesis settings:
• H0: = 0 vs. H1: 0; 1 is unknown; difference di = xi2 – xi1
• Although BP levels are different for each woman, the difference in BP
between baseline and follow-up have the same mean and variance
over the entire population of women. Thus, it can be considered a
one-sample t test based on the differences (di).
The test statistic is defined as
𝑑ҧ
𝑡 = 𝑠 ~𝑡𝑛−1
𝑑
𝑛
𝑑 + 𝑑 + …+𝑑𝑛
• 𝑑ҧ is the mean of observed differences, 𝑑ҧ = 1 2𝑛
• 𝑠𝑑 is the sample standard deviation of the observed differences:
• n = number of matched pairs
If t > tn-1,1-/2 or t < -tn-1,1-/2 then H0 is rejected.
If -tn-1,1-/2 ≤ t ≤ tn-1,1-/2 then H0 is accepted.
Computation of the p-Value for the Paired t Test
If t < 0, p = 2 × [the area to the left of t = d/(sd/√n) under a tn-1
distribution]
If t 0, p = 2 × [the area to the right of t under a tn-1 distribution]
Interval Estimation for the Comparison of Means
from Two Paired Samples
Confidence Interval for the True Difference () Between the
Underlying Means of Two Paired Samples (Two-Sided)
A two-sided 100% × (1-) CI for the true mean difference ()
between two paired samples is given by
(d – tn-1,1-/2 sd/√n, d + tn-1,1-/2 sd/√n)
One-Sample Inference for the Poisson Distribution
(Small-Sample Test: p-value Method)
Let = expected value of a Poisson distribution. To test the hypothesis
H0: = 0 vs. H1: 0.
1. Compute x = observed number of deaths in the study population
2. Under H0, the random variable X will follow a Poisson distribution
with parameter 0. Thus, the exact two-sided p-value is given by
• In a longitudinal or follow-up study, the same group of people is
followed over time.
• (Two samples are paired -> The Paired T test)
• In a cross-sectional study, the participants are seen at only one point
in time.
• (Two samples are independent -> Two samples T test)