m09-inference
m09-inference
Testing
Curtis Huttenhower ([email protected])
Jason Lloyd-Price ([email protected])
http://huttenhower.sph.harvard.edu/bst281
Topics
• Basic idea
• Performance evaluation
03/26/2025 2
Example
• On the phone: The lecturer for BST 281 has blond hair
03/26/2025 3
Example
p-value
03/26/2025 4
Null distributions
• The null hypothesis is the statement tested for possible rejection
◦ Usually that the results are due to chance
◦ I.e. there is no effect/bias/relationship/etc.
◦ E.g. There is no bias towards Heads/Tails, “this is a chicken”
◦ Denoted H0
• The null distribution is the distribution of the test statistic if the null
hypothesis is true
◦ E.g. binomial distribution, known distribution of chicken weights
◦ For test statistic T: P(T = t | H0)
03/26/2025 5
p-values
• The p-value is the probability of observing an equal or more extreme value of
the test statistic than observed, assuming the null hypothesis
◦ For test statistic T, this is P(T ≥ t | H0)
◦ Quantifies “surprise”
◦ Lower -> observation is more unlikely -> more surprised
• Chicken example:
◦ Adult chickens weigh 6.2 ± 0.8 lb
◦ Observed mass is 10 lb
◦ z = (10 – 6.2) / 0.8 = 4.75
◦ P(|z| ≥ 4.75 | This is a chicken) = 2.03 × 10-6
03/26/2025 7
One-tail and two-tail tests
• What observations are considered “extreme”?
03/26/2025 8
Simple parametric tests
• When the population mean + standard deviation are unknown
◦ Data is still assumed normal
◦ Null distribution is “Student’s t” distribution
◦ t-test
03/26/2025 9
Nonparametric tests
• Test statistic is insensitive to the distribution (e.g. by rank transform)
◦ “Non-parametric t-test”: Mann-Whitney U test
◦ Cost is decreased sensitivity
• Permutation test
◦ p-value is calculated by repeated randomizations of the data
◦ Cost is a significant increase in computation time
◦ Can also be used to test how well your test statistic fits a
specific assumption
Quantile-Quantile (Q-Q) plot
03/26/2025 10
Performance evaluation
• How do you assess the accuracy of a hypothesis test?
• Compare with data for which the answer is known: Gold standard
◦ Negatives: null hypothesis should not be rejected (drawn from H0)
◦ Positives: null hypothesis expected to be rejected (drawn from HA)
03/26/2025 11
Performance evaluation
• Probability of incorrect call H0 HA
◦ False positive rate: FPR = P(reject H0 | H0) H0 Not Rejected TN FN
◦ False negative rate: FNR = P(!reject H0 | HA) H0 Rejected FP TP
• Precision vs recall
◦ Upper-right is good
03/26/2025 13
Receiver Operating Characteristic (ROC)
• Sensitivity/specificity plots
◦ TPR vs FPR
◦ Upper left is good
◦ Well-defined behavior when guessing
03/26/2025 14
Testing many hypotheses
• Recall that under the null hypothesis, p-values follow a uniform distribution in
[0, 1]
◦ Probability of rejecting of α, even if there’s no biological effect
• Bonferroni correction
◦ Multiply p-values by the number of tests
◦ Very strict/conservative
03/26/2025 16
A simple experiment
• Create a 5000 x 15 matrix of random values
◦ Genes and samples
Feature Sample1 Sample2 Sample3 Sample4
Age
Age =RAND()
0.150953 0.821741 0.158316 0.898557
• Find gene most correlated with Age Gene1
Gene1
Gene2 0.801808 0.491848 0.608064 0.583391
Gene2
Gene3
Gene3
Gene4
0.146415 0.922903 0.547641 0.344042
0.219748 0.844069 0.39559 0.314302
…
Gene5
Gene4 0.613934 0.685797 0.512878 0.986288
• Does this gene truly predict longevity? Gene6
Gene5 0.322305 0.149985 0.106558 0.659032
1.2
Gene6 0.667529 0.508563 0.57018 0.803856
1
…
0.8
0.6
Pearson correlation = 0.81
Age
0.4
p-value = 0.000252
q-value = 1
0.2
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Gene989
03/26/2025 17
Statistical significance vs biological significance
03/26/2025 18
Accepting the null
• Consider:
◦ Adult chickens weigh 6.2 ± 0.8 lb
◦ On the phone: This weighs 6.1 lb
◦ ⇒ This is a chicken
03/26/2025 19
Summary
• Test statistics, null hypothesis and distribution, and p-values
• Performance evaluation
◦ False positive/negative rates, power, precision and specificity
◦ Precision/recall plots
◦ Receiver-Operator Characteristic (ROC) and the Area Under the Curve (AUC)
03/26/2025 20