0% found this document useful (0 votes)
6 views20 pages

m09-inference

The document discusses inference and hypothesis testing, covering concepts such as test statistics, p-values, and performance evaluation metrics like false positive/negative rates and power. It also addresses the challenges of repeated tests in high-dimensional data and methods for multiple hypothesis correction, including Bonferroni and FDR adjustments. Additionally, it emphasizes the distinction between statistical significance and biological relevance.

Uploaded by

awel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views20 pages

m09-inference

The document discusses inference and hypothesis testing, covering concepts such as test statistics, p-values, and performance evaluation metrics like false positive/negative rates and power. It also addresses the challenges of repeated tests in high-dimensional data and methods for multiple hypothesis correction, including Bonferroni and FDR adjustments. Additionally, it emphasizes the distinction between statistical significance and biological relevance.

Uploaded by

awel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Inference and Hypothesis

Testing
Curtis Huttenhower ([email protected])
Jason Lloyd-Price ([email protected])

http://huttenhower.sph.harvard.edu/bst281
Topics
• Basic idea

• Test statistics and p-values

• Performance evaluation

• Repeated tests and high-dimensional data

03/26/2025 2
Example

• Eric has dark hair

• On the phone: The lecturer for BST 281 has blond hair

⇒ The lecturer is not Eric

03/26/2025 3
Example

Test statistic: weight


• Adult chickens weigh 6.2 ± 0.8 lb Null distribution
Null hypothesis: this is a chicken
• On the phone: This weighs 10 lb Observation

⇒ This is probably not a chicken

p-value

03/26/2025 4
Null distributions
• The null hypothesis is the statement tested for possible rejection
◦ Usually that the results are due to chance
◦ I.e. there is no effect/bias/relationship/etc.
◦ E.g. There is no bias towards Heads/Tails, “this is a chicken”
◦ Denoted H0

• Alternate hypothesis is everything else

• The null distribution is the distribution of the test statistic if the null
hypothesis is true
◦ E.g. binomial distribution, known distribution of chicken weights
◦ For test statistic T: P(T = t | H0)
03/26/2025 5
p-values
• The p-value is the probability of observing an equal or more extreme value of
the test statistic than observed, assuming the null hypothesis
◦ For test statistic T, this is P(T ≥ t | H0)
◦ Quantifies “surprise”
◦ Lower -> observation is more unlikely -> more surprised

• When the null hypothesis is true, p-values have a uniform distribution

• p-values < α are considered “significant”


◦ I.e. reject the null hypothesis
◦ α is the fraction of tests that will be significant when the null hypothesis is true
◦ Usually 0.05 or 0.01
03/26/2025 6
z-test

• If the test statistic is normally-distributed under the null


◦ z = (x - µ) / σ

• Chicken example:
◦ Adult chickens weigh 6.2 ± 0.8 lb
◦ Observed mass is 10 lb
◦ z = (10 – 6.2) / 0.8 = 4.75
◦ P(|z| ≥ 4.75 | This is a chicken) = 2.03 × 10-6

03/26/2025 7
One-tail and two-tail tests
• What observations are considered “extreme”?

• One-sided/one-tail tests test whether the test


statistic is higher or lower than chance
◦ H0 : ≥0, HA : <0

• Two-sided/two-tail tests test whether the test


statistic is equal to
◦ H0 : =0, HA : 0

03/26/2025 8
Simple parametric tests
• When the population mean + standard deviation are unknown
◦ Data is still assumed normal
◦ Null distribution is “Student’s t” distribution
◦ t-test

• Testing Pearson correlations


◦ Usually H0 : =0, HA : 0
◦ tanh-1()  Normal
◦ -> z-test

• Often make very strong assumptions about the data

03/26/2025 9
Nonparametric tests
• Test statistic is insensitive to the distribution (e.g. by rank transform)
◦ “Non-parametric t-test”: Mann-Whitney U test
◦ Cost is decreased sensitivity

• Permutation test
◦ p-value is calculated by repeated randomizations of the data
◦ Cost is a significant increase in computation time
◦ Can also be used to test how well your test statistic fits a
specific assumption
 Quantile-Quantile (Q-Q) plot

03/26/2025 10
Performance evaluation
• How do you assess the accuracy of a hypothesis test?

• Compare with data for which the answer is known: Gold standard
◦ Negatives: null hypothesis should not be rejected (drawn from H0)
◦ Positives: null hypothesis expected to be rejected (drawn from HA)

• Perform the test on the gold standard data. Possible outcomes:


H0 True H0 False
H0 Not Rejected True Negative False Negative (Type II)
H0 Rejected False Positive (Type I) True Positive

03/26/2025 11
Performance evaluation
• Probability of incorrect call H0 HA
◦ False positive rate: FPR = P(reject H0 | H0) H0 Not Rejected TN FN
◦ False negative rate: FNR = P(!reject H0 | HA) H0 Rejected FP TP

• Power: probability of detecting a true effect


◦ P(reject H0 | HA)
◦ Also called recall, true positive rate (TPR), sensitivity
• Precision: probability a detected effect is true
◦ P(HA | reject H0)
• Specificity: probability an undetected effect is false
◦ P(!reject H0 | H0)
◦ Also true negative rate (TNR)
03/26/2025 12
Precision/recall plots (PR)

• Most tests have a parameter that


adjusts their sensitivity
◦ E.g. α

• Precision vs recall
◦ Upper-right is good

03/26/2025 13
Receiver Operating Characteristic (ROC)

• Sensitivity/specificity plots
◦ TPR vs FPR
◦ Upper left is good
◦ Well-defined behavior when guessing

• Area Under the Curve (AUC)


◦ Perfect = 1
◦ Random = 0.5
◦ Perfectly wrong = 0

03/26/2025 14
Testing many hypotheses
• Recall that under the null hypothesis, p-values follow a uniform distribution in
[0, 1]
◦ Probability of rejecting of α, even if there’s no biological effect

• Consider 20,000 genes measured at once by microarray/RNA-seq


◦ Test each gene for differences
◦ How many false positive are expected?

• What can we do?


◦ Can change the test statistic to “maximum difference over the dataset”
◦ Null hypothesis: “no effect for any of the features”
◦ Can instead “adjust” the p-value to account for multiple hypothesis testing
03/26/2025 15
Multiple hypothesis correction

• Bonferroni correction
◦ Multiply p-values by the number of tests
◦ Very strict/conservative

• Control False Discovery Rate (FDR)


◦ % of tests expected to “fail” by chance
◦ FDR q-value = (# tests) * (p-value) / (p-value rank)

03/26/2025 16
A simple experiment
• Create a 5000 x 15 matrix of random values
◦ Genes and samples
Feature Sample1 Sample2 Sample3 Sample4
Age
Age =RAND()
0.150953 0.821741 0.158316 0.898557
• Find gene most correlated with Age Gene1
Gene1
Gene2 0.801808 0.491848 0.608064 0.583391
Gene2
Gene3
Gene3
Gene4
0.146415 0.922903 0.547641 0.344042
0.219748 0.844069 0.39559 0.314302

Gene5
Gene4 0.613934 0.685797 0.512878 0.986288
• Does this gene truly predict longevity? Gene6
Gene5 0.322305 0.149985 0.106558 0.659032
1.2
Gene6 0.667529 0.508563 0.57018 0.803856
1


0.8

0.6
Pearson correlation = 0.81
Age

0.4
p-value = 0.000252
q-value = 1
0.2

0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Gene989
03/26/2025 17
Statistical significance vs biological significance

• Statistically significant ⇏ biologically significant


◦ Something can be statistically significant, but biologically irrelevant
◦ E.g. when sample size is very large or uncertainty is very low

• Not statistically significant ⇏ not biologically significant


◦ Especially when sample size is small
◦ “There is no difference at the effect size we’re powered to test”

03/26/2025 18
Accepting the null

• Consider:
◦ Adult chickens weigh 6.2 ± 0.8 lb
◦ On the phone: This weighs 6.1 lb
◦ ⇒ This is a chicken

• Equivalent to a high p-value

• “Guilty” vs “not guilty”

03/26/2025 19
Summary
• Test statistics, null hypothesis and distribution, and p-values

• Performance evaluation
◦ False positive/negative rates, power, precision and specificity
◦ Precision/recall plots
◦ Receiver-Operator Characteristic (ROC) and the Area Under the Curve (AUC)

• Repeated tests and high-dimensional data


◦ Bonferroni and FDR corrections

03/26/2025 20

You might also like