0% found this document useful (0 votes)
614 views23 pages

Chapter 3 (Part 1) of The Book of Why: From Evidence To Causes - Reverend Bayes Meets Mr. Holmes

The document discusses Bayesian reasoning and Bayes' theorem, explaining how they can be used to update beliefs about causal hypotheses based on evidence. It introduces the concepts of deduction, working from hypotheses to conclusions, and induction, working from evidence to hypotheses. Bayes' theorem provides a mathematical way to calculate posterior probabilities that allow reasoning in both the causal and evidential directions.

Uploaded by

Tomas Aragon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
614 views23 pages

Chapter 3 (Part 1) of The Book of Why: From Evidence To Causes - Reverend Bayes Meets Mr. Holmes

The document discusses Bayesian reasoning and Bayes' theorem, explaining how they can be used to update beliefs about causal hypotheses based on evidence. It introduces the concepts of deduction, working from hypotheses to conclusions, and induction, working from evidence to hypotheses. Bayes' theorem provides a mathematical way to calculate posterior probabilities that allow reasoning in both the causal and evidential directions.

Uploaded by

Tomas Aragon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Chapter 3 (Part 1) of “The Book of Why”

From Evidence to Causes—Reverend Bayes meets Mr. Holmes

Tomás Aragón and James Duren (Version F; Last compiled August 4, 2019)
June 6, 2019, San Francisco, CA
Health Officer, City & County of San Francisco
Director, Population Health Division (PHD)
San Francisco Department of Public Health
https://taragonmd.github.io/ (GitHub page)

PDF slides produced in Rmarkdown LATEX Beamer—Metropolis theme

1
A patient presents with chest pain to clinical provider.
The patient has a history of coronary artery disease.

CAD MI TT

CP

GERD
Figure 1: A patient with a history of coronary artery disease (CAD) presents to a provider
complaining of prolonged chest pain (CP). The provider’s differential diagnosis (hypotheses) are
myocardial infarction (MI) and gastroesophageal reflux disease (GERD). The provider sends a
blood specimen for a Troponin Test (TT) to "rule out" a MI.

2
Sherlock Holmes: deduction vs. induction

Deduction vs. induction


“It’s elementary, my dear Watson.”
So spoke Sherlock Holmes . . . Holmes performed not just deduction, which works
from a hypothesis to a conclusion.1 His great skill was induction, which works in the
opposite direction, from evidence to hypothesis.2

Multi-cause reasoning (an extension of causal reasoning)


“When you have eliminated the impossible, whatever remains, however improbable,
must be the truth.” Having induced several hypothesis, Holmes eliminated them one by
one in order to deduce (by elimination) the correct one.
1
Causal reasoning
2
Evidential reasoning
3
Synonyms

Cause • → • Effect
Hypothesis • → • Evidence

Graph Conditional prob. Probability Synonym Reasoning type


H→E P(E | H) Forward prob. deduction causal reasoning3
H→E P(H | E ) “Inverse” prob. induction evidential reasoning4

3
Also called "predictive" reasoning
4
Also called "diagnostic" reasoning
4
Bayes’ theorem for causal Hypothesis • → • Evidence

Starting from left to right, and then from right to left:

Reasoning N Probability Description


Causal reasoning 1 P(H) Prior probability (margin probability of H)
. . . leads to 2 P(E | H) Likelihood (TP [sensitivity], FP [1-specificity])
Evidential reasoning 3 P(E ) Marginal probability of E
. . . leads to 4 P(H | E ) Posterior probability (conditional probability)

For Bayes’ theorem just substitute probability expressions from table above:
(1)(2)
(4) =
(3)

5
Bayes’ theorem for causal (Hypothesis) → (Evidence)

2. Causal reasoning is evaluating the


1. Prior probability is the marginal
likelihood (true positive [sensitivity],
probability of causal Hypothesis
and false positive [1-specificity])

P(H)P(E | H)
P(H | E ) =
P(E )

3. Marginal probability of Evidence


4. Evidential reasoning is evaluating
summed over all possibilities;
the posterior probability
Likelihood Ratio is (2) ÷ (3)
6
Public health examples: Cause (hypothesis) • → • Effect (evidence)

Exposure • → • Disease

Disease • → • Test result

Smoking • → • Lung cancer

Jury trial: Guilty • → • Evidence

Citrus fruit consumption • → • Scurvy

Green house gases • → • Global warming

Global warming • → • Extreme weather events

Implicit (nonconscious) bias • → • Discrimination

7
Reverend Thomas Bayes’ pool table example

L = length of pool table


x = feet from left end of table

L (cause) → x (effect)

Forward probability
P(x | L)

"Inverse" probability
P(L | x )

8
Tea-Scones example (default: assume probabilistic dependence)

Customer Tea Scones Tea Scones


1 Yes Yes No Yes Total
2 No Yes
3 No No No 3 1 4
4 No No Yes 4 4 8
5 Yes Yes Total 7 5 12
6 Yes No
7 Yes No P(X , Y , T ) = P(X )P(T )P(Y | X , T )
8 Yes Yes
9 Yes No
10 Yes No
11 No No
12 Yes Yes
9
Tea-Scones example (default: assume probabilistic dependence)

P(Tea) = P(T )
P(Scone) = P(S)
T S P(Tea ∩ Scone) = P(T , S)
= P(S, T )
T ∩S
P(S, T ) = P(T )P(S | T ) (1)
P(T , S) = P(S)P(T | S) (2)

Total probability = 1 Bayes’ Theorem

P(T )P(S | T ) == P(S)P(T | S)

10
Bayes’ theorem

P(S)P(T | S)
P(S | T ) =
P(T )

P(Hypothesis)P(Evidence | Hypothesis)
P(Hypothesis | Evidence) =
P(Evidence)

11
Bayes’ theorem

P(Hypothesis)P(Evidence | Hypothesis)
P(Hypothesis | Evidence) =
P(Evidence)

Excerpt from “The Book of Why” (p. 101)


“This is perhaps the most important role of Bayes’ rule in statistics: we can estimate
the conditional probability directly in one direction, for which our judgment is more
reliable (deduction),5 and use mathematics to derive the conditional probability in the
other direction, for which our judgment is rather hazy (induction).”6

5
causal reasoning
6
evidential reasoning
12
Bayes’ theorem

P(S, T ) P(S)P(T | S)
P(S | T ) = =
P(T ) P(T )

P(Hypothesis)P(Evidence | Hypothesis)
P(Hypothesis | Evidence) =
P(Evidence)

Excerpt from “The Book of Why” (p. 103)


[Bayes’ rule] acts, in fact, as a normative rule for updating beliefs (causal hypotheses)
in response to evidence. . . . [T]he belief a person attributes to S (cause) after
discovering T (evidence) is never lower than the degree of belief that person attributes
to S AND T before discovering T . Also, it implies that the more surprising the
evidence T —that is, the smaller P(T ) is—the more convinced one should become of
its cause S. 13
Example: Mammogram for breast cancer screening

Cause • → • Effect
Disease • → • Test

P(Disease, Test)
P(Disease | Test) =
P(Test)
P(Disease)P(Test | Disease)
=
P(Test)
P(Test | Disease)
= P(Disease)( )
P(Test)
= P(Disease)(Likelihood Ratio)

14
Bayes’ theorem example: Mammogram for breast cancer screening

Cause • → • Effect
Disease • → • Test

P(Disease, Test)
P(Disease | Test) =
P(Test)
P(Disease)P(Test | Disease)
=
P(Test)
P(D)P(T | D)
=
P(D)P(T | D) + P(D̄)P(T | D̄)

15
Bayes’ theorem example: Mammogram for breast cancer screening

Cause • → • Effect
Disease • → • Test

P(D+ | T +) = Positive Predictive Value


P(D+)P(T + | D+)
=
P(D+)P(T + | D+) + P(D−)P(T + | D−)
P(D+)(True Positive)
=
P(D+)(True Positive) + P(D−)(False Positive)
P(D+)(Sensitivity)
=
P(D+)(Sensitivity) + P(D−)(1 − Specificity)

16
Bayes’ theorem example: Mammogram for breast cancer screening

Disease • → • Test

A 43 woman has a positive mammogram (T +). What is the probability of breast cancer
given the positive test (P(D+ | T +))? What do we know?

P(D+) = 1/700 for 43 year old women

P(T + | D+) = 0.73 = True Positive = Sensitivity

P(T + | D−) = 0.12 = False Positive = 1 - Specificity

P(D+)(TP)
P(D+ | T +) =
P(D+)(TP) + P(D−)(FP)
(1/700)(0.73)
= ≈ 0.009
(1/700)(0.73) + (1 − 1/700)(0.12)
17
Bayes’ theorem: Review sensitivity and specificity of a diagnostic test

Operating characteristics of a diagnostic test


TP
Sensitivity = P(T + | D+) = TP+FN
TN
Specificity = P(T − | D−) = TN+FP

Remember “SnOut” and “SpIn”


SnOut: Use a very sensitive test (very low FN) to “rule out” a hypothesis. That is,
when we have confidence in a negative result we can use the test to rule out a
hypothesis.
SpIn: Use a very specific test (very low FP) to “rule in” a hypothesis. That is, when
we have confidence in a positive result we can use the test to rule in a hypothesis.

18
Bayes’ theorem
Bayes’ rule is a distillation of the scientific method (TBoW, p. 108)

1. Formulate a causal hypothesis


2. Deduce a testable consequence of the hypothesis (causal reasoning)
3. Design an evaluation (study) and collect evidence
4. Update your belief in the causal hypothesis (evidential reasoning)

19
Bayes’ theorem for causal (Disease) → (Test)

2. Causal reasoning is evaluating the


1. Prior probability is the marginal
likelihood (true positive [sensitivity],
probability of causal Hypothesis
and false positive [1-specificity])

P(D)P(T | D)
P(D | T ) =
P(T )

3. Marginal probability of Evidence


4. Evidential reasoning is evaluating
summed over all possibilities;
the posterior probability
Likelihood Ratio is (2) ÷ (3)
20
Bayesian networks are nodes with probabilistic dependence

Here is a non-causal Bayesian network. Smelling smoke increases the credibility (belief)
of a fire nearby, but smoke does not cause fire.

Smell smoke Fire nearby

Here is a causal Bayesian network. Fire causes smoke. Smoke is evidence of a fire
(cause). Causal BNs have causal and evidential implications.
Fire Smoke

Directed acyclic graphs (DAGs) are causal Bayesian networks. The mammography
example was a causal Bayesian network.
Disease Test

21
Bayesian networks generalize Bayes’ theorem for complex causal graphs
Core DAG patterns for three nodes and two edges

X X Y

X Y Z Y Z Z
(a) (b) (c)

Figure 2: Core DAG patterns for three nodes and two edges: (a) chain (sequential cause), (b)
fork (common cause), and (c) collider (common effect).

22
Recap: A patient presents with chest pain to clinical provider.
The patient has a history of coronary artery disease.

CAD MI TT

CP

GERD
Figure 3: A patient with a history of coronary artery disease (CAD) presents to a provider
complaining of prolonged chest pain (CP). The provider’s differential diagnosis (hypotheses) are
myocardial infarction (MI) and gastroesophageal reflux disease (GERD). The provider sends a
blood specimen for a Troponin Test (TT) to "rule out" a MI. The pattern CAD → MI → TT is
a chain (sequential cause); TT ← MI → CP is a fork (common cause or confounder); and
MI → CP ← GERD is a collider (common effect). Providers reason like Sherlock Holmes.
23

You might also like