Signal Detection Theory and Bayesian Modeling: COGS 202: Computational Modeling of Cognition
Signal Detection Theory and Bayesian Modeling: COGS 202: Computational Modeling of Cognition
Bayesian Modeling
This development may well be regarded as the most towering achievement of basic
psychological research of the last half century.
When does Signal Detection Theory apply?
1. There are two true states of the world
2. An imperfect diagnostic procedure is used to make a decision
Present Absent
Present
Hit Miss
True State
Absent
False Correct
Alarm Rejection
Possible Responses
Diagnostic Decision
Present Absent
Present
Hit Miss
False Correct
Alarm Rejection
What is signal detection theory?
● A combination of two theoretical structures:
○ Statistical decision theory
− Used to treat the detection task as a decision process
− Allows TSD to deal with the problem of criterion for signal existence that is employed by
observer
○ Theory of ideal observer
− Makes it possible to relate the level of detection performance attained by a real observer
to the mathematically ideal detection performance
Why use signal detection theory?
● Allows us to establish a framework for experimental study or sensory systems
● The theory specifies the nature of the detection process
● Defines the experimental methods that are appropriate
● Deals with a key issue in psychophysics (and other related fields)
Most Importantly:
● It allows us to define the criterion for making a positive response and their
sensitivity to the presence or absence of a stimulus
Why use signal detection theory?
● Allows us to establish a framework for experimental study or sensory systems
● The theory specifies the nature of the detection process
● Defines the experimental methods that are appropriate
● Deals with a key issue in psychophysics (and other related fields)
Most Importantly:
● It allows us to define the criterion for making a positive response and their
sensitivity to the presence or absence of a stimulus
The Fundamental Detection Problem
“The observation interval contains either the noise alone, or a specified signal and
the noise … and can be represented as a unidimensional variable.”
Likelihood Ratio Criterion
Can be thought of as similar to SNR
SNR = Power(Signal) *When SNR is large, the decision will be that signal
Power(Noise) is present.
Discriminability: d’
The degree to which the signals Good approximation to behavior
of human observer
associated with target being present or
D’ = Σ-1(μ1-μ2)
absent are separated using a particular
diagnostic procedure
Discriminability
Discriminability: d’
The degree to which the signals Good approximation to behavior
of human observer
associated with target being present or
D’ = Σ-1(μ1-μ2)
absent are separated using a particular
diagnostic procedure
Decision Boundary of
Bayes Decision Rule in
Discriminability
Equal Covariance Case!!
The Observer’s Criterion and the Optimal Criterion
“Strict”
Correct ID Rate = 0.50
False Alarm ID Rate = 0.02
The Observer’s Criterion and the Optimal Criterion
Low
Discriminability
High
Discriminability
Implications for Psychophysical Methods
SDT provides a framework for the study of sensory systems with richer information.
1) The most salient implication is the ability to perform and analyze catch trials
(trials containing noise alone) and the treatment of positive responses
a) Not Adequate: Simply remind observer to avoid false-positives when one is made after a few
catch trials
i) Drives criterion for decisions up to a point where it cannot be measured
b) Adequate: Use enough catch trials to obtain good estimate of true response criterion
2) SDT Requirements:
a) Measurable Noise / Measurable Response Criterion (LR)
b) Enough catch trials
c) Analysis yielding measure of sensitivity independent of Response Criterion
Theory of Ideal Observers
“Makes it possible to relate the level of detection performance attained by a real
observer to the mathematically ideal detection performance”
*Relation through η: Varying η can determine range over which humans adjust the
parameters of their sensory system (likelihood criterion)
Theory of Ideal Observers
New Question: What is the nature of the discrepancy between real and ideal
observer?
○ NLP
○ Vision/Motor control
○ Learning
○ Etc.
Cognition and Probability
● Cognition involves inferring new information from prior input
● Probability is “calculus” for this type of uncertain inference
○ Subjective Interpretation of Probability
● Connectionist network vs. symbolic rule-based processing
Subjective Probability
● ‘Frequentist’ interpretation - coin flips, dice rolls
● ‘Subjective’ interpretation - ‘degrees of belief’
○ My degree of belief for a coin under a table ~ ½
○ May increase to 1 as I look underneath the table
○ My friend may have different ‘subjective’ probability of the same
event
Subjective Probability
● Justification for this interpretation
○ Cox’s Axioms - common sense, consistency,
divisibility/comparability
○ Dutch Book - rational people have subjective probabilities
● Applied towards conditional probabilities: Bayes’ Theorem
What is a Sophisticated Probability Model?
● Cognitive agents modeled through probability distributions
● Reasoning is modeled with techniques from statistical learning
● Meant for systems like intelligent machines, computer vision, machine
learning
Sophisticated Prob Models - Early Examples
Tenenbaum & Griffith’s Causal Structure Bayesian model selection with networks
Judgements
Alison Gopnik’s Causal Learning Model Model children’s causal learning with Bayesian
networks
Human Causal Learning
● Rescorla-Wagner associative learning model
----The idea that we can infer the nature of the source of data using tools from probability theory.
Allows us to -
h is one hypothesis in H
prior
With no priori reason to favor one hypothesis over the other, we can take
2/ Comparing two simple hypotheses
According the Bernoulli distribution, we have
Posterior odds
2/ Comparing two simple hypotheses
HHHHHHHHHH
HHTHTHTTHT
3/ Comparing infinitely many hypotheses
How can we infer θ from a sequence like HHHHHHHHHH?
3/ Comparing infinitely many hypotheses
Assume a prior: suppose p(θ) follows a uniform distribution.
3/ Comparing infinitely many hypotheses
Assumption1: p(θ) follows uniform distribution.
There are two methods to obtain a point estimate from a posterior distribution.
(1) The maximum a posteriori (MAP) for θ is NH/(NH+NT), which is also the
maximum-likelihood estimation.
(2) The posterior mean is (NH+1)/(NH+NT+2).
Example: HHHHHHHH?
3/ Comparing infinitely many hypotheses
Assumption1: p(θ) follows uniform distribution.
The posterior mean is sensitive to the fact that we might not want to put as much
weight in a single head as a sequence of ten heads in a row.
3/ Comparing infinitely many hypotheses
Assumption2. θ ~ Beta(VH+1,VT+1), VH=VT=1000
MAP for θ :
h1 is the hypothesis that θ takes a value drawn from a uniform distribution on [0,1]
With no priori reason to favor one hypothesis over the other, we can take
4/ Comparing hypotheses that differ in complexity
First, compute the likelihood under each hypothesis.
4/ Comparing hypotheses that differ in complexity
Second, we plot the posterior odds in favor of h1
4/ Comparing hypotheses that differ in complexity
Third, we plot the posterior probability v.s. θ
4/ Comparing hypotheses that differ in complexity
Conclusion:
(1) Complex hypotheses have more degrees of freedom that can be adapted to
the data, and can thus always be made to fit better than simple hypotheses.
(2) But this flexibility also makes a worse fit possible.
Advice:
(1) Bayes nets use acyclic graphs, which means no node can be its own
ancestor.
(2) Conditional independence is a basic tool to do inference in Bayes nets.
1/ Directed graphical models
Simple example from Prof. Saul.
B=burglary happens
E=earthquake happens
A=alarm rings
J=John calls
M=Mary calls
1/ Directed graphical models
(1) Nodes represent random variables
(2) Edges represent conditional dependencies
1/ Directed graphical models
31 = 25-1
13 = 1+2+4+2+4
1/ Directed graphical models
Causal Graphical Models
P(Q = q| E = e) ~ N(q)/N
Gaussian mixture models/latent variables
Cluster 2
Cluster 1
Z X
2/ The Expectation-Maximization (EM) Algorithm
● Used to learn parameters using maximum
likelihood estimation. Cluster 2
● Finds parameters that maximizes likelihood of
observed data
Differences in d’