0% found this document useful (0 votes)
12 views77 pages

Signal Detection Theory and Bayesian Modeling: COGS 202: Computational Modeling of Cognition

Signal detection theory and Bayesian modeling provide frameworks for studying decision making under uncertainty. Signal detection theory models how observers detect signals in noise and make decisions based on their sensitivity to signals and response criteria. It has been widely applied to perceptual detection tasks. Bayesian modeling uses probability to represent degrees of belief and make inferences based on prior knowledge and new evidence. Together, these approaches have advanced the study of cognition by providing mathematical tools to model human judgment and decision making.

Uploaded by

asmita saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views77 pages

Signal Detection Theory and Bayesian Modeling: COGS 202: Computational Modeling of Cognition

Signal detection theory and Bayesian modeling provide frameworks for studying decision making under uncertainty. Signal detection theory models how observers detect signals in noise and make decisions based on their sensitivity to signals and response criteria. It has been widely applied to perceptual detection tasks. Bayesian modeling uses probability to represent degrees of belief and make inferences based on prior knowledge and new evidence. Together, these approaches have advanced the study of cognition by providing mathematical tools to model human judgment and decision making.

Uploaded by

asmita saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Signal Detection Theory and

Bayesian Modeling

COGS 202: Computational Modeling of Cognition


Omar Shanta, Shuai Tang, Gautam Reddy,
Reina Mizrahi, Mehul Shah
Detection Theory and
Psychophysics: A Review
Swets (1961)
“Over ensuing decades, the SD model, with only technical modifications to accommodate
particular applications, has become almost universally accepted as a theoretical account
of decision making in research on perceptual detection and recognition and in numerous
extensions to applied domains”
- (Swets, 1988; Swets, Dawes, & Monahan, 2000)

This development may well be regarded as the most towering achievement of basic
psychological research of the last half century.
When does Signal Detection Theory apply?
1. There are two true states of the world
2. An imperfect diagnostic procedure is used to make a decision

(Macmillian, and Creelman, 2005; Wixted et al., 2014)


Possible Responses
Diagnostic Decision

Present Absent

Present
Hit Miss

True State
Absent

False Correct
Alarm Rejection
Possible Responses

Diagnostic Decision

Present Absent

Present
Hit Miss

False Alarm - Type I Error


True State Miss - Type II Error
Absent

False Correct
Alarm Rejection
What is signal detection theory?
● A combination of two theoretical structures:
○ Statistical decision theory
− Used to treat the detection task as a decision process
− Allows TSD to deal with the problem of criterion for signal existence that is employed by
observer
○ Theory of ideal observer
− Makes it possible to relate the level of detection performance attained by a real observer
to the mathematically ideal detection performance
Why use signal detection theory?
● Allows us to establish a framework for experimental study or sensory systems
● The theory specifies the nature of the detection process
● Defines the experimental methods that are appropriate
● Deals with a key issue in psychophysics (and other related fields)

Most Importantly:

● It allows us to define the criterion for making a positive response and their
sensitivity to the presence or absence of a stimulus
Why use signal detection theory?
● Allows us to establish a framework for experimental study or sensory systems
● The theory specifies the nature of the detection process
● Defines the experimental methods that are appropriate
● Deals with a key issue in psychophysics (and other related fields)

Most Importantly:

● It allows us to define the criterion for making a positive response and their
sensitivity to the presence or absence of a stimulus
The Fundamental Detection Problem
“The observation interval contains either the noise alone, or a specified signal and
the noise … and can be represented as a unidimensional variable.”
Likelihood Ratio Criterion
Can be thought of as similar to SNR

Likelihood Ratio: L(y) = P(y|present) y∊{present,absent}


P(y|absent)

If y=present, the ratio is analogous to SNR

L(present) = P(present|present) *When L is large, the decision will be that signal


P(present|absent) is present.

SNR = Power(Signal) *When SNR is large, the decision will be that signal
Power(Noise) is present.
Discriminability: d’
The degree to which the signals Good approximation to behavior
of human observer
associated with target being present or
D’ = Σ-1(μ1-μ2)
absent are separated using a particular
diagnostic procedure

Discriminability
Discriminability: d’
The degree to which the signals Good approximation to behavior
of human observer
associated with target being present or
D’ = Σ-1(μ1-μ2)
absent are separated using a particular
diagnostic procedure
Decision Boundary of
Bayes Decision Rule in
Discriminability
Equal Covariance Case!!
The Observer’s Criterion and the Optimal Criterion

Correct ID Rate = 0.98


False Alarm ID Rate = 0.50

Correct ID Rate = 0.84


False Alarm ID Rate = 0.16

Correct ID Rate = 0.50


False Alarm ID Rate = 0.02
The Observer’s Criterion and the Optimal Criterion

Correct ID Rate = 0.98


False Alarm ID Rate = 0.50

Correct ID Rate = 0.84


False Alarm ID Rate = 0.16

Correct ID Rate = 0.50


False Alarm ID Rate = 0.02
The Observer’s Criterion and the Optimal Criterion

Correct ID Rate = 0.98


False Alarm ID Rate = 0.50

Correct ID Rate = 0.84


False Alarm ID Rate = 0.16

Correct ID Rate = 0.50


False Alarm ID Rate = 0.02
The Observer’s Criterion and the Optimal Criterion
“Lax”
Correct ID Rate = 0.98
False Alarm ID Rate = 0.50

Correct ID Rate = 0.84 “Moderate”


False Alarm ID Rate = 0.16

“Strict”
Correct ID Rate = 0.50
False Alarm ID Rate = 0.02
The Observer’s Criterion and the Optimal Criterion

Correct ID Rate = 0.98


False Alarm ID Rate = 0.50

Correct ID Rate = 0.84


False Alarm ID Rate = 0.16

Correct ID Rate = 0.50


False Alarm ID Rate = 0.02
Receiver Operating Characteristic (ROC)

Low
Discriminability

High
Discriminability
Implications for Psychophysical Methods
SDT provides a framework for the study of sensory systems with richer information.

1) The most salient implication is the ability to perform and analyze catch trials
(trials containing noise alone) and the treatment of positive responses
a) Not Adequate: Simply remind observer to avoid false-positives when one is made after a few
catch trials
i) Drives criterion for decisions up to a point where it cannot be measured
b) Adequate: Use enough catch trials to obtain good estimate of true response criterion
2) SDT Requirements:
a) Measurable Noise / Measurable Response Criterion (LR)
b) Enough catch trials
c) Analysis yielding measure of sensitivity independent of Response Criterion
Theory of Ideal Observers
“Makes it possible to relate the level of detection performance attained by a real
observer to the mathematically ideal detection performance”

Maximum d’ of Ideal Observer d’ of Real Observer


d’=sqrt(2E/N0) E:Signal Energy d’obs=η*sqrt(2E/N0)
N0:Noise Power

*Relation through η: Varying η can determine range over which humans adjust the
parameters of their sensory system (likelihood criterion)
Theory of Ideal Observers
New Question: What is the nature of the discrepancy between real and ideal
observer?

1) Humans perform worse for case of “signal specified exactly”


a) Human Observer has “noisy decision process” (Variable Criterion)
b) Noise in human sensory systems (senses)
c) Faulty Memory in humans
i) Forgetting signal characteristics introduces uncertainty into the model and it will have
suboptimal performance
ii) Introducing a memory aid increases performance of human observer
Examples and Extended Applications of SDT
Potential uses include problems that:

1) Lengthen Observation Interval


2) Number of Observations before a decision increases
3) Number of signals in an interval increases
4) Number of observers concentrating on same signal increases

SDT has been applied to:

1) Recognition/Identification instead of Detection


2) Deferred Decision: Observer decides to make decision now or get another
sample
3) Speech Communication
Bayesian Modeling
Probabilistic Models of Cognition: Foundation
● How is Cognitive Science related to Probability?

● What applications does this relationship have?

● What scientific works helped further the field?


Probabilistic Models of Cognition: Foundation
● ‘Sophisticated’ probabilistic models applied to graphs and grammars

● Probability theory is two-fold

○ Normative - reasoning events (mathematics)

○ Descriptive - analyze actions and situations (psychology)


The Role of Mathematics
● Math usually seen as formal, disciplined, focused on patterns

● Provides framework for cognitive theories in many fields

○ NLP
○ Vision/Motor control
○ Learning
○ Etc.
Cognition and Probability
● Cognition involves inferring new information from prior input
● Probability is “calculus” for this type of uncertain inference
○ Subjective Interpretation of Probability
● Connectionist network vs. symbolic rule-based processing
Subjective Probability
● ‘Frequentist’ interpretation - coin flips, dice rolls
● ‘Subjective’ interpretation - ‘degrees of belief’
○ My degree of belief for a coin under a table ~ ½
○ May increase to 1 as I look underneath the table
○ My friend may have different ‘subjective’ probability of the same
event
Subjective Probability
● Justification for this interpretation
○ Cox’s Axioms - common sense, consistency,
divisibility/comparability
○ Dutch Book - rational people have subjective probabilities
● Applied towards conditional probabilities: Bayes’ Theorem
What is a Sophisticated Probability Model?
● Cognitive agents modeled through probability distributions
● Reasoning is modeled with techniques from statistical learning
● Meant for systems like intelligent machines, computer vision, machine
learning
Sophisticated Prob Models - Early Examples

Ulf Grenander - Brown U. Statistician and Created vocabulary to help machines


Applied Mathematics recognize patterns in the world

Judea Pearl - UCLA CS and COGS Championed probabilistic approach to AI.


Bayesian Networks - Probabilistic model
through directed acyclic graphs
Applied Sophisticated Probabilistic Models
● Applying towards human cognition is not straightforward
● Probabilistic mind often has poor judgement
○ Fall victim to various probabilistic fallacies
● Ideal for well-optimized cognitive processes (vision, motor control)
Probabilistic Models of Vision

● Most advanced models in cognitive


science come in vision-related
applications
○ Ideal observer to Bayesian
Theory
Probabilistic Models of Vision
● ‘Grammatical’ models of vision
● Stochastic grammars for image parsing
● Probabilistic models of language processing, psycholinguistics
● Analysis-By-Synthesis
○ Use Bayes Decision Rule (Bayes Theorem)
Impact on Causal Learning

P.W. Cheng’s Causal Power How variables affect other variables


Parameter estimation in bayesian networks

Tenenbaum & Griffith’s Causal Structure Bayesian model selection with networks
Judgements

Alison Gopnik’s Causal Learning Model Model children’s causal learning with Bayesian
networks
Human Causal Learning
● Rescorla-Wagner associative learning model

● Peter Dayan and colleagues - Animal learning behavior


● Explain structure of computational and neural mechanisms of the
brain
Marr’s Levels of Probabilistic Explanation
Computational Theory
● Cognitive Processes are naturally subjective and uncertain
● Focused on nature of problem being solved
● Does not matter how the system solves the problem
● Belief that cognitive systems operate via ‘heuristic tricks’
○ Learn through past experiences
Representation and Algorithms
● Human cognition is very flexible
● Some problems can be solved through explicit probabilistic methods
○ Stochastic grammars for language or vision
○ Bayesian Networks
● Bayesian computations are difficult when scaled up
● Sophisticated Probabilistic Models in various applications
○ Hypotheses for algorithms that make up probabilistic cognition
Hardware Implementation
● Naturally maps onto computational architectures
○ Distributed, autonomous, running in parallel
○ Qualitative features of neural architecture
● The brain and nervous system may act using probabilistic models
○ Research done in computational neuroscience
○ Further supports probabilistic cognition
Bayesian Modeling
What is Bayesian Inference?

----The idea that we can infer the nature of the source of data using tools from probability theory.

Allows us to -

Compare hypotheses using Bayes rule:

----A proper method for comparing hypotheses of varying complexity.

Impose structure to probabilistic models:

----Graphical models and efficient algorithms to perform inference and sampling.


Fundamentals of Bayesian Inference
● Basic Bayes
● Comparing two simple hypotheses
● Comparing infinitely many hypotheses
● Comparing hypotheses that differ in complexity
Notation:
H is the set of all hypotheses considered by agent

h is one hypothesis in H

d is the data observed by agent


1/ Basic Bayes
likelihood
posterior probability

prior

This formulation of Bayes’ rule makes it apparent that


the posterior probability of h is proportional to the product of the prior
probability and the likelihood.
2/ Comparing two simple hypotheses
● d is the observation that you got. For example: HHTHTHTTHT
● h0: probability of heads is 0.5
● h1 : probability of heads is 0.9
● θ denotes the probability of heads, either 0.5 or 0.9.

With no priori reason to favor one hypothesis over the other, we can take
2/ Comparing two simple hypotheses
According the Bernoulli distribution, we have

Then, we want to compute P(h0|d) and P(h1|d)

Posterior odds
2/ Comparing two simple hypotheses
HHHHHHHHHH

HHTHTHTTHT
3/ Comparing infinitely many hypotheses
How can we infer θ from a sequence like HHHHHHHHHH?
3/ Comparing infinitely many hypotheses
Assume a prior: suppose p(θ) follows a uniform distribution.
3/ Comparing infinitely many hypotheses
Assumption1: p(θ) follows uniform distribution.

There are two methods to obtain a point estimate from a posterior distribution.

(1) The maximum a posteriori (MAP) for θ is NH/(NH+NT), which is also the
maximum-likelihood estimation.
(2) The posterior mean is (NH+1)/(NH+NT+2).

Example: HHHHHHHH?
3/ Comparing infinitely many hypotheses
Assumption1: p(θ) follows uniform distribution.

The posterior mean is sensitive to the fact that we might not want to put as much
weight in a single head as a sequence of ten heads in a row.
3/ Comparing infinitely many hypotheses
Assumption2. θ ~ Beta(VH+1,VT+1), VH=VT=1000

MAP for θ :

Posterior mean for θ :


4/ Comparing hypotheses that differ in complexity
h0 is the hypothesis that θ=0.5

h1 is the hypothesis that θ takes a value drawn from a uniform distribution on [0,1]

With no priori reason to favor one hypothesis over the other, we can take
4/ Comparing hypotheses that differ in complexity
First, compute the likelihood under each hypothesis.
4/ Comparing hypotheses that differ in complexity
Second, we plot the posterior odds in favor of h1
4/ Comparing hypotheses that differ in complexity
Third, we plot the posterior probability v.s. θ
4/ Comparing hypotheses that differ in complexity
Conclusion:

(1) Complex hypotheses have more degrees of freedom that can be adapted to
the data, and can thus always be made to fit better than simple hypotheses.
(2) But this flexibility also makes a worse fit possible.

Advice:

A more complex hypothesis will be favored only if its greater complexity


consistently provides a better account of data.
Representing structured probability distributions
Directed graphical models (Bayesian networks)

Undirected graphical models (Markov Random Fields)

Uses of graphical models


1/ Directed graphical models
Suppose we have n binary random variables, the joint distribution

involves O(2n) numbers for random variables.


1/ Directed graphical models

The numbers for binary random variables = 20+21+22+...+2n-1=2n-1

(1) Can we have more compact representations?


(2) Can we have more efficient algorithms?
1/ Directed graphical models
Guidelines:

(1) Bayes nets use acyclic graphs, which means no node can be its own
ancestor.
(2) Conditional independence is a basic tool to do inference in Bayes nets.
1/ Directed graphical models
Simple example from Prof. Saul.

B=burglary happens

E=earthquake happens

A=alarm rings

J=John calls

M=Mary calls
1/ Directed graphical models
(1) Nodes represent random variables
(2) Edges represent conditional dependencies
1/ Directed graphical models

The total numbers used for representing joint probability

is 13 now, rather than 31.

31 = 25-1

13 = 1+2+4+2+4
1/ Directed graphical models
Causal Graphical Models

An edge is assumed to indicate a direct causal relationship.

(1) Representing the probabilities of events that one might observe


(2) Representing the probabilities of events that one can produce through
intervening on a system
2/ Undirected graphical models/Markov random
fields
(1) Nodes represent random variables
(2) Undirected edges define a neighborhood structure on the graph
2/ Undirected graphical models
(1) Used to model image data in vision
(2) Used as generative models for textures
(3) Used for image segmentation
3/ Uses of graphical models
● Intuitive representation of causal relationships
● Efficient algorithms for learning and inference
● Applications:
○ Language modeling
○ Mixture models
○ Artificial neural networks (Hopfield nets, Boltzmann machines)
Algorithms for approximate inference
(1) Exact inference is computationally expensive, particularly
with loopy graphs
(2) Popular algorithms:
(a) Markov Chain Monte Carlo (MCMC) methods
(b) The Expectation-Maximization (EM) Algorithm
Approximate inference
(1) Can write down joint pdf easily as a product of
marginal distributions involving nodes and their
parents
(2) Draw samples from the joint pdf and estimate
other marginal distributions.

P(M = m| B = b) ~ N(M= m,B = b)/N(B=b)


This is too slow, there are smarter methods
1/ Markov Chain Monte Carlo (MCMC) methods
Estimating marginal distribution, probability of the
query given the evidence (data)
E E
P(Q = q| E = e)

● Instantiate evidence vectors.


● Initialize other variables
● Sample from conditional distribution of an RV E
Q
over its Markov blanket.
● Repeat step 3 N times and we have

P(Q = q| E = e) ~ N(q)/N
Gaussian mixture models/latent variables
Cluster 2

Cluster 1
Z X
2/ The Expectation-Maximization (EM) Algorithm
● Used to learn parameters using maximum
likelihood estimation. Cluster 2
● Finds parameters that maximizes likelihood of
observed data

E step: evaluating the expectation of the ‘complete


log-likelihood’ logP(x,z|θ) with respect to P(z|x,θ)

M step: maximizing the resulting quantity with


Cluster 1
respect to θ
Z X
2/ The Expectation-Maximization (EM) Algorithm
EM algorithm applies to a Gaussian mixture model with two clusters.

1) Use current estimate of θ to estimate Z in the form of P(z|x,θ)


2) Use estimate of Z to update θ.
Conclusion
● Probabilistic models are powerful tools to model the complexity of human
cognition.
● They allow us to develop intuitive models of human rationality.
d’

Differences in d’

Weak Signal Strong Signal

Discriminability: The degree to


which the signals associated with Target Absent Target Present
target being present or absent are
separated using a particular
diagnostic procedure
Weak Signal d’ Strong Signal

Weak Signal Strong Signal

You might also like