Document
Document
Perception is how our body and mind work together to understand the world around us.
Two major debates have shaped our understanding of perception:
Empiricism: The mind starts as a blank slate, and experiences shape perception.
Modern View: Both are true. Babies are born with working sensory systems (nativism), but
learning through experience (empiricism) is essential for normal perception.
Direct: Some perception happens instantly from environmental cues (e.g., motion and
depth).
Constructivist: The brain "builds" perception step by step using sensory inputs (e.g., Marr’s
theory).
Gestalt Psychology
Gestalt psychology explains how the brain organizes sensory input into meaningful
patterns.
Key Principles:
Law of Pragnanz: We see objects in the simplest, most organized way (e.g., an incomplete
oval is perceived as a circle).
Closure: The brain fills in gaps to complete shapes (e.g., a half-drawn square is seen as a
full square).
Similarity: Similar-looking objects are grouped together (e.g., same color or shape).
Ambiguity (Bistable Perception): Stimuli can be interpreted in two ways, and the brain
alternates between them (e.g., drawings with dual interpretations).
Illusions: Misinterpretations occur when sensory signals are confusing (e.g., pilots
misjudging positions in darkness).
Conclusion
Modern research shows perception is shaped by innate brain functions and learning. Our
brains organize sensory inputs into patterns, but why they do so remains an area of study.
Pattern recognition is key for survival in animals and humans. Animals need to recognize
family members, mates, predators, and familiar places. Humans use pattern recognition to
understand how we process what we see and hear and to build machines that can do the
same.
Template Matching Theory: One theory of pattern recognition is called Template Matching.
According to this theory: We compare what we see or hear to mental images (templates)
stored in our [Link] what we’re seeing or hearing is similar enough to a template, we
recognize it.
For example, machines that process bank checks compare numbers to templates in their
memory.
Pre-processing: For the brain to recognize something, the shape and position need to be
specific. If they’re not, the brain may fail to recognize it.
Real-world Challenges: In reality, things can vary in size, orientation, or may not be fully
visible. Having a template for every possible version of something would be too hard for the
brain.
Prototype Models
Prototype models improve on template matching by being more flexible. Instead of needing
an exact match, prototypes are idealized versions of a stimulus based on past experiences.
Recognition happens when a new stimulus looks enough like one of these prototypes. If it
doesn’t match, the brain searches for another prototype, and this continues until it either
recognizes it or fails.
Posner & Keele (1970): Their experiments showed that people categorize patterns based on
how much they resemble an unseen prototype.
Real-World Application
Prototype theory helps explain how we recognize familiar objects or people even when they
appear in different ways (e.g., different clothes or hairstyle). It also explains why we might
mistake someone new for someone we know when we first see them.
• Reading also depends on identifying distinctive features (line segments ) of letters. For
example, the letter “A” consists of a left diagonal, a right diagonal, and a horizontal line,
while “B” consists of a vertical line and two curves. E.J. Gibson (1969) demonstrated that
letters are easier to distinguish when they have more differentiating features. For example,
the letters “G” and “W” differ greatly, leading to quicker recognition, whereas “P” and “R”
share more features, making them harder to differentiate. O. Selfridge’s feature theory of
pattern recognition, known as “Pandemonium”, which describes how visual information is
processed in stages by different “demons” (a metaphor for specialized units):
• Image demons represent internally a literal copy of the stimulus. (e.g., a letter).
• Feature demons detect specific features (e.g., a diagonal line). In this letter case demons
• Finally, the decision demon evaluates the inputs from cognitive demons and identifies the
most likely match.
Structural Theories
Structural theories of pattern recognition build on feature theories by focusing on how the
features of an object are organized and related to each other. Unlike feature theories, which
focus just on the parts of an object, structural theories say that recognizing an object
depends on how its parts are arranged.
For example:
The letter “P” is recognized by how its line segments are connected.
A melody is recognized by how the notes are related to each other, no matter what key it’s
in.
Biederman’s theory (1987) takes this idea further for 3D objects. He suggested that all
objects can be broken down into about 35 basic 3D shapes called geons (such as cylinders
or cones). Object recognition happens by identifying these geons and how they fit together,
not by having a different template for every possible object.
Studies show that the relationship between features is crucial for recognition. For example,
if the middle parts of an object (like a cup) are removed, people can still recognize it. But if
parts at the connections (like where the handle meets the body of the cup) are removed,
recognition becomes much harder.
The theory is supported by people with visual-object agnosia, a condition where patients
can identify simple shapes but struggle to recognize complex objects. This suggests that
recognizing objects relies on understanding how the features are organized.
Analysis-by-Synthesis
This approach to pattern recognition explains how we can identify objects or patterns even
when the information from a stimulus alone isn’t enough. It suggests that recognition can
be improved by using different types of information, including context.
Normally, we recognize objects based on the sensory data we receive. This is called
bottom-up processing and it is driven by the features of the stimulus itself, such as shapes
or lines.
For example, when we see words like “pair” or objects like “a cup,” the information we get
from the stimulus (like the letters or the shape of the cup) is usually enough to recognize
what it is.
For example, the words “pair,” “pear,” and “pare” all sound the same, so if we hear one of
them without context, it’s hard to know which one it is. But if you hear the word “pair” in a
sentence about shoes, you understand it means a pair of shoes, not a fruit. Context helps
fill in the gaps.
For example, if part of a word is hidden or a sentence is unclear because of noise, the brain
uses context (top-down) to figure out the missing or unclear part of the word or sentence.
The brain combines what it sees (bottom-up) with what it knows (top-down) to understand
the full pattern.
Perceptual Consistency
This process relies on perceptual consistency, which is the brain’s ability to recognize an
object or pattern as stable, even when the sensory information changes. It involves:
Memory (top-down): Past experiences that help interpret what we see or hear.
Biederman’s theory highlights that the way objects are organized and the context they are
presented in can affect how easily we recognize them. The structural arrangement of
objects (like geons) and the context they are placed in are important for recognition.
Objects are easier to recognize when they are shown as part of a group or configuration
rather than by themselves. Even if the objects in the group are more complex, they are still
easier to recognize when presented together.
It’s easier to recognize 3D objects than 2D ones. This suggests that higher-order
relationships (top-down processing) provide more helpful information for recognition and
synthesis.
A single letter is harder to recognize when it is alone or part of a random set of letters. But
when that letter is part of a meaningful word, top-down processing (based on language
knowledge) helps recognize it more easily.
Words in a meaningful sentence are easier to recognize than random or nonsensical words.
This again shows the importance of context (top-down processing) in making recognition
easier.
David Marr’s theory explains how we process visual information, transforming it step-by-
step into something we can recognize. His work combines biology and computational
thinking, showing how our eyes and brain work together to understand what we see. While
Marr didn’t fully explain how memory or conscious recognition fits in, his theory laid the
foundation for understanding visual processing in stages.
Marr’s theory is mainly data-driven (bottom-up), meaning it starts with the raw visual
information we receive and builds up from there. His approach falls under the
constructivist tradition, where complex patterns are created from simpler elements. It
explains how the brain takes basic sensory data (like light entering the eye) and transforms
it into meaningful recognition of objects and scenes.
Marr proposed that visual recognition happens in stages, with each stage refining the
information further:
The first stage identifies basic features of an object, like edges or boundaries.
This is done by detecting differences in light intensity between the object and its
background. For example, in a teddy bear, the edges of the bear’s shape are identified.
Full Primal Sketch:
At this stage, the brain groups the basic features into more complex shapes or patterns.
This helps us understand the object’s structure, such as its texture or how parts are
connected.
2.5-D Sketch:
This stage is based on the viewer’s perspective, helping us understand how far away the
object is and its surface details.
3-D Representation:
In the final stage, the brain creates a complete and stable understanding of the object.
In the Raw Primal Sketch stage, the brain identifies key features called primitives:
These features help the brain figure out things like orientation, contrast, and position.
Finding Edges
Marr’s theory explains how the brain finds edges even when the image is unclear. The brain:
Creates several blurred versions of the image with different sharpness levels.
Compares these versions to calculate how light intensity changes, helping it pinpoint the
edges.
Role of Context
Although Marr’s theory mainly focuses on the early stages of vision, he recognized that
memory and higher-level thinking are needed to fully recognize and name objects. This
idea was not fully developed in his theory, but it acknowledges the role of memory in later
stages of recognition.
No Pre-Set Knowledge: Unlike models that assume the brain already knows what it’s
looking for, Marr’s theory doesn’t make this assumption.
Viewer-Centered Representation: The 2.5-D sketch depends on how the viewer sees the
object, including depth and angle. Later, a stable 3-D model is formed.
In the later stages, the brain shifts to object-centered perception. This means we can
recognize the object no matter what angle or position it’s in, allowing us to identify it in
different situations.
J. J. Gibson’s Theory of Direct Perception suggests that humans don’t need higher-level
thinking, prior knowledge, or complex cognitive processes to perceive the world. Instead,
the sensory information from the environment is enough for perception. The environment
itself provides rich, useful information that helps guide perception. In this model,
perception happens directly without relying on beliefs, memories, or thought processes.
This theory is often called ecological perception because it focuses on perception in real-
world settings.
Gibson’s theory has two important ideas on how organisms perceive and react to their
environment:
nformation Postulate:
Every property of the environment has sensory signals that directly match it.
For example, the texture of a surface can give us direct information about its distance or
slant.
Perception Postulate:
Perception as Active
Perception isn’t just about passively observing the world. It involves actively gathering
information from the environment. Observers interact with their surroundings to collect
data from the optic array, which is the pattern of light coming from the environment. This
array provides detailed information about surfaces, textures, and the layout of the
environment.
Control of Action:
Perception also guides actions. As actions happen, they change how we perceive the
environment. For example, as you walk toward stairs, your perception of them changes
with each step, which influences your next action. This creates a feedback loop between
perception and action.
Affordances
Gibson introduced the idea of affordances, which are the opportunities for action that
objects or environments offer to an organism. For example, stairs afford climbing.
Perception helps organisms recognize these affordances and act in ways that help them
survive.
Texture Gradients:
Coarse textures (like individual blades of grass) indicate that objects are close.
Fine textures (like dense grass patterns) suggest that objects are farther away.
Optic Flow:
Optic flow refers to the changes in the optic array that occur when the observer moves.
Objects that are closer to the observer seem to move faster.
Objects that are farther away move more slowly or stay still.
Spectrum of Light
Types of Rays
Light
The human eye converts light into neural signals, enabling the brain to interpret visual
information almost instantly.
Cornea:Transparent outer layer where light [Link] light onto the retina.
Iris:Colored part of the eye, controlling the amount of light [Link] pupil size
based on light intensity.
In Bright Light: Pupil constricts for sharper vision and better focus.
In Low Light: Pupil dilates to allow more light, sacrificing detail for sensitivity.
Functions:
Retina
A thin layer of neural tissue at the back of the eye that converts light into neural signals.
Rods:
Cones:
Detect colors.
Horizontal Cells:
Bipolar Cells:
Inside-Out Structure: Light passes through several retinal layers before reaching the
photoreceptors.
Blind Spot
The optic disk, where the optic nerve exits the eye, has no photoreceptors, causing a blind
spot.
The brain fills in this gap using nearby visual information (process called completion).
Fovea
Thin retinal layers at the fovea reduce light distortion, enhancing clarity.
The retina works with two systems to adapt to different lighting conditions:
Location: Rods are abundant outside the central retina, especially in the nasal hemiretina.
Differences in Wiring and Sensitivity
Effect:
Effect:
Color perception.
This division allows the retina to adapt and provide vision in various lighting conditions
effectively.
These cues help us judge depth and distance using just one eye:
1. Texture Gradients
Closer: Textures with bigger, more spaced-out patterns appear near because their details
are easy to see.
Farther: Textures with smaller, denser patterns seem far away as details blend together.
2. Relative Size
Farther: Smaller objects (even if the same actual size) are seen as farther away.
3. Interposition (Overlap)
Closer: Objects that block or partially cover others are perceived as closer.
4. Linear Perspective
Closer: Parallel lines, like train tracks, appear to spread apart near the viewer.
Farther: These lines seem to come together as they go toward a vanishing point or horizon.
5. Aerial Perspective
Closer: Closer objects look clearer and sharper because there are fewer particles (like dust
or moisture) blocking the view.
Closer: Objects higher above or lower below the horizon line look closer.
7. Motion Parallax
Closer: When moving, close objects seem to pass by quickly in the opposite direction.
These cues are essential for perceiving depth and distance in everyday life.
1. Binocular Convergence
Closer: When an object is near, the eyes turn inward (converge) to focus on it.
The strain felt in the eye muscles gives a sense of how close the object is.
Farther: For far-away objects, the eyes stay more relaxed and look straight ahead.
2. Binocular Disparity
Closer: The difference between the images seen by each eye (disparity) is larger when the
object is close.
Farther: The disparity is smaller for objects farther away, as the images in both eyes are
more similar.
These cues work together to give us a clear perception of depth and distance.
Viewer-Centered Representation
This theory explains how we recognize objects based on how they appear from our specific
perspective. It focuses on the position, angle, and orientation of the object as seen by the
viewer.
Key Points:
The way the object looks from our angle matters for recognition.
Mental Adjustment
When we see the object again, we might mentally rotate or adjust its image to match our
stored version.
Example:
If you’re looking at a computer, you might store an image of the screen at a certain angle,
the keyboard directly in front of you, and the mouse on the right.
Recognition Process
Recognition happens when we match the object we see to the stored image.
If the object is at a different angle, our brain adjusts the view to align with the stored one.
Supporting Evidence
Studies (like Davies-Thompson, 2009) show that neurons respond differently to various
views of the same object.
This suggests that our brain uses viewer-centered representations and adjusts based on
the viewpoint rather than relying on a fixed, unchanging image of the object.
Object-Centered Representation
This theory explains that we recognize objects by focusing on their actual properties, like
shape and structure, rather than how they appear from our viewpoint. The object’s features
are stored in a way that remains constant, no matter the angle we view it from.
Key Points:
It focuses on the object’s fixed features, like its shape, structure, and how its parts are
arranged.
The object’s main axes (major and minor lines running through its shape) are identified and
used as a guide.
Example:
A computer’s parts—screen, keyboard, and mouse—are stored with fixed positions relative
to each other, no matter where the viewer is standing.
Supporting Evidence
Research (like Hayward, 2012) shows that neurons respond differently depending on how
an object is viewed.
This suggests that object-centered representation is less common because it doesn’t align
well with how the brain naturally processes different perspectives.
Landmark-Centered Representation
This theory explains how we recognize and organize objects or locations based on a
familiar reference point, called a landmark. It is especially helpful for navigating new
environments or spaces.
Key Points:
How It Works
Example:
A traveler in a new city might remember locations by thinking, “The café is to the left of my
hotel.”
This approach is useful for spatial navigation and exploring unfamiliar areas.
Switching Strategies
Example:
Supporting Evidence
Research (Committeri et al., 2004) shows that the brain activates different areas depending
on which strategy is used.
This flexibility helps the brain choose the most efficient method for each situation.
Humans use two main systems to recognize patterns, according to Farah (1992, 1995).
How It Works:
Focuses on recognizing individual parts of an object and putting them together to form the
whole.
Examples:
2. Configurational System
How It Works:
Examples:
Recognizing a friend by their whole face, even if small changes (like glasses or hairstyle) are
present.
The fusiform gyrus in the temporal lobe plays a key role in recognizing faces.
People are better at recognizing whole faces compared to individual facial features.
In contrast, for objects like houses, recognition works equally well whether viewed as a
whole or in parts.
This shows that face recognition relies on a specialized process that is different from how
we recognize other objects.
The brain processes visual information through two main pathways, which help determine
“what” we see and “where” it is.
Function:
Determines where objects are in space and how they are moving.
Path:
Function:
Path:
Importance:
Lesions in Monkeys:
What-How Hypothesis
An alternative theory suggests these pathways explain both “what” an object is and “how”
to interact with it.
Key Points:
Supporting Evidence
Agnosia are perceptual problems caused by damage to parts of the brain near the temporal
and occipital lobes or by restricted oxygen flow, often due to traumatic brain injury. People
with agnosia have normal vision but struggle to identify objects.
Visual-Object Agnosia
What Happens:
People can see parts of an object but cannot recognize the whole.
Example:
Someone may describe the parts of eyeglasses (e.g., circles, crossbars) but mistake them
for a bicycle because of similar shapes.
Simultagnosia
What Happens:
Example:
If shown multiple items, they might only notice one and miss the rest.
Prosopagnosia
What Happens:
Often linked to damage in the right temporal lobe, especially the fusiform gyrus.
Long-Lasting Effects:
Example:
Cases caused by brain injuries, like carbon monoxide poisoning, have lasted for decades.
Agnosia highlight the brain’s reliance on specific areas for recognizing and interpreting
objects and faces.
Optic ataxia is a condition where a person struggles to use vision to guide their
movements. This happens due to damage in the posterior parietal cortex, a part of the
brain that helps process visual information for actions.
Difficulty reaching for or interacting with objects, even in bright or clear environments.
Example: Most people can easily see and reach for a keyhole to insert a key, but someone
with optic ataxia would find this task very challenging.
The “how” pathway (also called the dorsal stream) helps us perform immediate actions,
like reaching for an object.
When an action is delayed, the ventral stream (the “what” pathway) and other brain areas
become involved.
The brain processes “what” we see and “how” we interact with it differently.
Optic ataxia shows how important the dorsal stream is for guiding movements and how its
impairment can affect daily tasks.
This condition highlights the complex ways our brain helps us see, understand, and act in
our surroundings.
Color Vision Deficiency (CVD) Color vision deficiencies are more common in men than in
women because many forms of color blindness are linked to the X chromosome, of which
men have only one. CVD can result from genetic factors or from damage to specific brain
areas, like the ventromedial occipital and temporal lobes. The Role of Cones in Color
Perception The retina has three types of cone cells, each sensitive to different wavelengths
of light: R S B
• L Cones: These are sensitive to long wavelengths, around 560 nm, which correspond to
the red part of the spectrum.
• M Cones: These cones are sensitive to medium wavelengths, around 530 nm, allowing us
to perceive green hues.
• S Cones: Sensitive to short wavelengths, around 420 nm, which correspond to the blue
part of the spectrum.
1. Rod Monochromacy (Achromacy): This is the rarest and only true form of color
blindness, where cones are nonfunctional. Individuals with this condition perceive the
world solely in shades of grey, relying on their rods (light-sensitive cells) instead of cones,
which are usually responsible for color vision.
2. Dichromacy: This form of color deficiency occurs when one of the three types of cone is
completely absent,leaving only two functional types. There are three types of dichromacy,
each affecting color perception differently:
• Protanopia: The most severe form of red-green color blindness, where individuals have
difficulty distinguishing red from green.
• Deuteranopia: Similar to protanopia, but the green cone is affected, causing difficulty in
distinguishing greens.
• Tritanopia: A rarer form, where individuals struggle to differentiate between blue and
green. They may also perceive yellow as light shades of red.
3. Trichromacy: Normal color vision, known as trichromacy, uses all three types of
cone cells (L, M, and S cones) functioning correctly, allowing trichromats to perceive
the full range of colors.
Anomalous Trichromacy: is a form of color vision deficiency where all three cone
cell types are present but one functions improperly, resulting in partial color
blindness. People with this condition are called anomalous trichromats and
experience varying degrees of color perception issues depending on the affected
cone type:
• Protanomaly: Reduced sensitivity to red light.
• Deuteranomaly: Reduced sensitivity to green light (most common).
• Tritanomaly: Reduced sensitivity to blue light (extremely rare).
Anomalous trichromats often struggle to distinguish between certain colors. For
example, protanomaly and deuteranomaly types (collectively known as red-green
color blindness) may have difficulty differentiating reds, greens, browns, and
oranges, as well as blue and purple hues. Tritanomaly causes confusion between
blue and yellow, violet and red, and blue and green.
The severity of anomalous trichromacy can range from almost normal color
perception to nearly complete difficulty distinguishing affected colors. In well-lit
conditions, some may perceive colors better, while others, especially those with
severe forms, may have color vision similar to those with dichromacy. This condition
can be inherited (remaining stable over time) or acquired (which may change in
severity).
Colour deficiency:
Red-Green Deficiency: This is the most common form of color deficiency, primarily due to
issues with the L-cones (long-wavelength, red-sensitive cones) or M-cones
(mediumwavelength, green-sensitive cones).
• Protanopia: Complete inability to perceive red light due to the absence or malfunction of
L-cones. This makes reds appear darker, and there’s confusion between reds and greens.
• Protanomaly: Reduced sensitivity to red light because L-cones are not fully functional.
Reds appear less vibrant, blending with greens and browns.
• Tritanopia: Complete inability to perceive blue light due to the absence or malfunction of
S-cones. This causes confusion between blues and greens, and yellow appears as a light
shade of red.
• Tritanomaly: Reduced sensitivity to blue light due to partially functional S-cones. This
results in difficulty distinguishing between blue and yellow.
• Blue Cone Monochromacy: Only S-cones are functional, while L- and M-cones are
nonfunctional. This rare condition causes severe color deficiency and poor visual acuity,
with vision limited to shades based on blue light.
Haptic perception
Humans are often said to have five senses: vision, hearing, smell, taste, and touch.
However, we can sense much more than these. For example, we can tell if we’re standing
upright or leaning, know where our arms and legs are even with our eyes closed, and feel
heat without touching something. This shows that we have more than just five senses.
According to Durie (2005), humans may actually have at least 21 different senses. These
include external senses, like hearing and smell, which help us understand the world
around us, and internal senses, like pain, thirst, and hunger, which help us know what’s
happening inside our body. Together, these senses give us a detailed understanding of both
our body and our environment.
Proprioception
Proprioception is the ability to sense where your body is in space and how it is positioned.
It helps you maintain balance and know the location of your body parts without needing to
see them. This sense relies on proprioceptors, which are special nerve receptors that track
the angles of your joints.
Kinesthesis
Kinesthesis is the sense of body movement. It is crucial for tasks like hand-eye
coordination and helps you perceive how your body is moving in relation to the world. While
it overlaps with proprioception, kinesthesis focuses more on movement rather than
position.
Haptic Information
Haptic information combines touch, proprioception, and kinesthesis to explore the world.
For example, when you feel an object: Touch receptors sense contact with the object.
Proprioception helps you know the position of your fingers and hands. Kinesthesis helps
you sense their movement as you explore the object. Together, these senses work to give
you a complete understanding of objects and the environment.
We often use vision to recognize objects, but haptic information (touch) is also very
effective for this.
Research shows that the brain processes both visual and haptic information in similar
ways. It uses the same mental processes to identify and categorize objects, whether we
see them or feel them (Gaissert & Wallraven, 2012).
Vision combines input from both eyes, but haptic processing doesn’t always treat
information from both hands equally.
Instead, it often relies more on the dominant hand, which is better at motor tasks.
Exploratory procedures are specific ways people use their hands to gather information
about objects.
For example:
Klatzky et al. (1987) found that people use these procedures consistently when exploring
objects.
This idea connects to Gibson’s concept of active perception, which suggests that we
actively interact with our surroundings to gather haptic information, rather than passively
receiving it.
This is when information comes directly from touch receptors, such as feeling an object’s
texture or temperature.
Top-down processing:
This is when we use prior knowledge or experience to guess what an object might be, like
identifying it while blindfolded.
Haptic feedback is being used to improve driver safety by delivering alerts through the
sense of touch. Unlike visual or auditory signals, which can be distracting or affected by
noise, haptic feedback keeps the driver’s focus on the road. Research has explored several
haptic methods:
A steering wheel that vibrates when a driver drifts out of their lane can help improve lane-
keeping.
Kozak et al. (2006) showed that vibrating steering wheels work better than visual or sound
warnings for drowsy drivers.
Responsive Pedals
An accelerator pedal that pushes back on the driver’s foot can warn of potential collisions.
Vibrating seat belts can reduce the time it takes to react to hazards.
Scott and Gray (2008) showed this helps drivers brake faster to avoid collisions.
For example, vibrations at the front of the seat can warn of a collision ahead.
Fitch et al. (2011) found this approach provides clear and easy-to-understand signals when
properly designed.
Comparative Research
Chang et al. (2011) compared haptic seat alerts to visual and auditory signals.
They found haptic feedback to be the most effective, but drivers needed time to get used to
it.
These methods show how haptic technology can make driving safer by providing clear, non-
distracting warnings.
Speech Perception
Speech perception might seem simple, but it’s actually a complex process. Here’s how it
works:
The auditory language system picks up sound vibrations from speech and turns them into
recognizable language.
When adults speak English, they produce about 15 sounds per second, which means
listeners process around 900 sounds per minute.
Word Boundaries
Listeners can figure out where one word ends and another begins, even if there’s no pause
or silence between them.
The way sounds (phonemes) are pronounced can differ a lot depending on the speaker.
Context allows listeners to guess missing sounds or words when they can’t hear them
clearly.
Visual Cues
Looking at the speaker’s mouth can help us understand unclear or tricky sounds.
These characteristics show how our brains work hard to understand speech, even when it’s
not perfect.
Word Boundaries
When we hear someone speak, it might seem like the words are clearly separated, but in
reality, spoken language often sounds like a continuous stream. Here’s how it works:
Continuous Speech
In an unfamiliar language, speech can sound like one long sentence with no breaks
between words.
This happens because, in most languages, there are no natural pauses to separate words
when people talk.
English Example
However, when people speak, actual pauses marking word boundaries occur less than
40% of the time.
Our brain uses top-down processing to figure out where words begin and end.
It quickly tries different ways to divide the stream of sounds into words.
Most of the time, our brain does this correctly without us even noticing.
This ability helps us make sense of speech, even when it feels like the words are all
jumbled together.
Phonemes are the smallest sounds in a language, like the “b” in “bat” or the “s” in “sit.”
Recognizing these sounds isn’t as simple as it seems because their pronunciation can vary
a lot.
1. Speaker Differences
Everyone has a unique voice, with differences in pitch, tone, and speaking speed.
These differences can make phonemes sound slightly different from one person to another.
2. Sloppy Pronunciation
Sounds might be shortened, blended, or less precise, making it harder to identify them.
3. Co articulation
When speaking, phonemes are affected by the sounds before and after them.
For example, the “d” in “idle” sounds slightly different from the “d” in “don’t” because of the
surrounding sounds.
Even with all this variation, our brain does an amazing job of helping us understand speech!
When we listen to speech, our brains don’t just passively hear the sounds; we actively use
context to help us understand words, especially when some sounds are unclear.
Phonemic Restoration
Our brain can “fill in” the missing sound based on the meaning of the sentence and guess
what was said. This is called phonemic restoration.
Classic Experiment
In a study by Warren and Warren (1970), people listened to sentences where a sound was
replaced by a cough. Here are some examples:
Based on the context of the sentence, listeners “filled in” the missing sound:
Even though the actual sound was missing, their brains used the context to fill in the
correct one.
Illusion of Hearing
People think they heard a sound that wasn’t actually there because their brain used context
to “restore” the missing sound.
This shows how our brain uses top-down processing (using prior knowledge and context) to
help us understand speech, even when parts are unclear.
Watching a speaker’s lips and face helps us understand what they’re saying, especially in
noisy places or when the audio is poor (like on a bad phone call).
Seeing the speaker’s mouth gives us clues about the sounds they are making.
McGurk Effect
The McGurk Effect shows how visual and auditory information can mix during speech
perception.
In a famous study, researchers played a video of a woman’s lips moving as if saying “gag,”
but the audio said “bab.”
People reported hearing something like “dad” instead of either “gag” or “bab.”
This happens because the brain tries to combine both what we see (the lip movements)
and what we hear to make sense of the speech.
The McGurk Effect shows that we use both sight (lip movements) and sound to understand
speech.
Usually, a speaker’s lip movements match the sounds they make, which helps us correctly
understand what they’re saying.
This theory suggests that humans are born with a special part of the brain designed just for
understanding speech sounds. Supporters believe this mechanism helps us quickly and
accurately recognize speech, setting it apart from other sounds like music or random
noises.
The theory proposes that humans have a “phonetic module,” a special neural mechanism
for understanding speech.
This module:
Breaks down the continuous flow of speech into recognizable phonemes (the smallest
units of sound) and words.
It helps us know where one word ends and another begins (imposing word boundaries).
Example
This module lets us understand speech even when it’s a little unclear or distorted.
Categorical Perception
Early researchers believed humans could hear speech sounds as clear categories, like a
distinct “b” or “p,” instead of hearing a mix of both.
In experiments, people heard sounds that were a blend of “b” and “p” but still clearly heard
one or the other, not a mix.
This ability to categorize sounds was thought to be unique to speech, supporting the idea
of a special speech mechanism.
Counter Evidence
Later studies found that humans also show categorical perception for some non-speech
sounds, like musical tones.
This challenges the idea that only a special speech mechanism can explain this ability.
Modularity Argument
Supporters believe the phonetic module works separately from other brain functions like
recognizing objects, memory, or problem-solving.
However, some psychologists argue that these cognitive functions are interconnected and
not separate.
In short, the Special Mechanism Approach suggests a unique brain system for speech, but
there’s evidence that some of these processes can apply to non-speech sounds too.
The general mechanism approach argues that speech perception doesn’t depend on a
special, inborn system. Instead, it suggests that humans use the same brain processes for
understanding speech as they do for other sounds and experiences. This view sees speech
perception as a learned skill rather than a unique biological ability.
Research shows that people can categorize certain non-speech sounds (like musical
tones) the same way they categorize speech sounds.
Studies measuring ERPs (brain responses to stimuli) show that the brain processes speech
sounds and other sounds (like music) in similar ways.
This supports the idea that the same brain processes are used for all types of sounds, not
just speech.
This shows that speech perception relies on multiple senses (sight and sound), challenging
the idea of a speech-specific brain system.
In summary, the general mechanism approach argues that speech perception is just a
learned skill that uses the same brain processes for all sounds, not a special ability.
• General mechanism theories propose that speech perception unfolds in stages, similar
to other types of perception:
• Learning: Applying past knowledge and experiences to recognize familiar sounds and
words.
• Decision Making: Deciding on the most likely interpretation of sounds based on context
and knowledge.
Auditory Perception
Auditory perception is how our brain understands sound waves we hear through our ears.
This process includes detecting, interpreting, and locating sounds to understand our
surroundings.
These waves cause vibrations in the eardrum, which are then turned into electrical signals
by the auditory system.
The eardrum’s vibrations send signals through the auditory nerve to the brain, where the
sound is processed.
Loudness: The amplitude (size) of sound waves affects how loud a sound is. Bigger
amplitude means louder sound.
Timbre
Timbre is the quality or color of sound that lets us tell different sources apart, even if they
have the same pitch and loudness.
For example, a piano and a violin playing the same note at the same volume will sound
different because they have different timbres.
Timbre also helps us perceive where a sound is coming from in space, creating the “sound
stage” or the location of sounds around us.
Spatial Localization
Spatial localization is the ability to figure out where a sound is coming from.
Interaural Time Difference (ITD): Sounds that come from one side reach the closer ear first,
helping us locate sounds on the horizontal plane.
Interaural Level Difference (ILD): The sound is louder in the ear closer to the sound. This
helps with locating sounds on the left-right axis.
Head-Related Transfer Function (HRTF): The shape of your head and ears affects how you
hear sounds from different directions. This helps with locating sounds in the vertical (up-
down) and front-back directions.
Reverberation and Echoes: Sounds reflect off surfaces, giving clues about how far away a
sound source is and the size of the space. These echoes help the brain estimate distances
and interpret the acoustic space.
ASA is how the brain organizes and separates sounds in an environment with many
sources.
For example, it helps us tell apart voices and instruments even when they overlap.
ASA uses rules based on Gestalt principles (like similarity and continuity) to group related
sounds together, so we can hear each sound source as a separate auditory stream.
This allows us to focus on one sound, like a conversation in a noisy room, while filtering out
other background sounds. This requires auditory attention.
In short, auditory perception helps us detect, identify, and locate sounds, and our brain
uses several methods to organize and make sense of sounds in our environment.
Auditory Attention
Auditory attention is the ability to focus on certain sounds while ignoring others. It’s
important for communication and being aware of our surroundings.
Selective Attention
This is the ability to focus on one sound, like listening to a conversation, while blocking out
background noise.
Divided Attention
This is the ability to listen to multiple sounds at the same time, like hearing a podcast while
also noticing traffic sounds.
Sustained Attention
This refers to maintaining focus on a specific sound or sounds for a longer time, like
listening for a specific signal or announcement.
Sound Salience
This is how noticeable a sound is, based on its loudness, frequency, or how unusual it is.
Goal-Driven Attention
For example, when we need to hear an important announcement, we’re more likely to
ignore other background noises.
In summary, auditory attention helps us focus on important sounds and ignore irrelevant
ones, influenced by factors like the sound’s prominence and our goals.
Auditory Localization
Auditory localization is the brain’s ability to figure out where a sound is coming from in
three-dimensional space. It is described in three ways:
The brain uses differences in timing and loudness between sounds reaching each ear to
determine this.
For example, higher-frequency sounds are blocked more by the head, creating an “acoustic
shadow,” while lower-frequency sounds are less affected.
The shape of the outer ear (pinna) helps the brain figure out the height of a sound by
changing how frequencies sound.
Sounds that reflect off the ears or body (like the shoulder) also provide clues about the
vertical location of the sound.
Distance:
Sound Level: The louder the sound, the closer it is. As the sound travels, it gets quieter.
Motion Parallax: When we move, nearby sounds change position faster than distant
sounds.
Reflection: Sounds that bounce off surfaces are a sign the sound source is farther away.
These sounds may sound more muffled and have a different quality (timbre) because of the
distance.
In short, the brain uses different cues (timing, loudness, frequency, reflections, etc.) to
figure out where a sound is coming from and how far away it is.
Brain Pathways in Sound Processing
It goes from the auditory cortex to the parietal lobes and is responsible for localizing
sounds in space.
It helps identify the source or nature of the sound, like whether it’s speech, music, or a
noise.
Multisensory Integration
When what we see matches what we hear (like watching someone’s lips move while they
speak), it helps us localize sound more accurately.
This is particularly true for both the azimuth (left-right) and elevation (up-down) of the
sound.
In short, the brain has different pathways to figure out where a sound is coming from and
what it is, and visual cues can help improve our ability to localize sounds.