0% found this document useful (0 votes)
43 views19 pages

Auditory Localization A Comprehensive Practical Re

This review provides a comprehensive overview of auditory localization, detailing the mechanisms and acoustic cues that enable humans to perceive the spatial location of sound sources. It discusses both monaural and binaural cues, such as Interaural Time Difference and Interaural Level Difference, along with factors influencing localization abilities, including age and environmental characteristics. Additionally, the paper offers practical insights for experimental research in auditory perception and highlights the implications for emerging technologies like virtual reality.

Uploaded by

Dhanoush Mşđ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views19 pages

Auditory Localization A Comprehensive Practical Re

This review provides a comprehensive overview of auditory localization, detailing the mechanisms and acoustic cues that enable humans to perceive the spatial location of sound sources. It discusses both monaural and binaural cues, such as Interaural Time Difference and Interaural Level Difference, along with factors influencing localization abilities, including age and environmental characteristics. Additionally, the paper offers practical insights for experimental research in auditory perception and highlights the implications for emerging technologies like virtual reality.

Uploaded by

Dhanoush Mşđ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

TYPE Review

PUBLISHED 10 July 2024


DOI 10.3389/fpsyg.2024.1408073

Auditory localization: a
OPEN ACCESS comprehensive practical review
EDITED BY
Victoria M. Bajo Lorenzana,
University of Oxford, United Kingdom
Alessandro Carlini *, Camille Bordeau and Maxime Ambard
REVIEWED BY Laboratory for Research on Learning and Development (LEAD), CNRS UMR, Université de Bourgogne,
Michael Pecka, Dijon, France
Ludwig Maximilian University of Munich,
Germany
Patrick Bruns, Auditory localization is a fundamental ability that allows to perceive the spatial
University of Hamburg, Germany location of a sound source in the environment. The present work aims to
*CORRESPONDENCE provide a comprehensive overview of the mechanisms and acoustic cues used
Alessandro Carlini
by the human perceptual system to achieve such accurate auditory localization.
[email protected]
Acoustic cues are derived from the physical properties of sound waves, and
RECEIVED 01 April 2024
ACCEPTED 17 June 2024
many factors allow and influence auditory localization abilities. This review
PUBLISHED 10 July 2024 presents the monaural and binaural perceptual mechanisms involved in auditory
CITATION
localization in the three dimensions. Besides the main mechanisms of Interaural
Carlini A, Bordeau C and Ambard M (2024) Time Difference, Interaural Level Difference and Head Related Transfer
Auditory localization: a comprehensive Function, secondary important elements such as reverberation and motion,
practical review.
Front. Psychol. 15:1408073. are also analyzed. For each mechanism, the perceptual limits of localization
doi: 10.3389/fpsyg.2024.1408073 abilities are presented. A section is specifically devoted to reference systems in
COPYRIGHT space, and to the pointing methods used in experimental research. Finally, some
© 2024 Carlini, Bordeau and Ambard. This is cases of misperception and auditory illusion are described. More than a simple
an open-access article distributed under the
description of the perceptual mechanisms underlying localization, this paper
terms of the Creative Commons Attribution
License (CC BY). The use, distribution or is intended to provide also practical information available for experiments and
reproduction in other forums is permitted, work in the auditory field.
provided the original author(s) and the
copyright owner(s) are credited and that the
original publication in this journal is cited, in KEYWORDS
accordance with accepted academic
acoustics, auditory localization, ITD, ILD, HRTF, action perception coupling
practice. No use, distribution or reproduction
is permitted which does not comply with
these terms.

1 Introduction
A bird singing in the distance, a friend calling us, a car approaching quickly… our auditory
system constantly works to transmit information coming from our surroundings. From
exploring our environment to identifying and locating dangers, auditory localization plays a
crucial role in our daily lives and fast and accurate auditory localization is of vital importance.
However, how does our perceptual system locate the origin of sounds so accurately? This
review aims to provide a comprehensive overview of the capabilities and mechanisms of
auditory localization in humans. The literature has so far extensively described the fundamental
mechanisms of localization, whereas recent findings add new information about the
importance of ancillary mechanisms to resolve uncertainty conditions and increase
effectiveness. This paper aims to summarize the totality of these factors. Moreover, for the sake
of completeness, we have supplemented the review with some practical insights. We enriched
the functional description with relevant information about the methods of study, measurement,
and perceptual limits.
There is growing interest in auditory localization mechanisms, as they have a great
potential for improving the spatialization of sound in emerging immersive technologies, such
as virtual reality and 3D cinema. Even more interesting and challenging is their use in sensory
augmentation or substitution devices, used to improve the lives of people with perceptual
disabilities. This work aims to provide a concise and effective explanation of the relation

Frontiers in Psychology 01 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

between the structure of the acoustic signals and the human sound defines the standard auditory equal-loudness level chart. To do this, it
source localization abilities, for both theoretical researches and uses a protocol based on free-field frontal and central loudspeaker
practical areas. Accordingly, we have omitted an examination of the playback to participants aged 18 ÷ 25 years from a variety of countries
neural correlates involved in auditory localization. We invite readers worldwide. With regard to the spectrum of frequencies audible to the
interested in this topic to refer to the specific literature. human auditory system, the standard audible range is considered to
The body of this review is divided into three sections. In the first be between 20 Hz and 20,000 Hz. However, auditory perception
part, an overview of the mechanisms involved in human 3D sound depends on many factors, and especially on the age of the listener.
localization, as well as the associated capabilities and limitations, is Performance is at its highest at the beginning of adulthood, at around
given in order to provide a holistic understanding of the field. In the 18 years of age, and declines rapidly: by 20 years of age, the upper limit
second part, we provide a more detailed explanation of the auditory may have dropped to 16 kHz (Howard and Angus, 2017). The
cues. Finally, we present other factors that influence the localization reduction is continuous and progressive, mainly affecting the upper
of sound source, such as pointing and training methods, and sound threshold. Above the age of 35~40 years, there is a significant reduction
characteristics (frequency, intensity…), alterations that can even lead in the ability to hear frequencies above 3–4 kHz (Howarth and Shone,
to illusionary phenomena. 2006; Fitzgibbons and Gordon-Salant, 2010; Dobreva et al., 2011).
Finally, the perceived frequency of sounds does not correspond
exactly to their physical frequency but instead shows systematic
2 Localizing a sound in space perceptual deviations. The best-known psychoperceptual scales that
relate sound frequency to pitch are the Mel scale (Stevens and
2.1 Auditory localization is based on Volkmann, 1940), the Bark scale (Zwicker, 1961), and the ERB scale
auditory perception (Glasberg and Moore, 1990; Moore and Glasberg, 1996).

Auditory localization naturally relies on auditory perception. Its


characteristics and limitations are primarily determined by the 2.2 Main characteristics of localization
capabilities of the human perceptual system and exhibit considerable
interindividual variability. Although the study of the auditory system Auditory localization involves several specialized and
has ancient origins, it was not until the 19th century that research complementary mechanisms. Four main types of cue are usually
started to focus on the functional characteristics of our auditory mentioned: the two binaural cues (interaural time and level
system, as well as on localization abilities. In the 20th century, the differences), monaural spectral cues due to the shading of the sound
growing knowledge of the perceptual system and the adoption of more wave by the listener body, and additional factors such as reverberation
rigorous protocols revealed the complexity of the mechanisms of or relative motion that make localization more effective – and more
acoustic localization as well as the importance of using appropriate complex to study. These mechanisms operate simultaneously or
methods of investigation (Grothe and Pecka, 2014; Yost, 2017). complementarily in order to compensate for weaknesses in any of the
Indeed, measures of auditory localization can be influenced by many individual mechanisms, resulting in high accuracy over a wide range
factors. In experimental tests, for example, participants’ responses of frequencies (Hartmann et al., 2016). In this section, we provide an
depend on the type of auditory stimuli as well as the order in which overview of the main mechanisms that are explained in more detailed
they are presented, the way the sound spreads through the in the second and third sections.
environment, the age of the listener, and the method used to collect The main binaural cues that our perceptual system uses to localize
responses (Stevens, 1958; Wickens, 1991; Reinhardt-Rutland, 1995; sound sources more precisely are the Interaural Time Difference
Heinz et al., 2001; Gelfand, 2017). Results of experimental research (ITD) and the Interaural Level Difference (ILD) (mechanisms based
also highlight a high level of inter-subject variability that affects on differences in time and intensity, respectively, as described below).
responses and performance in various experimental tests At the beginning of the last century, Lord Rayleigh proposed the
(Middlebrooks, 1999a; Mauermann et al., 2004; Röhl and existence of two different mechanisms, one operating at low
Uppenkamp, 2012). frequencies and the other at high frequencies. This is known as the
The cues used by the auditory system for localization are mainly Duplex Theory of binaural hearing. Stevens and Newman found that
based on the timing, intensity, and frequency of the perceived sound. localization performances are best for frequencies below about 1.5 kHz
The perceptual limits of these three quantities naturally play an and above about 5 kHz (Rayleigh, 1907; Stevens and Newman, 1936).
important role in acoustic localization. Regarding the perception of The smallest still perceivable interaural difference between our ears is
sound intensity, the threshold varies with frequency. Given a sound of about 10 μs for ITD, and about 1 dB for ILD (Mills, 1958; Brughera
1 kHz, the minimum pressure difference that the human hearing et al., 2013; Gelfand, 2017).
system can detect is approximately 20 μPa, corresponding by definition The localization of a sound source in space is characterized by a
to the intensity level of 0 dB SPL (Howard and Angus, 2017). The certain amount of uncertainty and bias, which result in estimation
perceived intensity of a sound does not correspond to the physical errors that can be measured as constant error (accuracy) and random
intensity of the pressure wave, and the perceptual bias varies error (precision). The type and magnitude of estimation errors depend
depending on the frequency of the sound (Laird et al., 1932; Stevens, on the properties of the emitted sound, the characteristics of the
1955). Fletcher and Munson, and Robinson and Dadson successively, surroundings, the specific localization task, and the listener’s abilities
carried out the best-known studies concerning the correspondence (Letowski and Letowski, 2011). Bruns and colleagues investigated two
between physical and perceived sound intensity (Fletcher and methods (error-based and regression-based) for calculating accuracy
Munson, 1933; Robinson and Dadson, 1956). Today, ISO 226:2003 and precision. The authors pointed out that accuracy and precision

Frontiers in Psychology 02 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

measures, while theoretically distinct in the two paradigms, can frontal-central position equal to 1 ÷ 2 degrees in azimuth. In a more
be strongly correlated in experimental datasets (Bruns et al., 2024). recent study, Aggius-Vella and colleagues found slightly larger values:
Garcia and colleagues proposed a comparative localization study, they reported an MAA threshold in azimuth of 3° in frontal position,
comparing performance before and after training. Their results show and 5° in rear position. The above values refer to sources positioned
that both constant errors and variability in auditory localization tend at ear level. When they moved the sound source to foot level, Aggius-
to increase when auditory uncertainty increases. Moreover, such Vella and colleagues found an MAA threshold of 3° in both front and
biases can be reduced through training with visual feedback (Garcia rear positions. In a more recent study, Aggius-Vella and colleagues
et al., 2017). placed the sound source 1 m above the floor and found an MAA
As we will see below, sound source localization is more accurate threshold of 6° in the front position and 7° in the rear position. It is
in the horizontal plane (azimuth) than in the vertical plane (elevation). important to note that the works of Mills and Aggius-Vella used two
Localization performances in the third dimension (distance) are less different protocols: while Mills used audio headphones to play the
accurate than for either azimuth or elevation and are subject to sound, Aggius-Vella and colleagues used a set of aligned loudspeakers
considerable inter-subject variability (Letowski and Letowski, 2011). (Mills, 1958, 1960; Mills and Tobias, 1972; Aggius-Vella et al., 2018,
Moreover, these abilities change with the age of the listener. Dobreva 2020). Similarly, a discrimination paradigm known as MADD
and colleagues found that young subjects systematically overestimate (Minimal Audible Distance Discrimination) is used for the distance
(overshoot) horizontal position and systematically underestimate dimension. Using a MADD-type paradigm, Aggius-Vella et al. (2022)
vertical position. Moreover, the magnitude of the effect varies with the reported better distance discrimination abilities in the front space
sound frequency. In middle-aged subjects, these authors found a (19 cm) than in the rear space (21 cm). They found a comparable effect
pronounced reduction in the precision of horizontal localization for of the spatial region using a distance bisection paradigm, which
narrow-band targets in the range 1,250 ÷ 1,575 Hz. Finally, in elderly revealed a lower threshold (15 cm) in the front space than in the rear
subjects, they found a generalized reduction in localization space (20 cm) (Aggius-Vella et al., 2022). It is also relevant to note that
performance in terms of both accuracy and precision (Dobreva et al., some authors have criticized the MAA paradigm, claiming that the
2011). Otte and colleagues also performed a comparative study of experimental protocol enables responses to be produced based on
localization abilities, testing three different age groups ranging from 7 criteria other than relative discrimination through the use of
to 80 years. Their results are somewhat more positive, especially for identification strategies (Hartmann and Rakerd, 1989a).
the older age group: localization ability remains fully effective, even in A second, and important, discrimination paradigm is the CMAA
the early phase of hearing loss. Interestingly, they also found that older (Concurrent Minimum Audible Angle), which measures the ability to
adults with big ears had significantly better elevation localization discriminate between two simultaneous stimuli. In the frontal
abilities. This advantage does not appear in azimuth localization. position, Perrott found a CMAA threshold of 4°÷10° (Perrott, 1984).
Young subjects, with smaller ears, require higher frequencies (above Brungart and colleagues investigated the discrimination and
11 kHz) to accurately localize the elevation of sounds (Otte localization capabilities of our auditory system when faced with
et al., 2013). multiple sources (up to 14 tonal sounds) with or without allowed head
The quantitative evaluation of human performance is based on movement. They found that although localization accuracy
two types of localization estimation: Absolute localization (a sound systematically decreased as the number of concurrent sources
source must be localized directly, usually with respect to a listener- increased, overall localization accuracy was nevertheless still above
centered reference system), and Discrimination (two sound sources chance even in an environment with 14 concurrent sound sources.
have to be distinguished in the auditory signal, either simultaneously Interestingly, when there are more than five simultaneous sound
or sequentially). sources, exploratory head movements cease to be effective in
Concerning absolute localization, in frontal position, peak improving localization accuracy (Brungart et al., 2005). Zhong and
accuracy is observed at 1÷2 degrees for localization in the horizontal Yost found that the maximum number of simultaneous separate
plane and 3÷4 degrees for localization in the vertical plane (Makous stimuli that our perceptual system can easily discriminate is
and Middlebrooks, 1990; Grothe et al., 2010; Tabry et al., 2013). approximately 3 for tonal stimuli and 4 for speech stimuli (Zhong and
Investigating the frontal half-space, Rhodes – as well as Tabry and Yost, 2017), which is in line with many studies that have shown that
colleagues more recently – found that the azimuth and the elevation localization accuracy is significantly improved when localizing
error grows linearly as the distance of the target from the central broadband sounds (Butler, 1986; Makous and Middlebrooks, 1990;
position increases. Using the head-pointing method, Tabry et al. Wightman and Kistler, 1992, 1997; Gelfand, 2017).
found that the error grows up to ~20 degrees for an azimuth of ±90°
and up to ~30 degrees for an elevation of −45° and +67° (Rhodes,
1987; Tabry et al., 2013). 2.3 Reference system and localization
Among the discrimination paradigms, the most commonly used performances
is the Minimal Audible Angle (MAA), which is defined as the smallest
angle that a listener can discriminate between two successively Each target in space is localized according to a reference system.
presented stationary sound sources. Mills developed the MAA In the case of human perception, experimental research suggests that
paradigm and studied the human ability to discriminate lateralization our brain uses different reference systems, both egocentric and
(azimuth). He showed that the MAA threshold also depends on the allocentric, and is able to switch easily between them (Graziano, 2001;
frequency of the sound and found that MAA performance is better for Wang, 2007; Galati et al., 2010). More specifically, with regard to
frequencies below 1,500 Hz, and above 2,000 Hz. The best performance auditory perception, Majdak and colleagues describe the different
is obtained in the frontal field with an MAA accuracy in the mechanisms involved in the creation of the internal representation of

Frontiers in Psychology 03 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

space (Majdak et al., 2020). Moreover, research works such as those of or depth) (Middlebrooks, 1999b; McIntyre et al., 2000; Jerath et al.,
Aggius-Vella and Viaud-Delmon also show that these mechanisms are 2015). In some cases, a cylindrical system (in which the angular
closely related to other perceptual channels, and in particular the elevation is replaced by a linear elevation parameter) (Febretti et al.,
visual and sensorimotor channels, making it possible to calibrate the 2013; Sherlock et al., 2021), or a Cartesian system (Parise et al., 2012)
reference system more accurately and improve the spatial is preferred. An alternative reference system is the Interaural-polar
representation (Viaud-Delmon and Warusfel, 2014; Aggius-Vella coordinate system, which has been described by Majdak as
et al., 2018). corresponding more closely to the human perceptual system and
With regard to the experimental protocols used in the field of consists of a lateral angle α, a polar angle β, and a linear distance r
auditory localization, almost all research works have adopted a (Majdak et al., 2020).
reference system centered on the listener, generally positioning the
origin at the midpoint of the segment joining the two ears 2.3.1 Azimuth
(Middlebrooks et al., 1989; Macpherson and Middlebrooks, 2000; In spherical coordinate systems, the Azimuth is defined as the
Letowski and Letowski, 2011). In contrast, some research has used an angle between the projection of the target position on the horizontal
allocentric reference system in which the positions of the localized plane and a reference meridian, measured from above either clockwise
sound sources have to be reported with reference to a fictional head (Rychtáriková et al., 2011; Parseihian et al., 2014) or, less commonly,
(tangible or digital) that represents that of the participant (tangible: counter-clockwise (Bronkhorst, 1995; Werner et al., 2016). The
Begault et al., 2001; Pernaux et al., 2003; Schoeffler et al., 2014) standard “zero” reference meridian is the frontal meridian (Vliegen
(digital: Gilkey et al., 1995). and Van Opstal, 2004; Risoud et al., 2020). Starting from the reference
The reference system and the pointing system used to provide the meridian, the horizontal plane is then indexed on a continuous scale
response are closely related. The use of an egocentric reference system of 360 degrees (Iwaya et al., 2003; Oberem et al., 2020) or divided into
is usually preferred because it prevents participants from making two half-spaces of 180 degrees, i.e., left and right, with the left half-
projection errors when giving responses. For example, Djelani and space having negative values (Makous and Middlebrooks, 1990; Boyer
colleagues demonstrated that the God’s Eye Localization Pointing et al., 2013; Aggius-Vella et al., 2020). Conveniently, the horizontal
(GELP) technique, that is an allocentric reference-and-response plane can be also simply divided into front and rear (or back)
system where the perceived direction of the sound is indicated by half-spaces.
pointing at a 20 cm diameter spherical model of auditory space, brings The most important cues for auditory localization in the azimuthal
about certain systematic errors as a consequence of the projection plane are the ILD and ITD. However, the effectiveness of ILD and ITD
from the participant’s head to its external representation (Djelani is subject to some limitations relating to the frequency of the sound,
et al., 2000). Similarly, head or eye pointing is preferred since it avoids and other mono-or bin-aural strategies are required to resolve
the parallax errors that frequently occur with pointing devices. In ambiguous conditions (see section ILD and ITD) (Van Wanrooij and
addition, many authors prefer to use head or gaze orientation as a Van Opstal, 2004).
pointing system, because it is considered more ecological and does not The best localization performance in the azimuthal plane is found
require training or habituation (Makous and Middlebrooks, 1990; at about 1-2 degrees, namely in the frontal area approximately at the
Populin, 2008). intersection with the sagittal plane (Makous and Middlebrooks, 1990;
The most commonly used reference system in studies on spatial Perrott and Saberi, 1990).
hearing is the bipolar spherical coordinate system (Figure 1). This
coordinate system consists of two angular dimensions, θ (azimuth or 2.3.2 Elevation
declination) and φ (elevation), and one linear dimension, d (distance In the spherical coordinate system, the Elevation (or polar angle)
is the angle between the projection of the target position on a vertical
frontal plane and a zero-elevation reference vector. This is commonly
represented by the intersection of the vertical frontal plane with the
Azimuth plane, with positive values being assigned to the upper half-
space and negative values to the lower half-space, thus obtaining a
continuous scale [−90°, +90°] (Figure 1; Middlebrooks, 1999b;
Trapeau and Schönwiesner, 2018; Rajendran and Gamper, 2019).
Occasionally, the zero-elevation reference is assigned to the Zenith
and the maximum value is assigned to the Nadir, resulting in a
measurement scale consisting only of positive values [0°, +180°]
(Oberem et al., 2020).
Elevation estimation relies primarily on monaural spectral cues,
mainly resulting from the interaction of the sound with the auricle.
These interactions cause modulations of the sound spectrum reaching
FIGURE 1
Reference system. The most commonly used reference system for the eardrum and are grouped together under the term Head-Related
locating a sound source in three-dimensional space is the polar Transfer Functions (HRTF), see HRTF section (Ahveninen et al.,
coordinate system. The reference system is centered on the listener
2014; Rajendran and Gamper, 2019). Otte et al. (2013) graphically
and divides space according to two angular coordinates (azimuth in
the horizontal plane, elevation in the vertical plane) and one linear show the variation of the sound spectrum as a function of both
coordinate (distance or depth), as shown in the figure. elevation and the various individual anatomies of the outer ear.
Auditory localization in the vertical plane has lower spatial resolution

Frontiers in Psychology 04 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

than that in the horizontal plane. The best localization performance Parseihian et al., 2014). For distant sources, the magnitude of the error
in terms of elevation is of the order of 4-5 degrees (Makous and increases with the distance (Brungart et al., 1999).
Middlebrooks, 1990).

2.3.3 Distance 3 Auditory cues for sound source


In acoustic localization, distance is simply defined as the linear localization
measure of the conjunction between the midpoint of the segment
joining the two ears and the sound source. The human auditory Sound localization is based on monaural and binaural cues.
system can use multiple acoustic cues to estimate the distance of a Monaural cues are processed individually in one ear, mostly providing
sound source. The two main strategies for estimating the distance information that is useful for vertical and antero-posterior localization.
from a sound source are both based on the acoustic intensity of the Binaural cues, by contrast, result from the comparison of sounds
sound reaching the listener. The first is based on an evaluation of the reaching the two ears, and essentially provide information about the
absolute intensity of the direct wave. The second, called the Direct-to- azimuth position of the sound source. The sections below explore
Reverberant energy Ratio “DRR,” is based on a comparison between these localization mechanisms.
the direct wave and the reverberated sound waves (Bronkhorst and
Houtgast, 1999; Zahorik, 2002; Guo et al., 2019). In addition, other
cues, such as familiarity with the source or the sound, the relative 3.1 ITD and IPD
motion between listener and source, and spectral modifications,
provide important indications for distance estimation (Little et al., Let us consider a sound coming, for instance, from the right side
1992). Prior knowledge of the sound and its spectral content plays a of the head: it reaches the right ear before the left ear. The difference
role in the ability to correctly estimate the distance (Neuhoff, 2004; in reception times between the two ears is called the Interaural Time
Demirkaplan and Haclhabiboǧlu, 2020). Some studies have suggested Difference (ITD). It constitutes the dominant cue in estimating the
that listeners may also use binaural cues to determine the distance of azimuth of sound sources at frequencies below 1,500 Hz and loses its
sound, especially if the sound source is close to the side of the listener’s effectiveness at higher frequencies. ITD is actually related to two
head. These strategies are thought to use the ITD to localize the distinct processes for measuring the asynchrony between the acoustic
azimuth and the ILD to estimate the distance. Given the limitations of signals received by the left and the right ears. The first process
the ILD, these strategies would only be effective for distances less than measures the temporal asynchrony of the onset between the two
1 meter (Bronkhorst and Houtgast, 1999; Kopčo and Shinn- sounds reaching the left and right ear or between distinctive features
Cunningham, 2011; Ronsse and Wang, 2012). Generally speaking, the that serve as a reference, such as variations. The second process
accuracy of distance estimation varies with the magnitude of the measures the phase difference between the two sound waves reaching
distance itself. Distance judgments are generally most accurate for each ear, which represents an indirect measure of the temporal
sound sources approximately 1 m from the listener. Closer distances asynchrony. We refer to this second mechanism as the Interaural
tend to be overestimated, while greater distances are generally Phase Difference (IPD). Panel B of Figure 2 represents the two
underestimated (Fontana and Rocchesso, 2008; Kearney et al., 2012; processes in graphic form.

FIGURE 2
ILD, ITD and IPD. The fundamental binaural cues for auditory localization are based on the difference in perception between the two ears in terms of
both intensity and time. A source located in front of the listener produces a sound wave that arrives at both ears identically (the direct wave arrives at
the same time and with the same intensity). By contrast, a lateral source results in a difference in signal intensity between the right and left ears,
respectively iR and iL (Δi, Panel A), and in arrival time (Δt, Panel B). (A) In the case of a lateral source, the sound stimulus arriving at the more distant ear
is less intense, due to its greater distance from the source and the shadow effect produced by the head itself. Interaural Level Difference (“ILD”) is the
perceptual mechanism that estimates the position of the source as a function of the intensity difference between the two ears. (B) The ear more
distant from the source receives the sound with a time delay. The Interaural Time Differences (“ITD”) is the perceptual mechanism for localizing the
sound source based on the time delay between the two ears. Fine variations in azimuth localization are also measured as Interaural Phase Differences
(“IPD”), based on the phase differences between the waves reaching each ear.

Frontiers in Psychology 05 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

The smallest detectable interaural time difference (i.e., the in intensity of the soundwaves reaching the two ears becomes
maximum ITD sensitivity) is in the order of 10 μs, both for noise or imperceptible. Consequently, sound frequencies higher than 4,000 Hz
complex stimuli (9 μs) (Klumpp and Eady, 1956; Mills, 1958) and for are highly attenuated and the ILD is a robust cue for azimuth
pure tones (11 μs) (Klumpp and Eady, 1956; Brughera et al., 2013). estimation, whereas for frequencies lower than 1,000 Hz, the ILD
More recently, Thavam and Dietz found a larger value with untrained becomes completely ineffective (Shaw, 1974).
listeners (18.1 μs), and a smaller value with trained listeners (6.9 μs), In a reverberant environment, as the distance from the sound
using a band-pass noise of 20–1,400 Hz at 70 dB (Thavam and Dietz, source increases, the sound waves reflect off multiple surfaces,
2019). By contrast, the largest ITD is of the order of 660–790 μs and resulting in a more complex received binaural signal. This leads to
corresponds to the case of a sound generated in front of one ear fluctuations in the Interaural Level Differences (ILDs), which have
(Middlebrooks, 1999a; Gelfand, 2017). For instance, considering the been shown to affect the externalization of sound (the perception that
spherical model of a human head with radius Rh = 8.75 cm combined the sound is located at a distance from the listener’s head) (Catic
with a sound speed Vs = 34,300 cm/s (at 20°C), we obtain an ITD et al., 2013).
threshold value = (3*Rh/Vs)*sin(90°) = 765.3 μs (Hartmann and
Macaulay, 2014).
Tests reveal that the best azimuth localization performances using 3.3 Limits of ITD and ILD
only ITD/IPD are obtained with a 1,000 Hz sound, allowing an
accuracy of 3~4 degrees (Carlile et al., 1997). Beyond this frequency, ITD and ILD appear to be two complementary mechanisms, the
ITD/IPD rapidly lose effectiveness due to the relationship between the former being optimized for low frequencies and the latter for high
wavelength of the sound and the physical distance between the frequencies. Therefore, our acoustic system exhibits the poorest
listener’s ears. Early research identified the upper threshold value at performance in terms of acoustic localization in the range between
which ITD loses its effectiveness at between 1,300 Hz and 1,500 Hz 1,500 Hz and 4,000 Hz (approximately) (Yost, 2016; Risoud et al.,
(Klumpp and Eady, 1956; Zwislocki and Feldman, 1956; Mills, 1958; 2018). However, given a spherical head shape, even a perfect
Nordmark, 1976). Most recent research has found residual efficacy for determination of the ILD or the ITD would not be sufficient to permit
some participants at 1,400 Hz and a generalized complete loss of complete and unambiguous pure tone localization. The ITD depends
efficacy at 1,450 Hz (Brughera et al., 2013; Risoud et al., 2018). on the difference between the distances from the sound source to each
Due to the cyclic nature of the sound signals, an IPD value for a of the two ears, and the ILD depends on the angle of incidence of the
given frequency can be encountered for multiple azimuth positions. sound wave relative to the axis of the ears. Thus, every point situated
In such cases, the information from the IPD becomes ambiguous and at the same distance and the same angle of incidence would
can easily lead to an incorrect azimuth estimation, especially with pure theoretically result in the same ITD and ILD. Mathematically, the
tones (Rayleigh, 1907; Bernstein and Trahiotis, 1985; Hartmann et al., solution to both systems is not a single point, but a set of points
2013). Various different azimuthal positions may appear located on a hyperbolic surface, whose axis coincides with the axis of
indistinguishable by IPD because the phase difference is equal to a the ears. This set of points, for which the difference in distance to the
multiple of the wavelength (Elpern and Naunton, 1964; Sayers, 1964; two ears is constant, is called the “cone of confusion” (Figure 3). More
Yost, 1981; Hartmann and Rakerd, 1989a). The quantity and angular information is required in order to obtain an unambiguous
values of these ambiguous directions depend on the wavelength of the localization of the sound source. Additional factors such as
sound: the higher the frequency of the sound, the greater the number reverberation, head movement, and a wider sound bandwidth greatly
of ambiguous positions generated. Consequently, the ITD/IPD reduce the uncertainty of localization. In ecological conditions with
operates more effectively at low frequencies. complex sounds, this type of uncertainty is mainly resolved by
analyzing the frequency modulation produced by the reverberation of
the sound wave at the outer ear, head and shoulders: the Head-Related
3.2 ILD Transfer Function.

When a sound source is positioned to the side of the head, one of


the ears is more exposed to it. The presence of the head produces a 3.4 Head-related transfer function (HRTF)
shadowing effect on the sound in the direction of propagation
(sometimes referred to as HSE – head-shadow effect). As a result, the Our perceptual system has evolved with a special ability to decode
sound intensity (or “level”) at the ear shadowed by the head is lower the complex structure of the sounds reaching our ears, thus enabling
than at the opposite ear (see Panel A of Figure 2). The amount of us to estimate the spatial origin of sounds. Under ecological
shadowing depends on the angle, frequency and distance of the sound conditions, each eardrum receives not only the direct sound wave of
as well as on individual anatomical features. Computing the difference each sound that reaches the listener’s ear but also a complex series of
in intensity between the two ears provides the auditory cue named sound waves reflected from the shoulders, head, and auricle (Figure 4).
Interaural Level Difference (ILD). ILD is zero for sounds originating This complex set of new waves that depend on the orientations of the
in the listener’s sagittal plane, while for lateral sound sources it head and the torso relatively to the sound source, greatly enriches the
increases approximately proportionally to the sine of the azimuth spatial information contained in and carried by the sound. These
angle (Mills, 1960). From a physical point of view, the head acts as an reflected waves are used by the auditory system to extract spatial
obstacle to sound propagation for wavelengths shorter than the head information and to infer the origin of the sound. This acoustic filtering
size. For longer wavelengths (i.e., lower frequency), however, the can be characterized by transfer functions called the Head-Related
sound wave passes relatively easily around the head and the difference Transfer Functions (HRTFs). HRTFs are considered monaural cues

Frontiers in Psychology 06 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

because the spectral distortions they produce depend solely on the Several studies have reported better HRTF localization
position of the sound source relative to the orientation of the body, the performance for sound sources positioned laterally than for sources
head, and the ear. No comparison between the signals received by positioned frontally and rearwardly. For example, Mendonça, and
both ears is required. later Oberem, found an improvement in lateral localization ranging
from a few degrees to ten degrees, depending on the test conditions
(Wightman and Kistler, 1989; Mendonça et al., 2012; Oberem et al.,
2020). However, a marked interindividual variability in localization
performances as well as in the ability and time required to adapt to
non-individualized HRTFs has also been observed (Mendonça et al.,
2012; Stitt et al., 2019). Begault and colleagues conducted a study on
the localization of speech stimuli in which they compared
individualized and non-individualized HRTFs (obtained from a
dummy head). One of the aims of the research was to assess whether
the relationship between listener and dummy head size was a predictor
of localization errors. Contrary to initial expectations, the results
showed no correlation between localization error and head size
difference (Begault et al., 2001). Another interesting and rather
unexpected result reported by both Begault et al. and Møller et al. was
that individualized HRTFs do not bring about an advantage in speech
localization accuracy compared to non-individualized HRTFs. To
explain this finding, Begault and colleagues suggest that most of the
spectral energy of speech is in a frequency range in which ITD cues
are more prominent than HRTF spectral cues (Møller et al., 1996;
Begault et al., 2001).
The way in which the sound is modified by the reflection in the
outer ear and the upper body can be recorded experimentally and
reproduced by transfer functions. The corresponding information can
be used in practice to play sounds through headphones and create the
FIGURE 3 perception that each sound is coming from a distant desired origin,
Cone of confusion. A sound emitted from any point on the dotted
thus creating a three-dimensional virtual auditory environment
line will give rise to the same ITD because the difference between
the distances to the ears is constant. This set of points forms the (Wightman and Kistler, 1989; Møller, 1992). Nowadays, HRTFs are the
“cone of confusion.” most frequent way of creating acoustic spatialization systems, strongly
driven by the demand for higher-performance entertainment systems,

FIGURE 4
HRTF. (A) Arrival of a sound wave at the outer ear and the generation of a series of secondary waves due to reflection in the auricle. (B) Each sound
wave that reaches the ear thus generates a different set of reflected waves, depending on its original orientation. Using this relationship, our auditory
system is able to reconstruct the origin of the sound by analyzing the set of waves that reach the eardrum.

Frontiers in Psychology 07 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

games, and specially augmented/virtual reality systems (Begault, 2000; sound. Although reverberation adds a great deal of complexity to
Poirier-Quinot and Katz, 2018; Geronazzo et al., 2019; Andersen et al., auditory percepts, our nervous system has developed the ability to
2021). Because everyone’s anatomy is different and ear shapes are very decode the different overlapping pieces of information. A very
individual, HRTF techniques can be divided into two main categories effective solution for localization in this context is based on the
depending on whether they use individualized or non-individualized Precedence Effect. As mentioned, when a sound is emitted from a
transforms. Although special environments and extensive calibrations given source, our auditory system first receives the direct sound wave
are needed in order to obtain individualized transforms, they do, and then, at very short time intervals, sound waves reflected from
however, permit more accurate auditory spatial perception (Pralong various surfaces in the surrounding environment. The Precedence
and Carlile, 1996; Meshram et al., 2014; Gan et al., 2017). Individualized effect is a mechanism by which our brain is able to ignore successive
HRTFs also require interpolation techniques, as HRTFs are typically reflections and correctly localize the source of sound based on the
measured at discrete locations in space (Freeland et al., 2002; Grijalva arrival of the direct sound wave. This mechanism is crucial in
et al., 2017; Acosta et al., 2020). Conversely, non-individualized HRTFs supporting localization in echogenic environments (Blauert, 1996;
are generic HRTFs, obtained on the basis of averaged or shared Hartmann, 1999; Nilsson and Schenkman, 2016).
parameters, which are then universally applied. They are easier to The literature reports conflicting results concerning the effect of
obtain, but are known to cause spatial discrepancies such as poor reverberation on localization accuracy in terms of the estimation of
externalization, elevation errors, misperception, and front-back azimuth and elevation. In a perceptual study in a reverberant room,
confusion (Wenzel et al., 1993; Begault, 2000; Berger et al., 2018). Hartmann reported a degradation of azimuth localization due to the
Various methods have been developed to generate individualized presence of reverberation (Hartmann, 1983).
HRTF based on anthropometric data: by analytically solving the Begault and colleagues, on the other hand, found a significant
interaction of the sound wave with the auricle (Zotkin et al., 2003; improvement in azimuth localization (of about 5°) in the presence of
Zhang et al., 2011; Spagnol, 2020), using photogrammetry (Mäkivirta reverberation, although for some participants the improvement in
et al., 2020), or based on deep-learning neural networks (Chun et al., accuracy was achieved only when head motion was allowed. However,
2017; Lee and Kim, 2018; Miccini and Spagnol, 2020). At the same they also found an increase in the average elevation error from 17.6°
time, several studies have also investigated the possibility of using a without reverberation to 28.7° with reverberation (Begault et al.,
training phase to improve the effectiveness of non-individualized 2001). Conversely, Guski compared the no-reverberation condition
HRTFs. Stitt and colleagues found a positive effect of training (Stitt with the reverberation condition in which the sound reverberated
et al., 2019). Mendonça and colleagues investigated whether feedback from a single surface in different orientations. His results showed an
is necessary in the training phase. Their results clearly indicate that overall increase in correct localizations with a sound-reflecting surface
simple exposure to the HRTF sounds without feedback does not on the floor, especially in terms of elevation (Guski, 1990).
produce a significant improvement in acoustic localization (Mendonça
et al., 2012). 3.5.2 Reverberation and distance estimation
Reverberation has proven to be a useful aid when estimating the
distance from a sound source. The reverberant wave train is reflected
3.5 Reverberation off surfaces, walls and objects and this causes its energy to remain
nearly constant over distance – especially indoors. Under ideal
Reverberation enriches the sound along its path with additional conditions, direct propagation in air causes the direct sound wave to
information concerning the environment, the sound itself, and its lose 6 dB of intensity for every doubling of distance. In a study
source. Under anechoic conditions, the listener estimates the direction conducted in a small auditorium, Zahorik demonstrated that the
and distance of the sound source based on its intensity and the spectral intensity of reflected waves, while being smaller than the direct wave,
content of the sound. When reverberation is present, however, it decreases by only 1 dB for each doubling of distance (Zahorik, 2002).
provides additional cues for direction and distance estimation, thereby As a result, the ratio between the direct-wave energy and the reflected-
potentially improving localization accuracy. In fact, due to wave energy (called the Direct-to-Reverberant Energy Ratio, or DRR)
reverberation, successive waves resulting from the reflection of the decreases as the distance from the source increases, and has been
sound on the surfaces and objects in the environment are added to the shown to be a useful perceptual cue for distance estimation (von
direct train of sound waves, acquiring and conveying information Békésy, 1938; Mershon and King, 1975; Bronkhorst and
about the size and the nature of these surfaces as well as their positions Houtgast, 1999).
relative to the sound source (Gardner, 1995).
A listener who can move its head is better able to utilize the 3.5.3 Reverberation and front-back confusion
beneficial effects of reverberation. However, under certain conditions, Front-back (and back-front) confusion refers to the misperception
such as in environments with high levels of reverberation or in the of a sound position, with the sound being perceived in the wrong
Franssen effect, reverberation can negatively impact localization hemifield (front or back). This perceptual confusion is particularly
accuracy (Hartmann and Rakerd, 1989b; Giguere and Abel, 1993). common when synthetic sounds are played or audio headphones are
used (i.e., in particular when non-individualized HRTFs are used)
3.5.1 Reverberation and estimation of azimuth (Begault et al., 2001; Rychtáriková et al., 2011). It is particularly critical
and elevation when bone-conduction audio headphones are used since these, by
In the presence of reverberation, the ITD and ILD must process exploiting an alternative communication channel to the inner ear,
both the direct wave and the trains of reflected waves, which may completely bypass the outer ear and its contribution to spatial
come from directions very different from the original direction of the perception (Wang et al., 2022). One way to reduce front-back

Frontiers in Psychology 08 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

confusion could be to introduce reverberations in synthesized signals. a perceptual strategy that includes movement results in a richer and
However, the experimental results are ambiguous. Some studies, such more varied percept. Although some early works reported equal or
as Begault et al. (2001), find that reverberation does not significantly poorer sound localization during head movement (Wallach, 1940;
reduce front-back confusion (Begault et al., 2001), while other studies Pollack and Rose, 1967; Simpson and Stanton, 1973), subsequent
have found that the presence of acoustic reverberation waves research has shown several benefits and has revealed the perceptual
significantly improves antero-posterior localization and reduces front- improvements permitted by perception during movement (Noble,
back confusion (Reed and Maher, 2009; Rychtáriková et al., 2011). 1981; Perrett and Noble, 1997a, b). Goosens and Van Opstal suggested
that head movements could provide richer spatial information that
3.5.4 Reverberation and sound externalization allows listeners to update the internal representation of the sound and
The presence of reverberation significantly improves the perceived the environment (Goossens and Van Opstal, 1999). Some authors
externalization of sound. Externalization refers to the perception of have also suggested that a perceptual advantage occurs only when the
sound as external to and distant from the listener. Poor externalization sound lasts long enough (Makous and Middlebrooks, 1990), with a
causes the listener to perceive sound as being diffused “inside his/her minimum duration of the order of 2 s appearing to be necessary to
head” and is a typical problem when sound is played through allow subjects to achieve the conditions required for maximum
headphones (Blauert, 1997). The three factors known to contribute the performance (Thurlow and Mergener, 1970). Iwaya and colleagues
most to effective externalization are the use of individualized HRTFs, also found that front-back confusion can only be effectively resolved
the relative motion between source and listener, and sound with longer-lasting sounds (Iwaya et al., 2003). Some studies on
reverberation. When creating artificial sound environments, the acoustic localization have taken advantage of this condition for their
addition of reverberation – thus reproducing the diffusion conditions experimental protocols: for example by using very short stimuli
found in the real environment – significantly increases the (typically ≤150 ms) to ensure that the sound ends before the subject
externalization of the sound, giving the listener a more realistic can initiate a head movement, thus making it unnecessary to restrain
experience (Zotkin et al., 2002, 2004; Reed and Maher, 2009). the participant’s head (Carlile et al., 1997; Macpherson and
Reverberation positively influences the externalization of sounds such Middlebrooks, 2000; Tabry et al., 2013; Oberem et al., 2020).
as noise and speech (Begault et al., 2001; Catic et al., 2015; Best et al., Conversely, when the sound continues throughout the entire
2020), including in the case of the hearing aids used by hearing- movement, the listener can implement a movement strategy within a
impaired people (Kates and Arehart, 2018). In some cases, the “early” closed-loop control paradigm (Otte et al., 2013).
reflections are sufficient to produce a significant effect (Begault, 1992; Although both conditions of relative motion between source
Durlach et al., 1992). and listener bring about a perceptual advantage, there is a relative
advantage in spatial processing when it is the listener who is
moving (Brimijoin and Akeroyd, 2014). The presence of motion
3.6 Action – perception coupling helps resolve or reduce ambiguities, such as front-back confusion
(Wightman and Kistler, 1999; Begault et al., 2001; Iwaya et al.,
Auditory perception in everyday life is strongly related to 2003; Brimijoin et al., 2010) and this is true even for listeners with
movement and active information-seeking. The gesture of “lending an cochlear implants (mainly through head movement) (Pastore et al.,
ear” is probably the simplest example of action in the service of 2018). The relative motion improves the perception of distance
auditory perception. Experimental research has shown that our (Loomis et al., 1990; Genzel et al., 2018), the perception of
auditory system localizes sounds more accurately in two areas: in front elevation (Perrett and Noble, 1997b), the effectiveness of HRTF
of the listener (i.e., 0° azimuth, 0° elevation) and laterally to the systems (Loomis et al., 1990), and the assessment of one’s own
listener, i.e., in front of each ear (i.e., ±90° azimuth, 0° elevation). The movement (Speigle and Loomis, 1993).
first position permits the most accurate ITD and ILD-based
localization (Makous and Middlebrooks, 1990; Brungart et al., 1999;
Tabry et al., 2013), while the second guarantees the highest accuracy 4 Elements influencing auditory
that can be obtained on the basis of the HRTF and maximizes the localization
intensity of the sound reaching the eardrum (Mendonça et al., 2012;
Oberem et al., 2020). 4.1 Sound frequency spectrum
Unlike some animal species, the human auricle does not have the
ability to move independently. As a result, listeners are obliged to Acoustic localization performance is highly dependent on the
move their heads in order to orient their ears. These movements allow frequency of the sound. Our perceptual system achieves the best
them to align the sound in a way that creates the most favorable angle localization accuracy for frequencies below 1,000 Hz and good
for perception. It should be noted that head movements are strongly localization accuracy for frequencies above 3,000 Hz. Localization
related to the orientation of the different senses mobilized, and the accuracy decreases significantly in the range between 1,000 Hz and
resulting movement strategy can be remarkably complex. In addition, 3,000 Hz. These results are a consequence of the functional
head movements are a crucial component in resolving ambiguous or characteristics of our localization processes (see ITD and ILD).
confusing localization conditions (Thurlow et al., 1967; Wightman Experimental research such as that of Yost and Zhong, who tested
and Kistler, 1997; Begault, 1999). frequencies of 250 Hz, 2,000 Hz, and 4,000 Hz, has confirmed the
The natural way for humans to hear the world is through active different localization abilities for the three frequency ranges (Yost
whole-body processes (Engel et al., 2013). Movement brings several and Zhong, 2014). ITD works best for frequencies below 1,500 Hz,
improvements to auditory localization. Compared to static perception, while ILD is most effective for frequencies above 4,000 Hz.

Frontiers in Psychology 09 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

Concerning the HRTF, Hebrank and Wright showed that sound Petersen confirmed that the relationship between intensity reduction
information within the 4,000–16,000 Hz spectrum is necessary for and distance can be assumed to be linear (Petersen, 1990).
good vertical localization. Langendijk and Bronkhorst consistently Some perceptual factors influence the accuracy with which we can
showed that the most important cues for vertical localization are in estimate the distance to a sound source. The first, of course, is related
the 6,000–11000 Hz frequency range. More precisely, Blauert found to our ability to discriminate small changes in intensity. Research has
that the presence of frequency components from about 8,000– shown that the smallest detectable change in intensity level for
10,000 Hz is critical for accurate estimation of elevation. Langendijk humans is about 0.4 dB for broadband noise, while this threshold
and Bronkhorst showed that antero-posterior localization cues occur increases to 1–2 dB for tonal sounds (this value varies with the
in the 8,000–16,000 Hz range (Blauert, 1969; Hebrank and Wright, frequency and sound level) (Riesz, 1932; Miller, 1947; Jesteadt
1974; Langendijk and Bronkhorst, 2002). et al., 1977).
The bandwidth of a sound plays an important role in acoustic It might be expected that a sound of higher intensity would always
localization: the broader the bandwidth, the better the localization be easier to localize. However, research has shown that the ability to
performance (Coleman, 1968; Yost and Zhong, 2014) under both localize sounds in the median plane deteriorates above about 50 dB
open-field and reverberant-room conditions (Hartmann, 1983). (Rakerd et al., 1998; Macpherson and Middlebrooks, 2000; Vliegen
Furthermore, the spectral content of the sound is an important cue for and Van Opstal, 2004). Performance degradation is more pronounced
estimating the distance of the sound source. This type of cue works for short sounds and affects almost only the median plane – the
under two different conditions. Over long distances, high frequencies reduction in localization accuracy on the left–right axis being much
are more attenuated than low frequencies due to propagation through less pronounced. Some studies have found that at higher levels,
the air. As a result, sounds with reduced high-frequency content are localization performance improves again as the sound intensity
perceived as being farther away (Coleman, 1968; Butler et al., 1980; increases. Marmel and colleagues tested localization ability at different
Little et al., 1992). However, in order to obtain a noticeable effect, the sound intensity levels and compared artificial HRTF and free-field
distance between the source and the listener must be greater than 15 m conditions. They found that in free-field listening, localization ability
(Blauert, 1997). For sound sources close to the listener’s head (about increases and then deteriorates monotonically up to 100 dB, whereas
1 m), by contrast, the spectral content is modified due to the diffraction in the HRTF condition, performance still improves at 100 dB
of the sound around the listener’s head. For this reason, for sources in (Macpherson and Middlebrooks, 2000; Brungart and Simpson, 2008;
the proximal space (<1.7 m), sounds at lower frequencies (<3,000 Hz) Marmel et al., 2018).
actually result in more accurate distance estimation than sounds at The intensity value provides information relating to both the
higher frequencies (>5,000 Hz) (Brungart and Rabinowitz, 1999; power and distance of the source. In the absence of information
Kopčo and Shinn-Cunningham, 2011). provided by other sensory channels, such as vision, this condition can
Finally, sound frequency appears to play a role in front-back lead to a state of indecision in the measurement of the two parameters.
confusion errors. Both Stevens and Newman, and Withington, found Researchers are still examining the way the auditory system handles
that the number of confusion errors was much higher for sound the two pieces of information. The evidence produced by Zahorik and
sources below 2,500 Hz. Letowski and Letowski reported more Wightman supports the hypothesis that the two processes are separate.
frequent errors for sound sources located near the sagittal plane for These authors reported good power estimation even when distance
narrow-band sounds and for a spectral band below 8,000 Hz. The estimation was less accurate (Zahorik and Wightman, 2001). The most
number of confusion errors decreases rapidly as the energy of the commonly accepted way of resolving the confusion between power
high-frequency component increases (Stevens and Newman, 1936; and distance is based on the Direct-to-Reverberant energy Ratio,
Withington, 1999; Letowski and Letowski, 2012). which consists in a comparison between the direct wave and the
reverberant wave train (see “Reverberation”) (Zahorik and
Wightman, 2001).

4.2 Sound intensity


4.3 Pointing methods
Sound intensity plays an important role in several aspects of
auditory localization, and especially in determining the distance Research over the past 30 years has shown that pointing methods
between the listener and the sound source. It does so by underpinning can affect precision and accuracy in localization tasks. Pointing
two important mechanisms: the estimation of the intensity of the paradigms can be classified as egocentric or allocentric, with
direct wave, and the comparison between the intensities of the direct egocentric methods generally being reported to be more accurate.
wave and the reverberated waves. When defining a protocol for a localization task, several pointing/
At the theoretical level, the intensity of a spherical wave falls by localizing methods are possible: the orientation of a body part, such
6 dB with each doubling of distance (Warren, 1958; Warren et al., as pointing with a hand or a finger (Pernaux et al., 2003; Finocchietti
1958). In the real word, however, both environmental factors and et al., 2015), the orientation of the chest (Haber et al., 1993), the nose
sound source features can alter this simple mathematical relationship (Middlebrooks, 1999b), or the head (Makous and Middlebrooks,
(Zahorik et al., 2005). Experimental tests have shown that the 1990; Carlile et al., 1997; Begault et al., 2001); use of a hand-tool
reduction in intensity during propagation in air is greater than the (Langendijk et al., 2001; Cappagli and Gori, 2016), or a computer
theoretical value and that this reduction amounts to about 10 dB for interface (Pernaux et al., 2003; Schoeffler et al., 2014); walking
each doubling of distance (Stevens and Guirao, 1962; Begault, 1991). (Loomis et al., 1998); or simply using a verbal response (Klatzky et al.,
However, Blauert found an even higher value of 20 dB (Blauert, 1997). 2003; Finocchietti et al., 2015). In 1955, Sandel and colleagues

Frontiers in Psychology 10 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

conducted three localization experiments in which participants gave while head-pointing resulted in better performance in the vertical
the response using an acoustic pointer. The method made use of a plane (Tabry et al., 2013). In addition, they reported lower accuracy
mobile loudspeaker that participants could place at the location where for head-pointing at extreme upward and downward elevations,
they felt the stimulus had been emitted (Sandel et al., 1955). probably due to the greater difficulty of the articular movements.
Several studies have focused on evaluating or comparing different Populin compared head-and gaze-pointing. He reported similar
localization methods. In a study conducted on blind subjects, Haber performances with the two methods in the most eccentric positions.
and colleagues compared nine different response methods using pure However, in frontal positions, he unexpectedly found that gaze-
tones as stimuli in the horizontal plane. They showed that using body pointing resulted in significantly larger errors than head-pointing
parts as the pointing method provides the best performance by (Populin, 2008).
optimizing localization accuracy and reducing intersubject variability Gilkey and colleagues proposed an original method using an
(Haber et al., 1993). allocentric paradigm called GELP (God’s Eyes Localization Pointing)
An interesting research was conducted by Lewald et al. (2000). designed to accelerate response collection in auditory-localization
They conducted some auditory localization experiments, investigating experiments. GELP uses a 20-cm-diameter sphere as a model of the
the influence of head rotation relative to the trunk. In the different listener’s head, on which the participant can indicate the direction
experiments proposed, they used both headphones and an array of 9 from which he/she perceives the sound coming. Test results obtained
speakers arranged in the azimuthal plane to deliver the sound stimuli. with GELP showed that it was a fast way to record participants’
For the response, they tested head pointing, a laser pointer attached responses and that it was also more accurate than the verbal response
to the head, and a swivel pointer (which must be directed with both method. However, when they compared their results with those of
hands toward the sound source). The authors highlighted that sound Makous and Middlebrooks (1990), the authors found that the GELP
localization is systematically underestimated (localization is biased technique is significantly less accurate than the head-pointing
toward the sagittal plane) when the head is oriented eccentrically. The technique (Makous and Middlebrooks, 1990; Gilkey et al., 1995).
orientation of the head on the azimuthal plane and the localization These results were subsequently confirmed by the work of Djelani
error appear almost linear. The presentation of virtual sources through et al. (2000).
headphones also showed similar deviations. When a visual reference To collect responses in a localization task, it is also possible to use
of the head’s median plane was provided, sound localization was more a computer-controlled graphical interface (Graphical User Interface,
accurate. Odegaard and colleagues used a very large sample of subjects GUI) through which participants can indicate the perceived direction.
(384 participants) in a study investigating the presence and direction Pernaux and colleagues, and Schoeffler and colleagues, compared two
of bias in both visual and auditory localizations. They used an eye different GUI methods, consisting of a 2D or 3D representation, with
tracking system to record participants’ responses. Contrary to Lewald, the participants using a mouse to give their responses. Both reported
in the unimodal auditory condition they found a peripherally oriented that the 3D version was more effective. Moreover, Pernaux also
localization bias (i.e., overestimation), which was also more compared the finger-pointing method with the two previous methods
pronounced as stimulus eccentricity increased (Odegaard et al., 2015). and showed that finger-pointing was faster and more accurate
Recanzone and colleagues conducted comparative research, in which (Pernaux et al., 2003; Schoeffler et al., 2014).
they found that the eccentricity of peripheral auditory targets is Table 1 shows and classifies a selection of articles that have
typically overestimated when using hand pointing, and typically investigated the characteristics of different pointing methods. This
underestimated when using head pointing methods. They suggested table provides an overview of the main categories into which the
that the different relative position of the head with respect to the literature on auditory localization can be pragmatically classified. It
sound source and the trunk may explain these results (Recanzone also includes a selection of key reference works that illustrate these
et al., 1998). categories. The information catalogued in the table can serve as a
One study show that the dominant hand also influences responses. framework for organizing new related work.
The study by Ocklenburg investigated the effect of laterality in a sound
localization task. The protocol is based on diffusion of the auditory
stimuly through a set of 21 horizontal speakers, and a pointing by 4.4 Training
head orientation or hand pointing. Interestingly, results show that
both right-and left-handers have a tendency to localize sound toward The considerable research and extensive experimental tests
the side contralateral to the dominant hand, regardless of their overall conducted in recent decades have shown that habituation and training
accuracy (bias similar to that observed in visual perception, suggesting are factors that significantly influence participants’ performance
same supramodal neural processes involved) (Ocklenburg et al., 2010). in localization tasks. Habituation allows participants to become
Majdak and colleagues used individualized HRTFs to compare familiar with the task and the materials used, and a few trials are
head-and hand-pointing. In a virtual environment, they found that usually enough. Training is a deeper process that aims to enable
the pointing method had no significant effect on the localization task participants to “appropriate” the methods and stimuli and requires a
(Majdak et al., 2010). Tabry and colleagues also assessed head-and much greater number of trials (Kacelnik et al., 2006; Kumpik et al.,
hand-pointing performance. They assessed the participants’ response 2010). There is no consensus on the duration required for an effective
to real words both in a free-field environment and in a semi-anechoic initial training phase. The training task plays an equally important
room. Under these conditions, and in contrast to Majdak’s findings, role. For instance, Hüg et al. (2022) investigated the effect of training
they found large and significant differences in performance between methods on auditory localization performance for the distance
the two pointing methods. More specifically, they found better dimension. Comparing active and passive movements, they observed
performance in the horizontal plane with the hand-pointing method, that the training was effective in improving localization performance

Frontiers in Psychology 11 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

TABLE 1 Experimental research articles on pointing methods in auditory localization.

References Pointing method Spatial Auditory cue Environ.


dimension
Head (H), Gaze (G), Hand / Finger (HF), Hand
Pointer Tool (T), Other (specified)
H G HF T Other Azimuth (A) Real (R),
Elevation (E) Virtual (V)
Distance (D)
Aggius-Vella et al.
Verbally A R
(2018)

Aggius-Vella et al.
Verbally (f2) A R
(2020)

Bahu et al. (2016) × × × (a) A, E R

Begault et al. (2001) × Graph_Int (c) A, E, D HRTF V

Berthomieu et al.
Verbally D V
(2019)

Bidart and Lavandier Scale (d),


D V
(2016) verbally

Boyer et al. (2013) (×) × A V

Brungart et al. (1999) (e) A, E ITD, ILD, HRTF V

Cappagli et al. (2017) Verbally (f2) D R

Cappagli and Gori


× A R
(2016)

Carlile et al. (1997) × A, E R

Chandler and
PB, verbally (f2) A, E, D R
Grantham (1992)

Djelani et al. (2000) × × GELP A, E HRTF V

Dobreva et al. (2011) Joystick A, E ITD, ILD R

Finocchietti et al.
× A, E R
(2015)

Getzmann (2003) × Verbally E R

Gilkey et al. (1995) GELP A, E R

Guo et al. (2019) Verbally (A), D R

Han and Chen (2019) Verbally (f2) A HRTF V

Klingel et al. (2021) Keyboard (f2) A ITD, ILD V

Langendijk et al. (2001) × (a) A, E V

Lewald et al. (2000) × × Laser pointer A R,V

Verbally,
Loomis et al. (1998) D R
walk (g)

Macpherson (1994) Verbally (A), E HRTF R

Macpherson and
× (A), E R
Middlebrooks (2000)

Majdak et al. (2010) × × A, E, D HRTF V

Makous and R
× A, E
Middlebrooks (1990)

Mershon and King Writing D R


(1975)

Middlebrooks (1999b) × (i) A, E HRTF R,V

(Continued)

Frontiers in Psychology 12 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

TABLE 1 (Continued)

References Pointing method Spatial Auditory cue Environ.


dimension
Head (H), Gaze (G), Hand / Finger (HF), Hand
Pointer Tool (T), Other (specified)
H G HF T Other Azimuth (A) Real (R),
Elevation (E) Virtual (V)
Distance (D)
Nilsson and Schenkman Keyboard (f2) A ITD, ILD V
(2016)

Oberem et al. (2020) × × (b) E HRTF V

Ocklenburg et al. (2010) × × A R

Odegaard et al. (2015) × A R

Otte et al. (2013) × A, E R

Pernaux et al. (2003) × Graph_Int (c) A, E V

Populin (2008) × A, E R

Rajendran and Gamper Keyboard (f3) E HRTF V


(2019)

Recanzone et al. (1998) × Switch A R

Rhodes (1987) Verbally A R

Risoud et al. (2020) Verbally A ITD, ILD, HRTF R

Rummukainen et al. Verbally (f2) A ITD, ILD V


(2018)

Schoeffler et al. (2014) Graph_Int (c) A, E R

Spiousas et al. (2017) Verbally D R

Tabry et al. (2013) × × A, E R

Yost (2016) Keyboard (f4) A R

Zahorik (2002) Writing (h) D R

Van Wanrooij and Van × A, E HRTF R


Opstal (2004)
The table presents a selection of research articles on experimental auditory localization and the pointing methods used. The selected articles propose a comparison of different pointing
methods or provide information about a specific method. The Pointing method column provides the used or analyzed pointing methods [Head (H), Gaze (G), Hand/Finger (F), Hand Pointer
Tool (T), Other (specified)]. The Spatial dimension column indicates which spatial dimensions are considered [azimuth (A), elevation (E), distance (D)]. The parentheses indicate that the
variable is a part of the assessment, even though it is not the main object of the research. The Auditory cue column indicates whether the focus of the article is on the analysis of a specific
auditory cue: ITD, ILD, HRTF. The Environ. column indicates whether the test was conducted in a real environment (R), using loudspeakers placed in real space around the listener, or in a
virtual environment (V), with the listener wearing audio headphones and using HRTF techniques. “Graph_Int” for Graphical Interface. “PB” for Push Button. (a) Gun pointer. (b) Hand-held
marker. (c) Schematic 2D and 3D views on which the subject reported his localization judgment with a mouse or a joystick. (d) Representative scale with hand selection. (e) HRTF
measurement. (f2) Two-alternative forced choice. (f3) Three-alternative forced choice. (f4) Fifteen-alternative forces choice. (g) Listener had to walk to the location perceived as the source. (h)
On a computer terminal with a numeric keypad. (i) Listener was instructed to “point with her/his nose.”

only with the active method (Hüg et al., 2022). Deep and effective consisting of five consecutive phases, each composed of 60 trials.
training results in much lower variability in the results. However, from The five phases progressively introduced the participant to the
an ecological point of view, deep training may affect the spontaneity complete task. The entire training phase lasted 10 min and was
of responses (Neuhoff, 2004). performed immediately before the tests (Macpherson and
Although some authors prefer not to subject their participants Middlebrooks, 2000).
to a training phase, thus prioritizing unconditioned responses, Other studies, by contrast, have proposed a more extensive
this approach appears to be very uncommon (Mershon and King, training phase. In a study specifically devoted to the effects of
1975; Populin, 2008). Some studies have foregone the use of a training on auditory localization abilities, Majdak and colleagues,
habituation or training phase and have instead performed a found that for the head-and hand-pointing methods, respectively,
calibration and/or verification of task understanding (Otte et al., listeners needed 590 and 710 trials (on average) to achieve the
2013). Bahu et al. and Hüg et al. proposed a simple habituation required performance (Majdak et al., 2010). To enable their
phase consisting of 10 or 4 trials, respectively, that were identical participants to learn how to use the pointing method correctly,
to the task used in the subsequent test (Bahu et al., 2016; Hüg Oberem and colleagues proposed training consisting of 600
et al., 2022). Macpherson and Middlebrooks proposed training localization trials with feedback (Oberem et al., 2020).

Frontiers in Psychology 13 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

Middlebrooks trained participants with 1,200 trials (Middlebrooks, either speakers or headphones (Bainbridge et al., 2015). Di Zio and
1999b). Oldfield and Parker’s participants were trained for at least colleagues investigated the Audiogravic Illusion (i.e., head-centered
2 h before performing the test (Oldfield and Parker, 1984). Makous auditory localization influenced by the intensity and direction of
and Middlebrooks administered 10 to 20 training sessions to gravity). To do this, they used an original and interesting experimental
listeners, with and without feedback (Makous and setup to manipulate the direction of gravity perceived by participants.
Middlebrooks, 1990). The results of their research show that by increasing the magnitude of
the resulting gravitational force and changing its direction relative to
the head and torso, it is possible to obtain an apparent displacement of
4.5 Auditory localization illusions a sound relative to the head in the opposite direction (DiZio et al., 2001).
Some auditory illusions have subsequently been used in a number
In auditory illusions, the perception or interpretation of a sound of important applications. Stereophony is perhaps the most widely used
is not consistent with the actual sound in terms of its physical, spatial, illusion. Stereophony is based on the “summing localization” effect:
or other characteristics. Some auditory illusions concern the when two sounds reach the two ears with a ‘limited incoherence’ in
localization or lateralization of sound. One of the earliest and best- time and level, the stimuli are merged into a single percept. Under these
documented auditory illusions is the Octave illusion (or Deutsch conditions, our brain infers a “phantom source,” located away from the
illusion), discovered by Diana Deutsch in 1973. Deutsch has identified listener, whose location is consistent with the perceived differences
a large number of auditory illusions of different types, of which the between the right and left ear stimuli. The purpose of using this illusion
Octave illusion is the best known. This illusion is produced by playing is to achieve a wider spatial perception in the diffusion of sounds and
a “high” and a “low” tone through stereo headphones, while alternating music with headphones or speakers (Chernyak and Dubrovsky, 1968).
the sound-ear correspondence (“high” left and “low” right, and vice-
versa) four times per second. The two tones are an octave apart. The
illusion takes the form of a perceptual alteration of the nature and 5 Conclusion
lateralization of the sounds, which are perceived as a single tone that
continuously alternates between the right and left ears (Deutsch, 1974, The human ability to localize sounds in our surroundings is a
2004). Although the explanation of this illusion is still a matter of complex and fascinating phenomenon. Through a sophisticated set of
debate, the most widely accepted solution is the one proposed by the mechanisms, our auditory system enables us to perceive the spatial
author herself and derives from the existence of a conflict between the location of sounds and orient ourselves in the world around us.
“what” and “where” decision-making mechanisms (Deutsch, 1975). In this article, we examined the main processes involved in
One of the most robust and fascinating auditory illusions is the auditory localization, based on monoaural and binaural cues, time and
Franssen effect, discovered by Nico Valentinus Franssen in 1960. The intensity differences between the ears, and frequencies that make it
Franssen effect is created by playing a sound through two easier – or more difficult – to localize the source. We have supplemented
loudspeakers, resulting in an auditory illusion in which the listener the “traditional” description of these mechanisms with the most recent
mislocalizes the lateralization of the sound. At the beginning of the research findings, which show how some ancillary cues, such as
illusion, a sound is emitted from only one of the loudspeakers (it is reverberation or relative motion, are essential to achieve our impressive
unimportant whether this is the left or right speaker) before then localization performance. We also have enhanced the functional
being completely transferred to the opposite side. Although the first description with relevant information concerning methodologies and
speaker has stopped playing, the listener does not perceive the change perceptual limitations in order to provide a broader information set.
of side. The most widely accepted explanation of the Franssen effect Modern applications of this knowledge make it possible today to
identifies the use of a pure sound, the change in laterality through live remarkable experiences. In particular, HRTF promises excellent
“rapid fading” from one side to the other, the dominance of onsets for spatialization results, but requires better understanding and
localization (in accordance with the law of the first wave front) and, management of its artificial reproduction. Resolving some conditions
most importantly, the presence of reverberation as the key elements. of localization uncertainty, and easily customizing equations on each
In the absence of reverberation, the effect does not occur (Hartmann listener, are still open challenges.
and Rakerd, 1989b). The illusion created by the Franssen effect is an In the present and in the future, one of the most interesting ethical
excellent example of how perception (and in this particular case, applications concerns the perceptual support for people with
auditory localization) also arises from the individual’s prior experience disabilities. Providing more effective assistive devices is certainly one
and is not just the result of momentary stimulation. of the most exciting challenges, as in the case of auditory rehabilitation
Advances in the understanding of the functioning of the auditory and assistive devices, such as sensory substitution devices for the blind
system have stimulated new and more original research, and this has (Bordeau et al., 2023).
led to the discovery (or creation) of new auditory illusions. Bloom
studied and experimented with the perception of elevation; he created
an illusion of sound elevation through spectral manipulation of the
sound (Bloom, 1977). A more recent auditory illusion is known as the Author contributions
Transverse-and-bounce illusion. This illusion uses front-to-back
confusion and volume changes to create the perception that a single AC: Conceptualization, Data curation, Methodology, Project
sound stimulus is in motion. When the volume increases, the sound is administration, Supervision, Writing – original draft, Writing – review
perceived as approaching, while when it decreases, it is perceived as & editing. CB: Data curation, Formal analysis, Investigation, Writing
moving away from the listener. This illusion can be reproduced using – original draft, Writing – review & editing. MA: Conceptualization,

Frontiers in Psychology 14 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

Data curation, Funding acquisition, Resources, Supervision, Writing Conflict of interest


– original draft, Writing – review & editing.
The authors declare that the research was conducted in the
absence of any commercial or financial relationships that could
be construed as a potential conflict of interest.
Funding
The author(s) declare that financial support was received for the Publisher’s note
research, authorship, and/or publication of this article. This work was
supported by Unadev – AAP 2019 – convention H144; Bourgogne- All claims expressed in this article are solely those of the authors
Franche-Comté Region/Feder – AAP 2020 – convention 2020Y- and do not necessarily represent those of their affiliated organizations,
12743. Funding sources have not played a role in any of the research or those of the publisher, the editors and the reviewers. Any product
phases, data collection, data analysis, or decision-making that may be evaluated in this article, or claim that may be made by its
for publication. manufacturer, is not guaranteed or endorsed by the publisher.

References
Acosta, A., Grijalva, F., Alvarez, R., and Acuna, B. (2020). Bilinear and triangular Blauert, J. (1996). The psychophysics of human sound localization. Spat. Heraing,
spherical head-related transfer functions interpolation on non-uniform meshes. 2020 Revis. Ed. Spatial hearing. MIT press.
IEEE ANDESCON, ANDESCON, Quito, Ecuador.
Blauert, J. (1997). Spatial hearing: the psychophysics of human sound localization.
Aggius-Vella, E., Campus, C., and Gori, M. (2018). Different audio spatial metric
representation around the body. Sci. Rep. 8, 1–9. doi: 10.1038/s41598-018-27370-9 Bloom, P. J. (1977). Creating source elevation illusions by spectral manipulation. J.
Audio Eng. Soc. 25, 560–565.
Aggius-Vella, E., Gori, M., Campus, C., Moore, B. C. J., Pardhan, S., Kolarik, A. J., et al.
(2022). Auditory distance perception in front and rear space. Hear. Res. 417:108468. doi: Bordeau, C., Scalvini, F., Migniot, C., Dubois, J., and Ambard, M. (2023). Cross-modal
10.1016/j.heares.2022.108468 correspondence enhances elevation localization in visual-to-auditory sensory
substitution. Front. Psychol. 14:1079998. doi: 10.3389/fpsyg.2023.1079998
Aggius-Vella, E., Kolarik, A. J., Gori, M., Cirstea, S., Campus, C., Moore, B. C. J., et al.
(2020). Comparison of auditory spatial bisection and minimum audible angle in front, Boyer, E. O., Babayan, B. M., Bevilacqua, F., Noisternig, M., Warusfel, O.,
lateral, and back space. Sci. Rep. 10:6279. doi: 10.1038/s41598-020-62983-z Roby-Brami, A., et al. (2013). From ear to hand: the role of the auditory-motor loop in
pointing to an auditory source. Front. Comput. Neurosci. 7:26. doi: 10.3389/
Ahveninen, J., Kopčo, N., and Jääskeläinen, I. P. (2014). Psychophysics and neuronal bases fncom.2013.00026
of sound localization in humans. Hear. Res. 307, 86–97. doi: 10.1016/j.heares.2013.07.008
Brimijoin, W. O., and Akeroyd, M. A. (2014). The moving minimum audible angle is
Andersen, J. S., Miccini, R., Serafin, S., and Spagnol, S. (2021). Evaluation of smaller during self motion than during source motion. Front. Neurosci. 8:273. doi:
individualized HRTFs in a 3D shooter game. In 2021 immersive and 3D audio: from 10.3389/fnins.2014.00273
architecture to automotive, I3DA, IEEE. 2021.
Brimijoin, W. O., McShefferty, D., and Akeroyd, M. A. (2010). Auditory and visual
Bahu, H., Carpentier, T., Noisternig, M., and Warusfel, O. (2016). Comparison of orienting responses in listeners with and without hearing-impairment. J. Acoust. Soc.
different egocentric pointing methods for 3D sound localization experiments. Acta Am. 127, 3678–3688. doi: 10.1121/1.3409488
Acust. 102, 107–118. doi: 10.3813/AAA.918928
Bronkhorst, A. W. (1995). Localization of real and virtual sound sources. J. Acoust.
Bainbridge, C. M., Bainbridge, W. A., and Oliva, A. (2015). Quadri-stability of a Soc. Am. 98, 2542–2553. doi: 10.1121/1.413219
spatially ambiguous auditory illusion. Front. Hum. Neurosci. 8:1060. doi: 10.3389/
fnhum.2014.01060 Bronkhorst, A. W., and Houtgast, T. (1999). Auditory distance perception in rooms.
Nature 397, 517–520. doi: 10.1038/17374
Begault, D. R. (1991). Preferred sound intensity increase for sensation of half distance.
Percept. Mot. Skills 72, 1019–1029. doi: 10.2466/pms.1991.72.3.1019 Brughera, A., Dunai, L., and Hartmann, W. M. (2013). Human interaural time
difference thresholds for sine tones: the high-frequency limit. J. Acoust. Soc. Am. 133,
Begault, D. R. (1992). Perceptual effects of synthetic reverberation on three- 2839–2855. doi: 10.1121/1.4795778
dimensional audio systems. AES J. Audio Eng. Soc. 40, 895–904.
Brungart, D. S., Durlach, N. I., and Rabinowitz, W. M. (1999). Auditory localization
Begault, D. R. (1999). Auditory and non-auditory factors that potentially influence of nearby sources. II. Localization of a broadband source. J. Acoust. Soc. Am. 106,
virtual acoustic imagery. AES 16th Int. Conf, Moffett Field, CA. 1956–1968. doi: 10.1121/1.427943
Begault, D. R. (2000). 3-D sound for virtual reality and multimedia. Moffett Field, CA: Brungart, D. S., and Rabinowitz, W. M. (1999). Auditory localization of nearby
Ames Research Center. sources. Head-related transfer functions. J. Acoust. Soc. Am. 106, 1465–1479. doi:
Begault, D. R., Wenzel, E. M., and Anderson, M. R. (2001). Direct comparison of the 10.1121/1.427180
impact of head tracking, reverberation, and individualized head-related transfer Brungart, D. S., and Simpson, B. D. (2008). Effects of temporal fine structure on the
functions on the spatial perception of a virtual speech source. J. Audio Eng. Soc. 49, localization of broadband sounds: potential implications for the Design of Spatial Audio
904–916. Displays. Proceedings of the 14th International Conference on Auditory Display, Paris,
Berger, C. C., Gonzalez-Franco, M., Tajadura-Jiménez, A., Florencio, D., and Zhang, Z. France June 24–27, 2008. Available at: https://www.icad.org/Proceedings/2008/
(2018). Generic HRTFs may be good enough in virtual reality. Improving source BrungartSimpson2008b.pdf
localization through cross-modal plasticity. Front. Neurosci. 12:21. doi: 10.3389/ Brungart, D. S., Simpson, B. D., and Kordik, A. J. (2005). Localization in the presence
fnins.2018.00021 of multiple simultaneous sounds. Acta Acust. 91, 471–479.
Bernstein, L. R., and Trahiotis, C. (1985). Lateralization of low-frequency, complex Bruns, P., Thun, C., and Röder, B. (2024). Quantifying accuracy and precision from
waveforms: the use of envelope-based temporal disparities. J. Acoust. Soc. Am. 77, continuous response data in studies of spatial perception and crossmodal recalibration.
1868–1880. doi: 10.1121/1.391938 Behav. Res. Methods 56, 3814–3830. doi: 10.3758/s13428-024-02416-1
Best, V., Baumgartner, R., Lavandier, M., Majdak, P., and Kopčo, N. (2020). Sound Butler, R. A. (1986). The bandwidth effect on monaural and binaural localization.
externalization: a review of recent research. Trends Hear. 24:233121652094839. doi: Hear. Res. 21, 67–73. doi: 10.1016/0378-5955(86)90047-X
10.1177/2331216520948390
Butler, R. A., Levy, E. T., and Neff, W. D. (1980). Apparent distance of sounds recorded
Berthomieu, G., and Koehl, V., and Paquier, M. (2019). Loudness and distance in echoic and anechoic chambers. J. Exp. Psychol. Hum. Percept. Perform. 6, 745–750.
estimates for noise bursts coming from several distances with and without visual cues to doi: 10.1037/0096-1523.6.4.745
their source. Universitätsbibliothek der RWTH Aachen.
Cappagli, G., and Gori, M. (2016). Auditory spatial localization: developmental delay
Bidart, A., and Lavandier, M. (2016). Room-induced cues for the perception of virtual in children with visual impairments. Res. Dev. Disabil. 53-54, 391–398. doi: 10.1016/j.
auditory distance with stimuli equalized in level. Acta Acustica United with Acustica, ridd.2016.02.019
102, 159–169.
Cappagli, G., and Cocchi, E., and Gori, M. (2017). Auditory and proprioceptive
Blauert, J. (1969). Sound localization in the median plane. Acust. 22, 205–213. spatial impairments in blind children and adults. Dev Sci, 20: e12374.

Frontiers in Psychology 15 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

Carlile, S., Leong, P., and Hyams, S. (1997). The nature and distribution of errors in Geronazzo, M., Sikstrom, E., Kleimola, J., Avanzini, F., De Gotzen, A., and Serafin, S.
sound localization by human listeners. Hear. Res. 114, 179–196. doi: 10.1016/ (2019). The impact of an accurate vertical localization with HRTFs on short explorations
S0378-5955(97)00161-5 of immersive virtual reality scenarios. In proceedings of the 2018 IEEE international
symposium on mixed and augmented reality, ISMAR, IEEE, 2018 90–97.
Catic, J., Santurette, S., Buchholz, J. M., Gran, F., and Dau, T. (2013). The effect of
interaural-level-difference fluctuations on the externalization of sound. J. Acoust. Soc. Getzmann, S. (2003). The influence of the acoustic context on vertical sound
Am. 134, 1232–1241. doi: 10.1121/1.4812264 localization in the median plane. Percept. psychophys, 65, 1045–1057.
Catic, J., Santurette, S., and Dau, T. (2015). The role of reverberation-related binaural Giguere, C., and Abel, S. M. (1993). Sound localization: effects of reverberation time,
cues in the externalization of speech. J. Acoust. Soc. Am. 138, 1154–1167. doi: speaker array, stimulus frequency, and stimulus rise/decay. J. Acoust. Soc. Am. 94,
10.1121/1.4928132 769–776. doi: 10.1121/1.408206
Chandler, D. W., and Grantham, D. W. (1992). Minimum audible movement angle Gilkey, R. H., Good, M. D., Ericson, M. A., Brinkman, J., and Stewart, J. M. (1995). A
in the horizontal plane as a function of stimulus frequency and bandwidth, source pointing technique for rapidly collecting localization responses in auditory research.
azimuth, and velocity. J. Acoust. Soc. Am, 91, 1624–1636. Behav. Res. Methods Instrum. Comput. 27, 1–11. doi: 10.3758/BF03203614
Chernyak, R. I., and Dubrovsky, N. A.. (1968). Pattern of the noise images and the Glasberg, B. R., and Moore, B. C. J. (1990). Derivation of auditory filter shapes from
binaural summation of loudness for the different interaural correlation of noise. notched-noise data. Hear. Res. 47, 103–138. doi: 10.1016/0378-5955(90)90170-T
Proceedings of the 6th International Congress on Acoustics, Tokyo.
Goossens, H. H. L. M., and Van Opstal, A. J. (1999). Influence of head position on the
Chun, C. J., Moon, J. M., Lee, G. W., Kim, N. K., and Kim, H. K. (2017). Deep neural spatial representation of acoustic targets. J. Neurophysiol. 81, 2720–2736. doi: 10.1152/
network based HRTF personalization using anthropometric measurements. In 143rd jn.1999.81.6.2720
audio engineering society convention 2017, AES, 2017.
Graziano, M. S. A. (2001). Is reaching eye-centered, body-centered, hand-centered, or
Coleman, P. D. (1968). Dual Rôle of frequency Spectrum in determination of auditory a combination? Rev. Neurosci. 12, 175–185. doi: 10.1515/REVNEURO.2001.12.2.175
distance. J. Acoust. Soc. Am. 44, 631–632. doi: 10.1121/1.1911132
Grijalva, F., Martini, L. C., Florencio, D., and Goldenstein, S. (2017). Interpolation of
Demirkaplan, Ö., and Haclhabiboǧlu, H. (2020). Effects of interpersonal familiarity head-related transfer functions using manifold learning. IEEE Signal Process. Lett. 24,
on the auditory distance perception of level-equalized reverberant speech. Acta Acust. 221–225. doi: 10.1109/LSP.2017.2648794
4:26. doi: 10.1051/aacus/2020025
Grothe, B., and Pecka, M. (2014). The natural history of sound localization in
Deutsch, D. (1974). An auditory illusion. Nature 251, 307–309. doi: mammals-a story of neuronal inhibition. Front. Neural Circ. 8:116. doi: 10.3389/
10.1038/251307a0 fncir.2014.00116
Deutsch, D. (1975). Musical Illusions. Sci. Am. 233, 92–104. doi: 10.1038/ Grothe, B., Pecka, M., and McAlpine, D. (2010). Mechanisms of sound localization in
scientificamerican1075-92 mammals. Physiol. Rev. 90, 983–1012. doi: 10.1152/physrev.00026.2009
Deutsch, D. (2004). The octave illusion revisited again. J. Exp. Psychol. Hum. Percept. Guo, Z., Lu, Y., Wang, L., and Yu, G. (2019). Discrimination experiment of sound
Perform. 30, 355–364. doi: 10.1037/0096-1523.30.2.355 distance perception for a real source in near-field, In EAA Spatial Audio Signal
DiZio, P., Held, R., Lackner, J. R., Shinn-Cunningham, B., and Durlach, N. (2001). Processing Symposium (pp. 85–89).
Gravitoinertial force magnitude and direction influence head-centric auditory Guski, R. (1990). Auditory localization: effects of reflecting surfaces. Perception 19,
localization. J. Neurophysiol. 85, 2455–2460. doi: 10.1152/jn.2001.85.6.2455 819–830. doi: 10.1068/p190819
Djelani, T., Pörschmann, C., Sahrhage, J., and Blauert, J. (2000). An interactive virtual- Haber, L., Haber, R. N., Penningroth, S., Novak, K., and Radgowski, H. (1993).
environment generator for psychoacoustic research II: Collection of head-related Comparison of nine methods of indicating the direction to objects: data from blind
impulse responses and evaluation of auditory localization: Acustica. 86, 1046–1053. adults. Perception 22, 35–47. doi: 10.1068/p220035
Dobreva, M. S., O'Neill, W. E., and Paige, G. D. (2011). Influence of aging on human Han, Y., and Chen, F. (2019). Minimum audible movement angle in virtual auditory
sound localization. J. Neurophysiol. 105, 2471–2486. doi: 10.1152/jn.00951.2010 environment: Effect of stimulus frequency. In 2019 IEEE Conference on Multimedia
Durlach, N. I., Rigopulos, A., Pang, X. D., Woods, W. S., Kulkarni, A., Colburn, H. S., Information Processing and Retrieval (MIPR), IEEE. 175–178.
et al. (1992). On the externalization of auditory images. Presence 1, 251–257. doi: Hartmann, W. M. (1983). Localization of sound in rooms. J. Acoust. Soc. Am. 74,
10.1162/pres.1992.1.2.251 1380–1391. doi: 10.1121/1.390163
Elpern, B. S., and Naunton, R. F. (1964). Lateralizing effects of Interaural phase Hartmann, W. M. (1999). How we localize sound. Phys. Today 52, 24–29. doi:
differences. J. Acoust. Soc. Am. 36, 1392–1393. doi: 10.1121/1.1919215 10.1063/1.882727
Engel, A. K., Maye, A., Kurthen, M., and König, P. (2013). Where’s the action? The Hartmann, W. M., and Macaulay, E. J. (2014). Anatomical limits on interaural time
pragmatic turn in cognitive science. Trends Cogn. Sci. 17, 202–209. doi: 10.1016/j. differences: an ecological perspective. Front. Neurosci. 8:34. doi: 10.3389/
tics.2013.03.006 fnins.2014.00034
Febretti, A., Nishimoto, A., Thigpen, T., Talandis, J., Long, L., Pirtle, J. D., et al. (2013). Hartmann, W. M., and Rakerd, B. (1989a). On the minimum audible angle—a
CAVE2: A hybrid reality environment for immersive simulation and information decision theory approach. J. Acoust. Soc. Am. 85, 2031–2041. doi: 10.1121/1.397855
analysis. in The Engineering Reality of Virtual Reality 2013SPIE. 8649, 9–20.
Hartmann, W. M., and Rakerd, B. (1989b). Localization of sound in rooms IV: the
Finocchietti, S., Cappagli, G., and Gori, M. (2015). Encoding audio motion: spatial Franssen effect. J. Acoust. Soc. Am. 86, 1366–1373. doi: 10.1121/1.398696
impairment in early blind individuals. Front. Psychol. 6:1357. doi: 10.3389/
fpsyg.2015.01357 Hartmann, W. M., Rakerd, B., Crawford, Z. D., and Zhang, P. X. (2016). Transaural
experiments and a revised duplex theory for the localization of low-frequency tones. J.
Fitzgibbons, P. J., and Gordon-Salant, S. (2010). Behavioral studies with aging humans: Acoust. Soc. Am. 139, 968–985. doi: 10.1121/1.4941915
hearing sensitivity and psychoacoustics. The aging auditory system, 111–134.
Hartmann, W. M., Rakerd, B., and Macaulay, E. J. (2013). On the ecological
Fletcher, H., and Munson, W. A. (1933). Loudness, its definition, measurement and interpretation of limits of interaural time difference sensitivity. in Proceedings of
calculation. J. Acoust. Soc. Am. 5, 82–108. doi: 10.1121/1.1915637 Meetings on Acoustics. (Vol. 19, No. 1). AIP Publishing.
Fontana, F., and Rocchesso, D. (2008). Auditory distance perception in an acoustic Hebrank, J., and Wright, D. (1974). Spectral cues used in the localization of sound
pipe. ACM Trans. Appl. Percept. 5, 1–15. doi: 10.1145/1402236.1402240 sources on the median plane. J. Acoust. Soc. Am. 56, 1829–1834. doi:
Freeland, F., Wagner, L., and Diniz, P. (2002). Efficient HRTF interpolation in 3D 10.1121/1.1903520
moving sound. 22nd Int. Conf. Virtual, Synth. Entertain. Audio Audio Eng. Soc. Heinz, M. G., Colburn, H. S., and Carney, L. H. (2001). Evaluating auditory
Galati, G., Pelle, G., Berthoz, A., and Committeri, G. (2010). Multiple reference frames performance limits: i. one-parameter discrimination using a computational model for
used by the human brain for spatial perception and memory. Exp. Brain Res. 206, the auditory nerve. Neural Comput. 13, 2273–2316. doi: 10.1162/089976601750541804
109–120. doi: 10.1007/s00221-010-2168-8 Howard, D. M., and Angus, J. A. S. (2017). Acoustics and psychoacoustics. 5th Edn.
Gan, W.-S., Peksi, S., He, J., Ranjan, R., Duy Hai, N., and Kumar Chaudhary, N. (2017). Routledge.
Personalized HRTF measurement and 3D audio rendering for AR/VR headsets. In
Howarth, A., and Shone, G. R. (2006). Ageing and the auditory system. Postgrad. Med.
Audio Engineering Society Convention 142. Audio Engineering Society.
J. 82, 166–171. doi: 10.1136/pgmj.2005.039388
Garcia, S. E., Jones, P. R., Rubin, G. S., and Nardini, M. (2017). Auditory localisation
Hüg, M. X., Bermejo, F., Tommasini, F. C., and Di Paolo, E. A. (2022). Effects of guided
biases increase with sensory uncertainty. Sci. Rep. 7, 1–10. doi: 10.1038/srep40567
exploration on reaching measures of auditory peripersonal space. Front. Psychol.
Gardner, W. G. (1995). Efficient convolution without input-output delay. J. Audio Eng. 13:983189. doi: 10.3389/fpsyg.2022.983189
Soc., 43, 127–136.
Iwaya, Y., Suzuki, Y., and Kimura, D. (2003). Effects of head movement on front-back
Gelfand, S. A. (2017). Hearing: an introduction to psychological and physiological error in sound localization. Acoust. Sci. Technol. 24, 322–324. doi: 10.1250/ast.24.322
acoustics. CRC Press.
Jerath, R., Crawford, M. W., and Barnes, V. A. (2015). Functional representation of
Genzel, D., Schutte, M., Brimijoin, W. O., MacNeilage, P. R., and Wiegrebe, L. (2018). vision within the mind: a visual consciousness model based in 3D default space.
Psychophysical evidence for auditory motion parallax. Proc. Natl. Acad. Sci. U. S. A. 115, Majallah-i Īrānī-i nazarīyah pardāzī dar ʻulūm-i pizishkī 9, 45–56. doi: 10.1016/j.
4264–4269. doi: 10.1073/pnas.1712058115 jmhi.2015.02.001

Frontiers in Psychology 16 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

Jesteadt, W., Wier, C. C., and Green, D. M. (1977). Intensity discrimination as a McIntyre, J., Stratta, F., Droulez, J., and Lacquaniti, F. (2000). Analysis of pointing
function of frequency and sensation level. J. Acoust. Soc. Am. 61, 169–177. doi: errors reveals properties of data representations and coordinate transformations within
10.1121/1.381278 the central nervous system. Neural Comput. 12, 2823–2855. doi:
10.1162/089976600300014746
Kacelnik, O., Nodal, F. R., Parsons, C. H., and King, A. J. (2006). Training-induced
plasticity of auditory localization in adult mammals. PLoS Biol. 4:e71. doi: 10.1371/ Mendonça, C., Campos, G., Dias, P., Vieira, J., Ferreira, J. P., and Santos, J. A. (2012).
journal.pbio.0040071 On the improvement of localization accuracy with non-individualized HRTF-based
sounds. J. Audio Eng. Soc. 60, 821–830.
Kates, J. M., and Arehart, K. H. (2018). Improving auditory externalization for
hearing-aid remote microphones. In conference record of 51st Asilomar conference on Mershon, D. H., and King, L. E. (1975). Intensity and reverberation as factors in the
signals, systems and computers, ACSSC, IEEE. 2017. auditory perception of egocentric distance. Percept. Psychophys. 18, 409–415. doi:
10.3758/BF03204113
Kearney, G., Gorzel, M., Rice, H., and Boland, F. (2012). Distance perception in
interactive virtual acoustic environments using first and higher order ambisonic sound Meshram, A., Mehra, R., Yang, H., Dunn, E., Franm, J. M., and Manocha, D. (2014).
fields. Acta Acust United Acust 98, 61–71. doi: 10.3813/AAA.918492 P-HRTF: Efficient personalized HRTF computation for high-fidelity spatial sound. In
2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), IEEE.
Klatzky, R. L., Lippa, Y., Loomis, J. M., and Golledge, R. G. (2003). Encoding, learning, 53–61.
and spatial updating of multiple object locations specified by 3-D sound, spatial
language, and vision. Exp. Brain Res. 149, 48–61. doi: 10.1007/s00221-002-1334-z Miccini, R., and Spagnol, S. (2020). HRTF individualization using deep learning. in
proceedings -2020 IEEE conference on virtual reality and 3D user interfaces, VRW,
Klingel, M., Kopčo, N., and Laback, B. (2021). Reweighting of binaural localization IEEE. 390–395.
cues induced by lateralization training. J Assoc Res Otolaryngol, 22, 551–566.
Middlebrooks, J. C. (1999a). Individual differences in external-ear transfer functions
Klumpp, R. G., and Eady, H. R. (1956). Some measurements of Interaural time reduced by scaling in frequency. J. Acoust. Soc. Am. 106, 1480–1492. doi: 10.1121/1.427176
difference thresholds. J. Acoust. Soc. Am. 28, 859–860. doi: 10.1121/1.1908493
Middlebrooks, J. C. (1999b). Virtual localization improved by scaling
Kopčo, N., and Shinn-Cunningham, B. G. (2011). Effect of stimulus spectrum on nonindividualized external-ear transfer functions in frequency. J. Acoust. Soc. Am. 106,
distance perception for nearby sources. J. Acoust. Soc. Am. 130, 1530–1541. doi: 1493–1510. doi: 10.1121/1.427147
10.1121/1.3613705
Middlebrooks, J. C., Makous, J. C., and Green, D. M. (1989). Directional sensitivity of
Kumpik, D. P., Kacelnik, O., and King, A. J. (2010). Adaptive reweighting of auditory sound-pressure levels in the human ear canal. J. Acoust. Soc. Am. 86, 89–108. doi:
localization cues in response to chronic unilateral earplugging in humans. J. Neurosci. 10.1121/1.398224
30, 4883–4894. doi: 10.1523/JNEUROSCI.5488-09.2010
Miller, G. A. (1947). Sensitivity to changes in the intensity of white noise and its
Laird, D. A., Taylor, E., and Wille, H. H. (1932). The apparent reduction of loudness. relation to masking and loudness. J. Acoust. Soc. Am. 19, 609–619. doi: 10.1121/1.1916528
J. Acoust. Soc. Am. 3, 393–401. doi: 10.1121/1.1915570
Mills, A. W. (1958). On the minimum audible angle. J. Acoust. Soc. Am. 30, 237–246.
Langendijk, E. H. A., and Bronkhorst, A. W. (2002). Contribution of spectral cues doi: 10.1121/1.1909553
to human sound localization. J. Acoust. Soc. Am. 112, 1583–1596. doi:
Mills, A. W. (1960). Lateralization of high-frequency tones. J. Acoust. Soc. Am. 32,
10.1121/1.1501901
132–134. doi: 10.1121/1.1907864
Langendijk, E. H. A., Kistler, D. J., and Wightman, F. L. (2001). Sound localization in
Mills, A. W., and Tobias, J. V. (1972). Foundations of modern auditory theory. JV
the presence of one or two distracters. J. Acoust. Soc. Am. 109, 2123–2134. doi:
Tobias 2.
10.1121/1.1356025
Møller, H. (1992). Fundamentals of binaural technology. Appl. Acoust. 36, 171–218.
Lee, G. W., and Kim, H. K. (2018). Personalized HRTF modeling based on deep neural
doi: 10.1016/0003-682X(92)90046-U
network using anthropometric measurements and images of the ear. Appl. Sci. 8:2180.
doi: 10.3390/app8112180 Møller, H., Sørensen, M. F., Jensen, C. B., and Hammershøi, D. (1996). Binaural
technique: do we need individual recordings? J. Audio Eng. Soc. 44, 451–469.
Letowski, T., and Letowski, S. (2011). “Localization error: accuracy and precision of
auditory localization” in Advances in sound localization. 55, 55–78. Moore, B. C. J., and Glasberg, B. R. (1996). A revision of Zwicker’s loudness model.
Acta Acust. 82, 335–345.
Letowski, T. R., and Letowski, S. T. (2012). Auditory spatial perception: auditory
localization. Army Research Laboratory Aberdeen Proving Ground MD Human Research Neuhoff, J. G. (2004). “Auditory motion and localization” in Ecological psychoacoustics
and Engineering Directorate. (Brill: Elsevier), 87–111.
Lewald, J., Dörrscheidt, G. J., and Ehrenstein, W. H. (2000). Sound localization with Nilsson, M. E., and Schenkman, B. N. (2016). Blind people are more sensitive than
eccentric head position. Behav. Brain Res. 108, 105–125. doi: 10.1016/ sighted people to binaural sound-location cues, particularly inter-aural level differences.
S0166-4328(99)00141-2 Hear. Res. 332, 223–232. doi: 10.1016/j.heares.2015.09.012
Little, A. D., Mershon, D. H., and Cox, P. H. (1992). Spectral content as a cue to Noble, W. G. (1981). Earmuffs, exploratory head movements, and horizontal and
perceived auditory distance. Perception 21, 405–416. doi: 10.1068/p210405 vertical sound localization. J. Aud. Res. 21, 1–12.
Loomis, J. M., Hebert, C., and Cicinelli, J. G. (1990). Active localization of virtual Nordmark, J. O. (1976). Binaural time discrimination. J. Acoust. Soc. Am. 60, 870–880.
sounds. J. Acoust. Soc. Am. 88, 1757–1764. doi: 10.1121/1.400250 doi: 10.1121/1.381167
Loomis, J. M., Klatzky, R. L., Philbeck, J. W., and Golledge, R. G. (1998). Assessing Oberem, J., Richter, J. G., Setzer, D., Seibold, J., Koch, I., and Fels, J. (2020).
auditory distance perception using perceptually directed action. Percept. Psychophys. 60, Experiments on localization accuracy with non-individual and individual HRTFs
966–980. doi: 10.3758/BF03211932 comparing static and dynamic reproduction methods. bioRxiv.
Macpherson, E. A., and Center, W. (1994). On the role of head-related transfer Ocklenburg, S., Hirnstein, M., Hausmann, M., and Lewald, J. (2010). Auditory space
function spectral notches in the judgement of sound source elevation. In: Proceedings of perception in left- and right-handers. Brain Cogn. 72, 210–217. doi: 10.1016/j.
the 2nd International Conference on Auditory Display, 187–194. bandc.2009.08.013
Macpherson, E. A., and Middlebrooks, J. C. (2000). Localization of brief sounds: Odegaard, B., Wozny, D. R., and Shams, L. (2015). Biases in visual, auditory, and
effects of level and background noise. J. Acoust. Soc. Am. 108, 1834–1849. doi: audiovisual perception of space. PLoS Comput. Biol. 11:e1004649. doi: 10.1371/journal.
10.1121/1.1310196 pcbi.1004649
Majdak, P., Baumgartner, R., and Jenny, C. (2020). “Formation of three-dimensional Oldfield, S. R., and Parker, S. P. A. (1984). Acuity of sound localisation: a topography
auditory space” in The technology of binaural understanding. Modern acoustics and of auditory space. II. Pinna cues absent. i-Perception 13, 601–617. doi: 10.1068/p130601
signal processing. eds. J. Blauert and J. Braasch (Cham: Springer).
Otte, R. J., Agterberg, M. J. H., Van Wanrooij, M. M., Snik, A. F. M., and Van
Majdak, P., Goupell, M. J., and Laback, B. (2010). 3-D localization of virtual sound Opstal, A. J. (2013). Age-related hearing loss and ear morphology affect vertical but not
sources: effects of visual environment, pointing method, and training. Atten. Percept. horizontal sound-localization performance. J. Assoc. Res. Otolaryngol. 14, 261–273. doi:
Psychophys. 72, 454–469. doi: 10.3758/APP.72.2.454 10.1007/s10162-012-0367-7
Mäkivirta, A., Malinen, M., Johansson, J., Saari, V., Karjalainen, A., and Vosough, P. Parise, C. V., Spence, C., and Ernst, M. O. (2012). When correlation implies causation
(2020). “Accuracy of photogrammetric extraction of the head and torso shape for personal in multisensory integration. Curr. Biol. 22, 46–49. doi: 10.1016/j.cub.2011.11.039
acoustic HRTF modeling” in 148th audio engineering society international convention.
Parseihian, G., Jouffrais, C., and Katz, B. F. G. (2014). Reaching nearby sources:
Makous, J. C., and Middlebrooks, J. C. (1990). Two-dimensional sound localization comparison between real and virtual sound and visual targets. Front. Neurosci. 8:269.
by human listeners. J. Acoust. Soc. Am. 87, 2188–2200. doi: 10.1121/1.399186 doi: 10.3389/fnins.2014.00269
Marmel, F., Marrufo-Pérez, M. I., Heeren, J., Ewert, S., and Lopez-Poveda, E. A. Pastore, M. T., Natale, S. J., Yost, W. A., and Dorman, M. F. (2018). Head movements
(2018). Effect of sound level on virtual and free-field localization of brief sounds in the allow listeners bilaterally implanted with cochlear implants to resolve front-back
anterior median plane. Hear. Res. 365, 28–35. doi: 10.1016/j.heares.2018.06.004 confusions. Ear Hear. 39, 1224–1231. doi: 10.1097/AUD.0000000000000581
Mauermann, M., Long, G. R., and Kollmeier, B. (2004). Fine structure of hearing Pernaux, J.-M. J. M., Emerit, M., and Nicol, R. (2003). Perceptual evaluation of
threshold and loudness perception. J. Acoust. Soc. Am. 116, 1066–1080. doi: binaural sound synthesis: the problem of reporting localization judgments. AES
10.1121/1.1760106 114th conv.

Frontiers in Psychology 17 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

Perrett, S., and Noble, W. (1997a). The contribution of head motion cues to localization Simpson, W. E., and Stanton, L. D. (1973). Head movement does not facilitate
of low-pass noise. Percept. Psychophys. 59, 1018–1026. doi: 10.3758/BF03205517 perception of the distance of a source of sound. Am. J. Psychol. 86, 151–159. doi:
10.2307/1421856
Perrett, S., and Noble, W. (1997b). The effect of head rotations on vertical plane sound
localization. J. Acoust. Soc. Am. 102, 2325–2332. doi: 10.1121/1.419642 Spagnol, S. (2020). HRTF selection by anthropometric regression for improving
horizontal localization accuracy. IEEE Signal Process. Lett. 27, 590–594. doi: 10.1109/
Perrott, D. R. (1984). Concurrent minimum audible angle: a re-examination of the
LSP.2020.2983633
concept of auditory spatial acuity. J. Acoust. Soc. Am. 75, 1201–1206. doi:
10.1121/1.390771 Speigle, J. M., and Loomis, J. M. (1993). Auditory distance perception by translating
observers. In proceedings of 1993 IEEE research properties in virtual reality symposium,
Perrott, D. R., and Saberi, K. (1990). Minimum audible angle thresholds for sources
VRAIS 1993.
varying in both elevation and azimuth. J. Acoust. Soc. Am. 87, 1728–1731. doi:
10.1121/1.399421 Spiousas, I., Etchemendy, P. E., Eguia, M. C., Calcagno, E. R., Abregú, E., and
Petersen, J. (1990). Estimation of loudness and apparent distance of pure tones in a Vergara, R. O. (2017). Sound spectrum influences auditory distance perception of sound
free field. Acta Acust. 70:5. sources located in a room environment. Front. Psychol, 8, 251475.

Poirier-Quinot, D., and Katz, B. F. G. (2018). “Impact of HRTF individualization on Stevens, S. S. (1955). The measurement of loudness. J. Acoust. Soc. Am. 27, 815–829.
player performance in a VR shooter game II” in Proceedings of the AES international doi: 10.1121/1.1908048
conference on Audio for Virtual and Augmented Reality. Stevens, S. S. (1958). Problems and methods of psychophysics. Psychol. Bull. 55,
Pollack, I., and Rose, M. (1967). Effect of head movement on the localization of sounds 177–196. doi: 10.1037/h0044251
in the equatorial plane. Percept. Psychophys. 2, 591–596. doi: 10.3758/BF03210274 Stevens, S. S., and Guirao, M. (1962). Loudness, reciprocality, and partition scales. J.
Populin, L. C. (2008). Human sound localization: measurements in untrained, head- Acoust. Soc. Am. 34, 1466–1471. doi: 10.1121/1.1918370
unrestrained subjects using gaze as a pointer. Exp. Brain Res. 190, 11–30. doi: 10.1007/ Stevens, S. S., and Newman, E. B. (1936). The localization of actual sources of sound.
s00221-008-1445-2 Am. J. Psychol. 48, 297–306. doi: 10.2307/1415748
Pralong, D., and Carlile, S. (1996). The role of individualized headphone calibration Stevens, S. S., and Volkmann, J. (1940). The relation of pitch to frequency: a revised
for the generation of high fidelity virtual auditory space. J. Acoust. Soc. Am. 100, scale. Am. J. Psychol. 53:329. doi: 10.2307/1417526
3785–3793. doi: 10.1121/1.417337
Stitt, P., Picinali, L., and Katz, B. F. G. (2019). Auditory accommodation to poorly
Rajendran, V. G., and Gamper, H. (2019). Spectral manipulation improves elevation matched non-individual spectral localization cues through active learning. Sci. Rep.
perception with non-individualized head-related transfer functions. J. Acoust. Soc. Am. 9:1063. doi: 10.1038/s41598-018-37873-0
145, EL222–EL228. doi: 10.1121/1.5093641
Tabry, V., Zatorre, R. J., and Voss, P. (2013). The influence of vision on sound
Rakerd, B., Vander Velde, T. J., and Hartmann, W. M. (1998). Sound localization in localization abilities in both the horizontal and vertical planes. Front. Psychol. 4:932. doi:
the median sagittal plane by listeners with presbyacusis. J. Am. Acad. Audiol. 9, 466–479. 10.3389/fpsyg.2013.00932
Rayleigh, L. (1907). XII. On our perception of sound direction. Philos. Mag. 13, Thavam, S., and Dietz, M. (2019). Smallest perceivable interaural time differences. J.
214–232. doi: 10.1080/14786440709463595 Acoust. Soc. Am. 145, 458–468. doi: 10.1121/1.5087566
Recanzone, G. H., Makhamra, S. D. D. R., and Guard, D. C. (1998). Comparison of Thurlow, W. R., Mangels, J. W., and Runge, P. S. (1967). Head movements during
relative and absolute sound localization ability in humans. J. Acoust. Soc. Am. 103, sound localization. J. Acoust. Soc. Am. 42, 489–493. doi: 10.1121/1.1910605
1085–1097. doi: 10.1121/1.421222
Thurlow, W. R., and Mergener, J. R. (1970). Effect of stimulus duration on
Reed, D. K., and Maher, R. C. (2009). “An investigation of early reflection’s effect on localization of direction noise stimuli. J. Speech Hear. Res. 13, 826–838. doi: 10.1044/
front-back localization in spatial audio” in In Audio Engineering Society Convention 127. jshr.1304.826
Audio Engineering Society.
Trapeau, R., and Schönwiesner, M. (2018). The encoding of sound source elevation in the
Reinhardt-Rutland, A. H. (1995). Increasing-and decreasing-loudness aftereffects: human auditory cortex. J. Neurosci. 38, 3252–3264. doi: 10.1523/JNEUROSCI.2530-17.2018
asymmetrical functions for absolute rate of sound level change in adapting stimulus. J.
Gen. Psychol. 122, 187–193. doi: 10.1080/00221309.1995.9921231 Van Wanrooij, M. M., and Van Opstal, A. J. (2004). Contribution of head shadow and
Pinna cues to chronic monaural sound localization. J. Neurosci. 24, 4163–4171. doi:
Rhodes, G. (1987). Auditory attention and the representation of spatial information. 10.1523/JNEUROSCI.0048-04.2004
Percept. Psychophys. 42, 1–14. doi: 10.3758/BF03211508
Viaud-Delmon, I., and Warusfel, O. (2014). From ear to body: the auditory-motor
Riesz, R. R. (1932). A relationship between loudness and the minimum perceptible
loop in spatial cognition. Front. Neurosci. 8:283. doi: 10.3389/fnins.2014.00283
increment of intensity. J. Acoust. Soc. Am. 4:6. doi: 10.1121/1.1901961
Risoud, M., Hanson, J. N., Gauvrit, F., Renard, C., Bonne, N. X., and Vincent, C. (2020). Vliegen, J., and Van Opstal, A. J. (2004). The influence of duration and level on human
Azimuthal sound source localization of various sound stimuli under different conditions. sound localization. J. Acoust. Soc. Am. 115, 1705–1713. doi: 10.1121/1.1687423
Eur. Ann. Otorhinolaryngol. Head Neck Dis. 137, 21–29. doi: 10.1016/j.anorl.2019.09.007 von Békésy, G. (1938). Über die Entstehung der Entfernungsempfindung beim Hören.
Risoud, M., Hanson, J. N., Gauvrit, F., Renard, C., Lemesre, P. E., Bonne, N. X., et al. Akust. Zeitschrift. 3, 21–31.
(2018). Sound source localization. Eur. Ann. Otorhinolaryngol. Head Neck Dis. 135,
Wallach, H. (1940). The role of head movements and vestibular and visual cues in
259–264. doi: 10.1016/j.anorl.2018.04.009
sound localization. J. Exp. Psychol. 27, 339–368. doi: 10.1037/h0054629
Robinson, D. W., and Dadson, R. S. (1956). Equal-loudness relations, and threshold
of hearing for pure tones. J. Acoust. Soc. Am. 28, 763–764. doi: 10.1121/1.1905030 Wang, Y. (2007). On the cognitive processes of human perception with emotions,
motivations, and attitudes. Int. J. Cogn. Informat. Nat. Intell. 1, 1–13. doi: 10.4018/
Röhl, M., and Uppenkamp, S. (2012). Neural coding of sound intensity and loudness jcini.2007100101
in the human auditory system. J. Assoc. Res. Otolaryngol. 13, 369–379. doi: 10.1007/
s10162-012-0315-6 Wang, J., Lu, X., Sang, J., Cai, J., and Zheng, C. (2022). Effects of stimulation position
and frequency band on auditory spatial perception with bilateral bone conduction.
Ronsse, L. M., and Wang, L. M. (2012). Effects of room size and reverberation, receiver Trends Hear. 26:233121652210971. doi: 10.1177/23312165221097196
location, and source rotation on acoustical metrics related to source localization. Acta
Acust. United Acust. 98, 768–775. doi: 10.3813/AAA.918558 Warren, R. M. (1958). A basis for judgments of sensory intensity. Am. J. Psychol. 71,
675–687.
Rummukainen, O. S., Schlecht, S. J., and Habets, E. A. (2018). Self-translation induced
minimum audible angle. J. Acoust. Soc. Am, 144, 340–345. Warren, R., Sersen, E., and Pores, E. (1958). A basis for loudness-judgments. Am. J.
Psychol. 71, 700–709.
Rychtáriková, M., van den Bogaert, T., Vermeir, G., and Wouters, J. (2011). Perceptual
validation of virtual room acoustics: sound localisation and speech understanding. Appl. Wenzel, E. M., Arruda, M., Kistler, D. J., and Wightman, F. L. (1993). Localization
Acoust. 72, 196–204. doi: 10.1016/j.apacoust.2010.11.012 using nonindividualized head-related transfer functions. J. Acoust. Soc. Am. 94, 111–123.
doi: 10.1121/1.407089
Sandel, T. T., Teas, D. C., Feddersen, W. E., and Jeffress, L. A. (1955). Localization of sound
from single and paired sources. J. Acoust. Soc. Am. 27, 842–852. doi: 10.1121/1.1908052 Werner, S., Klein, F., and Sporer, T. (2016). “Adjustment of the direct-to-reverberant-
Sayers, B. M. (1964). Acoustic-image lateralization judgments with binaural tones. J. energy-ratio to reach externalization within a binaural synthesis system” In Audio
Acoust. Soc. Am. 36, 923–926. doi: 10.1121/1.1919121 Engineering Society Conference: 2016 AES International Conference on Audio for Virtual
and Augmented Reality. Audio Engineering Society.
Schoeffler, M., Westphal, S., Adami, A., Bayerlein, H., and Herre, J. (2014).
Comparison of a 2D-and 3D-based graphical user Interface for localization listening Wickens, C. D. (1991). Processing resources in attention. Mult. Perform.
tests. In proc. of the EAA joint symposium on Auralization and Ambisonics. Wightman, F. L., and Kistler, D. J. (1989). Headphone simulation of free-field
Shaw, E. A. G. (1974). Transformation of sound pressure level from the free field to the listening. II: psychophysical validation. J. Acoust. Soc. Am. 85, 868–878. doi:
eardrum in the horizontal plane. J. Acoust. Soc. Am. 56, 1848–1861. doi: 10.1121/1.1903522 10.1121/1.397558
Sherlock, L. G. P., Perry, T. T., and Brungart, D. S. (2021). Evaluation of extended-wear Wightman, F. L., and Kistler, D. J. (1992). The dominant role of low-frequency
hearing aids as a solution for intermittently noise-exposed listeners with hearing loss. interaural time differences in sound localization. J. Acoust. Soc. Am. 91, 1648–1661. doi:
Ear Hear. 42, 1544–1559. doi: 10.1097/AUD.0000000000001044 10.1121/1.402445

Frontiers in Psychology 18 frontiersin.org


Carlini et al. 10.3389/fpsyg.2024.1408073

Wightman, F., and Kistler, D. (1997). “Factors affecting the relative salience of sound Zahorik, P., and Wightman, F. L. (2001). Loudness constancy with varying sound
localization cues” in Binaural and spatial hearing in real and virtual environments. eds. source distance. Nat. Neurosci. 4, 78–83. doi: 10.1038/82931
R. Gilkey and T. R. Anderson (Psychology Press), 1–23. Zhang, M., Kennedy, R. A., Abhayapala, T. D., and Zhang, W. (2011). Statistical method
Wightman, F. L., and Kistler, D. J. (1999). Resolution of front–back ambiguity in to identify key anthropometric parameters in hrtf individualization. In 2011 joint
spatial hearing by listener and source movement. J. Acoust. Soc. Am. 105, 2841–2853. workshop on hands-free speech communication and microphone arrays, HSCMA’11.
doi: 10.1121/1.426899 IEEE.
Withington, D. (1999). “Localisable Alarms” in Human factors in auditory warnings. Zhong, X., and Yost, W. A. (2017). How many images are in an auditory scene? J.
eds. N. A. Stanton and J. Edworthy (Routledge: Ashgate Publishing Ltd). Acoust. Soc. Am. 141:2882. doi: 10.1121/1.4981118
Yost, W. A. (1981). Lateral position of sinusoids presented with interaural intensive Zotkin, D. N., Duraiswami, R., and Davis, L. S. (2002). Creation of virtual auditory
and temporal differences. J. Acoust. Soc. Am. 70, 397–409. doi: 10.1121/1.386775 spaces. In ICASSP, IEEE international conference on acoustics, speech and signal
processing-proceedings. (Vol. 2, pp. II-2113). IEEE.
Yost, W. A. (2016). Sound source localization identification accuracy: level and
duration dependencies. J. Acoust. Soc. Am. 140, EL14–EL19. doi: 10.1121/1.4954870 Zotkin, D. N., Duraiswami, R., and Davis, L. S. (2004). Rendering localized spatial
audio in a virtual auditory space. IEEE Trans. Multimed. 6, 553–564. doi: 10.1109/
Yost, W. A. (2017). “History of sound source localization: 1850-1950” In Proceedings of TMM.2004.827516
Meetings on Acoustics (Vol. 30, No. 1). AIP (American Institute of Physics) Publishing.
Zotkin, D. Y. N., Hwang, J., Duraiswaini, R., and Davis, L. S. (2003). “HRTF
Yost, W. A., and Zhong, X. (2014). Sound source localization identification accuracy: personalization using anthropometric measurements” in IEEE workshop on applications
bandwidth dependencies. J. Acoust. Soc. Am. 136, 2737–2746. doi: 10.1121/1.4898045 of signal processing to audio and acoustics. (IEEE Cat. No. 03TH8684). (pp. 157-160).
Zahorik, P. (2002). Assessing auditory distance perception using virtual acoustics. J. IEEE.
Acoust. Soc. Am. 111, 1832–1846. doi: 10.1121/1.1458027 Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands
(Frequenzgruppen). J. Acoust. Soc. Am. 33:248. doi: 10.1121/1.1908630
Zahorik, P., Brungart, D. S., and Bronkhorst, A. W. (2005). Auditory distance
perception in humans: a summary of past and present research. Acta Acust. United Zwislocki, J., and Feldman, R. S. (1956). Just noticeable differences in dichotic phase.
Acust. 91, 409–420. J. Acoust. Soc. Am. 28, 860–864. doi: 10.1121/1.1908495

Frontiers in Psychology 19 frontiersin.org

You might also like