0% found this document useful (0 votes)
174 views156 pages

Critical Listening in Audio Production

The document talks about the development of critical listening skills in audio engineers. He explains that engineers must rely on both technical knowledge and listening skills to guide their work in audio production. The goal of the book is to help engineers map technical parameters to perceived auditory qualities and improve their ability to distinguish subtle changes in sound quality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views156 pages

Critical Listening in Audio Production

The document talks about the development of critical listening skills in audio engineers. He explains that engineers must rely on both technical knowledge and listening skills to guide their work in audio production. The goal of the book is to help engineers map technical parameters to perceived auditory qualities and improve their ability to distinguish subtle changes in sound quality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

AUDIO PRODUCTION

AND CRITICAL
LISTENING
Technical Ear Training

JASON COREY

Translation: Andrés Pérez Vargas


INTRODUCTION

The practice of audio engineering is both an art and a science. To be


successful in audio production, an engineer should ideally possess both an
understanding of theoretical concepts and highly developed critical listening
skills related to sound recording and production. Each recording project has
its own set of requirements and engineers cannot rely on one set of recording
procedures for each project. As such, they must rely on a combination of
technical knowledge and listening skills to guide their work.
Although technical knowledge of analog electronics, digital signal
processing, audio signal analysis, and theoretical aspects of audio equipment
is essential to a solid understanding of audio engineering principles, many of
the decisions taken during a recording project, such as microphone choice
and placement, mix balance, fader levels, and signal processing, are based
solely on what you hear. As such, it is often the ability to navigate subjective
impressions of audio that allows engineers to successfully improve sound
quality.
Every action taken by an engineer in relation to an audio signal will
have some effect on the sound the listener hears, no matter how subtle, and
an engineer must have a finely tuned ear attentive to the finer details of
timbre and quality of the sound. sound. Most of these subjective decisions
respond to the artistic goals of a project, and engineers must determine,
based on what they hear, whether a technical choice is contributing to or
detracting from these goals. Engineers need to know how the technical
parameters of audio hardware and software devices affect perceived sonic
attributes.
In addition to possessing technical and theoretical experience,
successful audio engineers possess the ability to differentiate the timbral,
dynamic, and technical details of sound. They can translate their auditory
impressions into appropriate technical judgments and alterations. Sometimes
referred to as “Golden Ears,” these highly experienced audio professionals
possess the extraordinary ability to focus their auditory attention, resulting in
efficient and precise control of audio signals. They are expert listeners,
individuals who possess highly developed critical listening skills and who can
identify fine details in sound and make consistent judgments about what they
hear (Stone, 1993). These experienced engineers identify deficiencies that
need to be resolved and features that need to be highlighted in an audio
signal.
Engineers can gradually develop and improve critical listening skills
over time as they work in the audio field, but there are systematic methods
that can shorten the time needed to make significant progress in ear training.
As René Quesnel reported in his doctoral thesis, sound recording students
who completed systematic technical ear training outperformed experienced
audio professionals in tasks such as identifying the frequency and gain
settings of parametric equalization (Quesnel, 2001 ). Typically, developing an
audio engineer's listening skills occurs on the job. Although it was once
common for entry-level engineers to work with and learn from more
experienced engineers in the context of hands-on experience, the audio
industry has undergone drastic changes and the apprentice model is gradually
disappearing from audio engineering practice. . Despite this evolution in the
audio industry, critical listening skills remain as important as ever, especially
as we see a decline in audio quality in many consumer audio formats. This
book presents some ideas for developing critical listening skills and
potentially reducing the time it takes to develop them.
A number of questions arise as we begin to consider the critical
listening skills related to sound recording and production:

• What listening skills do experienced sound engineers, producers,


tonemeisters and musicians possess that allow them to make
recordings, mix sound for films or equalize sound systems better than a
novice engineer?
• What can legendary engineers and producers, who have extraordinary
abilities to identify and manipulate sonic timbres, hear that the average
person cannot?
• How do audio professionals listen and consistently identify extremely
subtle characteristics or changes in an audio signal?
• How do expert listeners translate between their perceptions of sound
and the physical control parameters available to them?
• How can non-expert listeners acquire similar skills, allowing them to
identify the physical parameters of an audio signal necessary to achieve
a desired perceptual effect?
• What specific aspects of sound should novice audio engineers be aware
of?
A significant amount has been written on the technical and theoretical
aspects of sound, sound reproduction, and auditory perception, but this book
focuses on developing the critical listening skills necessary for the successful
practice of audio engineering.
To facilitate the training process, the software modules that
accompany the book allow the reader to practice listening to the effects of
different types of audio signal processing. The software practice modules
allow progression through various levels of difficulty and provide the practical
training necessary in developing technical listening skills.

Audio attributes
The primary goal of this book and its accompanying software is to explore
critical listening as it relates to typical types of audio signal processing. Unlike
musical aural skills or music theory, technical ear training focuses on the sonic
effects of the most common types of signal processing used in sound
recording and playback systems, such as equalization, dynamic processing
and the reverberation. Knowledge of the sonic effects of audio signal
processing, along with the ability to discriminate between small changes in
sound quality, allows engineers to make effective changes to the reproduced
sound as needed. Highly developed critical listening skills allow an engineer to
identify not only the effects of deliberate signal processing, but also
unintentional or unwanted artifacts, such as noise, buzz, hum, and distortion.
Once these undesirable sounds are identified, an engineer can work to
eliminate or reduce their presence.
The book is organized according to the common audio processing tools
available to the audio engineer. In this book, we will explore the following
main audio attributes and associated devices:
• Spectral balance: parametric equalization
• Spatial attributes: delay and reverb
• Dynamic range control: compression/limiting and expansion
• Sounds or sound qualities that can detract from recordings: distortion
and noise
• Audio Excerpt Breakpoints: Source and Destination Editing

Book objectives
There are three main objectives of this book and software:
1. Facilitate isomorphic mapping of technical parameters and perceived
sound qualities. Isomorphic mapping is a link of technical and
engineering parameters with auditory perceptual attributes. Engineers
must be able to diagnose problematic sonic artifacts in a recording and
understand their causes. In audio, engineers are translating between
physical control parameters (i.e. frequency in hertz, sound level in
decibels) and the perception of an audio signal (i.e. timbre, volume).
2. To increase awareness of subtle characteristics and attributes of sound,
and promote greater ability to differentiate between minor changes in
sound quality or signal processing.
3. To increase the speed with which sound characteristics can be
identified, translate between auditory perceptions and signal
processing control parameters, and decide which physical parameters
need to be changed in a given situation.
To achieve these goals, Chapters 2, 3, 4, and 5 focus on specific types of audio
processing and artifacts: equalization, reverb and delay, dynamics processing,
and distortion and noise, respectively.
Chapter 2 focuses on the spectral balance of an audio signal and how
filtering and parametric equalization influence it. Spectral balance is the
relative level of various frequency bands within the full audio band (20 to
20,000 Hz), and this chapter focuses specifically on parametric equalizers.
Spatial properties of the reproduced sound include source panning,
reverb, echo, and delay (with and without feedback). Chapter 3 examines
training methods for spatial attributes.
Dynamic processing is widely used in recorded music. Audio processing
effects such as compression, limiting, expansion, and gating provide means to
sculpt audio signals into unique, time-varying shapes. Dynamic range
compression can be one of the most difficult types of processing for a
beginning engineer to use. In many algorithms, the controllable parameters
are interrelated to some extent and affect how they are used and heard.
Chapter 4 discusses dynamic processing and offers practice exercises on the
auditory artifacts produced by these different effects.
Distortion can be applied intentionally to a recording or elements
within a recording as an effect, such as with electric guitars, but recording
engineers generally try to avoid unintentional distortion, such as overloading
an analog gain stage or analog converter. to digital. Chapter 5 explores
additional types of distortion, such as bitrate reduction and perceptual
keying, as well as other types of sonic artifacts that detract from a sound
recording, namely extraneous noises, clicks, pops, buzz and hum.
Chapter 6 focuses on cutoff points for audio excerpts and introduces a
novel type of ear training practice based on the source-target editing process.
The act of finding edit points can also sharpen the ability to differentiate
changes in breakpoints at the millisecond level. The accompanying software
module mimics the process of finding an edit point by comparing the end
point of one clip with the end point of a second clip of identical music.
Finally, Chapter 7 examines analysis techniques for recorded sound.
Although there are established traditions of theoretical music analysis, there
is no standardized method for analyzing recordings from a timbral, sound
quality, spatial imaging, aesthetic, or technical point of view. This chapter
presents some methods for analyzing musical recordings and presents some
examples of analyzing commercially available recordings.
There have been significant contributions to the field of technical ear
training appearing in conference and journal articles, including "Selection and
Training of Subjects for Hearing Testing on Sound Reproduction Equipment"
by Bech (1992); “Training versus Practice in Spatial Audio Attribute
Assessment Tasks” by Kassier, Brookes, and Rumsey (2007); "Timbre Solfege:
A Technical Listening Course for Sound Engineers" by Miskiewicz (1992); “A
Method for Training Listeners and Selecting Program Material for Listening
Tests” by Olive (1994); and "Tymbral Ear Trainer: Interactive and Adaptive
Training of Auditory Skills for Timbre Assessment" (1996). This book builds on
previous research and presents methods for practicing and developing critical
listening skills in the context of audio production.
The author assumes that the reader has completed some
undergraduate-level study in sound recording theory and practice and has an
understanding of basic audio theory topics such as decibels, equalization,
dynamics, microphones, and miking techniques.

The accompanying software


Due to the somewhat abstract nature of simply reading critical listening,
several software modules have been included with this book to help the
reader practice listening to the various types of signal processing described
here. The accompanying software practice modules are interactive, allowing
the user to adjust the parameters of each type of processing and receive
immediate auditory feedback, mimicking what happens in the recording and
mixing studio. Although some of the modules simply provide examples of
sound processing, others offer exercises that involve absolute matching and
identification of processing parameters by ear. The benefit of matching
exercises lies primarily in providing the opportunity to completely trust what
is heard without having to translate it into a verbal representation of a sound.
Using digital recordings for ear training practice has an advantage over
analog recordings or acoustic sounds in that digital recordings can be played
back multiple times in exactly the same way. Some specific sound recordings
are suggested in the book, but there are other locations for obtaining sound
samples useful for targeting different types of processing. At the time of
writing, single instrument samples and mix stems can be downloaded from
many websites, such as the following:
http://bush-of-ghosts.com/remix/bush_of_ghosts.htm
www.freesound.org
www.realworldremixed.com/download.php
www.royerlabs.com
Additionally, software programs such as Apple's Logic and GarageBand
include libraries of single-instrument sounds that can serve as sound sources
in the software's practice modules.
This book does not focus on specific models of commercially available
audio processing software or hardware, but treats each type of processing as
typical of what can be found among professional audio devices and software.
The audio processing modules that are commercially available vary from
model to model, and the author feels that the training discussed in this book
and applied in the software modules serves as a solid starting point for ear
training and can be extrapolated to most business models.
What this book does not attempt to do is provide recommendations for
signal processing setup or microphone techniques for different instruments
or recording setups. It is impossible to have a one-size-fits-all approach to
audio production, and the goal is to help the reader listen more critically and
in more detail to shape each individual recording.
All software modules are included on the accompanying CD-ROM, and
software updates will be posted periodically on the author's website:
www.personal.umich.edu/~coreyja .
https://sites.google.com/a/umich.edu/jason-corey/technical-ear-training?
authuser=0
https://webtet.net/apcl/#/
Chapter 1
LISTENING

We are exposed to sound throughout every moment of every day


whether we pay attention to it or not. The sounds we hear give us an idea not
only of their sources, but also of the nature of our physical environment
around us such as objects, walls and structures. Whether we are in a highly
reverberant environment or an anechoic chamber, the quality of reflected
sound or the lack of reflections tells us about the physical properties of our
location. The environment around us becomes audible, even if it is not
creating sound itself, by the way it affects sound, through patterns of
reflection and absorption.
Just as a light source illuminates surrounding objects, sound sources
allow us to hear the general shape and size of our physical environment.
Because we are primarily oriented toward visual stimuli, it may take a
constant and dedicated effort to focus our awareness on the auditory
domain. As anyone who works in the field of audio engineering knows, the
effort it takes to focus our listening awareness is well worth the satisfaction of
acquiring critical listening skills. Although simple in concept, the practice of
focusing attention on what you hear in a structured and organized manner is
challenging to achieve consistently.
There are many situations outside of audio production where listening
skills can be developed. For example, when walking through a construction
site, impulsive sounds like hammering may be heard. Echoes, the result of
those initial impulses reflecting off the exteriors of nearby buildings, can also
be heard shortly thereafter. The timing, location, and amplitude of the echoes
provide us with information about nearby buildings, including approximate
distances to them.
When listening in a large concert hall, we notice that the sound
continues and slowly fades out after a source stops playing. The gradual
degradation of sound in a large acoustic space is called reverberation . The
sound in a concert hall can be immersive because it seems to come from all
directions, and the sound produced on the stage combines with the
reverberant sound coming from all directions.
In a completely different location, such as a carpeted living room, a
musical instrument will sound noticeably different compared to the same
instrument played in a concert hall. Physical characteristics such as
dimensions and surface treatments of a living room determine that its
acoustic characteristics are markedly different from those of a concert hall;
the reverberation time will be significantly shorter in a living room. The
relatively close proximity of the walls will reflect the sound back to the
listener within milliseconds of the arrival of the direct sound and with almost
the same amplitude.
This small difference in the arrival time and nearly equal amplitude of
direct and reflected sound at a listener's ears creates a change in the
frequency content of the sound being heard, due to a filtering of the sound
known as comb filtering . Floor covering can also influence spectral balance: a
carpeted floor will absorb some high frequencies and a wooden floor will
reflect high frequencies.
When observing the surrounding soundscape, the listener may want to
consider questions such as the following:
• What sounds are present at any given time?
• Aside from the more obvious sounds, are there any constant, constant,
sustained sounds, such as the noise of air manipulation or the
humming of lights, that are usually ignored?
• Where is each sound located? Are the locations clear and distinct or
diffuse and ambiguous?
• How far away are the sound sources?
• How loud are they?
• What is the character of the acoustic space? Are there echoes? What is
the reverb decay time?

It can be informative to aurally analyze recorded music heard at


any time, whether in a store, club, restaurant, or elevator. It is useful to
think of additional questions in such situations:
• How is the timbre of sound affected by the system and environment
through which it is presented?
• Are all elements of sound clearly audible? If not, what elements are
difficult to hear and which are the most prominent?
• If the music sounds familiar, does the balance seem the same as what
you've heard in other listening situations?

Active listening is essential in audio engineering and we can take


advantage of the moments when we are not specifically working on an
audio project to increase our awareness of the listening landscape and
practice our critical listening skills. Walking down the street, sitting in a
cafe, and attending a live music concert offer us opportunities to hone
our listening skills and thus improve our work with audio. For a more
detailed study of some of these ideas, see Blesser and Salter's 2006
book, Spaces Speak , Are You Listening? , where they expand the
listening of acoustic spaces into a detailed exploration of auditory
architecture.

Audio engineers are concerned with capturing, mixing and


shaping sound. Whether recording acoustic sound, such as that of
acoustic musical instruments played in a live acoustic space, or creating
electronic sounds in a digital medium, one of the goals of an engineer is
to shape the sound so that it is most appropriate. for playback through
speakers and headphones and best communicates the intentions of a
musical artist. An important aspect of sound recording that an engineer
seeks to control is the relative balance of instruments or sound sources,
either through manipulation of recorded audio signals or through
microphone and ensemble placement. The way sound sources are
mixed and balanced in a recording can have a tremendous effect on the
musical feel of a composition. Musical and spectral balance is critical to
the overall impact of a recording.
Throughout the process of shaping sound, no matter what
equipment is being used or what the end goal is, the engineer's main
focus is simply listening. Engineers need to constantly analyze what
they hear to evaluate a track or mix and help make decisions about
additional adjustments to balance and processing. Listening is an active
process, challenging the engineer to remain continually aware of any
subtle or not-so-subtle perceived characteristics, changes or defects in
an audio signal.
From the producer to the third assistant engineer, active
listening is a priority for everyone involved in any audio production
process. No matter your role, practice thinking and listening to the
following elements in every recording project:
• Doorbell . Is a particular microphone in the right place for a given
application? Does it need to be matched? Is the overall tone of a mix
appropriate?
• Dynamics . Do sound levels vary too much or not enough? Can you hear
each sound source throughout the piece? Are there times when a
sound source is lost or covered by other sounds? Is there one sound
source that is dominating the others?
• The balance sheet . Does the balance of musical instruments and other
sound sources make sense for music? Or is there too much of one
component and not enough of another?
• Distortion/clipping . Is there any signal level that is too high causing
distortion?
• Strange noise . Is there a hum or hum from a faulty cable or connection
or ground problem?
• Space . Is the reverb/delay/echo okay?
• Panoramic . How does the left/right balance come out of the speaker
mix?

1.1What is technical ear training?


Just as musical ear training or solfeggio is an integral part of musical
training, technical ear training is necessary for everyone who works in
audio, whether in a recording studio, in live sound reinforcement or in the
field. audio hardware/software development. Technical ear training is a
type of perceptual learning focused on the timbral, dynamic and spatial
attributes of sound in relation to audio recording and production. In other
words, improved listening skills can be developed that allow an engineer
to analyze and rely on auditory perceptions in a more concrete and
consistent way. As Eleanor Gibson wrote, perceptual learning refers to “an
increase in the ability to extract information from the environment, as a
result of experience and practice with the stimulation that comes from it”
(Gibson, 1969). This is not a new idea, and through years of working with
audio, recording engineers typically develop strong critical listening skills.
By paying more attention to specific types of sounds and comparing
successively smaller differences between sounds, engineers can learn to
differentiate the characteristics of sounds. When two listeners, an expert
and a novice, with identical hearing abilities, receive identical audio
signals, it is likely that an expert listener will be able to identify specific
features of the audio that a novice listener will not recognize. Through
focused practice, a novice engineer can eventually learn to identify sounds
and sound qualities that were originally indistinguishable.
A subset of technical ear training includes "timbral" ear training
which focuses on the timbre of sound. One of the goals of following this
type of training is to become more skilled at distinguishing and analyzing a
variety of timbres.
Timbre is typically defined as the characteristic of sound other than pitch
or loudness, which allows the listener to distinguish two or more sounds.
Timbre is a multidimensional attribute of sound and depends on a series of
physical factors such as the following:
• Spectral content . All frequencies present in a sound.
• Spectral balance . The relative balance of individual frequencies or
frequency ranges.
• Amplitude envelope . Mainly the attack (or onset) and decay time of
the overall sound, but also that of the individual harmonics.
A person without specific training in audio or music can easily
distinguish between the sound of a trumpet and a violin even if they
both play the same pitch at the same volume: the two instruments
sound different. In the world of recorded sound, engineers often work
with much more subtle timbre differences that are not entirely obvious
to a casual listener. For example, an engineer may be comparing the
sound of two different microphone preamplifiers or two digital audio
sample rates. At this level of subtlety, a novice listener may not hear
any difference, but it is the responsibility of the experienced engineer
to be able to make decisions based on such subtle details.
Technical ear training focuses on the sonic characteristics and
artifacts that are produced by various types of signal processing
commonly used in audio engineering, such as the following:
• Equalization and filtering
• Reverb and delay
• Dynamic processing
• Stereo image characteristics
It also focuses on unwanted or unwanted sonic functions,
characteristics and artifacts that may be produced through faulty equipment,
particular equipment connections or parameter settings in equipment such as
noise, hum or hum and unintentional non-linear distortion.
Through concentrated and focused listening, an engineer should be
able to identify sonic characteristics that can positively or negatively impact a
final audio mix and know how subjective impressions of timbre relate to
physical control parameters. The ability to quickly focus on and make
decisions about subtle details in sound is the main goal of an engineer.
The process of sound recording has had a profound effect on the
development of music since the mid-20th century. Music has transformed
from an art form that could only be heard through a live performance to one
in which a recorded performance can be heard over and over again through a
storage medium and playback system. Sound recordings may simply
document a musical performance, or they may play a more active role in
applying specific signal processing and timbral sculpting to recorded sounds.
With a sound recording we are creating a virtual sound stage between our
speakers, in which instrumental and vocal sounds are located. Within this
virtual stage, recording engineers can place each instrument and sound.
With technical ear training, we focus not only on hearing specific sound
characteristics, but also on identifying specific sound characteristics and types
of processing that make a characteristic audible. It's one thing to be able to
know that there is a difference between an equalized and unequalized
recording, but quite another to be able to name the specific alteration in
terms of center frequency, Q, and gain. Just as experts in visual art and
graphic design can identify subtle color hues and shades by name, audio
professionals should be able to do the same in the auditory domain.
Sound engineers, hardware and software designers, and developers of
the latest perceptual encoders rely on critical auditory skills to help make
decisions about a variety of sound characteristics and sound processing.
Many characteristics can be measured objectively with test equipment and
test signals such as pink noise and sine tones. Unfortunately, these objective
measurements do not always provide a complete picture of what the
equipment will sound like to human ears using musical cues. Some
researchers such as Geddes and Lee (2003) have noted that high levels of
nonlinear distortion measured on a device may be less noticeable to listeners
than low levels of measured distortion, depending on the nature of the
distortion and the testing methods. employees. The opposite may also be
true, as listeners may strongly perceive low levels of measured distortion.
This type of situation may be true for other audio specifications, such
as frequency response. Listeners may prefer a speaker that does not have a
flat frequency response to one that does because frequency response is only
an objective measure of the total sound produced by a speaker. In other
areas of audio product design, final fine-tuning of software algorithms and
hardware designs is often performed by expert listeners. Therefore, physical
measurements alone cannot be relied upon and it is often auditory
perceptions that determine the verdict on sound quality.
Professionals who work with recorded sound on a daily basis
understand the need to listen for subtle changes in sound. It is important to
know not only how these changes occurred, but also ways to use the tools
available to remedy any problematic features.
1.1.1 Isomorphic map
Professionals who work with recorded sound on a daily basis
understand the need to listen for subtle changes in sound. It is
important to know not only how these changes occurred, but also ways
to use the tools available to remedy any problematic features. One of
the main objectives of this book is to facilitate the isomorphic mapping
of technical and engineering parameters to perceptual attributes; to
help link auditory perceptions with control of the physical properties of
audio signals.
With audio recording technology, engineers have control over
technical parameters that correspond to the physical attributes of an
audio signal, but it is often unclear to the beginner how to map a
perceived feel to control of the objective parameters of sound. A
parametric equalizer, for example, typically allows us to control
frequency, gain, and Q. These physical attributes, as labeled on a
device, have no natural or obvious correlation to the perceptual
attributes of an audio signal, and yet they are used by engineers to
affect a listener's perception of a signal. How does an engineer know
what a 6 dB boost at 315 Hz with a Q of 2 sounds like? Without
extensive experience with equalizers, these numbers will have little
meaning in terms of how they affect the perceived timbre of a sound.
There is an isomorphism between the audio equipment normally
used to make a recording and the type of sound that an engineer hears
and wants to obtain. An engineer can form mental links between
particular characteristics of sound quality and specific types of signal
processing or equipment. For example, a novice audio engineer may
understand what the term compression ratio means in theory, but the
engineer may not know how to adjust that parameter on a compressor
to effectively alter the sound or not fully understand how the sound
changes when that parameter It is balanced. An important component
of teaching audio engineering is illustrating the mapping between
engineering concepts and their respective effect on the sound heard.
Teaching these concepts requires the use of audio examples as well as
specific training for each type of processing. Ear training is as important
as knowing the functionality of the equipment available. Letowski, in
his article “Developing Technical Auditory Skills: Timbre Solfeggio”
(1985), originally coined the term timbre solfeggio to designate training
that has similarities to musical ear training but focuses on spectral
balance or timbre. .
If an engineer uses words like bright or cloudy to describe the
quality of a sound, it is not clear exactly what physical characteristics
are responsible for a particular subjective quality; It could be specific
frequencies, resonances, dynamic processing, artificial reverb, or some
combination of all of these and more. There is no label on an equalizer
that indicates how to affect these subjective parameters. Likewise,
subjective descriptions by their nature are not always consistent from
person to person or between situations. A “bright” sounding drum may
mean an excess of energy around 4 to 8 kHz in one situation or a
deficiency of around 125 Hz in another. It is difficult to be precise with
subjective descriptions of sound, but ambiguity can be reduced if
everyone agrees on the exact meaning of the adjectives being used.
Continuing with the example, an equalizer requires that a specific
frequency be chosen to boost or cut, but a verbal adjective chosen to
describe a sound can only give a vague indication that the actual
frequency is in the low, mid, or high frequency range. . It is essential to
develop an internal map of specific frequencies for the perceptual
attributes of a signal, and what a boost or cut at specific frequencies
sounds like. With practice, it is possible to learn to estimate the
frequency of an energy deficiency or excess in the power spectrum of
an audio signal and then adjust it by ear.
Through years of practice, professional audio engineers develop
methods for translating between their perceived auditory sensations
and the technical parameters they can control with the equipment
available to them. They also develop a finely tuned awareness of the
subtle details present in sound recordings. Although there may be no
common language among recording engineers to describe specific
auditory stimuli, engineers working at a very high level have devised
their own personal translation between the sound they hear and
imagine and the signal processing tools available. Comparison of
audiological examinations between professional and novice engineers
would probably not demonstrate superior listening skills in
professionals from a clinical and objective point of view. Something
else is happening: professionals are more advanced in their ability to
focus on sound.
Ideally, a recording engineer should have as much mastery of a
recording studio and its associated signal processing capabilities as a
professional musician has mastery of his instrument. A professional
violinist knows exactly when and where to place his fingers on the
strings and exactly what effect each bow movement will have on the
sound produced. There is an intimate knowledge and anticipation of a
sound even before it is produced. An audio engineer should have this
same level of sound processing and modeling knowledge and sensitivity
before looking at an effects processor parameter, fader position, or
microphone model. It is important to know what a 3 dB boost at 4 kHz
or an increase in compression ratio is going to sound like before it is
even applied to an audio signal. There will always be times when a
unique combination of signal processing and equipment options will
not be immediately evident, but it is very inefficient for an engineer to
continually guess what standard types of studio signal processing will
sound like. By knowing in advance what a particular parameter change
will have on the sound quality of a recorded signal, an engineer can
work more efficiently and effectively. Working at such a high level, an
engineer is able to respond to sound quality very quickly, similar to the
speed with which musicians respond to each other in an ensemble.
A recording studio can be thought of as a musical instrument that
is "played" by a recording engineer and producer. An engineer has a
direct involvement and influence on the artistic outcome of any musical
recording in which he or she is involved. By adjusting balances and
shaping spectra, an engineer focuses the sonic scene for listeners,
guiding them toward a musically satisfying experience that expresses
the musical artist's intentions.
1.1.2 Increased awareness
The second objective of technical ear training is to increase our
awareness of the subtle details of sound and develop our ability to
discern and identify minute changes in physical parameters. An
experienced recording engineer or producer can focus attention on
sonic details that may not be apparent to an inexperienced listener.
Often, the process of making a recording from start to finish is based on
hundreds, if not thousands, of decisions about technical aspects of
sound quality and timbre. Each decision contributes to a finished
project and influences other choices. These decisions cover a wide
range of options and levels of subtlety, but generally include:
• Microphone model, location and orientation each
instrument being recorded.
• Preamplifier model and gain settings for each microphone.
• Recording Level: Should be set high enough to reduce noise and
quantization error, and low enough to avoid overloading a gain
stage.
• EQ model and EQ parameter settings specific to each microphone
signal.
• Noise: which can take many forms, but is generally any sound
that is not intended to be part of a recording. Examples include
clicks/pops produced by analog or digital electronic devices, tape
hiss, quantization error, air handling noise (which may be in the
form of a thud and therefore not immediately apparent), sounds
external and environmental such as traffic and subway, 50 or 60
Hz hum.
• Timbral quality: mainly frequency content and spectral balance.
Every analog component, from the microphone to the input of
the recording device, as well as every stage of analog-to-digital
conversion and re-quantization, will have some effect on the
timbral quality of the audio.
• Dynamic Range and Processing: Sound, musical or otherwise, will
have a certain range from loud (fortissimo) to soft (pianissimo),
and this range can be altered by dynamic processing such as
compressors and expanders.
• Balance or mix levels of recorded microphone signals.
• Spatial Features: Includes reverb, echo, reflections, delays, as
well as panning and positioning of sound sources within the
stereo or surround image.
An engineer makes decisions about these and other technical
parameters that affect the perceived audio quality and timbre of an
audio signal.
It may be tempting to consider these subtle changes as
insignificant, but because they add up to form a coherent whole, the
cumulative effect makes each stage critical to a finished project.
Whether it is the quality of each component of a sound system or each
decision made at each stage of a recording project, the additive effect
is notable and substantial. Choices made early in a project that degrade
sound quality cannot be reversed later in a project. Audio problems
cannot be solved in the mix, and as such, engineers must listen
carefully to each and every signal path and processing decision that is
made. By listening at such a concentrated level, an engineer can
respond to sound quality and timbre quickly and in the moment,
listening for potential problems that may return to haunt a project at a
later stage. To use an analogy, painters use specific paint colors and
brush strokes in subtle ways that combine to produce powerful finished
images. Relatedly, recording engineers must be able to listen to and
focus on specific sonic characteristics that, when taken as a whole,
combine, blend, and support each other to create more powerful and
meaningful final sound mixes.
1.1.3 Increased detection speed
Finally, the third goal is to increase the speed with which we can
identify and decide on the appropriate engineering parameters to
change. A recording and mixing session can take up a large amount of
time, within which hundreds of subtle and not-so-subtle adjustments
can be made. The faster an engineer can locate any sonic
characteristics that need to be changed, the more effective a given
period of time will be. The ability to make quick judgments about sound
quality is essential during recording and mixing sessions. For example,
during a recording session, valuable time can be consumed by
comparing and changing microphones.
It is anticipated that greater sensitivity in one critical listening
area (such as equalization) will facilitate greater awareness and
sensitivity in other areas (such as compression and reverb) as a result
of overall improvement in listening skills. Because a significant part of
audio engineering (recording, mixing, mastering) is an art in which
there are no right answers, this book does not provide advice on the
"best" EQ, compression, or reverb settings for different situations.
What may be the perfect EQ for an instrument in one situation may not
be suitable for another. However, what this book attempts to do is
guide the reader in developing listening skills that will then help
identify problem areas in sound quality. A novice engineer may not
realize when there is a problem with sound quality or may have some
idea that there is a problem, but may not be able to specifically identify
it or know how to solve it. Highly developed critical listening skills help
the engineer identify timbre characteristics and sound quality quickly
and efficiently.
Standard signal processing types include equalization
(parametric, graphic and filters), compression/limiting,
expansion/gating, reverb, delay, chorus, flanger and gain changes.
Within each of these signal processing categories, numerous brands
and models are available at various price ranges and quality levels. If
we consider compressors for a moment, we know that various
brands/models of compressors perform the same basic function: they
make loud sounds quieter. Most compressor models have common
functionality that gives them similar overall sonic characteristics, but
the exact way they perform gain reduction varies from model to model.
Differences in analog electronics or digital signal processing algorithms
between compressors create a variety of sonic results, and each make
and model will have a unique sound. Through listening experience,
engineers learn that there are variations in sound quality between
different brands and models, and will choose a given model because of
its specific sound quality.
It is common to find plug-in software versions of many analog
signal processing devices. Often the screen image of a plugin that
models an analog device will be almost identical to the faceplate of the
device. Sometimes, because the two devices look identical, it can be
tempting to think that they also sound identical. Unfortunately, they
don't always sound the same, but it's possible to fool yourself into
thinking that the sound is replicated as perfectly as the visual
representation of the device. Usually the best option is to listen and
determine by ear whether the two sound as similar as they look. There
is not always a direct translation between analog electronics and the
computer code that performs the equivalent digital signal processing,
and there are several ways to model analog circuits; Therefore, we
have differences in sound quality.
Although each signal processing model has a unique sound, it is
possible to transfer knowledge from one model to another and be able
to use an unknown model effectively after a short period of listening.
Just as pianists must adapt to every new piano they encounter,
engineers must adapt to the subtle and not-so-subtle differences
between equipment that performs a given function.

1.2 Shaping sounds


Music recordings can not only be recognized by their melodies, harmonies,
and musical structure, but they can also be recognized by the timbres of
instruments created in the recording process. Sometimes the timbre is the
most identifying characteristic of a recording. In recorded music, an
engineer and producer shape the sounds that are captured to best fit a
musical composition. Timbre shaping has become incredibly important in
recorded music, and in his book The Producer as Composer: Shaping the
Sounds of Popular Music (2005) , Moorefield describes how sound
recording and processing equipment contributes to the composition
process. Timbre has become such an important factor in recorded music
that it can be used to identify a song before the musical tonality or melody
has time to develop sufficiently. In their paper titled “Name That Tune:
Identifying Popular Recordings from Short Excerpts,” Schellenberg et al.
(1999) found that listeners could correctly identify pieces of music when
presented with excerpts that were only one-tenth of a second long.
Popular music radio stations have been known to challenge listeners by
playing a short (usually less than a second) snippet of a well-known
recording and inviting listeners to call in and identify the song title and
artist. These excerpts are too short to indicate the harmonic or melodic
progression of the music. Listeners rely on the timbre or "mix" of sound
characteristics to make a correct identification. Levitin, in This Is Your
Brain on Music (2006), also illustrates the importance of timbre in
recorded sound, reporting that “Paul Simon thinks in terms of timbre; It is
the first thing he hears in his music and in the music of others.”
One effect that the recording studio has had on music is that it has
helped musicians and composers create soundscapes that are impossible
to realize acoustically. Sounds and sound images that could not have been
produced acoustically are most evident in electroacoustic and electronic
music in which sounds originate from purely electronic or digital sources
rather than through vibrating string, membrane or the airflow of a
conventional musical instrument. However, recordings of purely acoustic
musical instruments can be significantly altered with standard processing
equipment and recording studio plugins. Electronic processing of the
spectral, spatial and dynamic properties of recorded sound alters the
original properties of a sound source, creating new sounds that may not
exist as purely acoustic events.
In the recording and mixing process, an engineer can manipulate
any number of parameters, depending on the complexity of a mix. Many
of the parameters that are adjusted during a mix are interrelated, so
altering one track also influences the perception of other tracks. The level
of each instrument can affect the entire feel or focus of a mix, and an
engineer and producer can spend countless hours adjusting levels, down
to quarter-decibel increments, to create the right balance. For example, a
slight increase in the level of an electric bass can have a significant impact
on the sound and musical feel of a kick drum or even an entire mix as a
whole. Each parameter change applied to an audio track whether level
(gain), compression, reverb or equalization, can have an effect on the
perception of other individual instruments and the music as a whole.
Because of this interrelationship between the components of a mix, an
engineer may wish to make small incremental changes and adjustments,
gradually building and sculpting a mix.
At this point, it is not yet possible to measure all perceived audio
qualities with currently available physical measurement tools. For
example, the development of perceptual coding schemes such as MPEG-1
Layer 3, more commonly known as MP3, has required the use of expert
listening panels to identify artifacts and sonic deficiencies produced by
data reduction processes. Because perceptual coding relies on
psychoacoustic models to remove components of a sound recording that
are considered inaudible, the only reliable test for this type of processing
is the human ear. Small panels of trained listeners are more effective than
large samples of the general population because they can provide
consistent judgments about sound and can focus on the more subtle
aspects of a sound recording.
Studies, such as those by Quesnel (2001) and Olive oil (1994, 2001),
provide strong evidence that training people to listen for specific
attributes of the reproduced sound makes a significant difference in their
ability to consistently and reliably recognize the sound characteristics, and
also increases the speed with which they can correctly identify these
characteristics. Listeners who have completed systematic timbral ear
training can work with audio more productively and effectively.
1.3 Sound playback system settings
Before taking a closer look at critical listening techniques and philosophies,
it is important to describe what some of the most common sound
reproduction systems are like. Recording engineers are primarily
concerned with the sound reproduced by speakers, but it is also beneficial
to analyze sources of acoustic sound, as we will see in Chapter 7.
1.3.1 Monaural: single channel sound reproduction
A single channel of audio played through a speaker is typically called
monaural or mono (Fig. 1.1). Even if there is more than one speaker, it is
considered monaural if all speakers produce exactly the same audio signal.
Early sound recording, playback and transmission systems used only one
audio channel, and although this method is not as common as it once was,
we still encounter situations where it is used. Mono sound reproduction
creates some restrictions for a recording engineer, but it is often this type
of system that speaker manufacturers use for subjective evaluation and
testing of their products.

Figure 1 .1Mon au ral nr single C ha nnel listen in g.


1.3.2 Stereo: two-channel sound reproduction
The evolution of monaural systems, two-channel playback or stereo
systems allows sound engineers greater freedom in terms of placement,
panning, amplitude and amplitude of the sound source. Stereo is the
primary setting for sound reproduction, whether using speakers or
headphones. Figure 1.2 shows the ideal listener and speaker locations for
two-channel stereo.

Figure 1.2 Ideal two-channel stereo listening placement.

1.3.3 Headphones
Listening to headphones with two-channel audio has advantages and
disadvantages over speakers. With modestly priced headphones (relative
to the price of speakers of equivalent quality), it is possible to achieve
high-quality sound reproduction. Good quality headphones can offer more
clarity and detail than speakers, in part because they are not subject to the
acoustic effects of listening rooms, such as early reflections and room
modes. The headphones are also portable and can be easily taken to other
locations where the speaker characteristics and room acoustics may be
unfamiliar to an engineer.
The main disadvantage of headphones is that they create an internal
localization for mono sound sources. That is, centrally panned mono
sounds are perceived to originate somewhere between the ears because
the sound is transmitted directly to the ears without first bending or
reflecting off the head, torso, and outer ear. To avoid internal localization,
audio signals should be filtered with what are known as head-related
transfer functions (HRTF). Simply put, HRTFs specify filtering due to the
presence of external ears (pinna), head and shoulders, as well as interaural
time differences and interaural amplitude differences for a given sound
source location. Each location in space (elevation and azimuth) has a
unique HRTF, and typically many locations in space are sampled when
measuring HRTF. It's also worth noting that each person has a unique
HRTF based on the unique shape of the outer ear, head, and upper torso.
HRTF processing has a number of drawbacks, such as a negative effect on
sound quality and spectral balance and the fact that there is no universal
HRTF that works perfectly for everyone.
1.3.4 Headphone recommendations
At the time of this writing, there are several fine headphones on the
market that are perfectly suitable for technical ear training. Before
purchasing headphones, the reader is recommended to listen to as many
different models as possible. By comparing the sound of different
headphones using familiar music recordings, it is possible to get a better
idea of the strengths and weaknesses of each model. There are no perfect
headphones and each model will sound slightly different. Because not all
readers are close to retail stores that carry high-quality headphones, here
are some suggestions at different price points:
• Audio-Technica ATH-M50. This model is a closed design, meaning it
blocks a substantial amount of external or background sound.
• Beyerdynamic DT770 Pro. This model also has a closed-back design
with a comfortable circumaural fit.
• Degree. There are a number of models in the Grado line of
headphones and they are all supra-aural designs, meaning they rest
directly on the ear, rather than being circumaural, which surrounds
the ear. Additionally, they are all open headphones, meaning they
do not block outside sound and therefore may not be appropriate
for listening in environments where there is significant background
noise. The Grado headphones are excellent value for money,
especially for the lower-end models, even though they are not the
most comfortable headphones available.
• Sennheiser HD 600 and HD 650. Both models are open-ended in design
and are at the higher end of the headphone price range. They also have
a circumaural design, making them comfortable to wear.
• Sony MDR 7506 and 7509. These Sony models have become an industry
standard for studio monitoring.
1.3.5 Surround: multi-channel sound playback
Sound reproduced from more than two speakers is known as
multichannel, surround, ambisonic, or more specific notations indicating
the number of channels, such as 5.1, 7.1, channel 3/2, and quadraphonic.
Surround audio for music-only applications has had limited popularity and
is still not as popular as stereo playback. On the other hand, surround
soundtracks for film and television are common in movie theaters and are
becoming more common in home systems.
There are many suggestions and philosophies regarding the exact
number and arrangement of speakers for surround sound playback
systems, but the most widely accepted configuration among audio
researchers is that of the International Telecommunication Union (ITU),
which recommends an arrangement of five-channel speakers as shown in
Figure. 1.3. Users of the ITU-recommended configuration typically also use
an optional subwoofer or low-frequency effects (LFE) channel known as
channel .1, which reproduces only the low frequencies, typically below 120
Hz.
Center

Figure 13 Ideal five-channel surround listening placement according to the ITU-R BS.775-1 recommendations (ITU-R, 1994), with the
listeners equidistant from all five loudspeakers.

With multichannel sound systems, there is much more freedom for the
placement of the sound source within the 360° horizontal plane than with
stereo. There are also more possibilities for a convincing simulation of
immersion within a virtual acoustic space. Sending the right signals to the
right channels can create a realistic sense of spaciousness and
envelopment. As Bradley and Soulodre (1995) have shown, the listener
envelope (LEV) in a concert hall, a component of the spatial impression,
depends primarily on strong lateral reflections reaching the listener 80 ms
or more after the direct sound.
There are also some challenges regarding sound localization for
certain areas within a multi-channel listening area. Panning sources on
both sides (between 30° and 110°) produces sound images that are
unstable and difficult to locate accurately. On the other hand, the presence
of a center channel allows sounds to be locked in the center of the front
sound image, no matter where the listener is located. When the sources
are shifted toward the center with only two speakers in front (left and
right), the perceived location of the image depends on the location of the
listener.

Summary
In this chapter we have explored active listening and its importance in
recording projects and in everyday life. In defining technical ear training,
we also identified some goals that we are working toward through the
book and practice modules of the software. We finish by giving a rough
overview of the main sound reproduction systems. Next, we'll move on to
more specific ideas and exercises focused on EQ.
Episode 2
SPECTRAL BALANCE AND EQUALIZATION

Spectral balance refers to the frequency content of an audio signal and the
relative power of each frequency or frequency band in the audible
frequency range, 20 to 20,000 Hz. An audio signal with a flat spectral
balance would represent all frequencies at the same relative amplitude.
Audio engineers often describe the spectral balance of sound using
equalization parameters, as the equalizer is the primary tool for altering
the spectral balance of sound. An engineer can boost or cut specific
frequencies or frequency ranges with an equalizer to highlight low-level
details or to compensate for unwanted resonances.
In the context of sound recording and production, a flat spectral
balance is more likely to mean that the entire frequency range in a
recording of a sound source is represented appropriately for a given
recording project. However, it is not always clear what we mean by
representing all frequencies "adequately." Does it mean that we want
recordings of musical instruments to sound identical to how they sound
acoustically? Is that possible or even desirable? In classical music
recording, engineers generally strive to achieve some similarity to live
performances, but in most other genres of music, engineers are creating
sound images that do not exist in a live performance situation. . Sounds
and timbres are created and shaped in the recording studio and digital
audio workstation, making it possible to take the recorded sound in many
possible artistic directions.
Although the equalizer is the primary tool for directly altering the
spectral balance, almost all electronic devices that audio passes through
alter the spectral balance of an audio signal to a greater or lesser extent.
Sometimes this alteration of frequency content is necessary and
completely intentional, as with the use of equalizers and filters. Other
times, a change in spectral balance is much more subtle or almost
imperceptible, as is the case with different types of microphone
preamplifiers. Vintage audio equipment is often sought after because of
the unique and pleasing alterations to the spectral balance of an audio
signal. Changes in spectral balance are sometimes caused by distortion,
which results in harmonics being added to an audio signal. Audio
engineers must be able to hear how each piece of audio equipment is
altering the spectral content of its audio signals to shape the timbre of
each sound to make it more appropriate for a given situation. The ability
to distinguish subtle but critical aspects of sound quality comes from
experience listening to various types of audio processing and forming
mental links between what one hears and the parameters that can be
controlled in an audio signal. In essence, experienced audio professionals
are like human spectral analyzers due to their ability to identify and
characterize the frequency balance of reproduced sound.
Apart from the use of equalizers, spectral balance can also be
altered to some extent by dynamic processing, which changes the
amplitude envelope of a signal and therefore its frequency content, and by
mixing a signal with a delayed version of itself, which can produce comb
filtration. Although both methods influence spectral balance, we will focus
on signal processing devices whose primary function is to alter the
frequency content of a signal.
An engineer looks for the equalization and spectral balance that best
suits the music being recorded. For example, the appropriate spectral
balance for a jazz drum recording will likely be different from that for a
rock drum recording, and an experienced recording engineer, by listening
to two of these audio samples, understands and can identify specific
timbral differences between they.
To determine the EQ or spectral balance that best suits a given
recording situation, an engineer must have well-developed listening
skills regarding frequency content and its relationship to the physical
parameters of EQ: frequency, gain, and Q. Each recording situation
requires specific engineering choices, and there are rarely general
recommendations for equalization that are applicable in multiple
situations. When approaching a recording project, an engineer should
be familiar with existing recordings of a similar musical genre or have
some idea of the timbral goals of a project to inform the decision
process during production.
An engineer monitors the spectral balance of individual
microphone signals, as well as the overall spectral balance of multiple
microphone signals combined at each stage of a recording project. It is
possible to use a real-time spectral analyzer to get an idea of the
frequency content and balance of an audio signal. A novice engineer
may want to employ a real-time spectral analyzer to visualize the
frequency content of an audio signal and apply equalization based on
what he sees. Professional recording and mixing engineers do not
typically measure the power spectrum of a musical signal, but instead
rely on their auditory perception of the spectral balance over the
course of a piece of music.1 Unfortunately, real-time analyzers do not
provide a clear enough picture. of a music recording's frequency
content to rely on to make decisions about how to apply equalization
to a music signal. Additionally, there is no clear indication of what the
spectral graph “should” look like because there is no objective
reference.
Musical signals generally exhibit constant fluctuations, whether
large or small, in frequency and amplitude of each harmonic and
overtone present. Due to the constantly changing nature of a typical
musical signal, it is difficult to obtain a clear reading of the amplitude of
the harmonics. Taking a snapshot of a spectral diagram from a specific
moment in time would be clearer visually, but does not provide a broad
enough view of the overall spectral shape of an audio signal over time.
The situation becomes a little more complicated because with any
objective spectral analysis there is a trade-off between time resolution
and frequency resolution. With increases in time resolution, frequency
resolution decreases while the frequency response display updates at
such a rapid rate that it is difficult to see details accurately while
playing an audio signal. Therefore, currently available physical

1 Live sound engineers, on the other hand, who are tuning a sound system for a live music performance, will
often use real-time spectral analyzers. The difference is that they have a reference, which is often pink noise
or a recording, and the analyzer compares the spectrum of the original audio signal (a known objective
reference) to the output of the speakers. The goal in this situation is a little different than it is for recording
and mixing because a live sound engineer is adjusting the frequency response of a sound system so that the
input reference and output spectral balances of the system are as similar as possible.
measurements are not appropriate for determining what equalization
to apply to a musical signal, and the auditory system must be relied
upon to make equalization decisions.

2.1 Shaping spectral balance


2.1.1 Equalization
In its most basic characterization, spectral balance can refer to the
relative balance of bass and treble, which can be controlled with basic
tone controls on a consumer sound system. Typically, during the
recording process of an acoustic musical instrument, an engineer can
have direct control over the spectral balance of the recorded sound,
whether a single audio track or a mix of tracks, through several
different methods. Aside from an equalizer, the most direct tool for
altering frequency balance, there are other methods available for
controlling the spectral balance of a recorded audio track, as well as
indirect factors that influence the perceived spectral balance. In this
section we discuss how engineers can directly alter the spectral balance
of recorded sound, as well as ways in which spectral balance can be
altered indirectly during sound playback.
The most obvious deliberate method of shaping the spectral
balance of an audio signal is achieved with an equalizer or filter, a
device specifically designed to change the amplitude of selected
frequencies. Equalizers can be used to reduce particular frequency
resonances in a sound recording, as they can mask other frequency
components of a recorded sound and prevent the listener from hearing
the truer sound of an instrument. In addition to helping eliminate
problematic frequency regions, equalizers can also be used to
accentuate or enhance certain frequency bands to highlight the
characteristics of an instrument or mix. There is a great deal of art in
using equalization, whether for a speaker system or a recording, and an
engineer must rely on what is heard to make decisions about its
application. Precise choice of frequency, gain and Q is critical to the
successful use of EQ, and the ear is the final judge of the
appropriateness of an EQ setting.
2.1.2 Microphone Choice and Location
Another method of altering the spectral balance of an audio signal is
through a microphone. The choice of microphone type and model has a
significant effect on the spectral balance of any sound being recorded,
as each make and model of microphone has a unique frequency
response due to internal electronics and physical construction.
Microphones are analogous to the filters or lenses on a camera;
Microphones affect not only the overall frequency content, but also the
perspective and clarity of the sound that is "picked up." Some
microphone models offer a very close to flat frequency response, while
others are chosen because they are decidedly not flat in their
frequency response. Engineers often choose microphones because of
their unique frequency responses and how the frequency response
relates to the sound source being recorded.
During the beginning of a recording session, a recording engineer
and a producer compare the sounds of the microphones to decide
which ones to use for a recording. By listening to different microphones
while musicians perform, they can decide which microphones have the
most appropriate sonic characteristics for a given situation. The choice
would take into account the characteristics of a musician's instrument
or voice, the space in which they are recording, and any combinations
that may need to occur with other instruments/voices that are also
being picked up by the microphone.
In addition to the frequency response of a microphone, its
physical orientation and location relative to a sound source also directly
affect the spectral balance of the audio signal, as other factors come
into play, such as the polar response of the microphone, the radiation
patterns from a sound source, and the relationship between direct
sound and reverberant sound at a given location within an acoustic
space. The placement of a microphone in relation to a musical
instrument can have a direct and clear effect on the spectral balance of
the captured sound. The sound radiated by a musical instrument does
not have the same spectral balance in all directions. For example, the
sound emanating directly in front of a trumpet bell will contain a much
higher level of high frequency harmonics than the sound from the side
of the trumpet. An engineer can affect the frequency response of a
recorded trumpet sound simply by changing the location of a
microphone relative to the instrument. In this example, having the
player point the bell of the trumpet slightly above or below a
microphone will result in a slightly darker sound than when the
trumpet is pointed directly at a microphone.
Beyond the complex sound radiation characteristics of musical
instruments, microphones themselves generally do not have the same
frequency response for all angles of sound incidence. Even
omnidirectional microphones, which are generally considered to have
the best off-axis response, have some variation in their frequency
response at various angles of sound incidence. Simply changing the
angle of orientation of a microphone can alter the spectral balance of a
sound source being recorded.
Directional microphones, such as cardioid and bidirectional polar
patterns, produce a higher level of low frequencies when placed close
to a sound source, in a phenomenon known as proximity effect or bass
boost. The response of a microphone varies in the low frequency range
according to its distance from a sound source, within a range of
approximately 1 m. It is important to be aware of changes in low
frequency response as a result of changes in the distance between a
musician and a microphone. This effect can be used to advantage to
achieve prominent low frequencies when playing a kick drum up close,
for example.
2.1.3 Indirect factors affecting spectral balance
When working on setting the spectral balance of a track or mix, there
are a few factors that will have an indirect influence on this process.
Because there is no direct connection between the brain's auditory
processing center and digital audio data or analog magnetic tape,
engineers must keep in mind that audio signals are altered in the
transmission path between a recorder and brain. Three main factors
influence our perception of the spectral balance of an audio signal in
our studio control room:
• Monitors/speakers
• Room acoustics
• Sound levels
Figure 2.1 illustrates the path of an audio signal from
electrical to acoustic energy, highlighting three of the main
modifiers of spectral balance.

Figure 2.1 The signal path showing the transmission of an audio signal as an electrical signal to a speaker where it is
converted into an acoustic signal, modified by a listening room and finally received by the ear and processed by the
auditory system. Each stage highlights the factors that influence the spectral balance of a signal, both physical and
perceptual, along the way.

2.1.3.1 Monitors and speakers


Monitors and speakers are like windows through which engineers perceive
and therefore make decisions about recorded audio signals. Although
monitors have no direct effect on the spectral balance of the signals sent to a
recorder, each type and model of monitor and speaker offers a unique
frequency response. Because engineers rely on monitors to judge the spectral
balance of audio signals, the frequency and power response of monitors can
indirectly alter the spectral balance of audio signals. When listening to a
recording through monitors that have a weak low frequency response, an
engineer may have a tendency to boost low frequencies in the recorded
audio signal. It is common for engineers to check a mix on three or more
different sets of monitors and headphones to form a more accurate
conception of what the true spectral balance of the audio signal is. Each
speaker model will give a slightly different impression, and by listening to a
variety of monitors, engineers can find the best compromise. Beyond a
speaker's inherent frequency response, almost all active speakers include
built-in user-adjustable filters, such as high- and low-frequency shelving
filters, which can compensate for things like low-frequency buildup when
monitors are placed close. of a wall. Therefore, any decision made about
spectral balance will be influenced by the cumulative effect of a speaker's
inherent frequency response added to any filtering applied by the user.
Real-time analyzers can provide some indication of a speaker's
frequency response within a room, and equalizers can be used to adjust a
response until it is nearly flat. An important point to note is that unless the
frequency response is measured in an anechoic chamber, the response
presented is not purely that of the loudspeaker, but will also include room
resonances and reflections. Any type of objective frequency response
measurement performed in a listening room or studio must be averaged over
different locations in the listening area. As we will see in the next section,
frequency resonances in a room are prominent in some places and less so in
others. By measuring the frequency response of different locations, we
average the effect of location-dependent resonances.
2.1.3.2 Control room and listening room acoustics
The dimensions, volume, and surface treatments of the room in which an
engineer monitors audio signals also have a direct effect on the audio heard.
Groups such as the International Telecommunication Union (ITU) have
published recommendations on the acoustics and characteristics of listening
rooms. Recommendation ITU-R BS.1116 (ITU-R, 1997) defines a series of
physical and acoustic parameters that can be applied to a listening room to
create an acoustically neutral room. At first, it may seem that an anechoic
room free of room modes and reflections would be ideal for listening because
the room will be essentially "invisible" acoustically, but a room free of
reflections does not give us a realistic environment that reflects the type of
room we are in. We usually listen to music. Sound originating from speakers
propagates into a room, reflecting off objects and walls, and combining with
sound that propagates directly to the listener. Sound primarily radiates from
the front of a speaker, especially for high frequencies, but most speakers
become more omnidirectional as the frequency decreases. Primarily low-
frequency sound radiating from the back and sides of a speaker will be
reflected into the listening position by any walls that may be behind the
speaker. Regardless of the environment in which we are listening to the
reproduced sound, we hear not only the speakers but also the room. In
essence, the speakers and the listening environment act as a filter, altering
the sound we hear.
Room modes depend on the dimensions of a room and influence the
spectral balance of what is heard from speakers in a room. Room modes are
mostly problematic in the low frequency range, usually below 300 Hz. The
fundamental resonance frequencies that occur in one dimension (axial
modes) have wavelengths that are twice the distance between parallel walls.
Open or sloping walls do not reduce room modes; Instead, resonance
frequencies are based on the average distance between opposing walls.
Because the amplitudes of room resonances vary by location, it is
important for an engineer to walk and listen at different locations within a
room. A room's listening position may have a standing wave node at a
particular frequency. Unaware of this low-frequency acoustic effect, a mixing
engineer may boost the missing frequency with an EQ, only to realize when
listening in a different location in the room that the frequency boost is too
large.
If a mixing studio is attached to an adjacent room that is available,
engineers like to stroll into the second room, leaving the adjacent door open,
and listen to a mix, now essentially filtered through two rooms. By listening to
the balance of a mix from this new location, an engineer can learn which
components of the balance change from this new perspective, which sounds
remain prominent and which are lost. It may be helpful to focus on how well
the vocals or lead instrument can be heard from a distant listening location.
Another common and useful way to work is to listen to a mix on a
second and possibly third pair of speakers and headphones, because each
pair of speakers will tell us something different about the sound quality and
balance of the mix. One set of speakers may feel like the reverb is too loud,
while another may feel like there isn't enough bass. Among the available
monitoring systems, a compromise can be found that one hopes will allow
the final mix to sound relatively optimal in many other systems as well.
Engineers often say that a mix "translates" well to describe how consistent a
mix remains when heard on various types and sizes of speakers. There can be
huge differences highlighted in a mix auditioned on different systems,
depending on how the mix was done. One characteristic of a well-made
recording is that it will translate well to a wide range of sound reproduction
systems, from mini systems to full-scale speaker systems.
2.1.3.3 Sound levels and spectral balance
The sound level of a sound reproduction system plays an important role in
the perception of spectral balance. The well-known equal-loudness contours
of Fletcher and Munson (1933) illustrate that not only does the human
auditory system have a wide variation in its frequency response, but also that
this response changes depending on the level of sound reproduction. In
general, the ear is less sensitive to low and high frequencies, but as the sound
level increases, the ear becomes more sensitive to these same frequencies,
relative to mid frequencies. If you mix at a high sound level, such as an
average sound pressure level of 100 dB, and then suddenly the level is
reduced much further, to 55 dB SPL, for example, the perceived spectral
balance will change. There will be a tendency to think that there are not
enough low frequencies in the mix. It is useful to listen to a mix at various
playback levels and find the best compromise in overall spectral balance,
taking into account the frequency response differences of the human
auditory system at different playback levels.
2.2 Types of filters and equalizers
Now that we've discussed ways to change spectral balance directly, as well as
the factors that are responsible for altering our perception of the reproduced
sound, it's time to focus more specifically on equalizers. There are different
types of equalizers and filters, such as high-pass filters, low-pass filters, band-
pass filters, graphic equalizers, and parametric equalizers, which allow various
levels of control over spectral balance. Filters are those devices that eliminate
a range or band of frequencies, above or below a defined cutoff frequency.
Equalizers, on the other hand, offer the ability to apply various levels of boost
or cut on selected frequencies.
2.2.1 Filters: low pass and high pass
High-pass and low-pass filters remove frequencies above or below a defined
cutoff frequency. Typically, the only adjustable parameter is the cutoff
frequency, although some models offer the ability to control the slope of a
filter, or how quickly the output drops beyond the cutoff frequency. Figures
2.2 and 2.3 show frequency response curves for low-pass and high-pass
filters, respectively. In practice, high-pass filters are generally used more
frequently than low-pass filters. High-pass filters can remove low-frequency
noise from a signal, and the engineer ensures that the cutoff frequency is set
below the lowest frequency produced by the musical instrument signal.

2.2.2 Graphic equalizers


Graphic equalizers allow you to control only the amount of boost or cut for a
given set of frequencies, usually with vertical sliders on the front panel of the
device. The frequencies available for manipulation are typically based on the
International Organization for Standardization (ISO) center frequencies, such
as the octave frequencies 31.5 Hz, 63 Hz, 125 Hz, 250 Hz, 500 Hz, 1000 Hz,
2000 Hz, 4000 Hz, 8000 Hz, and 16,000 Hz. It is also possible for a graphic
equalizer to have a greater number of bands with greater frequency
resolution, such as 1/3 octave or 1/12 octave frequencies. The bandwidth or
Q of each boost or cut is often predetermined by the equalizer designer and
generally cannot be changed by the user. The graphic equalizer gets its name
from the fact that the vertical sliders form the shape of the equalization curve
from the low frequencies on the left to the high frequencies on the right.
2.2.3 Parametric equalizers
A term originally coined by George Massenburg in his 1972 Audio Engineering
Society convention paper, the parametric equalizer allows completely
independent and tunable control of three parameters per band: center
frequency, Q, and amount of boost or cut in that frequency. The Q is inversely
proportional to the bandwidth of the pulse or cut and is specifically defined as
follows:
Q =Fc/bandwidth
Fc is the center frequency, the bandwidth is defined as f2 - f1. The two
frequencies, f1 and f2, are the points at which the frequency response is - 3dB
below maximum boost or + 3dB above maximum cutoff.
Figures 2.4 and 2.5 illustrate the frequency responses of two different
parametric equalizer settings.
In practice, we find that many equalizers are limited in the amount of
control they provide. For example, instead of Q being completely variable, it
can be switched between three discrete points, such as low, medium, and
high. Center frequency selection may also not be
completely variable and instead restrict a predetermined set of
frequencies. Additionally, some equalizers do not allow independent control
of Q and are designed in such a way that Q changes according to the amount
of gain with a minimum boost/cut giving the lowest Q (widest bandwidth) and
a boost/cut. maximum cut giving the highest Q (narrowest bandwidth).
Figure 2.5 The frequency response of a parametric equalizer with a cut of 6dB at 1000 Hz and a Qof

2.2.4 Shelving Equalizers


Sometimes confused with low-pass and high-pass filters, shelving equalizers
can be used to alter a range of frequencies by the same amount. While high-
and low-pass filters can only remove a range of frequencies, shelving
-12
equalizers can boost or100attenuate a range of frequencies to10000
1000
Frequency (Hi J
varying degrees.
This frequency range extends downward from the cutoff frequency for a low
Figure 2.4 The frequency response to a parametric equalizer with a boost of 12dB at 4000 Hz and a Qof 2.

shelving filter, or extends upward from the cutoff frequency for a high
shelving8filter.
+12
-
They are probably most often used as tone controls in home or
car sound systems. Consumers can alter the spectral balance of their home
sound reproduction systems by using tone controls and "bass" and "treble"
controls, -12
which are typically shelving filters with a fixed 10000
100 1000
frequency. High
shelving filters apply a set amount of boost
Frequency (Hz) or cut equally to all frequencies
above the cutoff frequency, while low shelving filters apply a set amount of
boost or cut equally to all frequencies below the cutoff frequency. cutoff
frequency. In the recording studio, shelving filters are often found as a
switchable option on the lower and higher frequency bands in a parametric
equalizer. Some equalizer models also offer high-pass and low-pass filters in
addition to shelving filters.
Examples of the frequency response of shelving filters are shown below
in Figures 2.6 and 2.7.
Figure 2.7 The frequency response- of a high-aha If filter set tn —6 at 20MHz.

2.3 Introduction to practice


It is essential for audio professionals to have a keen sense of spectral balance
and how it relates to individual instruments as well as overall mixes.
Engineers make decisions about the balance of musical elements within an
audio recording, and the spectral balance of each individual element within
the mix contributes to its ability to blend and "stick" with other elements to
form a clear, sound image. consistent. To help develop critical listening skills,
a software module is included for the reader to practice listening to the sonic
effect of various equalization parameters.

Use of the “TETPracticeEQ” technical ear training software practice


module is essential to advance EQ recognition accuracy and speed. An image
of the user interface is shown in Figure 2.8 and the functionality of the
software is described below.
Figure 2.8 A screenshot of the software user interface for the Technical Ear Trainer practice module for parametric equalization.

The key to practicing with any of the software modules is to keep short
but regular practice times daily or several times a week. In the early stages,
10 to 15 minute practice sessions are probably best to avoid becoming overly
fatigued. Because of the amount of energy required to listen with high
concentration, practicing for longer periods of time (a couple of hours or
more) usually becomes counterproductive and frustrating. Over time, as you
become accustomed to this type of focused listening, you may want to
increase the length of the practice period, but typically 45 to 60 minutes will
be the upper useful limit for a given practice period. Regular practice for
shorter periods of time several times a week is much more productive than
longer but less frequent practice sessions. Obviously, this could become a
significant time commitment, but taking even 5 minutes a day is probably
more effective than trying to cram in a 2-hour practice session once a month.
The software produced for the exercises in this book allows the reader
to practice with randomly generated equalization settings within certain
limitations chosen by the reader. A screenshot in Figure 2.8 shows the
software module for parametric equalization. The objective of the practice
module is to identify by ear the configuration of the equalization parameters
chosen by the software. The following sections describe the main functions of
the software and the available user parameters.
2.3.1 Types of practice
Starting in the top left corner of the window, just below the blue header,
there is an option to select one of four practice types: Pairing, Memory
Pairing, Return to Plane, and Absolute Identification:
• Matching . When working in Matching mode, the goal is to duplicate
the equalization that the software has applied. This mode allows you to
freely switch between "Question" and "Your Answer" to determine if
the chosen equalization matches the unknown equalization applied by
the computer.
• MatchingMemory . This mode is similar to Matching mode with one
main difference: once the gain or frequency is changed, the “Question”
is no longer available for audition. “Question” and “Bypass” are
available to be listened to freely before making any changes to the
equalizer. Matching Memory mode helps us match sounds by memory
and can be considered moderate to very difficult depending on the
other practice parameters chosen, such as the number of bands, time
limit and frequency resolution.
• Return to Flat . In this mode, the goal is to invert or cancel the randomly
chosen equalization applied to the audio signal by the computer by
selecting the correct frequency and applying an equal but opposite gain
to that applied by the software. It's similar
in difficulty to "Matching" but requires thinking in the opposite way,
since the goal is to remove the equalization and return the sound to its
original spectral balance. For example, if you hear a 12 dB boost at
2000 Hz, the correct response would be to apply a 12 dB cut at 2000
Hz, thus returning the audio signal to its original state and sounding
identical to the "Flat" option. Because the equalization used is
reciprocal peak/drop, it is possible to completely eliminate any
frequency boost or cut by applying equal but opposite boosts or cuts to
the respective frequencies. It should be noted that, if you wish to try
these exercises in a different context outside of the included software
practice modules, not all types of parametric equalizers available are
reciprocal peak/decay and therefore will not be able to cancel a boost
with an equal but opposite cut. This is not a deficiency, but simply a
difference in design.
• Absolute Identification . This practice mode is the most difficult and the
goal is to identify the applied equalization without having the
opportunity to hear what the correct answer is chosen. Only "Bypass"
(no equalization) and "Question" (the equalization randomly chosen by
the computer) can be heard.
2.3.2 Frequency resolution
There are two frequency resolutions you can choose from:
• 1 octave: the easier of the two options with 9 possible frequencies
1/3 octave: the more difficult with 25 possible frequencies

The frequencies correspond to the International Organization for
Standardization (ISO) frequencies that are common in all commercially
available graphic equalizers, as listed in Table 2.1. The software randomly
chooses between these frequencies to apply equalization to the audio signal.
Exercises using one-third octave frequency resolution are predictably more
difficult than those with one-octave frequencies. The third octave frequency
list includes all octave frequencies with the addition of two frequencies
between each pair of octave frequencies.
It is essential to work with octave frequencies until you excel at
identifying all nine octave frequencies. Once these frequencies are solidified,
exercises with third-octave frequencies can begin. Octave frequencies should
look like solid anchors in the spectrum around which you can identify third
octave frequencies.
A key strategy for identifying third-octave frequencies is to first identify
the nearest octave frequency. Based on a solid knowledge of octave
frequencies, you can identify whether the frequency in question is in fact one
of the nine octave frequencies. If the frequency in question is not an octave
frequency, you can determine whether it is above or below the nearest
octave frequency.
For example, here are two specific octave frequencies (1000 Hz and
2000 Hz) with the respective neighboring third-octave frequencies:

2500 Hz: upper neighbor


2000 Hz : octave frequency anchor
1600 Hz: lower neighbor
1250 Hz: upper neighbor
1000 Hz : Octave frequency anchor 800 Hz: Bottom neighbor
2.3.3 Number of bands
You can choose to work with one, two or three frequency bands. This setting
refers to the number of simultaneous frequencies that are affected in a given
question. The more simultaneous frequency bands that are chosen, the more
difficult the question will be. It is important to work with one frequency band
until you are comfortable with octave and third octave frequencies. Moving
to two or three bands is much more difficult and can be frustrating if you
don't develop confidence in just one band.
When working with more than one band at a time, it can be confusing
to know which frequencies have been altered. The best way to work with two
or three bands is to first identify the most obvious frequency and then
compare your answer to the equalizer question. If the chosen frequency
actually matches one of the frequencies in the question, that particular
frequency will be less noticeable when switching between the question and
its answer, and the remaining frequencies will be easier to identify. The
software can accept the frequencies in any order. When working with less
than three frequency bands, only the leftmost equalizer faders are active.
2.3.4 Frequency range
We can limit the range of testable frequencies from the full range of 63 Hz to
16,000 Hz to a range as small as three octaves. Users are encouraged to limit
the frequency range in the initial stages to only three frequencies in the mid-
range, such as 500 to 2000 Hz. Once these frequencies are mastered, the
range can be expanded one octave at a time.
After working through the full frequency range, there may be some
frequencies left that are still giving you trouble. For example, low frequencies
(in the range of 63 Hz to 250 Hz) are often more difficult to correctly identify
when practicing with music recordings, especially with third-octave
frequencies. This low frequency range can pose problems due to a number of
possible conditions. First, music recordings do not always contain consistent
levels in the low frequency range. Secondly, the sound reproduction system
you are using may not be capable of producing very low frequencies. Third, if
you accurately reproduce low frequencies, room modes (resonant
frequencies within a room) may be interfering with what you hear. Using
headphones can eliminate any problems caused by room modes, but the
headphones may not have a flat frequency response or may have a weak low
frequency response. For recommendations on specific headphone models,
see Section 1.3.3.
2.3.5 Win Combination
Gain combination option refers to the possible gains (boost or cut) that can
be applied at a given frequency. For each question, the software randomly
chooses a boost or cut (if there is more than one possible gain) from the
selected gain combination and applies it to a randomly selected frequency.
When there is only one gain possible, the gain will automatically jump to the
appropriate gain when a frequency is chosen.
As expected, larger changes in gain (12 dB) are easier to hear than
smaller changes in gain (3 dB). Gains are usually easier to identify than cuts,
so it is best to start with gains until one becomes proficient in identifying
them. It's difficult to identify something that has been removed or reduced,
but when switching from the EQ version to the bypass, it is possible to hear
the frequency in question reappear, almost as if it had been raised above
normal.
When working with a band and gain combination that includes a boost
and a cut, such as +/- 6 dB, it is possible that a low cut could be confused with
a high boost and vice versa. A sensitivity to relative changes in frequency
response can cause a cut in the low frequency range to sound like a boost in
the high frequency range.
2.3.6 Q
The Q is a static parameter for any exercise. The default setting of Q = 2 is the
best starting point for all exercises. Higher Qs (narrower bandwidth) are
harder to identify.
2.3.7 sound source
The practice can be done with pink noise, which is generated internally in the
software, or with any two-channel sound file in AIFF or WAV format at sample
rates of 44,100- or 48,000 Hz. Averaged over time, pink noise has the same
power per octave, and its power spectrum appears as a flat line when plotted
logarithmically. It also sounds equally balanced from low to high frequencies
because the auditory system is sensitive to octave (logarithmic) relationships
between frequencies rather than linear differences. The range from 20 to 40
Hz represents one octave (a doubling of frequency) but a difference of only
20 Hz, while the range between 10,000 Hz and 20,000 Hz is also one octave
but a difference of 10,000 Hz. The auditory system perceives both ranges as
the same interval: an octave. In pink noise, both octave ranges (20 to 40 Hz
and 10,000 to 20,000 Hz) have the same power. By using an audio signal that
has the same power across the entire spectrum, we can be sure that a change
in one frequency will likely be as audible as a change in any other frequency.
There is also the option to listen to the sound source in mono or stereo.
If a loaded sound file contains only one audio track (instead of two), the audio
signal will be sent out of the left output only. Pressing the mono button will
send audio to the left and right output channels.
It is best to start with pink noise when beginning any new exercise and
then practice with recordings of various genres and instruments. The greater
the variety of sound recordings used, the more able you will be to transfer
the skills gained in these exercises to other listening situations.
2.3.8 Equalizer selection
In the practice software, an audio signal (pink noise or audio file signal) is
routed to three places:
• Direct without equalization, bypassed
• Through the "Question" equalizer chosen by the computer
• Through the user equalizer ("Your response")
We can select which of these options to audition. Bypass selection
allows us to listen to the original audio signal without applying any
equalization. The selection called “Question” allows us to listen to the
equalization that has been randomly chosen by the software and applied to
the audio signal. The selection called “Your Response” is the equalization
applied by the user, according to the parameters displayed in the user
interface. See Figure 2.9, which shows a block diagram of the practice
module.

Figure 2.9 A block diagram of the signal path for the Technical Ear Trainer Practice Module for Parametric Equalization.

2.3.9 Sound file control


The Sound File Control section of the interface includes a waveform display of
the audio signal. You can select excerpts from the entire audio file by clicking
and dragging the waveform. The audio file loops automatically once it
reaches the end of the file or the end of the selected section. By simply
clicking on the waveform, the waveform is selected from the click location to
the end of the file.
2.3.10 Time limit
In the recording studio or live sound venue, time is of the essence. Engineers
often must make quick and accurate decisions about sound quality and audio
signal processing. To help prepare you for these real-world situations, a time
limit can be applied in the practice module so you can practice identifying EQ
parameters with speed and accuracy.
The keyboard shortcuts included in the software are ideal for quickly
indicating answers when using the timer. When working on exercises with
more than one frequency band, the tab key cycles through the bands. The
up/down arrows can be used to increase or decrease octave frequencies.
Alternatively, the number keys correspond to octave frequencies (0 = 20 Hz, 1
= 63 Hz, 2 = 125 Hz, 3 = 250 Hz, 4 = 500 Hz, 5 = 1000 Hz, 6 = 2000 Hz, 7 = 4000
Hz , 8 = 8000 Hz and 9 = 16,000 Hz) and can be used to jump to an octave
frequency immediately. The left/right arrows adjust the gain of a selected
band in 3 dB increments. For exercises with only one gain option (e.g. +12dB),
the gain is automatically set when the frequency slider is changed from 20Hz
to any other frequency. Returning the frequency slider to 20Hz resets the gain
to 0dB. For exercises with more than one profit option (e.g. e.g., +/- 12dB),
the gain remains at 0dB until the user adjusts it; It does not change
automatically when the frequency is changed.
Sometimes a time limit is useful because it forces us to respond with
our first impression instead of spending too much time thinking and
rethinking. Novice recording engineers who have spent time with the practice
module have often reported that thinking too much about a question leads to
errors and that their first impressions are often the most accurate.
2.3.11 Keyboard shortcuts
• [spacebar] toggles equalizer selection based on practice type:
o Matching: alternate between Question and Your Answer
o Matching Memory: toggles between Question and Your Answer,
until a parameter is changed that toggles between Bypass and
Your Answer
o Return to Flat: toggles between Your Response and Bypass o
Absolute Identification: toggles between Question and Bypass

• [enter] or [return] check the answer and go to the next question


• [q] listen Bypass
• [w] listen to the question
• [e] listen to your answer
• Numbers 1 through 9 correspond to octave frequencies of a selected
band (for example, 1 = 63 Hz, 2 = 125 Hz, 3 = 250 Hz, 4 = 500 Hz, 5 =
1000 Hz, 6 = 2000 Hz, 7 = 4000 Hz, 8 = 8000 Hz, 9 = 16 000 Hz)
• Up/down arrows change the frequency of the selected band
• Left/right arrows change the gain of the selected band
• [tab] select the frequency band to modify, if the number of bands is
more than one
• [esc] turn off the audio

2.4 Working with the EQ practice module


When you first open the EQ practice module, select pink noise in Monitor
Selection, turn on the audio, and adjust the output level to a comfortable
listening level. Make sure the equalizer selection is set to Your Response and
scroll through each octave frequency to feel the sound of each frequency.
Once you change the frequency, the gain will automatically jump to 12dB;
This is the default gain combination setting when opening the software
module. Switch between Bypass (no equalization) and Your Response to
compare the change in timbre created by a boost at each frequency. Initially,
spend some time listening to various frequencies, alternating between flat
and equalized. After you become familiar with what the octave frequencies
sound like with pink noise, load a sound file and do the same thing again,
listening to all the octave frequencies.
When listening to a sound file, start taking note of which instruments
or instrument sound components are affected by each particular octave
frequency. For example, 125 Hz can highlight the low harmonics in a snare or
bass. At the higher end of the spectrum, 8 kHz can produce crisp cymbal
harmonics. If you are listening to a baroque ensemble recording, you may find
that an increase to 8 kHz makes a harpsichord more prominent. Boosts at
specific frequencies can sometimes bring out individual instruments in a mix,
and in fact, skilled mastering engineers use this ability to provide subtle
rebalancing of a mix.
Every recording will be affected slightly differently by a given
frequency, even with comparable instrumentation. Depending on the
frequency content and spectral balance of each individual instrument in a
recording, the effect of an EQ setting will be somewhat different from mix to
mix. This is one reason why an engineer must be attentive to what is required
in each individual recording, rather than simply relying on what may have
worked on previous recordings. For example, just because a 250 Hz cut
worked on a drum in one recording does not mean it will work on all
recordings of the drum.
Sometimes during the recording and mixing process, we may find
ourselves evaluating and questioning our processing and mixing decisions
based on the logic of what seems correct from a numerical standpoint. For
example, let's say we apply a 20 dB cut at 300 Hz on an individual instrument.
There may be a temptation to evaluate the amount of equalization and think
that 20dB is too much, based on what would seem reasonable (i.e., thinking
to ourselves, "I've never had to do this before and it seems like an extreme
setting, so how can it be correct ? ” instead of what sounds reasonable.
Evaluating a decision based on what we believe is appropriate does not
always coincide with what clearly sounds most appropriate. In the end, it
doesn't matter how ridiculous a signal processing or mixing decision may
seem as long as the sonic result fits the artistic vision we have for a project.
As an engineer, we can have a direct effect on the artistic impression created
by recorded music depending on options such as balance and mix levels,
timbre, dynamics, and spatial processing. Judgments about what is
appropriate and appropriate should be made by ear without judging the
actual parameter numbers that are chosen.
2.4.1 vowel sounds
Several researchers have observed that associating specific vowel sounds
with octave frequencies can help listeners identify frequencies due to the
formant frequencies present in each vowel sound (Letowski, 1985;
Miskiewicz, 1992; Opolko & Woszczyk, 1982; Quesnel, 2001; Quesnel and
Woszczyk, 1994; Slawson, 1968). The following vowel sounds correspond
approximately to octave frequencies:
• 250 Hz [u] as in boot
• 500 Hz [o] as in tow
• 1000 Hz [a] as in father
• 2000 Hz [e] as in bet
• 4000 Hz [i] as in beet
Matching frequency resonances to specific vowel sounds can help with
learning and memory of these particular frequencies. Instead of trying to
think of a frequency number, some readers will find it helpful to match the
sound they are hearing with a vowel sound. The vowel sound can be linked to
a specific octave frequency.
2.5 Recommended recordings for practice
The following list identifies some commercially available recordings from
various genres that are suitable for use as sound sources in the EQ software
practice module. They represent examples of high-quality recordings that
have good spectral balance over a wide frequency range. Compact disc
quality versions (i.e., 44.1 kHz digital linear pulse code modulation, AIFF, or
16-bit WAV) should be used for all exercises. Encoded versions (such as MP3,
Windows Media Audio, or Advanced Audio Coding) should never be used for
equalization exercises, even if they have been converted back to PCM. Once
an audio file has been perceptually encoded, its quality has been degraded
and cannot be recovered by converting it back to linear PCM.
Anderson, Arild. (2004). “Straight” from The Triangle . ECM Records.
(jazz piano trio)
Blanchard, Terence. (2001). “On the Sunny Side of the Street” from
Let's Get Lost . Sony. (jazz with vocals)
Earth, Wind & Fire. (1998). “September” from Greatest Hits . Sony.
(R&B pop)
Hellendaal, Pieter. (1991). “Concerto II—Presto” from 6 Concerti
Grossi . Perf. The European Community Baroque Orchestra. Channel Classics.
(Baroque orchestra)
Le Concert des Nations. (2002). “Marche pour la cérémonie” from
Soundtrack from the film Tous les matins du monde . Alia Vox Spain. (Baroque
orchestra)
Randall, Jon. (2005). Walking Among the Living . Epic/Sony BMG Music
Entertainment. (roots music/bluegrass)
Steely Dan. (2000). “Gaslighting Abbie” from Two Against Nature .
Giant Records. (pop)
The Police. (1983). “Every Breath You Take” from Synchronicity . A&M
Records. (rock)
There are also some artists who are making multitrack tracks available
for purchase or free download. Apple's GarageBand and Logic also offer
recordings of solo instruments that can be useful with the software.
Summary
Equalization is one of the most important tools of any audio engineer. It is
possible to learn to identify resonances and anti-resonances by ear through
practice. The included software practice module can serve as an effective tool
for progress in technical ear training and critical listening when used for
regular and consistent practice.
Chapter 3
SPATIAL ATTRIBUTES AND REVERBERATION

Reverb is used to create distance, depth and width in recordings, whether


captured with microphones during the recording process or added later
during mixing. In classical music recording, engineers strive to achieve a fairly
natural representation of a musical ensemble on stage in a reverberant
performance space. In this type of recording, microphones are placed to
capture direct sound coming directly from the instruments, as well as indirect
sound reflected from a surrounding room (walls, ceiling, floor, seats).
Engineers seek to achieve an appropriate balance of direct and indirect sound
by adjusting microphone locations and angles.
Pop, rock, electronic, and other styles of music that predominantly use
electric instruments and computer-generated sounds are not necessarily
recorded in reverberant acoustic spaces. Rather, a sense of present space is
often created through the use of artificial reverb and delays, after the music
has been recorded in a relatively dry acoustic space. Artificial reverb and delay
are used both to imitate real acoustic spaces and to create completely
unnatural sound spaces.
Delay and reverb help create a sense of depth and distance in a
recording, helping to place some sound sources further away (i.e.,
overshadow them) while other, less reverberant elements remain at the front
of a soundstage. ghosting. An engineer can not only make sounds seem more
distant and create the impression of an acoustic space, but can also influence
the character and mood of a musical recording with careful use of reverb. In
addition to depth and distance control, the angular location of sound sources
is controlled through amplitude panning. When listening over speakers, an
engineer essentially has two dimensions within which to control the location
of a sound source: distance and angular location (azimuth).
Together, we can consider the properties of the location of the sound
source within a simulated acoustic space, the qualities of a simulated acoustic
space, as well as the coherence and spatial continuity of a sound image
collectively as the spatial attributes of a recording. .

3.1 Analysis of perceived spatial attributes


The auditory system extracts information about the spatial attributes of a
sound source, whether the source is an acoustic musical instrument or a
recording of a musical instrument played over speakers. Spatial attributes
help determine with varying levels of precision the azimuth, elevation, and
distance of sound sources, as well as information about the environment or
room in which they occur. The binaural hearing system relies on interaural
time differences, interaural intensity differences, and filtering by the pinna or
outer ear to determine the location of a sound source (Moore, 1997). The
process of localizing sound images reproduced by loudspeakers is somewhat
different from the localization of individual acoustic sources, and in this
chapter we will concentrate on the spatial attributes that are relevant to the
production of audio and, therefore, the reproduction of sound by
loudspeakers. speakers.
Spatial attributes include the perceived layout of sources in a sound
image, the characteristics of the acoustic environment in which they are
placed, as well as the overall quality of a sound image produced by
loudspeakers. It is essential for a recording engineer to have a highly
developed sense for any spatial processing already present or added to a
recording. Panning and spatial effects have a large effect on the balance and
combination of elements in a mix, which in turn influences the way listeners
perceive a musical recording. For example, using a longer reverb time can
create drama and emotion in a music recording by creating the impression
that the music emanates from a large space. Alternatively, with the use of
short reverb times, an engineer can create a sense of intimacy or rawness in
the music.
The spatial arrangement of sources in a sound image can influence the
clarity and cohesion of a recording, as spatial masking plays a role in the
perceived result. Occasionally, the use of reverb in a dense sound recording
may seem inaudible or at least difficult to identify because it is mixed in and
partially masked by the direct sound. When mixing a track with a small
amount of reverb, there are times when it is useful to mute and unmute any
additional reverbs to hear their contribution to a mix.
When considering the parameters available in artificial reverb, such as
decay time, predelay time, and early reflections, we must also take into
account the subjective impressions of spatial processing as we translate
between controllable parameters and their sonic results. For example, there is
usually no parameter labeled "distance" in a reverb processor, so if we want
to make a sound source more distant, we need to control the distance
indirectly by adjusting the parameters in a coordinated way until we get the
desired direction. away. An engineer must translate between objective
reverberation parameters to create the desired subjective impression of the
source location and the simulated acoustic environment. It is difficult to
separate the control of sound source distance from the simulation of an
acoustic environment, because an integral part of distance control is the
creation of a perceived soundstage within a mix, a virtual environment from
which sound sources appear to emanate. musical sounds.
The choice of reverb parameter settings depends on several factors,
such as the nature of the transient and width of a dry sound source, as well as
the decay and early reflection characteristics of a reverb algorithm.
Professional engineers often identify subjective qualities of each reverb that
bring them closer to their specific goals for each mix rather than simply
choosing parameter settings that worked in other situations. A particular
combination of parameter settings for a source and reverb generally cannot
be duplicated simply to obtain an identical distance and amplitude effect with
a different source or reverb.
We can benefit from analyzing spatial properties from objective and
subjective perspectives, because the tools have objective parameters, but our
ultimate goal in recording is to achieve a great sound mix, not to identify
specific parameter settings. As with equalization, we must find ways to
translate between what we hear and the parameters available for control.
Spatial attributes can be divided into the following categories and
subcategories:
• Placement of direct/dry sound sources
• Characteristics of Acoustic Spaces and Ghost Image Soundstages
• Characteristics of an overall sonic image produced by loudspeakers

3.1.1 sound sources


3.1.1.1 angular location
Also called azimuth, the angular location of a sound source is its perceived
location along the horizontal plane relative to the left and right speakers.
Typically, it is best to spread the sources across the stereo image so that there
is less masking and more clarity for each sound source. Sounds can mask each
other when they occupy a similar frequency range and angular location.
Each microphone signal can be panned to a specific location between
the speakers using the conventional constant power panning found on most
mixers. Panning can also be achieved by delaying the output of a signal to one
speaker channel relative to the other speaker output. The use of delay for
panning is not common because its effectiveness depends largely on the
location of the listener in relation to the speakers.
Balancing the signals of some stereo miking techniques will generally
require panning each pair of microphone signals fully left and fully right. The
resulting positions of the sound sources in front of each pair of microphones
will depend on the stereo miking technique used and the respective locations
of each source.

3.1.1.2 Distance
Although human perception of absolute distance is limited, the relative
distance of sounds within a stereo image is important in giving depth to a
recording. Large ensembles recorded in acoustically live spaces are likely to
exhibit a natural sense of depth, analogous to what we would hear as an
audience member in the same space. With recordings made in acoustically
dry spaces, such as studios, engineers often look to create depth using delays
and artificial reverb. Engineers can control the distance from the sound source
by adjusting physical parameters such as the following:
• Direct sound level . Quieter sounds are considered farther away
because there is a 6 dB loss in sound intensity for doubling the distance
from a source. This signal may be ambiguous to the listener because a
change in volume may be the result of a change in distance or a change
in the acoustic power of a source.
• Reverb level . As a source moves further away from the listener in a
room or hallway, the level of direct sound decreases and the
reverberant sound remains the same, reducing the ratio of direct to
reverberant sound.
• Distance from microphones to sound sources . Moving the microphones
further away decreases the direct to reverberant ratio and therefore
creates a greater sense of distance.

• Location and level of the room microphone . Microphones placed on the


opposite side of a room or hallway from where musicians are standing
pick up sound that is primarily reverberant or diffuse. The room
microphone signals can be thought of as a reverb return on a mixer.
• Low-pass filtering of direct sounds with close microphones . High
frequencies are attenuated more than lower frequencies due to air
absorption. Additionally, the acoustic properties of a room's reflective
surfaces affect the spectrum of reflected sound that reaches the
listener's ears.

3.1.1.3 Spatial extension


Sometimes the sound source locations in a mix are precisely defined, while
other times the sound source location is blurrier and more difficult to identify.
Spatial extent describes the perceived width of a font. A related concept in
concert hall acoustics research is Apparent Source Width, which is related to
the strength, time, and direction of lateral reflections. Barron (1971) found
that stronger lateral reflections would result in wider ASW.
The perceived amplitude of a sound image produced by speakers will
vary with the microphone technique used and the sound source being
recorded. Spaced microphones produce a wider sound source because the
level of correlation between the two microphone signals is reduced as the
microphones are spaced further apart. As with concert hall acoustics, the
perceived width of sources played through speakers can also be influenced by
early reflections, whether recorded with microphones or artificially
generated. If artificial early reflections (in stereo) are added to a single close-
mic recording of a sound source, the direct sound tends to perceptually merge
with the early reflections and produces an image that is broader than the dry
sound alone.

The spatial extent of sound sources can be controlled by physical


parameters such as the following:
• Early reflection patterns that originate in a real acoustic space or
are artificially generated with reverberation
• Type of stereo miking technique used: Spaced microphones
generally produce a wider spatial image than matched
microphone techniques

3.1.2 Acoustic spaces and sound stages


An engineer can control additional spatial attributes, such as the perceived
characteristics, qualities, and size of the acoustic environment in which each
sound source is placed in a stereo image. The environment or soundstage may
consist of a real acoustic space captured with room microphones, or may be
created by artificial reverb added during mixing. There may be a common type
of reverb for all sounds, or some sounds may have unique types of reverb
added to help differentiate them from the rest of the instruments. For
example, it is quite common to treat vocals or solo instruments with a
different reverb than the rest of an accompanying ensemble.

3.1.2.1 Reverb Decay Character


Decay time is one of the most common parameters in artificial reverb devices.
When recording acoustic instruments in a live acoustic space, the reverb
decay time is often not adjustable, however some recording spaces have been
designed with panels on the wall and ceiling surfaces that can be rotated to
expose various sound-absorbing or reflective materials, allowing for some
variation in reverberation decay time.
Decay time is defined as the time that the sound continues to persist
after the direct sound stops playing. Longer reverb times are typically more
audible than shorter reverb times for a given reverb level. Transient sounds
like drums or percussion expose the decay time more than sustained sounds,
allowing us to hear the decay rate more clearly.
Some artificial reverb algorithms will incorporate modulation into the
decay to give it variation and hopefully make it sound less artificial. A perfectly
smooth decay is something we rarely hear in a real room, and a stripped-
down artificial reverb can sound abnormally smooth.

3.1.2.2 Spatial extent (width and depth) of the sound stage


A sound stage is the acoustic environment within which a sound source is
heard, and must be differentiated from a sound source. The environment can
be a recording of a real space, or it can be something that has been artificially
created using artificial delay and reverbs.

3.1.2.3 Spatiality
Spatiality represents the perception of the physical and acoustic
characteristics of a recording space. In concert hall acoustics, it is related to
the envelope, but with only two speakers in stereo playback, it is difficult to
achieve true surround. We can use the term spatiality to describe the feeling
of space within a recording.

3.1.3 General characteristics of stereo images


Also grouped under spatial attributes are elements that describe the general
impressions and characteristics of a stereo image reproduced by
loudspeakers. A stereo image is the illusion of the location of the sound
source of the speakers. Although there are only two speakers for stereo, the
human binaural hearing system allows for the creation of ghost images at
locations between the speakers. In this section, we consider the general
qualities of a stereo image that are more generalized than those specific to
the source and sound stage.
3.1.3.1 Coherence and relative polarity between channels
Despite the widespread use of stereo and multichannel playback systems
among consumers, mono compatibility remains vitally important, mainly
because we can listen to music through computers and mobile phones with a
single speaker. Checking a mix for mono compatibility involves listening for
changes in timbre that result from destructive interference between the left
and right channels. In the worst case with stereo channels of opposite
polarity, summing to mono will cancel out a significant portion of a mix. Every
project an engineer mixes should be checked to ensure that the two channels
of a stereo mix do not have opposite polarity. When the left and right
channels are identical and of opposite polarity, they will completely cancel
out when added. If both channels are identical, then the mix is monophonic
and not truly stereo. Most stereo mixes include some combination of mono
and stereo components. We can describe the relationship between the signal
components in the left and right channels of a mix as existing along a
correlation scale between - 1 and 1:
• Left and right are identical - composed of signals that are in the center
of the panorama, with a correlation of 1
• Left and right have nothing in common: signals that move to one side or
the other, or similar signals that have a correlation of 0 between
channels
• The left and right channels are identical, but opposite polarity; the
signals have a correlation of – 1
Phase meters provide an objective way to determine the relative
polarity of stereo channels, but if such meters are not available, an engineer
must rely on his ears. Left and right channels of opposite polarity can be
identified by listening to an extremely wide stereo image, so when sitting in
the ideal listening position (see Fig. 1.2), the sound from the speakers seems
to come from the sides. Another characteristic of opposite polarity channels is
that the stereo image is unstable and tends to move exaggeratedly with small
head movements. Section 3.7.3 provides more information on listening to
channels of opposite polarity.
Sometimes an individual instrument may be represented in a mix by
two identical but opposite polarity signals, framed left and right. If such a
signal exists, a phase meter may not register it strongly enough to provide an
unambiguous visual indication. Sometimes the stereo line outputs of electric
instruments are of opposite polarity or perhaps a polarity reversal cable was
used during recording by mistake. Often the stereo outputs (left and right) of
electronic instruments are not actually stereo but mono. When one output is
of opposite polarity, the two channels will cancel out when added to mono.

3.1.3.2 Spatial continuity of a sound image from one speaker to another


As a general attribute, mixing engineers consider the continuity and balance
of a sound image from one speaker to another. An ideal stereo image will be
balanced between left and right and will not have too much or too little
energy located in the center. Pop and rock music mixes often have a strong
center component due to the number and strength of instruments that
typically have a center stage, such as the kick, snare, bass, and vocals.
Classical and acoustic music recordings may not have an equally strong center
image, and there may be a deficiency in the amount of energy in the center,
sometimes referred to as having a "hole in the middle." Engineers strive to
have a uniform and continuous distribution of sound energy from left to right.
3.2 Basic components of digital reverb
Next, we'll explore two fundamental processes found in most digital reverb
units: time delay and reverb.

3.2.1 Time Delay


Although a simple concept, time delay can serve as a fundamental building
block for a wide variety of complex effects. Figure 3.1 shows a block diagram
of a single delay combined with a non-delayed signal. Figure 3.2 shows what
the output of the block diagram would look like if the input were a pulse.
By simply delaying an audio signal and mixing it with the original
undelayed signal, the result is a comb filter (for shorter delay times) or echo
(for longer delay times). By adding hundreds of delayed versions of a signal in
an organized way, early reflection patterns like those found in real acoustic
spaces can be mimicked. Chorus and flanger effects are created by using time-
varying delays.

Figure 3.1 A block diagram of a delay line.

3.2.2 Reverberation

Figure 3.2 A time-based view


of the output of a signal (in
this case an impulse) plus a
delayed version of itself.

Whether originating in a real acoustic space or an artificially generated one,


reverb is a powerful effect that provides a sense of width, depth, cohesion
and distance in recordings. Reverb helps mix recorded tracks to create a
unified sound image where all components of an image reside in a common
acoustic space. In the reproduced sound, reverb can create the illusion of
being immersed in an environment that is different from our physical
environment.
On the other hand, reverb, like any other type of audio processing, can
also create problems in sound recording and production. Reverberation of too
high a level or too long a decay time can destroy the clarity of direct sounds
or, as in the case of speech, affect the intelligibility of what is said. The quality
of the reverb should be optimized to suit the musical and artistic style being
recorded.
Reverb and delay have important functions in recording music, such as
assisting instruments and vocals in a recording and "gel" mix. By using reverb,
an engineer can influence a listener's feel of a mix by creating the illusion of
sources performing in a common acoustic space. Additional layers of reverb
and delay can be added to accentuate and highlight specific leads.
The sound of a closed-mic instrument or singer playing through
speakers creates an intimate or perhaps even uncomfortable feeling for the
listener. Listening to such a recording through headphones can create the
impression that a singer is just inches from the ear, and this is not something
listeners are used to hearing acoustically in a live music performance. Live
music performances are typically heard at some distance, meaning that sound
reflected from the walls, floor, and ceiling of a room perceptually merges with
sound coming directly from a sound source. When using a close microphone
placement in front of a musical performer, it is often helpful to add some
delay or reverb to the "dry" signal to create some perceived distance between
the listener and the sound source.
Conventional digital reverb algorithms use a network of delays, all-pass
filters and comb filters as their basic components, based on the original idea
of Schroeder (1962) ( figure 3.3 ). Equalization is applied to alter the spectral
content of reflections and reverb. In its simplest form, artificial reverb is
simply a combination of delays with feedback or recursion. Each time a signal
passes through the feedback loop, its level is reduced by a preset amount so
that its strength decays over time.
Figure 3.3 A block diagram of Manfred Schroeder's original digital reverberation algorithm

More recent reverb algorithms have been designed to convert an


impulse response of a real acoustic space with the incoming "dry" signal.
Hardware units capable of convolution-based reverb have been commercially
available since the late 1990s, and software implementations are now
commonly released as plug-ins with digital audio workstation software.
Convolution reverb is sometimes called sampling reverb because a "sample"
of an acoustic space (i.e., its impulse response) is convolved with a dry audio
signal. Although it is possible to calculate in the time domain, the convolution
process is usually completed in the frequency domain so that the calculation
is fast enough for real-time processing. The resulting audio signal from a
convolution reverb is arguably a more realistic sounding reverb than is
possible with conventional digital reverb. The main drawback is that there is
not as much flexibility or control over the parameters of the convolution
reverb as is possible with digital reverb based on comb filters and all filters.
In conventional digital reverb units, there are several possible
parameters that can be controlled. Although these parameters vary from
manufacturer to manufacturer, some of the most common ones include the
following:
• Reverb decay time (RT60)
• Delay time
• Predelay time
• Some control over initial reflection patterns, either choosing from
predefined sets of initial reflections or controlling individual reflections
• Low Pass Filter Cutoff Frequency
• High Pass Filter Cutoff Frequency
• Decay time multipliers for different frequency bands
• Gate: threshold, attack time, retention time, release or decay time,
depth.
Although most digital reverb algorithms represent simplified models of
the acoustics of a real space, they are widely used in recorded sound to help
increase the recorded acoustic space or to create a sense of spaciousness that
did not exist in the recording environment. original.

3.2.2.1 Reverb Decay Time


Reverberation time is defined as the amount of time it takes for a sound to
decay 60 dB once it is turned off. Usually referred to as RT60, WC Sabine
proposed an equation to calculate it in a real acoustic space (Howard &
Angus, 2006):

V = volume in m³, S = surface area in m² for a given type of surface material


and α = absorption coefficient of the respective surface.
Because RT60 will have a value greater than zero even if α is 1.0 (100%
absorption on all surfaces), the Sabine equation is generally only valid for
values of α less than 0.3. In other words, the drawback of the Sabine equation
is that even in an anechoic chamber, a reverberation time would still be
calculated, even though no reverberation would be measured acoustically.
Norris-Eyring proposed a slight variation in the equation for a wider range of
values (Howard and Angus, 2006):

It is essential for an engineer to have an intuitive sense of what the


decay times of various values mean in terms of how they sound. A 2 second
decay time will have a very different sonic effect in a mix than a 1 second
decay time.
3.2.2.2 Delay Time
A direct delay without feedback or recursion of an audio signal is often mixed
with the dry signal to create a sense of space, and can complement or replace
the use of reverb.
With shorter delay times (less than about 30 milliseconds), the auditory
system tends to fuse direct and delayed sounds, judging the position of the
combined sound based on the location of the direct sound. The phenomenon
is known as the precedence effect , Haas effect , or law of the first wavefront .
With delay times of more than about 30 milliseconds, the delayed signal is
heard as a distinct echo of a direct sound. The actual amount of delay time
needed to create a distinct echo depends on the nature of the audio signal
being delayed. Transient percussive signals reveal distinct echoes with much
shorter delay times (less than 30 milliseconds), while sustained steady-state
signals require much longer delay times (greater than 50 milliseconds) to
create an audible echo.

3.2.2.3 Predelay Time


Pre-delay time is typically defined as the delay time between the direct sound
and the start of the reverb. Perceptually, it can give the impression of a larger
space as the predelay time increases. In a real acoustic space with no physical
obstructions between a sound source and a listener, there will always be a
small delay between the arrival of direct and reflected sounds. The longer this
initial delay is, the larger a space is perceived to be.

3.2.3 Digital Reverb Presets


Most digital reverb units available today, whether in plug-in or hardware
form, offer hundreds, if not thousands, of reverb presets. What may not be
immediately obvious to the novice engineer is that there are typically only a
handful of different algorithms for a given reverb type or model. Presets are
simply the same algorithms repeated with variations in parameter settings
and named individually to reflect the type of space the unit is modeling or a
possible application such as big room, bright vocals, studio drums, or theater.
All presets that use a given algorithm type represent identical types of
processes and will sound identical if the parameters of each preset match.
Because engineers adjust many reverb parameters to create the most
suitable reverb for each application, it can make sense to pick any preset and
start adjusting the parameters rather than trying to find a preset that works
without any adjustments. The main drawback to trying to find the correct
preset for each instrument and voice during a mix is that the "correct" preset
may not exist and will probably require parameter adjustment anyway. It may
be best to start right away by choosing any preset and editing the parameters
to suit a mix. The process of editing parameters instead of trying to find a
preset will also help you learn the capabilities of each reverb and the sonic
result of each parameter change.
Although it may not be the best use of time to look up a preset during
the mixing process, there is an advantage to reviewing presets and listening to
them because it can give a clearer idea of what a reverb unit may sound like
at many different parameter settings. . This listening exercise should be done
at a time outside of a mixing project to allow time to listen and become
familiar with the hardware and software at our disposal.

3.3 Reverb in multichannel audio


From a practical point of view, my informal research and listening seem to
indicate that, in general, higher levels of reverb are possible in multichannel
audio recordings than in two-channel stereo, while maintaining an acceptable
level of clarity. More formal testing is needed to verify this point, but it may
make sense from what we know about masking. Masking of one sound by
another is reduced when the two sounds are spatially separated (Kidd et al.,
1998; Saberi et al., 1991). It appears that due to the greater spatial
distribution of sound in multichannel audio, relative to two-channel stereo,
reverb is less likely to obscure or mask the direct sound, and therefore may be
more prominent in multichannel audio. .
One could argue that reverb is increasingly critical in downmixed
recordings for multichannel audio playback because multichannel audio offers
a much greater chance of recreating a feeling of immersion in a virtual
acoustic space than two-channel stereo. Much more research has been done
on the spatial dimension of reproduced sound in recent years as multichannel
audio has gained popularity and its distribution has grown to a broader
audience. As such, recording engineering students can benefit from a
systematic training method to learn to match artificial reverb parameter
settings "by ear" and further develop the ability to consistently identify subtle
details of the sound reproduced by the speakers.
Recording music and sound for multi-channel playback also presents
new challenges over two-channel stereo in terms of creating a detailed and
immersive sound image. One of the difficulties with multichannel audio
playback using the ITU-R BS.775 speaker design (ITU-R, 1994) is the large
space between the front and rear speakers (80 to 90° spacing; see Fig. 1.3).
Due to speaker spacing and the nature of our binaural sound localization
capabilities, side ghosting is often unstable. Additionally, it is challenging to
produce ghost images that bridge the front and rear sound image. Reverb can
be useful for creating the illusion of sound images that span the space
between speakers.

3.4 Software training module


The included software training module is a tool to help hear subtle details and
artificial digital reverb parameters rather than an ear trainer for the
perception of room acoustics. It is possible that the skills obtained through
the use of this system assist in the perception of acoustic characteristics, but it
is not clear how well one skill transfers to the other. Most conventional digital
reverb algorithms are based on various combinations of comb and all-pass
filters according to the model developed by Schroeder, and although these
algorithms are computationally efficient and provide many controllable
parameters, they are not physical models of the behavior of sound in a real
environment. room. Therefore, it is not possible to confirm that artificial
reverberation parameters such as decay time are identical to those found in
sound in a real acoustic space. It is not clear how closely the reverb decay
time (RT60) of a given artificial reverb algorithm relates to the decay time of
sound in a real room. For example, if the decay times of different artificial
reverb units or plug-ins are set to 1.5 seconds, the perceived decay time may
differ between the units. Additionally, reverberation time sometimes depends
on other parameters of an algorithm. It is not always clear exactly what other
parameters such as "size" control or why they may affect the perceived decay
time without changing the displayed decay time. Due to the variability of
perceived decay time between units and algorithms, it is perhaps best not to
learn absolute decay times, but rather to learn to listen for differences
between representative examples and be able to match parameter settings.
However, reverb is a powerful sonic tool available to recording engineers who
mix it with recorded sound to create the auditory illusion of real acoustics and
spatial context.
Just as it is essential to train audio engineers to recognize spectral
resonances, it is equally important to improve our perception of the subtleties
in artificial reverb. At least one researcher has shown that listeners can
"learn" the reverberation of a given room (Shinn-Cunningham, 2000). Other
work has also been carried out to train listeners to identify the spatial
attributes of sound. Neher et al. (2003) have documented a method for
training listeners to identify spatial attributes using verbal descriptors for the
purpose of assessing spatial audio quality.
Research has been conducted to describe the spatial attributes of
reproduced sound using graphical evaluations (such as Ford et al., 2003 and
Usher & Woszczyk, 2003). An advantage of the training system discussed here
is that you compare one spatial scene with another, by ear, and there is never
a need to translate or map an auditory sensation to a second sensory
modality and subsequently to a means of expression, such as drawing a
picture. or choosing a word. With the system, you can compare and combine
two sound scenes, within a given set of artificial reverb parameters, using only
the auditory system. Therefore, there is no isomorphism between the
different meanings and methods of communication. Additionally, this method
has ecological validity as it mimics the process of a sound engineer sculpting
the sonic details of a sound recording by ear rather than through graphics and
words.

3.5 Software Training Module Description


The included software training module "TETpracticeReverb" is available for
listening exercises. The computer randomizes the exercises and offers a
choice of difficulty and a selection of parameters for an exercise. It works in
the same way as the EQ module described in Chapter 2.
3.5.1 sound sources
Readers are encouraged to begin the training course with simple, transient, or
impulsive sounds, such as percussion, and progress to more complex sounds,
such as voice recordings and music. In the same way that pink noise is used in
the initial stages of frequency auditory training because it exposes a certain
amount of EQ better than most musical samples, percussive or impulsive
sounds are used for the initial levels of training in processing time-based
effects because the sonic character of reverb is more evident than with
steady-state sources. The temporal character of a sound affects the ability to
hear the reverb qualities when the two are mixed. Typically transient or
percussive sounds reveal reverberation, while more stable-state sustained
musical passages tend to mask or blend with the reverberation, making
judgments difficult.
3.5.2 User interface
A graphical user interface (GUI), shown in Figure 3.4, provides a control
surface for you to interact with the system. With the GUI you can do the
following:
• Choose the difficulty level
• Select the parameters to work with
• Choose a sound file
• Adjust Reverb Parameters
• Toggle between the reference and your answer
• Controls the overall level of sound output
• Submit an answer to each question and move on to the next example
The graphical interface also keeps track of the current question and the
average score up to that point, and provides the score and the correct answer
for the current question.

Technical Ear Trainer - Reverberation

N. . cat I

Cl--- e--
Sclcctall
Load sound file..

Ciqar
gound fil

3.6 Introduction to practice


The training curriculum covers some of the most common parameters in
digital reverb units, including the following:
• decay time
• Predelay time
• Reverb (mix) level
• Combinations of two or more parameters at the same time
The main task in the exercises and tests is to sonically duplicate a reference
sound scene by listening to and comparing its response with the reference
and making appropriate changes to the parameters. The software randomly
chooses a parameter value based on the difficulty level and the parameter
being tested, and asks you to identify the reference's reverb parameters by
setting the appropriate parameter to the value that most closely matches the
reference's sound. You can toggle between the reference question and your
answer by clicking the switches labeled “Question” and “Your Answer” (see
Fig. 3.4) or by pressing the space bar on the computer keyboard. Once the
two sound scenes match, you can click the check answer or press the Enter
key to submit the answer and see the correct answer. Clicking the next button
takes you to the next question.
3.6.1 Fall time
Decay times range from 0.5 seconds to 2.5 seconds with an initial resolution
of 1.5 seconds and increase in difficulty at a resolution of 0.25 seconds.
3.6.2 Predelay time
Pre-delay time is the amount of delay time between the direct (dry) sound
and the beginning of the first reflections and reverberations. Pre-delay times
vary between 0 and 200 milliseconds, with an initial resolution of 40 ms and
decreasing to a resolution of 10 ms.
3.6.3 Mixing level
Often, when mixing reverb into recorded sound, the level of the reverb is
adjusted as an auxiliary return on the recording console or digital audio
workstation. The training system allows you to practice learning various
"mixed" reverb levels. A mix level of 100% means that there is no direct (raw)
sound in the output of the algorithm, while a mix level of 50% represents an
output with equal levels of processed and raw sound. The resolution of the
mixing values at the lowest difficulty level is 25% and progresses to a
resolution of 5%, covering the mixing range from 0 to 100%.

3.7 Mid-Side Matrix


Michael Gerzon (1986, 1994) has presented mathematical explanations of
matrixing and mixing stereo recordings to enhance and rebalance the
correlated and uncorrelated components of a signal. The techniques he
suggests are useful for technical ear training because they can aid in the
analysis and deconstruction of a recording by bringing to light components of
a sound image that would not otherwise be as audible.
By applying the principles of stereo mid-side miking technique to
completed stereo recordings, it is possible to rebalance aspects of a recording
and learn more about the techniques used in a recording. Although this
process takes its name from a specific stereo miking technique, any stereo
recording can be post-processed to convert the left and right channels to mid
(M) and side (S), regardless of the mixing or miking technique used.
Mastering engineers sometimes split a stereo recording into its M and S
components and then process them in some way and convert them back to L
and R once again.
The average component can be derived by adding the left and right
channels. In practice, this can be done by placing the two audio channels on
two faders and moving them towards the center. The L and R channels can be
split and sent to two other channel pairs. One pair can move completely to
the left and with the L channel of opposite polarity. The final pair of L and R
channels can be panned to the right with the right channel of opposite
polarity. See Figure 3.5 for details on signal routing information. Now that the
signals are split into M and S, we can simply rebalance these two components,
or we can apply processing to them independently. The S signal represents
signal components that satisfy any of the following conditions:
• Exists only on the L channel or only on the R channel
• They are opposite in polarity, L relative to R
3.7.1 The middle component
The mid signal represents all components of a stereo mix that do not have
opposite polarity between the two channels, that is, anything that is common
to both channels or is only present on one side. As we can see from the block
diagram presented in Figure 3.5, the M component is derived from L + R.
3.7.2 The lateral component
The side signal is obtained by subtracting the L and R channels: side = L - R.
Anything that is common to L and R will cancel out and will not be part of the
S component. Any signal that is in the center of a mix will be canceled from
the S component.
3.7.3 Exercise: Listen to Mid-Side Processing
The included "TETlisteningMidSide" practice module offers an easy way to
listen to the mid and side components of any stereo recording (AIFF or WAV
file formats) and hear how it sounds if rebalanced. By converting a stereo mix
(L and R) to M and S signals, it is possible to hear elements of the mix that
may have been masked in the full mix. In addition to being able to hear the
stereo reverb better, other artifacts sometimes become evident. Artifacts
such as punch-through, distortion, dynamic range compression, and edits can
become more audible if we listen to just the S component. Many stereo mixes
have a strong center component, and when you remove that component, you
also remove anything in the center of the stereo image. The punch-outs that
tend to be a bigger problem with analog tape recordings are more audible
when listening to the S component in isolation. A punch-in is usually done
during an overdub of a multitrack recording, where the lead instrument or
voice will record a part and may want to arrange a certain section of the
music. A punch-in is pressing the record button on the recorder for a specific
track somewhere in the middle of the piece of music.
By splitting a stereo mix into its M and S components, some of the
differences created by the perceptual encoding process can be highlighted
(for example, MP3 or AAC that has been converted back to AIFF or WAV).
Although most of the artifacts are masked by the stereo audio, removing the
M component makes the artifacts more audible.
Additionally, when listening to the side component at 100%, we are
hearing a correlation of -1 because one speaker produces the original side
component and the other speaker produces an opposite polarity version of
the side component.

Summary
This chapter covers the spatial attributes of sound, focusing primarily on
reverb and mid-side processing. The goal of the reverb software practice
module is to systematically familiarize listeners with aspects of artificial reverb
and increase auditory sensitivity to time-based effects processing. When
comparing two audio scenes by ear, a listener can match one or more artificial
reverb parameters to a reference randomly chosen by the software. Listeners
can move from comparisons that use percussive sound sources and coarse
resolution between parameter values to more stable musical recordings and
finer resolution between parameter values. Often, very small changes in
reverb parameters can have a significant influence on the depth, mix, width
and clarity of the final mix of a sound recording.

Chapter 4
DYNAMIC RANGE CONTROL

Achieving an appropriate balance of a musical ensemble is essential to


expressing an artist's musical intent. Conductors and composers understand
the idea of finding the optimal ensemble balance for each performance and
piece of music. If an instrumental part within an ensemble is not loud enough
to be heard clearly, listeners do not receive the full impact of a piece of music.
Overall balance depends on the control of individual vocal and instrumental
amplitudes in an ensemble.
By recording spot microphone signals on multiple tracks and mixing
those tracks, an engineer has some control over musical balance and
therefore musical expression as well. When mixing multiple tracks, it may be
necessary to continually adjust the level of certain instruments or voices to
achieve a constant balance from the beginning to the end of a track.
Dynamic range in the musical sense describes the difference between
the highest and lowest levels of an audio signal. For microphone signals that
have a wide dynamic range, adjusting fader levels over time can compensate
for variations in signal level and therefore maintain a constant perceived
loudness. Fader level adjustments made throughout the duration of a piece
are equivalent to manual dynamic range compression; An engineer is
manually reducing the dynamic range by boosting levels during quiet sections
and attenuating loud sections. Dynamic Range Controller Compressors and
Expanders: Adjust levels automatically based on the level of an audio signal
and can be applied to individual audio tracks or to a mix as a whole.

One type of sound that can have an extremely wide dynamic range is a
lead vocal, especially when recorded with a nearby microphone. In extreme
cases in pop and rock music, a singer's dynamic range can vary from the
loudest screams to just a whisper, all within a single song. If a vocal track's
fader is set at one level and left for the duration of a piece without
compression, there will be times when the vocals will be too loud and other
times when they will be too quiet. When a vocal level goes too high, it
becomes uncomfortable for the listener who may want to turn down the
entire mix. In the opposite situation, a voice with a level that is too low
becomes difficult to understand, leaving an unsatisfactory musical experience
for the listener. It is probably impossible to find a satisfactory static fader level
without compression for a sound source as dynamic as pop vocals. One way
to compensate for a wide dynamic range is to manually adjust the fader level
for each word or phrase a singer sings. Although some tracks require such
detailed manual control of the fader level, the use of compression is still
useful in reaching the goal of consistent, intelligible and musically satisfying
levels, especially for tracks with a wide dynamic range. Consistent levels for
instruments and vocals help communicate an artist's musical intentions more
effectively.
At the same time, engineers also understand that dynamic contrast is
important to help convey musical emotion. The question arises, if the level of
a vocal track is adjusted so that fortissimo passages are the same volume as
pianissimo passages, how is a listener going to hear any dynamic contrast?
The first part of the answer to this question is that the application of level
control depends in part on gender. Most classical music recordings will not
benefit as much from this type of active level control. For most other genres
of music, at least some amount of dynamic range control is desirable. And
specifically for pop and rock recordings, the goal is a more limited dynamic
range to be consistent with recordings of this style.
Fortunately, the perception of dynamic range will be maintained due to
timbral changes between quiet and loud dynamic levels. For almost all
instruments, including the voice, there is a significant increase in the number
and strength of high-frequency harmonics as the dynamic level goes from
quiet to loud. So even if the dynamic range of a dynamic vocal performance is
highly compressed, the perception of dynamic range remains due to changes
in the timbre of the voice. Regardless of timbral differences, it is possible to
take dynamic range reduction too far, leaving a musical performance lifeless.
Engineers must be conscious about using too much compression and limiting
because it can be quite destructive when used excessively. Once a track is
recorded with compression, there is no way to completely undo the effect.
Some types of audio processing, such as reciprocal peak/decay equalization,
allow minor alterations to be undone with equal parameters and opposite
gain settings, but compression and limiting do not offer such transparent
flexibility.
Dynamic range control can be thought of as a type of amplitude
modulation where the modulation rate depends on the amplitude envelope
of an audio signal. Dynamic processing is simply a gain reduction applied to a
signal where the gain reduction varies over time based on variations in the
level of a signal, with the amount of reduction based on the amplitude of a
signal level above of a certain threshold. Compression and expansion are
examples of nonlinear processing because the amount of gain reduction
applied to a signal depends on the level of the signal itself and the gain
applied to a signal changes over time. Dynamic processing, such as
compression, limiting, expansion, and gating, offer means to sculpt and shape
audio signals in unique and time-varying ways. It is time variable because the
amount of gain reduction varies over time. The dynamic range control can
assist in the mixing process by not only smoothing audio signal levels, but by
acting as a glue that helps add cohesion to various musical parts in a mix.
4.1 Signal detection in dynamic processors
Dynamic processors work with objective audio signal levels, usually measured
in decibels. The first reason to measure in decibels is that the decibel is a
logarithmic scale that is comparable to the way the human auditory system
interprets changes in volume. Therefore, the decibel as a measurement scale
seems to correlate with the perception of sound due to its logarithmic scale.
The second main reason for using decibels is to scale the range of audible
sound levels to a more manageable range. For example, human hearing varies
from the threshold of hearing, at about 0.00002 Pascals, to the threshold of
pain, about 20 Pascals, a range that represents a factor of 1 million. Pascals
are a unit of pressure that measures force per unit area and are abbreviated
as Pa. When this range is converted to decibels, it scales from 0 to 120 dB
sound pressure level (SPL), a much more meaningful and manageable range.
To control the level of a track, there needs to be some way to measure
and indicate the amplitude of an audio signal. It turns out that there are many
ways to measure a signal, but they are all generally based on two common
representations of the audio signal level: peak level and RMS level (which
means root mean square level). The maximum level simply indicates the
highest amplitude of a signal at a given time. A commonly found peak level
indicator is a meter on a digital recorder, which tells an engineer how close a
signal is to the digital cutoff point.
RMS is something like an average signal level, but it is not
mathematically equivalent to the average. With audio signals where there is a
voltage that varies between positive and negative values, an average
mathematical calculation is not going to give any useful information because
the average will always be around zero. The RMS, on the other hand, will give
a useful value and is basically calculated by squaring the signal, taking the
average of a predefined time window and then taking the square root of that.
For sine tones, the RMS value is easily calculated because it will always be 3dB
below the peak level or 70.7% of the peak level. For more complex audio
signals, such as music or speech, the RMS level should
be measured directly from a signal and cannot be calculated by
subtracting 3dB from the peak value. Although RMS and average are not
mathematically identical, RMS can be considered a type of signal averaging,
and we will use the terms RMS and average interchangeably. Figures 4.1, 4.2
and 4.3 illustrate the crest factor, RMS and peak levels for three different
signals.

Figure 4.1 The RMS value of a sine wave is always 70.7% of the peak value, which is equivalent to saying that the RMS
value is 3dB below the peak level. This is only true for a sine wave. Crest factor is the difference between peak and RMS
levels, usually measured in dB. A sine wave has a crest factor of 3dB.

Figure 4.2 A square wave has equal peak and RMS levels, so the crest factor is 0.
Figure 4.3 A pulse wave is similar to a square wave, except that we are shortening the time the signal is at its maximum
level. The length of the pulse determines the RMS level, where a shorter pulse will give a lower RMS level and therefore a
larger crest factor.

Dynamic range can have a significant effect on the loudness of recorded


music. The term loudness is used to describe the perceived level rather than
the measured physical sound pressure level. Several factors contribute to
perceived loudness, such as the power spectrum and crest factor (the ratio
between the peak level and the RMS level). Given two musical recordings with
the same peak level, the one with a smaller crest factor will generally sound
louder because its RMS level is higher. When judging the volume of sounds,
our ears respond more to medium levels than to maximum levels.
Dynamic range compression increases the average level through a two-
stage process that begins with a gain reduction of the highest or maximum
levels followed by a linear output gain, sometimes called compensation gain .
Compression and limiting essentially reduce only the peaks (the loudest parts)
of an audio signal and then apply a linear gain stage to recover the entire
audio signal so that the peaks are at the maximum level possible for our
medium. recording speed (for example, 0dB full scale [dBFS] for digital audio).
The linear gain stage after compression is sometimes called compensation
gain because it compensates for peak level reduction, and some compressors
and limiters apply automatic compensation gain at the output stage. The
process of compression and limiting reduces the crest factor of an audio
signal, and when compensation gain is applied to restore peaks to their
original level, the RMS level also increases, making the overall signal stronger.
Therefore, by reducing the crest factor through compression and limiting, it is
possible to make an audio signal sound louder even if its maximum level has
not changed.
It can be tempting for a novice engineer to normalize a recorded audio
signal in an attempt to make it sound louder. Normalization is a process by
which a digital audio editing program scans an audio signal, finds the highest
signal level for the entire clip, calculates the difference in dB between the
maximum recording level (0dBFS) and the maximum of an audio signal, and
then boosts the entire audio clip by this difference so that the maximum level
reaches 0dBFS. Because engineers generally want to record audio signals so
that peak levels are as close to 0dBFS as possible, they may only get a couple
of decibels of gain at best when normalizing an audio signal. This is one of the
reasons why the process of digitally normalizing a sound file will not
necessarily make a recording sound significantly louder. However, engineers
can make a signal appear louder by using compression and limiting, even if
the peaks are already reaching 0dBFS.
In addition to learning to identify the changes produced by dynamic
range compression, it is also important to learn to identify static changes in
gain. If you increase the overall level of a recording, it is important to be able
to recognize the amount of gain change applied in decibels.

4.2 Compressors and limiters


To reduce the dynamic range of a recording, dynamic processing is used in the
form of compressors and limiters. Typically, a compressor or limiter will
attenuate the level of a signal once it has reached or exceeded a threshold
level.
Compressors and expanders belong to a group of sound processing
effects that are adaptive, meaning that the amount or type of processing is
determined by some component of the signal itself (Verfaille et al., 2006). In
the case of compressors and expanders, the amount of gain reduction applied
to a signal depends on the level of the signal itself or a secondary signal
known as a side-chain or key input . With other types of processing, such as
equalization and reverb, the type, amount, or quality of processing remains
the same, regardless of the characteristics of the input signal.
Depending on the nature of the signal-dependent processing, it may
sometimes be more obvious and sometimes less obvious than non-signal-
dependent processing. Any changes in processing occur synchronously with
changes in the audio signal itself, and these changes may be masked by the
real signal or our auditory system will assume that they are part of the original
sound (as in the case of compression). Alternatively, with signal-dependent
quantization error at low bit rates, the distortion (error) will be modulated by
the signal amplitude and will therefore be more noticeable than constant
amplitude noise such as dither, such as we will see in Section 5.2.3.
To determine whether a signal level is above or below a specified
threshold, a dynamics processor must use some method to determine the
signal level, such as RMS or peak level detection.
Other forms of dynamic processing increase dynamic range by
attenuating lower amplitude sections of a recording. These types of
processors are often called expanders or gates. Unlike a compressor, an
expander attenuates the signal when it is below the threshold level. The use
of expanders is common when mixing drums for pop and rock music. Each
component of a drum kit is often closely mic'd, but there is still some
"leakage" of the sound from adjacent drums into each mic. To reduce this
effect, expanders or gates can be used to attenuate a microphone's signal
between hits on its respective drum.
There are many different types of compressors and limiters, and each
make and model has its own unique "sound." This sonic signature is based on
a number of factors, such as the signal detection circuit or the algorithm used
to determine the level of an input audio signal and therefore whether to apply
dynamic processing or not, and how much to apply based on the parameters
established by the engineer. In analog processors, the actual electrical
components in the audio signal chain and power supply also affect the audio
signal.
Typically, several parameters can be controlled in a compressor. These
include threshold, ratio, attack time, release time, and knee.
4.2.1 Threshold
An engineer can usually set a compressor's threshold level, although some
models have a fixed threshold level with variable input gain. A compressor
begins to reduce the gain of an input signal as soon as the amplitude of the
signal itself or a sidechain input signal exceeds the threshold. Compressors
with a side-chain or key input can accept an alternative signal input that is
analyzed in terms of its level and used to determine the gain function to be
applied to the main audio signal input. Compression of the input signal is
activated when the sidechain signal rises above the threshold, regardless of
the level of the input signal.
4.2.2 Attack Time
Although the compressor begins to reduce the gain of the audio signal as soon
as its amplitude rises above the threshold, it usually takes some time to
achieve maximum gain reduction. The actual amount of gain reduction
applied depends on the ratio and how far the signal is above the threshold. In
practice, attack time can help an engineer define or round out the attack of a
percussion sound or the beginning of a musical note. By properly adjusting
the attack time, an engineer can help a pop or rock recording sound more
"punchy."
4.2.3 Release Time
Release time is the time it takes for a compressor to stop applying gain
reduction after an audio signal has passed below the threshold. As soon as the
signal level drops below the threshold, the compressor begins to return it to
unity gain and reaches unity gain in the amount of time specified by the
release time.
4.2.4 knee
The knee describes the transition of the level control from below the
threshold (no gain reduction) to above the threshold (gain reduction). A
smooth transition from one to the other is called a soft knee , while an abrupt
change in threshold is known as a hard knee .
4.2.5 Ratio
The compression ratio determines the amount of gain reduction applied once
the signal rises above the threshold. It is the relationship between the input
level and the output level in dB above the threshold. For example, with a
compression ratio of 2:1 (input:output), the portion of the output signal that
is above the threshold will be half the level (in dB) of the input signal that is
above the threshold. threshold in dB. Compressors set to ratios of about 10:1
or more are generally considered limiters. Higher ratios will give more gain
reduction when a signal exceeds the threshold and therefore compression will
be more evident.
4.2.6 Level detection time
To apply a gain function to an input signal, dynamics processors must
determine the amplitude of an audio signal and compare it to a threshold set
by an engineer. As mentioned above, there are different ways to measure the
amplitude of a signal, and some compressors allow an engineer to switch
between two or three options.
Typically, options differ in how quickly level detection responds to the level of
a signal. For example, peak level detection is good at responding to sharp
transients and RMS level detection responds to less transient signals. Some
dynamic processors (such as the GML 8900 dynamic range controller) have
fast and slow RMS detection settings, where fast RMS averages over a shorter
period of time and is therefore more responsive to transients.
When a compressor is configured to sense levels using slow RMS, it is
impossible for the compressor to respond to very short transients. Because
RMS detection averages over time, a sharp transient will not have much
influence on the average signal level.

4.2.7 Visualization of the output of a compressor


To fully understand the effect of dynamic processing on an audio signal, we
must look beyond the input/output transfer function commonly seen in
explanations of dynamic processors. It can be useful to visualize how a
compressor's output changes over time given a specific type of signal and thus
take into account the ever-critical parameters known as attack and release .
Dynamics processors change the gain of an audio signal over time so they can
be classified as non-linear time-varying devices. They are considered
nonlinear because compressing the sum of two signals will generally result in
something different than compressing the two signals individually and then
adding them (Smith, accessed August 4, 2009).
To see the effect of a compressor on an audio signal, a step function is
required as the input signal. A step function is a type of signal that instantly
changes its amplitude and remains at the new amplitude for a period of time.
By using a step function, it is possible to illustrate how a compressor responds
to an immediate change in the amplitude of an input signal and eventually
stabilizes at its objective gain.
For the following visualizations, an amplitude-modulated sine wave acts
as a step function (see Figure 4.4a). The modulator is a square wave with a
period of 1 second. The maximum amplitude of the sine wave was chosen to
change between 1 and 0.25. An amplitude of 0.25 is 12 dB lower than an
amplitude of 1.

Figure 4.4 This figure shows the input signal to a compressor (a) which is an amplitude modulated sine wave and the
output of the compressor shows the step response for three different attack and release times: long (b), medium (c ) and
short. (d).

Figure 4.4 shows the general attack and release curves found on most
compressors. This type of visualization is not published with the specifications
of a compressor, but we can visualize it by recording the output when we
send an amplitude-modulated sine tone as the input signal. If this type of
measurement were performed on various types of analog and digital
compressors, they would be seen to have a shape similar to what we see in
Figure 4.4. Some compressor models have attack and release curves that look
a little different, as in Figure 4.5. In this compressor it seems to have an
overshoot in the amount of gain reduction in the attack before it settles to a
constant level. Figure 4.6 shows an example of an audio signal that has been
processed by a compressor and the resulting gain function that the
compressor derived, depending on the input signal level and the compressor
parameter settings. The gain function shows the amount of gain reduction
applied over time, which varies with the amplitude of the input audio signal.
The threshold was set at 6 dB, which corresponds to 0.5 in the amplitude of
the audio signal, so every time the signal exceeds 0.5 in level (-6 dB), the gain
function shows a reduction in the level.

Figure 4.5 The same 40 Hz modulated sine tone via a commercially available analog compressor with an attack time of
approximately 50 ms and a release time of 200 ms. Note the difference in the curve in Figure 4.4. There appears to be an
overshoot in the amount of gain reduction in the attack before it settles to a constant level. A visual representation of the
attack and release times of a compressor like this is not something that would be included in the specifications of a
device. The difference that is evident between Figures 4.4 and 4.5 is usually something that an engineer would hear but
could not visualize without performing the measurement.
Figure 4.6 From an audio signal ( above ) sent to the input of a compressor, a gain function ( mean ) is derived based on
the compressor parameters and the signal level. The resulting audio signal output ( below ) from the compressor is the
input signal with the gain function applied. The gain function shows the amount of gain reduction applied over time,
which varies with the amplitude of the input audio signal. For example, a gain of 1 ( unity gain ) produces no change in
level and a gain of 0.5 reduces the signal by 6 dB. The threshold was set to -6dB, which corresponds to 0.5 in the
amplitude of the audio signal, so every time the signal exceeds 0.5 in level (-6dB), the gain function shows a reduction in
level .

4.2.8 Automated level control via compression


Dynamic range compression can be one of the most difficult types of
processing for the beginning engineer to learn to hear and use. It's probably
hard to hear because often the goal of compression is to be transparent.
Engineers use a compressor to eliminate amplitude inconsistencies in an
instrument or voice. Depending on the nature of the signal being compressed
and the parameter settings chosen, compression can range from very
transparent to completely obvious.
Perhaps another reason novice engineers find it difficult to identify
compression is that almost all recorded sound heard by listeners has been
compressed to some degree. Compression has become such an integral part
of almost all music heard through speakers that listeners can expect it to be a
part of all musical sounds. Listening to acoustic music without sound
reinforcement can help in the ear training process to refresh your perspective
and remember what music sounds like without compression.
Because dynamic processing depends on amplitude variations of an
audio signal, the amount of gain reduction varies with changes in the signal.
With the amplitude modulation of an audio signal synchronized with the
amplitude envelope of the audio signal itself, the modulation can be difficult
to hear because it is not clear whether the modulation was part of the original
signal or not. Amplitude modulation becomes almost inaudible when it
reduces the amplitude of the signal at a rate equivalent to but opposite to the
amplitude variations in an audio signal. Compression or limiting can be heard
most easily by setting a device's parameters to their maximum or minimum
values: a high ratio, a short attack time, a long release, and a low threshold.
If amplitude modulation were applied that did not vary synchronously
with an audio signal, the modulation would probably be much more apparent
because the resulting amplitude envelope would not correlate with what is
happening in the signal and would be heard as a separate event. For example,
with a sine wave modulator, the amplitude modulation is periodic and not
synchronized with any type of musical signal from an acoustic instrument and
is therefore very audible. This is not to say that sinusoidal pitch amplitude
modulation should always be avoided. Amplitude modulation with a sine
wave can sometimes produce desirable effects in an audio signal, but with
that type of processing, the goal is usually to highlight the effect rather than
make it transparent.
Through the action of gain reduction, compressors can create audible
artifacts, that is, the timbre of a sound changes in an unwanted way, and in
other circumstances, these artifacts are completely intentional and contribute
significantly to the sound of a recording. In other situations, dynamic range
control is applied without creating artifacts and without changing the timbre
of the sounds. An engineer may want to turn down loud parts in a way that
still controls peaks but does not interrupt the audio signal. In any case, an
engineer needs to know what artifacts sound like to decide how much or little
dynamic range control to apply to a recording. In many dynamic range
controllers, user-adjustable parameters are interrelated to some extent and
affect how an engineer uses and hears them.
4.2.9 Manual dynamic range control
Because dynamic range controllers respond to an objective measure of signal
level, peak or RMS, rather than subjective signal levels such as volume, the
level reduction provided by a compressor may not suit a audio signal as well
as desired. Automated dynamic range control of a compressor may not be as
transparent as required for a given application. The amount a compressor is
acting on an audio signal is based on how much it determines that an audio
signal goes above a specified threshold and, as a result, applies gain reduction
based on objective measurements of the signal level. Objective signal levels
do not always correspond to subjective signal levels and, as a result, a
compressor may meter a signal to be louder than it is perceived and therefore
may apply greater attenuation than the signal. desired.
When mixing a multitrack recording, engineers worry about the levels,
dynamics, and balance of each track, and they want to be aware of any sound
sources that become masked at any point in a piece. On a more subtle level,
even if a sound source is not masked, engineers strive to find the best
possible musical balance, adjusting as necessary over time and within each
note and musical phrase. Intentional listening helps the engineer find the best
compromise in the overall levels of each sound source. It's often a
compromise because every note from every source is not likely to be heard
with perfect clarity, even with a wide dynamic range control. If each sound
source is turned on successively so that it can be heard above all the others, a
mix will end up with the same problems again, so it becomes a balancing act
where priorities must be established. For example, the vocals on a pop, rock,
country, or jazz recording are often the most important element. Generally,
an engineer wants to make sure that every word of a vocal recording is heard
clearly. Vocals are often particularly dynamic in amplitude, and adding a little
dynamic range compression can help bring each word and phrase of a
performance to a more consistent level.
With recorded sound, an engineer can influence the listener's
perspective and perception of a piece of music by using level control on
individual sound sources. A listener can be guided through a musical
performance as instruments and voices are dynamically placed in the
foreground and sent further back, as dictated by the artistic vision of a
performance. Automating the level of each sound source can create a
changing perspective. The listener may not be aware that the levels are being
manipulated, and in fact, engineers often try to make changing levels as
transparent and musical as possible. A listener should only be able to hear
that every moment of a music recording is clear and musically satisfying, not
that continuous level changes are being applied to a mix. Once again,
engineers strive to make the effect of technology transparent to an artistic
vision of the music we are recording.

4.3 Timbral effects of compression


In addition to being a utilitarian device for managing the dynamic range of
recording media, dynamic processing has become a tool for altering the color
and timbre of recorded sound. When applied to an entire mix, compression
and limiting can help the elements of a mix blend together. Compressed
musical parts will have what is known in auditory perception as a common
fate because their amplitude changes share a certain similarity. When two or
more elements (e.g., instruments) in a mix have synchronously changing
amplitudes, the auditory system will tend to merge these elements
perceptually. The result is that dynamic processing can help combine
elements of a mix.
In this section, we'll move beyond compression as a basic tool for
maintaining consistent signal levels, to compression as a tool for sculpting a
track's timbre.
4.3.1 Attack time effect
With a compressor set for a slow attack time, in the range of 100 milliseconds
or more, with a low threshold and high ratio, we can hear the level of the
sound when the input signal exceeds the threshold. The audible effect of the
sound reducing at this rate is what is known as pumping sound and may be
more audible in sounds with a strong pulse where the signal rises clearly
above the threshold and then falls below the threshold. him, such as those
produced by drums, other percussion instruments and, sometimes, a double
bass. If there is any lower level sound or background noise with the
compressed main sound, a modulated background sound will be heard.
Sounds that have a more constant level, such as a distorted electric guitar, will
not show as audible a pump effect.
4.3.2 Effect of release time
Another related effect is present if a compressor is configured to have a long
release time, in the range of 100 milliseconds or more. Listening again with a
low threshold and high ratio, watch for the sound to rise again after a strong
pulse. The audible effect of sound coming back up in level after a significant
gain reduction is called breathing because it can sound like someone is
breathing. As with the pump effect, you may notice the effect most
prominently in background sounds, hisses, or higher overtones that sound
after a strong pulse.
Although compression tends to be explained as a process that reduces
the dynamic range of an audio signal, there are ways to use a compressor that
can accentuate the difference between transient peak levels and any
sustained resonance that may follow. In essence, what can be achieved with
compression can be similar to dynamic range expansion because loud peaks
or pulses can be highlighted relative to the quieter sounds that immediately
follow them. It may seem completely counterintuitive to try to think of
compressors that perform dynamic range expansion, but in the next section
we'll look at what happens when you experiment with various attack times.
4.3.3 Compression and battery
A recording with a strong pulse, such as drums or percussion, with a regularly
repeating transient, will trigger gain reduction in a compressor and can serve
as a useful type of sound to highlight the effect of dynamic processing. When
processing a stereo mix of an entire drum kit through a compressor at a fairly
high 6:1 ratio, the attack and release times can be adjusted to hear their effect
on the drum sound. On a typical uncompressed snare drum recording, there is
a natural attack or start, perhaps some sustain, and then a decay. The
compressor can influence all of these properties depending on how the
parameters are set. The attack time has the greatest influence on the onset of
the drum sound, allowing an engineer to reshape this particular characteristic
of the sound. By increasing the attack time from a very short time to a much
longer time, the start of each drum hit is audibly affected. A very short attack
time can eliminate the sensation of a sudden onset. By increasing the attack
time, the startup sound begins to gain prominence and may actually be
accentuated slightly compared to the uncompressed version.
Let's explore the sonic effect on a drum kit when heard through a
compressor with a low threshold, high ratio, and very short attack time (e.g.
down to 0 milliseconds). With such a short attack time, transients drop in level
immediately, almost at the rate at which the input level increases for each
transient. When the rate of gain reduction nearly matches the rate at which a
transient signal increases in level, the transient nature of a signal is
significantly reduced. So, with very short attack times, transients are lost
because gain reduction causes the level of a signal to drop at almost the same
rate that the signal was originally rising during a transient. As a result, the
initial attack of a transient signal is reduced to the level of the sustained or
resonant part of the amplitude envelope. Very short attack times can be
useful in some cases, such as limiters used to prevent clipping. For shaping
drum and percussion sounds, short attack times are quite destructive and
tend to suck the life out of the original sounds.
By lengthening the attack time to a few milliseconds, a clicking sound
arises at the beginning of a transient. The click is produced by a few
milliseconds of the original audio passing by as the gain reduction occurs, and
the timbre of the click is directly dependent on the duration of the attack
time. The abrupt gain reduction modifies the attack of a drum hit.
By increasing the attack time further, the onset sound begins to gain
prominence relative to the sustain and decay portions of the sound, and may
be more accentuated than without processing. When compressing low
frequency drums like a kick drum, an increase in attack time will increase the
presence of low frequency harmonics. Because low frequencies have longer
periods, a longer attack time will allow more cycles of a low frequency sound
to occur before gain reduction and therefore the low frequency content will
be more audible on each rhythmic bass pulse.
The release time mainly affects the sound decay. The decay part of the
sound is the one that gets quieter after the loud start. If the release time is
long, the compressor gain does not quickly return to unity after the signal
level has fallen below the threshold (which occurs during decay). With a long
release time, the natural decay of the drum sound is significantly reduced.
When compressing a mix of a full drum kit, it becomes more apparent that
the attack time is affecting the spectral balance of the overall sound.
Increasing the attack time from a very short value to something longer
increases the low frequency energy coming from the kick drum. As the attack
time lengthens from near zero to several tens or hundreds of milliseconds, the
spectral effect is similar to adding a low-shelf filter to the mix and increasing
the low-frequency energy.
4.3.4 Compression and vocals
Because vocal performances tend to have a wide dynamic range, engineers
often find that some type of dynamic range control helps them achieve their
artistic goals for a given recording. Compression can be very useful for
reducing the dynamic range and de-essing treatment of a vocal track.
Unfortunately, compression does not always work as transparently as desired,
and artifacts from a compressor's automated gain control sometimes appear.
A couple of simple tips can help reduce dynamic range without adding
too many side effects that can detract from a performance:
• Use low ratios . The lower the ratio, the less gain reduction will be
applied. Ratios of 2:1 are a good starting point.
• Use more than one compressor in series . By chaining two or three
compressors in series on a voice, each set to a low ratio, each
compressor can provide some gain reduction and the effect is more
transparent than using a single compressor to do all the gain reduction.
To help identify when compression is being applied too aggressively,
listen for changes in timbre while watching the gain reduction meter on our
compressor. If there is any change in timbre that is synchronized with the gain
reduction, the solution may be to reduce the ratio or increase the threshold
or both. Sometimes a track can sound a little darker during extreme gain
reduction, and it can be easier to identify synchronous changes when looking
at a compressor's gain reduction meter.
A slight pop at the beginning of a word or phrase may indicate that the
attack time is too slow. Generally, a very long attack time is not effective on a
vocal, as it has the effect of accentuating the attack of a vocal and can be
distracting.
Compression of a voice typically highlights low-level details in a vocal
performance, such as breaths and "s" sounds. A de-esser, which can reduce
the “s” sound, is simply a compressor that has a high-pass filtered version of
the voice (around 5 kHz) as its sidechain or key input. De-essers tend to work
most effectively with very fast attack and release times.
4.4 Expanders and gates
4.4.1 Threshold
Expanders modify the dynamic range of an audio signal by attenuating it when
its level falls below a predefined threshold, unlike compressors, which act on
signal levels above a threshold. Gates are extreme versions of expanders and
generally silence a signal when it falls below a threshold. Figure 4.7 shows the
effect of an expander on an amplitude-modulated sine wave. Like
compressors, expanders typically have sidechain inputs that can be used to
drive an audio signal with a secondary signal. For example, engineers
sometimes control a low-frequency sine tone (around 40 or 50 Hz) and with a
kick drum signal sent to the sidechain input of the gate. This results in the sine
tone sounding only when the kick drum hits and the two can be mixed to
create a new timbre.
Figure 4.7 This figure shows the input signal to an expander (a) which is an amplitude modulated sine wave and the
output of the expander shows the step response for three different attack and release times: short (d), medium (c ) and
long (b).

Most of the controllable parameters in an expander are similar in


function to a compressor with a couple of exceptions: attack and release
times. These two parameters should be considered in relation to the level of
an audio signal, rather than in relation to gain reduction.
4.4.2 Attack Time
The attack time of an expander is the amount of time it takes for an audio
signal to return to its original level once it has passed the threshold. Like a
compressor, attack time is the amount of time it takes to make a gain change
after a signal exceeds the threshold. In the case of a compressor, a signal is
attenuated above the threshold; With an expander, a signal returns to unity
gain above threshold.
4.4.3 Release Time
The release time on an expander is the amount of time it takes for an audio
signal to completely attenuate once it has fallen below the threshold. In
general, for compressors and expanders, the release time does not define a
particular direction of level control (boost or cut), it is defined with respect to
a signal level relative to the threshold.
4.4.4 Viewing the output of an expander
Figure 4.7 shows the effect that an expander has on the amplitude of a step
function; in this case, it is an amplitude-modulated sinusoidal tone. Figure 4.8
shows a clip of a music recording with the gain function derived from the
audio signal and parameter settings and the resulting output audio signal. Low
level sections of an audio signal are further reduced in the expanded audio
signal.
Figure 4.8 From an audio signal ( top ) sent to the input of an expander, a gain function ( center ) is derived based on the
expander parameters and the signal level. The resulting audio signal output ( below ) from the expander is the input signal
with the gain function applied. The gain function shows the amount of gain reduction applied over time, which varies with
the amplitude of the input audio signal. For example, a gain of 1 (unity gain) produces no change in level and a gain of 0.5
reduces the signal by 6 dB. The threshold was set at 6dB, which corresponds to 0.5 in the amplitude of the audio signal, so
every time the signal drops below 0.5 in level (–6dB), the gain function shows a reduction in the level.

4.5 Introduction to practice


The recommendations on Introduction to Practice in Section 2.3 are
applicable to all software exercises described in the book, and the reader is
encouraged to review those recommendations regarding the frequency and
duration of practice.
The overall functionality of the dynamic processing focused software
modules, "TETpracticeDyn" and "TETpracticeExp", is very similar to that of the
equalization module. With the focus on dynamics, there are different
parameters and sound qualities to explore just like with equalization.
The dynamics modules allow you to practice with up to three test
parameters at a time: attack time, release time and ratio. The practice can
occur with each parameter alone or in combination with one or two of the
other parameters, depending on which “Parameter Combination” is chosen.
The threshold is completely variable for all exercises and controls the
threshold for both the computer generated "Question" and "Your Answer".
Because the signal level of a sound recording will determine how long a signal
spends above a threshold, and it is not known how the level of each recording
will relate to a given threshold, it is best to keep a threshold completely
variable.
In the compressor module, the threshold level should initially be set
quite low so that the effect of compression is more audible. A compensation
gain fader is included so that the subjective levels of the compressed and
nulled signals can be approximately equalized by ear if desired.
In the case of the expander module, a higher threshold will cause the
expander to produce steeper level changes. Additionally, the input level can
be reduced to further highlight dynamic level changes.
The Difficulty Level option controls the number of options available for
a given parameter. With higher levels of difficulty, a greater number of
parameter options are available within each value range.
The parameter combination determines which parameters will be
included in a given exercise. When working with a parameter combination
that tests only one or two parameters, the remaining user-controllable
parameters that are not being tested will control the processing of the
"Question" and "Your Answer" compressors.
The dynamic range control practice modules are the only ones in the
entire collection where the computer can choose “no compression” as a
possible question. Practically, this means that a 1:1 ratio can be chosen, but
only when the parameter combination includes "ratio" as one of the options.
When you encounter a question where no dynamic range control is heard,
indicate this by selecting a ratio of 1:1, which is equivalent to bypassing the
module. If a question has a 1:1 ratio, all other parameters will be ignored in
the calculation of the question and average scores.
Figure 4.9 shows a screenshot of the dynamic range compression
software practice module.

Figure 4.9 A screenshot of the software user interface for the Technical Ear Trainer practice module for dynamic range
compression.

4.5.1 Types of practice


There are two types of practice in the dynamics software practice module:
Matching, Matching Memory, and Absolute Identification:
• Matching . Working in Matching mode, the goal is to duplicate the
dynamic processing that has been applied by the software. In this
mode, the user is free to toggle between "Question" and "Your Answer"
to determine whether the chosen dynamic processing matches the
unknown processing applied by the computer.
• MatchingMemory . Similar to Matching, this mode allows you to freely
switch between "Question", "Your Answer" and "Bypass" until one of
the question parameters is changed. At that point, the “Question” can
no longer be selected and its sound should have been memorized well
enough to determine if the answer is correct.
• Absolute Identification . This practice mode is the most difficult and
requires identification of applied dynamics processing without having
the opportunity to hear what is chosen as the correct answer. You can
only hear "Bypass" (no dynamics processing) and "Question" (the
processing parameters chosen randomly by the computer); can't
audition "Your Response."
4.5.2 sound source
Any sound recording in AIFF or WAV format at a sampling rate of 44,100- or
48,000 Hz can be used for practice. There is also the option to listen to the
sound source in mono or stereo. If a loaded sound file contains only one audio
track (instead of two), the audio signal will be sent out of the left output only.
Pressing the mono button will send audio to the left and right output
channels.
4.5.3 Recommended recordings for practice
Some artists are making multitrack tracks available for purchase or free
download. Single drum beats are useful to start training, and then it makes
sense to advance to drum kits, as well as other lead instruments and vocals.
There are some websites with free sound samples and loops that you can use
to practice, such as www.freesound.org ,
www.realworldremixed.com/download.php and www. royerlabs.com, among
many others. There are also excerpts or loops of various solo instruments
included with GarageBand and Apple's Logic that can be used with the
software.
Summary
This chapter discusses the functionality of compressors and expanders and
their sonic effects on an audio signal. Dynamic range controllers can be used
to smooth out fluctuating levels in a track or to create interesting timbre
modifications not possible with other types of signal processing. Practice
modules of the compression and expansion software are described and can
be used by listeners to practice listening to the sonic effects of various
parameter settings.
Chapter 5
DISTORTION AND NOISE

In the recording process, engineers regularly encounter technical problems


that inadvertently introduce noise or degrade audio signals. For the attentive
listener, such events remove the illusion of transparent audio technology,
revealing a recorded musical performance and reminding them that they are
listening to a recording mediated by a once invisible but now clearly apparent
technology. It becomes more difficult for a listener to fully enjoy any artistic
statement when technological choices add unwanted sonic artifacts. When
recording technology contributes negatively to a recording, the listener's
attention focuses on the artifacts created by the technology and away from
the musical performance. There are many levels and types of sonic artifacts
that can detract from a sound recording, and gaining experience in critical
listening promotes greater sensitivity to various types of noise and distortion.
Distortion and noise are the two general categories of sonic artifacts
that engineers often try to avoid or use for creative effect. They can be
present in a variety of levels or intensities, so it is not always easy to detect
lower levels of distortion or unwanted noise. In this chapter we focus on
extraneous noises that sometimes find their way into a recording, as well as
some forms of distortion, both intentional and unintentional.

5.1 Noise
Although some composers and performers intentionally use noise for artistic
effect, we will discuss some of the types of noise that are unwanted and
therefore detract from the quality of a sound recording. Through inadequate
grounding and shielding, loud outside sounds, radio frequency interference,
and heating, ventilation, and air conditioning (HVAC) noise, there are many
sources and types of noise that engineers seek to avoid when recording in the
studio. Often the noise is at a low but still audible level and therefore will not
register significantly on a meter, especially in the presence of musical audio
signals.
Some of the various sources of noise include the following:
• Clicks . Transient sounds resulting from equipment malfunction or
digital synchronization errors
• Pops . Sounds resulting from hidden vocal sounds
• Humming and humming ground. Sounds originating from poorly
grounded systems
• Hiss, which is essentially low-level white noise . Sounds originating from
analog electronics, dithering, or analog tape
• Strange acoustic sounds . Sounds that are not intended to be recorded
but exist in a recording space, such as air handling systems or sound
sources outside of a recording room.

5.1.1 Clicks
Clicks are several types of short-duration transient sounds that contain
significant high-frequency energy. They can originate from malfunctioning
analog equipment, from the act of connecting or disconnecting analog signals
in a connection bay, or from synchronization errors in the interconnection of
digital equipment.
Clicks that result from malfunctioning analog equipment can often be
random and sporadic, making it difficult to identify their exact source. In this
case, meters can be useful to indicate which audio channel contains a click,
especially if the clicks occur in the absence of program material. A visual
indication of a meter with maximum retention can be invaluable in pursuing
problematic equipment.
With digital connections between devices, it is important to ensure that
sample rates are identical across all interconnected devices and that clock
sources are consistent. Without properly selected clock sources in digital
audio, clicks are almost inevitable and will likely occur at regular intervals,
usually spaced by several seconds. Clicks originating from inadequate clock
sources are usually quite subtle and require vigilance to audibly identify.
Depending on the digital interconnections in a studio, the clock source for
each device must be internal, digital input, or word clock.
5.1.2 Pops
Pops are low-frequency transient sounds that have a thump-like sound.
Typically, outbursts occur as a result of vocal stops occurring in front of a
microphone. Stops are consonant sounds, such as those resulting from
pronouncing the letters p , b and d , in which an explosion of air occurs in the
creation of the sounds. A burst of air resulting from the production of a
plosive reaching the microphone capsule produces a low-frequency sound
similar to a knock. Typically, engineers try to counteract popping during vocal
recording by placing a pop filter in front of a vocal microphone. Pop filters are
usually made of thin fabric stretched along a circular frame.
Pops are not something you hear from a singer when heard acoustically
in the same space as the singer. The pop artifact is simply the result of a
microphone near a vocalist's mouth, responding to a burst of air. Pops can
distract listeners from a vocal performance because they don't expect to hear
a low-frequency hit from a singer. Typically, engineers can filter a pop with a
high-pass filter inserted only for the brief moment while a pop plays.
5.1.3 Hum and Buzz
Poorly grounded analog circuits and signal chains can cause noise in the form
of hum to be introduced into analog audio signals. Both are related to the
frequency of alternating current (AC) electrical power sources, which in some
places is called grid frequency. The frequency of a power source will be 50 Hz
or 60 Hz depending on the geographic location and the power source being
used. Power distribution in North America is 60 Hz, in Europe it is 50 Hz, in
Japan it will be 50 or 60 Hz depending on the specific location within the
country, and in most other countries it is 50 Hz.
When there is a ground problem, a hum or buzz is generated with a
fundamental frequency equal to the alternating current frequency of the
power source, 50 or 60 Hz, with additional harmonics above the fundamental.
A hum is identified as a sound that contains primarily lower harmonics and a
buzz as containing more prominent upper harmonics.
Engineers want to make sure they identify any hum or buzz before
recording when the problem is easier to solve. It is possible to try to remove
these noises in post-production, but it will take longer. Because a hum or buzz
includes numerous 50 or 60 Hz harmonics, multiple node filters, each tuned
to one harmonic, are needed to effectively eliminate all of the offending
sound. Although we are not going to discuss the exact technical and wiring
problems that can cause hum and buzz and how these problems might be
resolved, there are many excellent references that cover the topic in great
detail, such as Giddings' book titled Audio System Design and Installation.
( 1990).
Raising monitor levels while musicians are not playing often exposes
any low-level floor hum that may be occurring. If dynamic range compression
is applied to an audio signal and the gain reduction is compensated with
compensation gain, low-level sounds, including background noise, will be
raised to a more noticeable level. If an engineer can detect any ground hum
before reaching that stage, the recording will be cleaner.
5.1.4 Strange acoustic sounds
Despite the hope that recording spaces will be perfectly quiet, there are often
numerous sources of noise both inside and outside a recording space that
must be addressed. Some of these are relatively constant, steady-state
sounds, such as air handling noise, while other sounds are unpredictable and
somewhat random, such as car horns, people talking, footsteps, or the noise
of thunderstorms. .
With most of the population concentrated in cities, sound insulation
can be particularly challenging as noise levels rise and our physical proximity
to others increases. In addition to airborne noise, there is also structure-
borne noise, where vibrations are transmitted through building structures
and end up producing sound in a recording space.

5.2 Distortion
Although engineers generally want to avoid or eliminate noises like those
listed above, distortion, on the other hand, can be used creatively as an
effect, or it can appear as an unwanted artifact of an audio signal. Sometimes
distortion is applied intentionally, such as to an electric guitar signal, to
enhance the timbre of a sound, adding to the palette of options available for
musical expression. At other times, an audio signal may be distorted due to
incorrect parameter settings, faulty equipment, or low-quality equipment.
Whether distortion is intentional or not, an engineer must be able to identify
when it is present and shape it for artistic effect or eliminate it, depending on
what is appropriate for a given recording.
Fortunately, engineers have a tool to help identify when a signal is cut
out in an objectionable manner. Digital meters, peak meters, clip lights, or
other signal strength indicators are present on most input stages of analog-
to-digital converters, microphone preamplifiers, and many other gain stages.
When a gain stage is overdriven or a signal is clipped, a bright red light
provides a visual indication as soon as a signal rises above a clip level and
remains on until the signal drops below the clip level. A visual indication in
the form of a light spike, which is synchronized with the onset and duration of
a distorted sound, reinforces an engineer's awareness of signal degradation
and helps identify if and when a signal has clipped. Unfortunately, when
working with a large number of microphone signals, it can be difficult to
capture every flash of a clip light, especially in the analog domain. Digital
meters, on the other hand, allow for peak hold so that if a clip indicator light
is not visible at the time of clipping, it will continue to indicate that a clip has
occurred until manually reset by an engineer. For momentary clip indicators,
it is much more important to rely on what you hear to identify overloaded
sounds because it can be easy to miss the flash of a red light.
In the process of recording any musical performance, engineers
configure microphone preamplifiers to deliver the highest possible recording
level, as close to the cutoff point as possible, but not overshooting. The goal is
to maximize the signal-to-noise or signal-to-quantization error by recording a
signal whose peaks reach the maximum recording level, which in digital audio
is 0 dB at full scale. The problem is that the exact peak level of a musical
performance is not known until after it has occurred. Engineers set the
preamp gain based on a representative sound test, giving them some
headroom in case the peaks are higher than expected. When actual musical
performance occurs after a sound check, often the peak level will be higher
than during the sound check because the musicians may be performing at a
more enthusiastic and higher dynamic level than during the sound check.
Although it is ideal to have a sound check, there are many cases where
engineers don't have the opportunity to do so and must jump straight into
recording, hoping their levels are set correctly. They have to be especially
concerned about monitoring signal levels and detecting any signal clipping in
these types of situations.
There is a range of sounds or sound qualities that we can describe as
distortion in a sound recording. Among these unwanted sounds are the broad
categories of distortion and noise. We can expand these categories and
describe several types of each:
• Strong clipping or overload . This sounds harsh and is the result of a
signal's peaks squaring off when the level exceeds the maximum input
or output level of a device.
• Soft clipping or overdrive . Sounding less harsh and often more
desirable for creative expression than hard clipping, it usually results
from the activation of a specific type of circuit designed to introduce
soft clipping, such as a guitar amplifier.
• Distortion due to quantization error . As a result of low bit quantization
in PCM digital audio (e.g. conversion from 16 bits per sample to 8 bits
per sample). Note that we are not talking about low bitrate perceptual
encoding, but simply reducing the number of bits per sample to
quantify the signal amplitude.
• Perceptual encoder distortion . There are many different artifacts, some
more audible than others, that can occur when encoding a PCM audio
signal into a data-reduced version (for example, MP3 or AAC). Lower bit
rates show more distortion.
There are many forms and levels of distortion that can be present in
the reproduced sound. All sound reproduced by speakers is distorted to some
extent, however insignificant. Equipment with exceptionally low distortion
can be particularly expensive to produce, and therefore most average
consumer audio systems feature slightly higher distortion levels than those
used by professional audio engineers. Audio engineers and audiophile
enthusiasts go to great lengths (and expense) to reduce the amount of
distortion in their signal chain and speakers.
Most other commonly available sound reproduction devices, such as
intercoms, telephones, and inexpensive headphones connected to digital
music players, have audible distortion. For most situations, such as voice
communication, as long as the distortion is low enough to maintain
intelligibility, distortion is not really a problem. For inexpensive audio
playback systems, the level of distortion is generally not detectable by the
untrained ear. This is part of the reason for the massive success of MP3 and
other perceptually encoded audio formats found in Internet audio; Most
casual listeners don't notice the distortion and quality loss, but file sizes are
much more manageable and audio files are much more easily transferable
over a computer network connection than their PCM equivalents.
Distortion is usually caused by amplification of an audio signal beyond
the maximum output level of an amplifier. Distortion can also occur by
increasing the level of a signal beyond the maximum input level of an analog-
to-digital converter (ADC). When an ADC attempts to represent a signal
whose level is above 0 dB full scale (dB FS), called over, the result is harsh-
sounding signal distortion.

5.2.1 Hard Clipping and Overload


Had clipping occurs when too much gain is applied to a signal and attempts to
go beyond the limits of a device's maximum input or output level. Peak levels
greater than a device's maximum allowable signal level are flattened, creating
new harmonics that were not present in the original waveform. For example,
if a sine wave is clipped as in Figure 5.1 , the result is a square wave as in
Figure 5.2 , whose time domain waveform now contains sharp edges and
whose frequency content contains additional harmonics. A square wave is a
specific type of waveform that is made up of odd harmonics (1st, 3rd, 5th,
7th, and so on). One result of distortion is an increase in the number and
levels of harmonics present in an audio signal. The technical specifications of
a device often indicate the total harmonic distortion for a given signal level,
expressed as a percentage of the overall signal level.
Because of the additional harmonics added to a signal when it is
distorted, the sound takes on greater brightness and harshness. Clipping a
signal flattens the peaks of a waveform, adding sharp corners to a clipped
peak. The new sharp corners in the time-domain waveform represent
increased high-frequency harmonic content in the signal, which would be
confirmed by frequency-domain analysis and plotting of the signal.

5.2.2 Soft Clipping


A milder form of distortion known as soft clipping or overdrive is often used to
achieve a creative effect on an audio signal. Its timbre is less harsh than
clipping, and as you can see in Figure 5.3 , the shape of a saturated sine wave
does not have the sharp corners that are present in a hard-clipped sine wave (
Figure 5.2 ). As known from frequency analysis, sharp corners and steep
vertical portions of a clipped sine waveform indicate the presence of high-
frequency harmonics.
Hard clipping distortion occurs when the amplitude of a signal is raised
above the maximum output level of an amplifier. With gain stages like solid-
state mic preamps, there is an abrupt change from linear gain before clipping
to non-linear distortion. Once a signal reaches the maximum level of a gain
stage, it cannot go any higher, regardless of the input level increase;
therefore, there are flattened peaks as in Figure 5.2 . It is the abruptness of
the change from clean amplification to hard clipping that introduces such
harsh distortion.
In the case of soft clipping, there is a gradual transition, rather than an
abrupt change, between the linear gain and the maximum output level. When
a signal level is high enough to reach the transition range, there is some
flattening of the signal peaks (as in Fig. 5.3 ) but the result is less severe than
with a strong cut.
Especially in pop and rock music recordings, there are examples of the
creative use of soft clipping and saturation that enhance sounds and create
new and interesting timbres.

time
Figure 5.3 A sine wave at 1 kHz that has been soft clipped or overdriven. Note how the shape of the
waveform is somewhere in between that of the original sine wave and a square wave.
5.2.3 Quantization error distortion
In the process of converting an analog signal into a digital PCM
representation, the analog amplitude levels for each sample are quantized
into a finite number of steps. The number of data bits stored per sample
determines the number of possible quantization steps available to represent
analog voltage levels. An analog-to-digital converter records and stores
sample values using binary digits or bits, and the more bits available, the
more quantization steps are possible.
The Red Book standard for CD-quality audio specifies 16 bits per
sample, which represents 2 16 or 65,536 possible steps from the highest
positive voltage level to the lowest negative value. Higher bit depths are
usually chosen for the initial stage of a recording. Given the option, most
recording engineers will record using at least 24 bits per sample, which
corresponds to 2 24 or 16,777,216 possible amplitude steps between the
highest and lowest analog voltages. Even if the final product is only 16-bit, it
is best to initially record at 24-bit because any gain changes or signal
processing applied will require requantization. The more quantization steps
that are available to begin with, the more accurate the representation of an
analog signal will be.
Each quantized step of linear PCM digital audio is an approximation of
the original analog signal. Because it is an approximation, there will be a
certain amount of error in any digital representation. Quantization error is
essentially the distortion of an audio signal. Engineers typically minimize
quantization error distortion by applying dither or noise modeling, which
randomizes the error. With the random error produced by dither, distortion is
replaced by constant noise which is generally considered preferable to
distortion.
What is interesting about the amplitude quantization process is that
the signal-to-error ratio drops as the signal level is reduced. In other words,
the error becomes more significant for lower level signals. For every 6 dB that
a signal is below the maximum digital audio recording level (0 dB FS), 1 bit of
binary representation is lost. For each bit lost, the number of quantization
steps is reduced by half. A signal recorded at 16 bits per sample at an
amplitude of 012 dB FS will only use 14 of the 16 available bits, representing a
total of 16,384 quantization steps.
Although the signal peaks of a recording may be close to the 0 dB FS
level, there are often other lower level sounds within a mix that may suffer
from more quantization errors. Many recordings that have a wide dynamic
range can include significant portions where the audio signals move at a level
well below 0 dB FS. An example of low-level sound within a recording is
reverb and the sense of space it creates. With excessive quantization error,
perhaps as a result of reducing the bit depth, some of the sense of depth and
width conveyed by the reverb is lost. By randomizing the quantization error
with the use of dither during bit depth reduction, some of the lost sense of
space and reverb can be regained, but at the cost of additional noise.

5.2.4 Software module exercises


The included "TETpracticeDist" software module, which focuses on distortion,
allows the listener to practice listening to three different types of distortion:
soft clipping, hard clipping, and bit depth reduction distortion.
There are two main types of practice with this software module: Pairing
and Absolute Identification. The overall functioning of the software is similar
to other modules discussed above.
5.2.5 Perceptual encoder distortion
Perceptual audio coding significantly reduces the amount of data required to
represent an audio signal with minimal degradation of audio quality. In this
section we discuss lossy audio data compression, which removes audio during
the encoding process. There are also lossless encoding formats that reduce
the size of an audio file without removing any audio. Lossless encoding is
comparable to the ZIP computer file format, where the file size is reduced but
the actual data is not removed.
When converting a linear PCM digital audio file to a compressed lossy
format such as MP3, 90% of the data used to represent a digital audio signal
is removed, yet the encoded version still sounds similar to the PCM file.
original uncompressed audio. Differences in sound quality between an
encoded version of a recording and the original PCM version are mostly
imperceptible to the average listener, however, these same differences in
sound quality can be a great source of frustration for an engineer. of
experienced sound. Due to signal degradation during the encoding process,
perceptual encoding is considered a type of distortion, but it is a type of
distortion that cannot be easily measured, at least objectively. Due to the
difficulty of obtaining meaningful objective measures of distortion and sound
quality with perceptual encoders, their development has involved expert
listeners who are adept at identifying audible artifacts resulting from the
encoding process. Expert listeners listen to music recordings encoded at
various bit rates and quality levels and then rate the audio quality on a
subjective scale. Trained expert listeners become experts at quickly
identifying distortion and artifacts produced by perceptual encoders because
they know where to focus their auditory attention and what to listen for.
With the proliferation of downloadable music from the Internet,
perceptually encoded music has become ubiquitous, the best-known version
being MP3, more technically known as MPEG-1 Audio Layer-3. There are
many other encoding-decoding (codec) schemes that go by names such as
AAC (Advanced Audio Coding), WMA (Windows Media Audio), AC-3 (also
known as Dolby Digital), and DTS (Digital Theater Systems). Codecs reduce the
amount of data needed to represent a digital audio signal by removing
components of a signal that are considered inaudible according to
psychoacoustic models. The main improvement to codecs over years of
development and progression has been that they are smarter in the way they
remove audio data and are increasingly transparent at lower bit rates. That is,
they produce fewer audible artifacts for a given bit rate than the previous
generation of codecs. The psychoacoustic models used in codecs have
become more complex and the algorithms used in signal detection and data
reduction based on these models have become more precise. Still, when
compared side by side with an original, unaltered signal, it is possible to hear
the difference between the two.
The process of converting linear PCM digital audio (such as AIFF, WAV,
or BWF) to MP3, AAC, WMA, RealAudio, or another lossy encoded format
removes components of an audio signal that an encoder thinks we cannot
hear. Encoders perform various types of analyzes to determine the frequency
content and dynamic amplitude envelope of an audio signal, and based on
psychoacoustic models of human hearing, encoders remove components of
an audio signal that are likely inaudible. Some of these components are
quieter sounds that are partially masked by louder sounds in a recording.
Sounds determined to be masked or inaudible are removed and the resulting
encoded audio signal can be represented with less data than was used to
represent the original signal. Unfortunately, the encoding process also
removes the audible components of an audio signal and therefore the
encoded audio sounds are degraded from an original unencoded signal.
As we explore audible artifacts and signal distortion of encoded audio,
here are some elements to focus on as we practice critical listening:
• Clarity and sharpness . Listen for loss of clarity and sharpness in
percussive and transient signals. The loss of clarity can translate into
the feeling that there is a thin veil covering the music. Compared to
linear PCM, unencoded audio should sound more direct.
• Reverb . Listen for some reverb loss and other low amplitude
components. The effect of lost reverb generally results in less depth
and width in a recording and the perceived space around the music
(acoustic or artificial) being less evident.
• Encoded audio . A little warble or swooshy. Sustained musical notes,
especially with solo instruments or prominent voices, do not sound as
smooth as they should, and the overall sound can take on a tinny
quality.
• Lack of high frequency harmonics . These sounds, such as cymbals, and
loud sounds, such as audience applause, can take on a swooshy quality.

5.2.6 Exercise: Comparing Linear PCM with Encoded Audio


It is important to investigate how various perceptual encoders affect sound
quality. One of the ways to explore sound quality degradation is to encode
linear PCM sound files and compare the original with the encoded version to
identify any audible differences. There are many free programs that encode
audio signals, such as Apple's iTunes Player and Microsoft's Windows Media
Player. Deficiencies in sound quality in encoded audio may not be
immediately obvious unless we are attuned to the types of artifacts that
occur when audio is encoded. By switching between a linear PCM audio file
and an encoded version of the same audio, it is easier to hear any differences
that may be present. Once we start learning to listen for the types of artifacts
an encoder produces, they become easier to listen to without doing a side-by-
side comparison of encoded PCM with linear.
Start by encoding a linear PCM audio file at various bit rates into MP3,
AAC or WMA and try to identify how an audio signal is degraded. Lower bit
rates result in a smaller file size, but also reduce audio quality. Different
codecs (MP3, AAC and WMA) provide slightly different results for a given bit
rate because the encoding method varies from codec to codec. Switch
between the original linear PCM audio and the encoded version. Try encoding
recordings of different genres of music. Note the sonic artifacts that occur for
each bitrate and encoder.
Another option is to compare streaming audio from online sources to
linear PCM versions you may have. Most radio stations and online music
players use lower bitrate audio that contains more clearly audible encoding
artifacts than those found with audio from other sources, such as through the
iTunes Store.

5.2.7 Exercise: Subtraction


Another interesting exercise to perform is to subtract an encoded audio file
from an original linear PCM version of the same audio file. To complete this
exercise, convert a linear PCM file to some encoded format and then convert
it back to linear PCM at the same sample rate. Import the original sound file
and the encoded/decoded (now linear PCM) file into a digital audio
workstation (DAW), on two different stereo tracks, taking care to align them
in time as precisely as possible. When playing synchronized stereo tracks
together, reverse the polarity of the encoded/decoded file so that it is
subtracted from the original. As long as the two stereo tracks are precisely
aligned in time, anything common to both tracks will be cancelled, and the
remaining audio heard is that which was removed by the codec. By doing this
exercise, it helps highlight the types of artifacts that are present in the
encoded audio.

5.2.8 Exercise: Listen to audio encoded through mid processing side


By splitting an encoded file into its core and side (MS) components, some of
the artifacts created by the encoding process can be discovered. The
perceptual encoding process relies on masking to hide the artifacts that are
created in the process. When a stereo recording is converted to M and S
components and the M component is removed, the artifacts are often much
more audible. In many recordings, especially in the pop/rock genre, the M
component forms the majority of the audio signal and can mask a large
number of encoding artifacts. By reducing the M component, the S
component becomes more audible along with encoder artifacts.
Try encoding an audio file with a perceptual encoder at a common bit
rate, such as 128 kbps, and decoding it back to linear PCM (WAV or AIFF). It is
possible to use the M matrix software module S included with this book to
hear the effect that MS decoding can have in highlighting the effects of a
codec.

Summary
In this chapter we explore some of the undesirable sounds that can appear in
a recording. By practicing with the included distortion software ear training
module and completing the exercises, we can become more aware of some
common forms of distortion.
AUDIO CLIP EDIT POINTS

In Chapter 4 we discussed modifying the amplitude envelope of an audio


signal through dynamic processing. In this chapter we will explore the
amplitude envelope and technical ear training from a slightly different
perspective: that of an audio editor.
The process of editing digital audio, especially with classical or acoustic
music using a source-destination method, offers an excellent opportunity to
train your ear. Likewise, the music editing process requires an engineer to
have a good ear for seamless audio splicing. Music editing involves making
seamless connections or splices between takes of a piece of music and often
requires specifying precise editing locations by ear. In this chapter we will
explore how aspects of digital editing can be used systematically as an ear
training method, even outside the context of an editing session. The chapter
describes a software tool based on audio editing techniques that is an
effective ear trainer that offers benefits that transfer beyond audio editing.

6.1 Digital Audio Editing: The Origin-Destination Technique


Before describing the software and method for ear training, it is important to
understand some digital audio editing techniques used with classical music.
Classical music requires a high level of precision, perhaps more than other
types of music, to achieve the required level of transparency.
Empirically, through hundreds of hours of classical music editing, I have
found that the process of repeatedly adjusting the placement of edit points
and creating smooth crossfades by ear not only results in a clean recording,
but can also improve listening skills that translate to other areas of critical
listening. Through the highly focused listening required for audio editing, with
the goal of matching edit points from different takes, the editing engineer is
engaging in an effective form of ear training.
Digital audio editing systems allow an editing engineer to view a visual
representation of a waveform and move, insert, copy or paste audio files to
any location along a visual timeline. For important parts of editing music
recordings, a rough estimate of an edit location is first found, followed by the
precise location of an edit point location through listening. Despite having a
visual representation of a waveform, it is often more efficient and more
accurate to find the precise location of an edit by ear.
During the editing process, an engineer receives a take list from a
recording session and assembles a complete piece of music using the best
takes from each section of a score. A common method for editing classical or
acoustic music is known as origin-destination. Basically, the engineer
constructs a complete musical performance (the destination ) by taking the
best excerpts from a list of recording session takes (the source ) and stitching
them together.
In source-destination editing, the location of an edit is found by
following a musical score and placing a marker at a chosen edit point along
the timeline of the visual waveform that represents the recorded music. The
editing engineer typically auditions a short snippet (typically 0.5 to 5 seconds
long) of a recorded take, down to a specific musical note on which an edit is
to be made. The same musical excerpt is then auditioned from a different
take and compared to the first. Typically the end point of such an excerpt will
be chosen to occur precisely at the beginning of a musical note and the
connecting point will therefore be inaudible. The goal of an editing engineer is
to focus on the sonic characteristics of the note onset that occurs during the
last milliseconds of an excerpt and match the sound quality between takes by
adjusting the location of the edit point (i.e., the end point of the extract). The
edit point marker may appear as a moving bracket on the audio signal
waveform, as in Figure 6.1 .
Figure 6.1 A typical view of a waveform in a digital editor with the edit point marker indicating where the edit point will
occur and the audio will fade into a new take. The location of the marker, indicated by a large square bracket, is
adjustable in time (left/before or right/after). The arrow simply indicates that the stand can be slid left or right. The
editing engineer will listen to the audio up to this large bracket with a default pre-roll time typically between 0.5 and 5
seconds.

Figure 6.2 The editing engineer listens to the source and destination audio files, up to a chosen edit point, usually at the
beginning of a note or beat. In an editing session, the two audio clips (source and destination) would be identical musical
material, but from different takes. The engineer auditions the audio excerpts up to a chosen edit point, usually located in
the middle of the attack of a strong note or rhythm. One of the goals of the engineer is to answer the question, does the
end point at the source match that at the destination? The greater the similarity between the two cutting timbres, the
more successful the edit will be. The software module presented here recreates the process of listening to a sound clip up
to a predefined point and matching that end point in a second sound clip.
Destination Take 5 Take 1 Take 2

time --------
--------------- —2
-----------------------------------------------------------------------------------------------------

Take 5/
1 Take 1
1
Source

1 Take 2

Figure 6.3 The source and target waveform timelines are shown here in block form along with an example of how a set of
takes (source) could fit together to form a complete performance (target). This example assumes that takes 1, 2 and 5
would be of the same music program material and therefore a composite version of the best sections of each take could
be produced to form what is labeled as the destination in this figure.

It is the editing engineer's focus on the final milliseconds of an audio excerpt


that is critical to finding an appropriate edit point. When choosing an edit
point to be at the beginning of a musical note, it is important to set the edit
point so that it actually occurs sometime during the beginning of a note
attack. Figure 6.1 shows a gate (bracket indicating the edit point) aligned with
the attack of a note.
When an engineer listens to an audio clip up to a chosen edit point, the
new note that starts playing, but stops immediately, can form a transient
percussive sound. The specific characteristics of the actual sound of the
clipped note will vary directly with the amount of incoming note that is
sounded before being clipped. Figure 6.2 illustrates in block form the process
of listening to source and target program material.
Once the characteristics of the last few milliseconds of audio match as
closely as possible between takes, an edit is performed with a crossfade from
one take to the next and auditioned to check for sonic anomalies. Figure 6.3
illustrates a composite version as the destination that has been extracted
from three different source shots.
During the process of auditioning a crossfade, an editing engineer also
pays close attention to the sound quality of the crossfade, which can typically
range from a few to several hundred milliseconds depending on the context
(e.g. E.g., sustained vs. transient notes). The process of relistening to a
crossfade and adjusting crossfade parameters such as length, position, and
shape also offers an opportunity to improve critical listening skills.
6.2 Software Exercise Module
Based on source-destination editing, the included ear training software
module was designed to mimic the process of comparing the last few
milliseconds of two short clips of identical music from different takes. The
advantage of the software practice module is that it promotes critical
listening skills without requiring an actual editing project. The main difference
when working with the practice module is that the software will work with a
single "take", which is any linear PCM sound file loaded. Because of this
difference, the two audio clips will be identical signals and therefore it is
possible to find identical sound endpoints. The benefit of working this way is
that the software has the ability to judge whether the sound clips end at
precisely the same point.
To start, the software randomly chooses a fragment or short clip (called
clip 1 or the reference) from any stereo music recording loaded into the
software. The exact length of clip 1 is not revealed, but can be auditioned.
The lengths of the excerpts, ranging from 500 milliseconds to 2 seconds, are
also chosen randomly to ensure that you are not simply training to identify
the length of audio clips. A second clip (clip 2 or your response) of known
duration, and with an identical starting point to clip 1, can also be listened to
and compared to clip 1. Clips can be listened to as many times as necessary
by pressing the appropriate button or keyboard shortcut.
The goal of the exercise is to adjust the duration of clip 2 until it ends at
exactly the same point in time as clip 1. By listening to the amplitude
envelope, timbre, and musical content of the last milliseconds of each clip, it
is possible to compare the two clips and adjust the duration of clip 2 so that
the sound at its end point matches that of clip 1.
By following a cycle of listening, matching, and adjusting the length of clip 2,
the goal is to identify the endpoint features of clip 1 and match those
features to clip 2.
The duration of clip 2 is adjusted by "pushing" the end point earlier or
later in time. There are different sizes of nudge time steps to choose from, so
the clip duration can be adjusted in increments of 5, 10, 15, 25, 50, or 100
milliseconds. The smaller the thrust step size, the harder it is to hear the
difference from one step to the next.
Figure 6.4 shows the waveforms of four sound clips of increasing length
from 825 ms to 900 ms in steps of 25 ms. This particular example shows how
the end of the clip can vary significantly depending on the length chosen.
Although the second (850 ms) and third (875 ms) waveforms in Figure 6.4
appear very similar, there is a noticeable difference in the percussive or
transient sound perceived at the end. With smaller step or push sizes, the
difference between steps would be less obvious and would require more
training for correct identification.

Figure 6.4 Clips of a music recording of four different lengths: 825 ms, 850 ms, 875 ms and 900 ms. This particular
example shows how the end of the clip can vary significantly depending on the length chosen. The listener can focus on
the quality of the percussive sound at the end of the clip to determine which sounds most like the reference. The 825ms
long clip contains a faint percussive sound at the end of the clip, but because the note that starts to play (a drum beat in
this case) is almost completely cut off, it comes out as a short click. In this specific example, the listener can focus on the
percussive quality, timbre, and envelope of the incoming drum hit at the end of the clip to determine the correct length
of the sound clip.

After deciding the length of a clip, you can press the "Check Answer"
button to find the correct answer and continue listening to the two clips for
that question. The software indicates whether the answer to the previous
question was correct or not and if incorrect, it indicates whether clip 2 was
too short or too long and the magnitude of the error. Figure 6.5 shows a
screenshot of the software module.
There is no view of the waveform as you would normally see in a digital
editor because the goal is to create an environment where we must rely
solely on what is heard with minimal visual information about the audio
signal. However, there is a black bar that increases in length on a timeline,
following the playback of clip 2 in real time, as a visual indication that clip 2 is
playing. Additionally, the play buttons for the respective clips briefly turn
green while the audio is playing and then return to gray when the audio
stops.
With this ear training method, the goal is to compare one sound with
another and try to match them. There is no need to translate the sound
characteristic into a verbal description, but rather the focus is solely on the
characteristics of the audio signal. Although there is a numerical display
indicating the duration of the sound clip, this number serves only as a
reference to keep track of where the end point is set. The number has no
relation to the sound characteristics heard, except in a specific excerpt. For
example, a randomly chosen 600 ms clip will have different endpoint
characteristics than most other randomly chosen 600 ms clips.
Technical Ear Trainer - Audio Clip Edit Points
Practice Module

Nudge sue (ms}

|1OO|| 50 || 25 I 10 || 5 I
Low Lewi at oenouty Hgh

Audition
I refer Your Answer

0 500 1000 1500 2000


milkseconds

load sound fie

Figure 6.5 A screenshot of the training software. The large squares with "1" and "2" are play buttons for clips 1 and 2,
respectively. Clip 1 (the reference) has an unknown length and the length of clip 2 needs to be adjusted to match clip 1.
Below the play button for clip 2 are two horizontal bars. The top one indicates, with a vertical bar, the duration of clip 2,
on the timeline from 0 to 2000 milliseconds. The bottom bar increases in length (from left to right) to the vertical line in
the top bar, following the playback of clip 2, to serve as a visual indication that clip 2 is playing.

Practice exercises should progress from the least challenging exercises


with large steps of 100 ms to the most challenging exercises where the
smallest step size is 5 ms.
Almost any stereo recording in the AIFF or WAV linear PCM format can
be used with the training software, as long as it is at least 30 seconds long.

6.3 Exercise Focus


With the type of training program described in this chapter, the primary goal
is to focus on the amplitude envelope of a signal at a specific point in time,
which is the end of a short audio excerpt. Although the audio is not processed
in any way, the location of the end point determines how and at what point a
musical note can be cut. In this exercise, focus on the last few milliseconds of
the first clip, keep the final sound in memory, and compare it to the second
clip.
Because the software randomly chooses the location of an excerpt, an
end point can occur almost anywhere in an audio signal. However, there are
two specific cases in which it is important to describe the location of a cut:
those that occur at the entrance of a note or downbeat and those that occur
during a sustained note, between strong hits.
First, you can explore the result of a cut that falls at the beginning of a
downbeat or note. If the cutoff occurs during the attack part of a musical
note, a transient signal can be produced whose characteristics vary where the
amplitude envelope of a note is cut, allowing the matching of a transient
sound by adjusting the cutoff point. Depending on how much of a note or
percussion sound is cut, the spectral content of that particular sound will vary
with the modified duration of the note. With respect to a clipped note at the
end, generally a shorter note segment will have a higher spectral centroid
than a longer segment and will have a brighter sound quality. The spectral
centroid of an audio signal is the average frequency of a spectrum and
describes where the center of mass of a spectrum is located. If there is a click
at the end of an excerpt, produced as a result of the location of the endpoint
relative to the waveform, it can serve as a cue for the location of the
endpoint. The spectral quality of the click can be evaluated and compared
based on its duration.
The case of a more sustained or decaying audio signal being cut off is
examined below. For this type of cut, attention should be focused on the
duration of the sustained signal and match its length. This could be analogous
to adjusting the hold time of a gate (dynamic processor) with a very short
release time. With this type of matching, the focus can shift more to musical
qualities, such as tempo, to determine how long the final note is held before
being muted.
With any endpoint placement, the requirement is to track the
amplitude envelope and spectral content of the end of the clip. One of the
goals of this exercise is to increase auditory acuity, which facilitates the ability
to hear subtle details in a sound recording that were not evident before
spending a lot of time on digital editing. Practicing with this exercise can
begin to highlight details in a recording that may not have been as evident
when the full piece of music was auditioned. By listening to short excerpts
out of context of the piece of music, sounds within a recording can be heard
in new ways and some sounds can be unmasked and therefore more audible.
It allows you to focus on features that may be partially or completely masked
when heard in context (i.e. much longer excerpts) or features that are simply
less obvious in a broader context. Repeating clips out of context of the entire
recording can also contribute to a change in the perception of an audio signal.
It is common for music composers to take excerpts from musical recordings
and repeat them to create a new type of sound and effect, allowing listeners
to hear new details in the sound that may not have been evident before.
The auditory training method can help us focus on quieter or lower-
level features (in the middle of louder features) of a given program material.
The quietest features of a program are those features that may be partially or
mostly masked, perceptually less prominent, or considered in the background
of a perceived sound scene or soundstage. Examples may include the
following (those listed above are included here again):
• Reverb and delay effects for specific instruments
• Dynamic range compression artifacts for specific instruments
• Sound quality of a specific musical instrument: sounds of drum brushes
or the articulation of an acoustic double bass in a jazz piece
• Specific characteristics of each voice/musical instrument, such as the
temporal nature or spatial location of the amplitude envelope
components (attack, decay, sustain, and release)
• Definition and clarity of elements within the sound image, width of
individual elements
Sounds taken out of context begin to give a new impression of the
sound quality and also the musical feel of a recording. Additional details of an
excerpt are often heard when a short piece of music is played repeatedly,
details that would not necessarily be heard in context.
Working with this practice module and a musical example that features
prominent vocals, acoustic bass, acoustic guitar, piano, and lightly played
drums (such as “Desafinado” by Stan Getz and João Gilberto [1963]), brings
new impressions of the timbres . and sound qualities found in the recording
that were not previously evident.
In this recording, the percussion part is fairly quiet and more in the
background, but if an excerpt falls between vocal phrases or guitar chords,
the percussion part can perceptually move to the foreground as the matching
exercise shifts our focus. . It may also be easier to focus on percussion
characteristics, such as its reverb or echo, if that particular musical part can
be heard more clearly. Once details within a small excerpt are identified, it
can make it easier to hear these characteristics within the context of the
entire recording and also transfer knowledge of these sound characteristics to
other recordings.

Summary
This chapter describes an ear training method based on the source-target
audio editing technique. Because of the critical listening required to perform
accurate audio editing, the process of finding and matching edit points can
serve as an effective form of ear training. With the interactive software
exercise module, the goal is to practice matching the length of a sound clip to
a reference clip. By focusing on the timbre and amplitude envelope of the
final milliseconds of the clip, the end point can be determined based on the
nature of the transients or the length of the sustained signals. By not
including verbal or meaningful numerical descriptors, the exercise focuses
solely on the perceived audio signal and matching the endpoint of the audio
signals.
SOUND ANALYSIS

After focusing on the specific attributes of recorded sound, we are now ready
to explore a broader perspective of sound quality and music production. The
experience of practicing with each of the software modules and specific types
of processing described in the previous chapters prepares us to focus on
these sonic characteristics within the broader context of recorded and
acoustic sound.
A sound recording is a specific interpretation and representation of a
musical performance. Listening to a recording is different from attending a
live performance, even for recordings with little signal processing. A sound
recording can offer a more focused and clearer experience than a live
performance, while also creating a sense of space. It is a paradoxical
perspective to hear musicians with a high degree of clarity and at the same
time have the experience of listening from a more distant place due to the
level of reverberant energy. Additionally, a recording engineer and producer
often make level and processing adjustments during the course of a piece of
music that highlight the most important aspects of a piece and guide the
listener to a specific musical experience.
Each recording has something unique to tell in terms of its timbral,
spatial and dynamic qualities. It is important to listen to a wide variety of
recordings from many different musical genres and examine the production
choices that were made for each recording. An engineer can become familiar
with recording and mixing aesthetics for different genres of music that can
inform their own work. When it comes time to make a recording, an engineer
can rely on internal references for sound quality and mix balance to help
guide a project. For each recording that seems interesting from a production
and sound quality standpoint, take note of the credits of the production staff,
including the producer, recording engineer, mixing engineer, and mastering
engineer. With recordings distributed digitally, production credits are not
always listed with the audio, but can be found through various websites such
as www.allmusic.com . Finding additional recordings from engineers and
producers referenced above can assist in the process of characterizing various
production styles and techniques.

7.1 Sound analysis of electroacoustic sources


In developing critical listening skills, it is necessary to examine, explore and
analyze sound recordings to help understand the sonic signatures of a
particular artist, producer or engineer. Through the analysis process it is
possible to learn to identify which aspects of your recordings make them
particularly successful from a timbral, spatial and dynamic point of view.
The sound quality, technical fidelity, and sonic characteristics of a
recording have a significant impact on the clarity with which the musical
meaning and intentions of a recording are communicated to listeners. The
components of a stereo image can be deconstructed to learn more about the
use of reverb and delays, panning, layering and balancing, dynamics
processing, and equalization.
At its most basic level, the sound mixing process essentially involves
gain control and level changes over time. Whether those changes are full-
band or frequency-selective, static or time-varying, manual or via a
compressor, the basic component of sound mixing is control of the level or
amplitude of the sound. Individual instruments or even individual notes can
be raised or lowered in level to emphasize musical meaning.
In the critical process of listening and analysis, there are numerous
layers of deconstruction, from the general and total characteristics of an
entire mix to the specific details of each sound source. At a much deeper level
in analyzing a recording, an engineer who is more advanced in critical
listening skills can begin to make educated guesses about specific models of
equipment used during recording and mixing, based on the timbres and
amplitude envelopes of the recordings. the components of a sound image.
A stereo image produced by a pair of speakers can be analyzed in terms
of characteristics ranging from completely obvious to almost imperceptible.
One goal of auditory training, as a type of perceptual learning, is to develop
the ability to identify and differentiate features of a reproduced sound image,
especially those that may not have been evident before performing training
exercises. We will now consider some of the specific characteristics of a
stereo or surround image that are important to analyze. The list includes the
parameters described in the European Broadcasting Union technical
document 3286 entitled “Evaluation methods for the subjective evaluation of
the quality of sound program material: music” (European Broadcasting Union
[EBU], 1997):
• Overall bandwidth
• Spectral balance
• Auditory image
• Spatial impression, reverb and time-based effects
• Dynamic range, level or gain changes, dynamic processing artifacts
(compressors/expanders)
• Noise and distortion
• Balance of elements within a mixture

7.1.1 Overall bandwidth


Overall bandwidth refers to the frequency content and how far it extends into
the lower and higher frequencies of the audio spectrum. In this part of the
analysis, the goal is to determine if a recording spans from 20 Hz to 20 kHz, or
if it is band-limited in some way. FM radio extends only up to about 15 kHz
and the bandwidth of standard telephone communication ranges from about
300 to 3000 Hz. A recording can be limited by its recording medium, a sound
system can be limited by its electronic components, and a digital signal can be
reduced to a narrower bandwidth to save data transmission. The effect of
reducing bandwidth can be heard by using high-pass and low-pass filters.
When making a judgment about high frequency extension, one must
consider the higher overtones present in the recording. The highest
fundamental pitches of music do not exceed 4,000 Hz, but the harmonics of
cymbals and brass instruments easily reach 20,000 Hz. An engineer's choice of
recording equipment or filters can intentionally reduce the bandwidth of a
sound, which differentiates the bandwidth of the acoustic and recorded
sound of an instrument.

7.1.2 Spectral balance


As we saw in Chapter 2, spectral balance refers to the relative level of
frequency bands across the audio spectrum. In its simplest analysis, it can
describe the balance of high frequencies to low frequencies, but it is possible
to be more precise and identify resonances and antiresonances of specific
frequencies. The power spectrum of an audio signal, which can help visualize
the spectral balance of a signal, can be measured in several ways. The most
common calculation of the power spectrum is probably using the fast Fourier
transform (FFT), which specifies the frequency content of a signal and the
relative amplitudes of the frequency bands. The spectral balance of pink noise
is flat when averaged over a period of time and plotted on a logarithmic
frequency scale. Pink noise is perceived to have the same energy across the
entire frequency range and therefore has a flat spectral balance.

Through subjective analysis of spectral balance, listen to a recording


comprehensively. Where the possible combination and number of frequency
resonances were simplified in Chapter 2, the analysis is now open to any
frequency or combination of frequencies. Taking a broader view of a
recording, the following questions are addressed:
• Are there specific frequency bands that are more prominent or
deficient than others?
• Can we identify resonances by their approximate frequency in hertz?
• Are there specific musical notes that are more prominent than others?

Frequency resonances in recordings can occur due to the deliberate


use of equalization, the placement of the microphone around an instrument
being recorded, or specific characteristics of an instrument, such as the tuning
of a drum head. The location and angle of orientation of a microphone will
have a significant effect on the spectral balance of the recorded sound
produced by an instrument. Because musical instruments often have sound
radiation patterns that vary with frequency, the position of the microphone
relative to an instrument is critical in this regard. (For more information on
the sound radiation patterns of musical instruments, see Dickreiter's book
titled Tonmeister Technology: Recording Environments, Sound Sources, and
Microphone Techniques [1989].) Additionally, depending on the nature and
size of a recording space, resonant modes may be present and microphones
may pick up these modes. Resonance modes can amplify certain specific
frequencies produced by musical instruments. All of these factors contribute
to the spectral balance of a sound recording or playback system and can have
a cumulative effect if resonances from different microphones occur in the
same frequency regions.
7.1.3 Auditory image
An auditory image, as defined by Woszczyk (1993), is “a mental model of the
external world that is constructed by the listener from auditory information
(p. 198)”. Listeners can locate sound images that are produced from
combinations of audio signals emanating from pairs or sets of speakers. The
auditory impression of sounds located at various locations between two
speakers is known as stereo imaging . Despite having only two physical sound
sources in the case of stereo, it is possible to create ghost images of sources
in locations between the actual speaker locations, where no physical source
exists.
Using a full stereo image, spanning the entire range from left to right, is
an important aspect of production that is sometimes overlooked. Listening
carefully to recordings can illustrate a variety of stereo and panoramic image
treatments. The illusion of a stereo image is created by controlling amplitude
differences between channels through panning and time differences between
channels through time delay. The differences between channels do not
correspond to the inter -aural differences when played through speakers
because the sound from both speakers reaches both ears. Stereo miking
techniques can provide yet another method of controlling interchannel
amplitude and timing differences due to microphone polar patterns and
physical spacing between microphones.
In the study of music production and mixing techniques, several
conventions are found in panning sounds within the stereo image between
various genres of music. For example, pop and rock generally emphasize the
center part of the stereo image, because the kick drum, snare, bass, and
vocals are usually moved to the center. The guitar and keyboard parts
sometimes shift to the side, but overall there is significant energy originating
from the center. A look at a correlation meter would confirm what is heard as
well, and a recording with a strong central component will give a reading
close to 1 on a correlation meter. Likewise, if the polarity of a channel is
reversed and the left and right channels are added, a mix with a dominant
center image will have significant cancellation of the audio signal. Any audio
signal components that are equally present in the left and right channels (i.e.,
center wide or mono) will have destructive cancellation when the two
channels are subtracted.
The panning and placement of sounds in a stereo image has a definitive
effect on how clearly listeners can hear individual sounds in a mix. The
phenomenon of masking, where one sound obscures another, must also be
considered with panning. Separating sounds will result in greater clarity,
especially if they occupy similar musical registers or contain similar frequency
content. Musical mixing and balance, and therefore the meaning and musical
message of a recording, are directly affected by the panning of instruments;
Proper use of panning can give an engineer more flexibility for level
adjustments.
As you listen to the width of the stereo image and the spread of an
image from one side to the other, the following questions guide your
exploration and analysis:
• Overall, does a stereo image have a balanced distribution from left to
right with all points between the speakers represented equally or are
there places where an image appears to be missing?
• How wide or monophonic is the image?
• What are the locations and widths of individual sound sources in a
recording?
• Are their locations stable and defined or ambiguous?
• How easily can the locations of sound sources be located within a
stereo image?
• Does the sound image appear to have the correct and appropriate
spatial distribution of sound sources?
By considering these types of questions for each sound recording
encountered, a stronger sense can be developed for the types of panoramic
and stereo images created by professional engineers and producers.
7.1.4 Spatial impression, reverb and time-based effects
The spatial impression of a recording is essential for conveying emotion and
drama in music. Reverb and echo help set the stage on which a musical
performance or theatrical action takes place. Listeners can mentally transport
themselves to the space in which the music exists through the strong
influence of early reflections and reverberation that surround the music in a
sound recording. Whether capturing a real acoustic space in a recording or
adding artificial reverb to mimic a real space, spatial attributes convey an
overall impression about the size of a space. A long reverb time can create
the feeling of being in a larger acoustic space, while a short reverb decay time
or a low reverb level can convey the feeling of a smaller, more intimate space.
The analysis of spatial impression can be divided into the following
subareas:
• Apparent room size:
o How big is the room?
o Is more than one type of reverb present in a recording?
o Is the reverb real or artificial?
o What is the approximate reverberation time?
o Are there echoes or long delays in the reverb and early
reflections?
• Depth perspective: Are sounds placed in front clearly distinguished
from those in the background?
• What is the spectral balance of the reverb?
• What is the direct/reverberant ratio?
• Are there strong echoes or delays?
• Are there any apparent time-based effects, such as chorus or flanger?
Classical music recordings can give listeners the opportunity to become
familiar with the reverberation of a real acoustic space. Often, orchestras and
artists with larger recording budgets will record in concert halls and churches
with acoustics that are considered very conducive to musical performance.
The depth and sense of space that can be created with the proper capture of
a real acoustic space are generally difficult to imitate with artificial reverb.
Adding artificial reverb to dry sounds is not the same as recording
instruments in a live acoustic space from the beginning. If a dry sound is
recorded in an acoustically dead space with close microphones, then the
microphones do not pick up the sound radiating from the microphones.
Sound radiating from the back of an instrument probably won't be picked up
in a dry studio environment. So even when the highest quality artificial reverb
is added, it won't sound the same as an instrument recorded in a live acoustic
space with close and room mics.

7.1.5 Dynamic range and level changes


Dynamic range can be critical for a music recording and different styles of
music will require different dynamic ranges. There can be wide fluctuations in
sound level throughout a piece of music, as a dynamic level rises to fortissimo
and falls to pianissimo. Likewise, the microdynamics of a signal can be
examined, the analysis of which is often aided by the use of a level meter,
such as a peak program meter (PPM) or a digital meter. For pop and rock
recordings, generally the dynamic range from a level point of view is quite
static, but we can hear (and see on a meter) small fluctuations that occur on
strong beats and between their beats. A meter can fluctuate more than 20 dB
for some recordings or as little as 2 to 3 dB for others. Fluctuations of 20 dB
represent a wider dynamic range than smaller fluctuations and generally
indicate that a recording has been less compressed. Because the human
auditory system primarily responds to average levels rather than maximum
levels in loudness judgment, a recording with smaller amplitude fluctuations
will sound louder than one with larger fluctuations, even if the two have the
same maximum amplitude.
In this part of the analysis, listen for changes in the level of individual
instruments and of an overall stereo mix. Level changes can be the result of
manual gain changes or a signal-dependent automatic gain reduction
produced by a compressor or expander. Dynamic level changes can help
magnify musical intentions and enhance the listening experience. A
disadvantage of a wide dynamic range is that quieter sections are partially
inaudible and therefore detract from any musical impact intended by an
artist.
7.1.6 Noise and distortion
Many different types of noise can interrupt or degrade an audio signal in one
way or another and can come in different forms, such as a 50 or 60 Hz buzz or
hum, low-frequency bumps from a microphone or a stand being knocked,
external noises such as car or airplane horns, clicks and pops from inaccurate
digital synchronization, and dropouts (very short periods of silence) as a result
of faulty recording media. Generally, the goal is to avoid any accidental
instances of noise, unless of course it suits a deliberate artistic effect.
Unless a sound is intentionally distorted, engineers try to avoid clipping
any of the stages in a signal chain. Therefore, it is important to recognize
when it is occurring and reduce the level of a signal appropriately. Sometimes
it is unavoidable or escapes those involved and is present in a finished
recording.

7.1.7 Balance of components within a mixture


Finally, in analyzing recorded sound, consider the mix or balance of elements
within a recording. The relative balance of instruments can have a very
significant influence on the musical meaning, impact and focus of a recording.
The amplitude of an element within the context of a mixture can also have an
effect on the perception of other elements within the mixture.
Think about questions like the following:
• Are the amplitude levels of the instruments balanced appropriately for
the style of music?
• Is there an instrument that is too loud or another that is too quiet?
The entire perceived sound image can be analyzed as a whole.
Likewise, the less significant features of a sound image can also be analyzed
and can be considered as a subgroup. Some of these subfunctions may
include the following:
• Specific characteristics of each component, musical voice, or
instrument, such as the temporal nature or spatial location of the
amplitude envelope components (e.g., attack, decay, sustain, and
release).
• Definition and clarity of elements within a sound image
• Width and spatial extent of individual elements
Often, to an inexperienced listener, specific characteristics of the
played audio may not be obvious or immediately recognizable. A trained
listener, on the other hand, will likely be able to identify and distinguish
specific features of the played audio that are not evident to an untrained
listener. One such example exists in the world of developing perceptual
coding algorithms, which has required the use of expertly trained listeners to
identify processing deficiencies. The artifacts and distortion produced during
perceptual encoding are not necessarily immediately evident until critical
listeners, who are testing encoding software, learn what to listen for. Once a
listener can identify audio artifacts, it can be difficult not to hear them.
Unlike listening to music at a live concert, music recordings (audio only,
as opposed to accompanied by video) require listeners to rely completely on
their sense of hearing. There is no visual information to help follow a musical
soundtrack, unlike a live performance where visual information helps fill in
details that may not be as obvious in the auditory domain. As a result,
recording engineers sometimes exaggerate certain sonic characteristics of a
sound recording, through level control, dynamic range processing,
equalization, and reverb, to help engage the listener.

7.2 Analysis Examples


In this section we will study some recordings, highlighting the timbral,
dynamic, spatial and mixing options that are evident when listening. Any of
these tracks would be appropriate for practicing with the EQ software
module, listening to speakers and headphones, and performing graphical
analysis (see Section 7.3).
7.2.1 Sheryl Crow: “Strong Enough”
Crow, Sheryl. (1993). Tuesday Night Music Club . A&M Records.
Produced by Bill Bottrell.
The third track on Sheryl Crow's Tuesday Night Music Club is fascinating
for its use of numerous layers of sounds that are arranged and blended to
form a musically and timbrally interesting track. The instrumental parts
complement each other and are well balanced. Numerous listenings of the
track are required to identify all the sounds that are present.
The piece begins with a synth pad followed by two acoustic guitars
panned left and right. The guitar sound is not as crisp as you might imagine
with an acoustic guitar. In this recording, the high frequencies of these guitars
have been attenuated a bit, perhaps because the strings are old and some
signal from an acoustic guitar pickup is mixed in.
Crow's lead vocals come in with a dry but intense sound. There is very
little reverb on the voice and the timbre is quite bright. A crisp, clear 12-string
sound contrasts with the dull sound of the other two guitars. The fretless
electric bass comes in to round out the lower tones. The hand drum pans left
and right to complete the spatial component of the stereo image.
The chorus features a fairly dry ride cymbal and a high-pitched, flute-
like Hammond B3 sound fairly low in the mix. After the chorus, a pedal steel
comes in and then fades out before the next verse. The bridge features
bright, clear strumming mandolins that travel left and right. Backing vocals,
panned left and right, echo Crow's lead vocals.
The unconventional instrumentation and layering of contrasting sounds
make this recording interesting from a subjective analysis point of view. The
arrangement of the piece results in various types of instruments coming and
going to emphasize each section of the music. Despite the coming and going
of instruments and the number of layers present, the music sounds clear and
coherent.
7.2.2 Peter Gabriel: “In Your Eyes”
Gabriel, Peter. (1986). So. Produced by Daniel Lanois and Peter Gabriel.
Engineered by Kevin Killen and Daniel Lanois. The David Geffen Company.
This Peter Gabriel track is a study in the successful layering of sounds
that create a complete timbral, dynamic and spatial blend. The music begins
with a chorused piano sound, a synth pad, and percussion. The bass and
drums enter shortly after, followed by Gabriel's lead vocals.
There is an immediate sense of space in the first note of the track.
There is no obvious decrease in reverb at the beginning, however, the
combination of all the sounds, each with its own sense of space, creates a
feeling of openness. The reverb decay is most audible after the chorus when
the percussion and synths kick in for a few bars.
Despite the multiple layers of percussion, such as the talking drum and
triangle, along with the full rhythm section, the mix is pleasantly complete
and yet remains uncluttered. The various percussion parts and drums occupy
a large area in the stereo image, helping to create a space for the lead vocal
to sit.
The vocal timbre has a warm, but slightly harsh sound. It is fully
supported by the variety of drums, bass, percussion and synths throughout
the piece. The Senegalese singer Youssou N'Dour performs a solo at the end
of the piece, which is overlapped with other voices that unfold to the sides.
The bass line is punchy and articulate, sounding as if it were quite
compressed, and contributes significantly to the rhythmic foundation of the
piece.
Distortion is present in some sounds, starting with the slightly crunchy
drum beat on the downbeat of the piece. Other sounds are slightly distorted
in places and compression effects are audible. This is certainly not the
cleanest recording you can find, however, distortion and compression
artifacts work to add life and excitement to the recording.
Overall, this recording demonstrates a fascinating use of many layers of
sound, including acoustic percussion and electronic synthesizers, creating the
feeling of a large open space in which a musical story is being told.
7.2.3 Lyle Lovett: “Church”
Lovett, Lyle. (1992). Joshua Judges Ruth . Produced by George Massenburg,
Billy Williams, and Lyle Lovett. Recorded by George Massenburg and Nathan
Kunkel. Curb Music Company/MCA Records.
Lyle Lovett's recording of "Church" represents contrasting perspectives.
The track begins with the piano giving a gospel choir an opening note, which
they hum. Lovett's lead vocal enters immediately with the chorus's applause
on beats two and four. The piano, bass, and drums begin with sparse vocal
accompaniment and gradually build to more prominent parts. One thing that
immediately stands out on this recording is the clarity of each sound. The
timbres of instruments and voices represent evenly balanced spectrums,
emerging from the mix as a natural sound.
Lovett's vocals are direct with very little reverb, and his level in the mix
is consistent from start to finish. The drums have a crisp attack with just the
right amount of resonance. Each drum hit emerges from the mix with panned
toms across the stereo image. The cymbals are crystal clear and add brilliance
to the top end of the recording.
The choir on this recording accompanies Lovett and responds to her
singing. Interestingly, the choir sounds as if it is located in a small country
church, where the reverb is especially highlighted with applause. The chorus
and associated applause pan widely across the stereo image. As the choir
members take short solos, their individual voices come through and are
particularly drier than when with the choir.
The lead vocals and rhythm section are presented quite dryly, up front,
and this contrasts with the chorus, which is clearly in a more reverberant or at
least more distant space.
The levels and dynamic range of each instrument are adjusted
appropriately, presumably through some combination of compression and
manual fader control. Every component of the mix is audible and none of the
sounds are hidden.
Noise and distortion are completely non-existent on this recording and
obviously great care has been taken to eliminate or prevent any extraneous
noise. There is also no evidence of clipping and every sound is clean.
This recording has become a classic in terms of sound quality and has
also been mixed with surround sound as a standalone version.
7.2.4 Sarah McLachlan: “Lost”
McLachlan, Sarah. (1991). Solace . Produced and recorded by Pierre
Marchand. Nettwerk/Arista Records, Bertelsmann Music Group.
This track begins with a somewhat reverberant but clear acoustic guitar
and dry brushes on a snare. A somewhat airy lead vocal comes in with a lot of
space around it. The reverb that creates the space around the vocal is quite
low level, but the decay time is probably in the 2 second range. The reverb
blends well with the vocals and seems appropriate for the character of the
piece. The timbre of the voice is clear and spectrally balanced. The mixing and
compression of the voice has meant that its level is constantly ahead of the
pack.
The mandolin and 12-string guitar shift slightly left and right after the
first verse along with the electric bass and reverb pedal. The bass plays a few
tones below the standard bass notes of a bass, creating a surround sound
that supports the rest of the mix. The backing vocals are shifted slightly left
and right and placed a little further back in the mix than the lead vocal. Synth
pads, backing vocals and delayed guitar transform the mix into a dreamy
texture for a verse and then fade out for a return of mandolin and 12-string
guitar.
The timbres of this track are clear, but not harsh. There is an overall
softness to the timbres and the low frequencies, mainly from the bass,
provide a solid foundation for the mix. (Interestingly, some sounds on other
tracks on this album are slightly harsh.) The lead vocal is the most prominent
sound in the mix with backing vocals mixed in slightly lower than the lead
vocal. Guitars, mandolin and bass are the next most prominent sound in the
mix. The drums are almost completely gone after the intro, but return at the
end. The drummer raises the energy of the final chorus by playing tom and
snare hits. The drums are mixed quite low but are still audible as a rhythmic
texture and the drums have their toms decoupled.
With the round, smooth and full bass sound, this recording is useful for
listening to the low frequency response of speakers and headphones.
There's not much bass attack to identify articulation, but its sound fits
comfortably into the music. With such a prominent and balanced voice, the
recording can also serve to help identify any mid-frequency resonance or anti-
resonance in a sound reproduction system.

7.2.5 Jon Randall: “In the Country”


Randall, Jon. (2005). Walking Among the Living . Produced by George
Massenburg and Jon Randall. Recorded by George Massenburg and David
Robinson. Epic/Sony BMG Music Entertainment.
The fullness and clarity of this track is present from the first note.
Acoustic guitar and mandolin begin the intro followed by Randall's lead
vocals. The rhythm section enters the second verse, which widens the
bandwidth with cymbals in the high frequency range and kick drum in the low
frequency range. Various musical colors, such as dobro, violin, Wurlitzer and
mandolin, stand out in the short musical features and then fade into the
background. It seems evident that great care was taken to create an ever-
evolving mix featuring musically important phrases.
The timbres of this track sound naturally clear and completely
spectrally balanced. The voice is constantly present above the instruments,
with a subtle sense of reverb to create space around it. The drums are not as
prominent as on the Lyle Lovett recording discussed above, and are a bit
understated. The cymbals are present and clear, but do not overpower other
sounds. The bass is smooth and full, with enough articulation on its part. The
violin, mandolin and guitar sounds are all full-bodied, crisp and warm. The
high harmonics of strummed mandolin and guitars mix with the harmonics of
cymbals in the upper frequency range. Aside from the track's timbral
integrity, there is no evidence of any noise or distortion.
The stereo image is used to its full extent with wide-panned mandolins,
guitars and drums. The balance of this recording is impeccable, making use of
musically appropriate spatial treatment (reverb and panning), dynamic
processing and equalization.

7.3 Graphic sound analysis


In research on the perception of sound images produced by automotive audio
systems, researchers have used graphical techniques to obtain listeners'
perceptions of the location and dimensions of sound images (Ford et al.,
2002, 2003; Mason et al., 2000). Work by Usher and Woszczyk (2003) and
Usher (2004) has sought to visualize the location, depth and width of sound
images within a multi-channel playback environment, to better understand
listeners' perceptions of the locations of sound sources in a car sound
playback environment. In the experiments, listeners were asked to draw
sound sources using elliptical shapes on a graphical computer interface.
By translating what is heard into a two-dimensional visual diagram, a
level of analysis different from verbal descriptions can be achieved. Although
there is no clear method for visually illustrating an auditory perception, the
exercise of doing so is very useful for sonic analysis and exploration.
Using a template like the one in Figure 7.1 , draw what you hear
coming from a sound system. The listening location relative to a sound system
will have a direct effect on the location of ghost images. Section 1.3.2
illustrates the ideal listening location for stereo sound reproduction that will
provide accurate ghosting locations.

Figure 7.1 The reader is encouraged to use the template shown here as a guide for graphical analysis of a sound image,
to visualize the perceived locations of sound images within a sound recording.

The images drawn on the template should not resemble the actual
shapes of musical instruments, but should be analogous to the sound images
perceived from the speakers. For example, the stereo image of a solo piano
recording will be very different from the image of a piano playing with an
ensemble, and their corresponding visual images would also look significantly
different.
Drawings of stereo images should be labeled to indicate how the visual
shapes correspond to the perceived auditory images. Without labels, they
may seem too abstract to be understood, but when considered in relation to
their respective sound recordings, they can help the listener draw a sonic
picture.
Graphical analysis allows attention to be focused on the location,
width, depth and spread of sound sources in a sound image. A visual
representation of a sound image should include not only the direct sound
from each sound source, but also any spatial effects, such as reflections and
reverberation, present in a recording.
7.4 Multichannel audio
This section will focus on the most common multichannel playback format
with 5.1 channels. Multichannel audio generally allows for more realistic
reproduction of a surround sound field, especially for recordings of purely
acoustic music in a concert hall; This type of recording can leave listeners with
the impression of sitting in a room, completely surrounded by sound.
In contrast, multichannel audio also offers the least realistic audio
reproduction because it allows an engineer to position sound sources around
a listener. There are usually no musicians behind audience members at a
concert, other than the antiphonal organ, brass, or choir, but multichannel
audio playback allows a sound mixer to place direct sound sources at the back
of the listening position. Certainly, multichannel audio has many advantages
over two-channel stereo, but there are still challenges to consider and
opportunities for critical listening to help with these challenges.
Although there are speakers in front and behind, in the ITU-R BS.775-1
recommendation (ITU-R, 1994) (see Fig. 1.3) there is a fairly wide space
between the front speaker (30°) and the nearest surround speaker (110° to
120°). The wide space between the front and rear speakers makes it difficult
to produce lateral sound images, at least with stability and placement
accuracy.

7.4.1 The central channel


A distinctive feature of the 5.1 playback environment is the presence of a
center speaker located at 0° between the left and right channels. The
advantage of a center channel is that it can help solidify and stabilize sound
images that move toward the center. Ghosting in the center of a conventional
stereo speaker setup appears to come from the center only when the listener
is seated in the ideal listening location, equidistant from the speakers. When
a listener moves to one side, a central ghost image appears to move to the
same side.
Because a listener is no longer equidistant from the two speakers, sound
reaches the listener first from the closest speaker and will localize to that
speaker due to the law of first arriving wavefront, also known as the
precedence effect or Haas effect .
Soloing the center speaker of a surround mix helps give an idea of what
a mix engineer sent to the center channel. As you listen to the center channel
and explore how it integrates with the left and right channels, ask questions
such as the following:
• Does the presence or absence of the central channel make a significant
difference in the frontal image?
• Are the lead instruments or vocals the only sounds in the center
channel?
• Are there any drums or drum kit components in the center channel?
• Is the bass present in the center channel?
If a recording has prominent lead vocals and they are panned only in
the center channel, it is likely that some of the reverb, echo, and initial
reflections are done in other channels. In such a mix, muting the center
channel can make it easier to hear the reverb without any direct sound.
Sometimes ghost images produced by the left and right channels are
reinforced by the center image or channel. Duplicating a center ghost image
on the center speaker can make the center image more stable and solid.
Often the signal sent to the left and right channels can be delayed or modified
in some way so that it is not an exact copy of the center channel. With all
three channels producing exactly the same audio signal, the listener can
experience comb filtering with changes in head location as signals from three
different locations are combined in the ears (Martin, 2005).
The spatial quality of a ghost image produced between the left and
right channels is markedly different from the solid center channel image that
reproduces exactly the same audio signal on its own. Some may prefer a
ghost image between the left and right speakers despite its shortcomings,
such as the movement of the ghost image corresponding to the listener's
location. A ghost image produced by two speakers will generally be wider and
fuller sounding than a single center speaker producing the same sound, which
may be perceived as narrower and more restricted.
It is important to compare different channels of a multichannel
recording and begin to form an internal reference for various aspects of a
multichannel sound image. By making these comparisons and listening closely
and carefully, we can form solid impressions of what types of sounds are
possible from various speakers in a surround environment.
7.4.2 The surround channels
In analyzing surround recordings, it is useful to focus on how well a 5.1-
channel surround sound recording achieves uniform front-to-back
distribution and whether side imaging exists. Side images are difficult to
produce without an actual speaker placed to the side due to the nature of
binaural hearing, which is much more accurate at localizing sounds originating
from the front.
Locate various elements in a mix and examine the placement of sounds
around the listening area by considering some questions such as:
• How are the different elements of the mix placed?
• Do they have precise locations or is it difficult to determine the exact
location because a sound seems to come from many locations at once?
• What is the nature of reverb and where does it travel?
• Are there different levels of reverb and delay?
In surround playback systems, the rear channels are widely spaced. The
large space, coupled with binaural hearing having less spatial acuity in the
rear, makes it difficult to create a coherent and uniform rear image. It is
important to only listen to the surround channels. When listening to the
entire mix, the rear channels may not be as easy to hear due to the hearing
system's predisposition to sound coming from the front.
7.4.3 Exercise: Comparison of Stereo with Surround Sound
Comparing a stereo and surround mix of the same musical recording can be
illuminating. A lot of detail can be heard in a surround mix that is not as
audible or missing in a stereo mix. Surround playback systems allow an
engineer to place sound sources in many different locations around a
listening area. Due to the spatial separation of sound sources, there is less
masking in a surround mix. Listening to a surround mix and then returning to
its corresponding stereo mix can help highlight elements of a stereo mix that
were not heard before.

7.4.4 Exercise: Comparison of Original and Remastered Versions


Several recordings have been remastered and released several years after
their original release. Remastering an album usually involves going back to its
original stereo mix and applying new equalization, dynamics processing, level
adjustments, mid-side processing, and possibly reverb. Comparing an original
album release to a remastered version is a useful exercise that can help
highlight the timbral, dynamic, and spatial characteristics typically altered by
a mastering engineer.

7.5 high sampling rates


There have been a number of heated debates about the advantages or
benefits of high sample rates in digital audio. The compact disc digital audio
format specifies a sampling rate of 44,100 Hz and a bit depth of 16 bits per
sample, according to the Red Book CD standard. As recording technology has
evolved, it has allowed audio to be recorded and distributed to listeners at
much higher sample rates. There is no doubt that bit depths greater than 16
bits per sample improve audio quality, and engineers typically record with at
least 24 bits per sample. As an exercise, compare a recording of

24 bits with a faded 16-bit version of the same recording and note the
audible differences.
The sampling rate determines the highest frequency that can be
recorded and therefore the bandwidth of a recording. The sampling theorem
states that the highest frequency we can record is equal to half the sampling
frequency. Higher sample rates allow for wider bandwidth for recording.
Although the difference between a high sample rate (96 kHz or 192
kHz) and a 44.1 kHz sample rate is subtle and it may be difficult to hear any
difference, comparing a high sample rate to CD-quality audio can be be useful
in fine-tuning listening skills. As one progresses to perceiving more precise
audible differences between sounds, it may be helpful to compare sound
recorded at different sampling rates. Some engineers report that a recording
made at 96 kHz and downsampled to 44.1 kHz sounds better than a recording
originating at 44.1 kHz.
A 2.8224 MHz sample rate recording from a Super Audio CD (SACD) can
offer a greater difference than 96 kHz or 192 kHz compared to CD-quality
audio. One of the differences has to do with improving spatial clarity. The
panning of instruments and sound sources within a stereo or surround image
can be more clearly defined, source locations are more precise, and reverb
decay is generally smoother.
With any of these comparisons, it's easier to hear the differences when
the audio is played through high-quality speakers or headphones. Lower
quality playback devices do not allow you to fully enjoy the benefits of high
sample rates. High-quality playback systems don't always have to be
expensive, especially consumer systems.
25 6 Exercise: comparison of speakers and headphones
Each particular model of speaker or headphone has a unique sound.
Frequency response, power response, distortion characteristics, and other
specifications contribute to the sound an engineer hears and therefore
influence decisions during recording and mixing sessions.
For this exercise, do the following:
• Choose two different pairs of speakers, two different headphones, or a
pair of speakers and a pair of headphones.
• Choose several well-known music recordings.
• Document the make/model of the speakers/headphones and the
listening environment.
• Compare the sound quality of the two different sound playback
devices.
• Describe the audible differences with comments on the following
aspects and characteristics of the sound field:
o Timbral quality: describes differences in frequency response and
spectral balance.
- Is there a poor model in a specific frequency band?
- Is there a particularly resonant pattern in a certain frequency
band?

o Spatial characteristics: what does the reverb sound like?


- Does one model make the reverb more prominent than the
other?
- Is the spatial layout of the stereo image the same in both? - Is
the clarity of the sound source locations the same on both? That
is, can sound sources be located in the stereo image equally well
on both models? - If comparing headphones to speakers, can we
describe the differences in the image components that are in the
panoramic center?
- How do the center images compare in terms of their front/back
placement and their width?

o Overall clarity of the sound image:


- Which is more defined?
- Can you hear details in one that are less audible or inaudible in
the other?
o Preference: which is generally preferred?
o General differences: Describe the differences beyond the list
presented here.
• Sound files: It is best to use only linear PCM files (AIFF or WAV) that
have not been converted from MP3 or AAC.
Every sound reproduction device and environment has a direct effect
on the quality and character of the sound heard, and it is important for an
engineer to know his or her sound reproduction system (the speaker/room
combination) and have a couple of reference recordings that you know well.
Reference recordings do not have to be pristine, perfect recordings as long as
they are familiar.

7.7 Exercise: Sound enhancers in media players


Many software media players used to play audio on a computer offer so-
called sound enhancement controls. This type of control is often enabled by
default in media players like iTunes, and offers another opportunity for
critical listening. It can be informative to compare the audio quality with
sound enhancement on and off and try to determine by ear how the
algorithm is affecting the sound. The processing it uses may improve the
sound of some recordings, but degrade the sound of others.
Consider how a sound enhancer affects the stereo image and whether
the overall width of the image is affected or if the panning and placement of
sound sources is altered in any way:
• Is the reverb level affected?
• The timbre is likely to be altered in some way. Try to identify as
precisely as possible how the timbre is changed. Identify if any
equalization has been added and what specific frequencies have been
altered.
• Is there any dynamic range processing going on? Are compression
artifacts present or does the upgraded version sound louder?
The sound enhancement settings on media players may or may not
alter the audio in a desirable way, but they certainly offer a critical listening
exercise to determine differences in audio characteristics.

7.8 Sound analysis of acoustic sources


Live acoustic music performances can be instructive and enlightening in
developing critical listening skills. Most music heard is played through
electroacoustic transducers of some type (speakers or headphones), and it
can be easy to lose sight of how an instrument sounds acoustically as it
projects sound in all directions of a room or hall. At least one consumer audio
system manufacturer encourages its research and development staff to
attend acoustic music concerts. This practice is incredibly important for
developing a reference point for tuning speakers. The act of listening to
sound quality, timbre, spatial characteristics, and dynamic range during a live
music concert can hone technical listening skills through speakers.
It may seem counterintuitive to use such acoustic music performances
for training in a field that relies on sound reproduction technology, but the
sound radiation patterns of musical instruments are different from those of
loudspeakers, and it is important to recalibrate the auditory system. actively
listening to acoustic music. When attending jazz, classical music,
contemporary acoustic music, or folk music concerts, you can hear the result
of the natural sound radiation patterns of each instrument in the room. The
sound emanates from each instrument into the room, theater or hall and
mixes with that of other instruments and voices.
Sitting in the audience at a live music concert, focus on the aspects of
sound that are often considered when balancing tracks on a recording. Just as
the spatial layout (panning) and depth of a recording played over
loudspeakers can be analyzed, these aspects can also be examined in an
acoustic environment. Start by trying to locate the different members or
sections of the ensemble that is being presented. With your eyes closed, it
may be easier to focus on the auditory sensation and ignore what the visual
sense is reporting. Try locating instruments on a stage and think of the overall
sound in terms of a “stereo image,” as if two speakers were producing the
sound and ghosting was heard between the speakers. The location of sound
sources may not be the same for all seats in the house and may be influenced
by early reflections from side walls in the performance space. When
comparing music played through a pair of speakers to that played in a live
acoustic space, the perceived sound image will be significantly different in
terms of timbre, space and dynamics. Some questions can guide the
comparison:
• Does live music sound wider or narrower than stereo speakers?
• Is the relationship between direct and reverb consistent with what
might be heard on a recording?
• How does the timbre compare to what is heard through the speakers?
If different, describe the difference.
• How well do very quiet passages sound?
• How does the dynamic range compare?
• How does the feeling of spaciousness and envelopment compare?
Audience members almost always sit much further away from the
musical performers than the microphones would normally be placed, and are
outside the reverberation radius or critical distance. Therefore, most of the
sound energy they hear is indirect sound (reflections and reverberation),
making it much more reverberant than what is heard on a recording. This
level of reverb probably wouldn't be acceptable on a recording, but audience
members find it enjoyable. Perhaps because music performers are visible live,
the auditory system is more forgiving, or perhaps visual cues help audience
members engage with the music because they can see the performers'
movements in sync with the notes being played. they are playing.
Ideally, the reverberant field (the audience seating area) should be
somewhat diffuse, meaning that indirect sound should be heard equally
coming from all directions. In an actual concert hall or other musical
performance space, this may not be the case and it may be possible to
localize the reverb. If the reverb is localizable, focus on the width and spatial
extent of it. Is it located mainly behind or does it also extend to the sides? Is it
enveloping? Is there any reverb coming from the front where the musicians
normally stand?
Early reflections can also be discerned as a characteristic of any sound
field. Although early reflections typically reach the listener within tens of
milliseconds of a direct sound and are therefore imperceptible as discrete
sounds, there are times when reflections can accumulate or focus from a
particular location and alter our perception of the location. from a sound
source. Any curved wall will tend to focus reflections, causing them to add up
and therefore increase in amplitude to a level greater than that of direct
sound.
Early lateral reflections can help broaden the perceived width of the
sound image. Although these reflections may not be perceived as discrete
echoes, try to focus on the overall width. Also focus on how the direct sound
mixes and joins the sound coming from the sides and rear. Does the sound
continuously envelop the entire environment or are there interruptions in the
sound field, as can occur when listening to multichannel recordings?
Echoes, reflections, and reverberation are sometimes more audible
when transient or percussive sounds are present. Sounds that have a high
attack and a short sustain and decay will allow the indirect sound that comes
immediately after to be heard, because the direct sound will be quiet and
therefore will not mask the indirect sound.

Summary
The analysis of sound, whether purely acoustic or coming from loudspeakers,
presents opportunities to deconstruct and discover characteristics and
features of a sound image. The more one listens to recordings and acoustic
sounds with active participation, the more sound characteristics can be
identified and focused on. With time and continued practice, the perception
of auditory events opens up and one begins to notice sonic characteristics
that were not audible before. The more you discover through active listening,
the deeper your enjoyment of sound can become, but it requires dedicated
practice over time. Likewise, more focused and effective listening skills lead to
greater efficiency and effectiveness in recording, producing, composing,
reinforcing, and developing sound products. Technical ear training is essential
for anyone involved in audio engineering and music production, and critical
listening skills are within the reach of anyone who is willing to take the time
to pay attention to what they are hearing.
Here are some final tips: Listen to as many recordings as possible.
Listen through a wide variety of headphones and speaker systems. During
each listening session, take notes about what you hear. Find out who
designed the most admired recordings and find more recordings from the
same engineers. Note the similarities and differences between various
recordings by a given engineer, producer, or record label. Note the similarities
and differences between various recordings by a given artist who has worked
with a variety of engineers or producers.
The most difficult activity to do while working on any audio project is
continuous active listening. The only way to know how to make decisions
about what equipment to use, where to place microphones, and how to
configure parameters is to listen carefully to every sound that emanates from
your monitors and headphones. By actively listening at all times, one can gain
essential information to better serve the musical vision of any audio project.
In sound recording and production, the human auditory system is the final
judge of quality and artistic vision.

You might also like