0% found this document useful (0 votes)
12 views

Reading and Writing WAV Files in Python – Real Python

This document provides a comprehensive tutorial on reading and writing WAV files in Python using the built-in wave module. It covers the WAV file format, PCM encoding, audio sample visualization, and efficient processing of large WAV files, along with practical examples and code snippets. The tutorial is aimed at intermediate users and includes bonus materials and quizzes to enhance learning.

Uploaded by

petejazz2500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Reading and Writing WAV Files in Python – Real Python

This document provides a comprehensive tutorial on reading and writing WAV files in Python using the built-in wave module. It covers the WAV file format, PCM encoding, audio sample visualization, and efficient processing of large WAV files, along with practical examples and code snippets. The tutorial is aimed at intermediate users and includes bonus materials and quizzes to enhance learning.

Uploaded by

petejazz2500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Reading and Writing WAV Files in Python

by Bartosz Zaczyński 0 Comments intermediate

Mark as Completed

Table of Contents
• Understand the WAV File Format
◦ The Waveform Part of WAV
◦ The Structure of a WAV File
• Get to Know Python’s wave Module
◦ Read WAV Metadata and Audio Frames
◦ Write Your First WAV File in Python
◦ Mix and Save Stereo Audio
◦ Encode With Higher Bit Depths
• Decipher the PCM-Encoded Audio Samples
◦ Enumerate the Encoding Formats
◦ Convert Audio Frames to Amplitudes
◦ Interpret the 24-Bit Depth of Audio
◦ Encode Amplitudes as Audio Frames
• Visualize Audio Samples as a Waveform
◦ Encapsulate WAV File’s Metadata
◦ Load All Audio Frames Eagerly
◦ Plot a Static Waveform Using Matplotlib
◦ Read a Slice of Audio Frames
• Process Large WAV Files in Python Efficiently
◦ Animate the Waveform Graph in Real Time
◦ Show a Real-Time Spectrogram Visualization
◦ Record an Internet Radio Station as a WAV File
◦ Widen the Stereo Field of a WAV File
• Conclusion

1 of 86
2 of 86
3 of 86
4 of 86
5 of 86
6 of 86
7 of 86
8 of 86
9 of 86
10 of 86
11 of 86
12 of 86
13 of 86
14 of 86
15 of 86
16 of 86
17 of 86
18 of 86
19 of 86
20 of 86
21 of 86
22 of 86
23 of 86
24 of 86
25 of 86
26 of 86
27 of 86
28 of 86
29 of 86
30 of 86
31 of 86
32 of 86
33 of 86
34 of 86
35 of 86
36 of 86
37 of 86
38 of 86
Remove ads

There’s an abundance of third-party tools and libraries for manipulating and analyzing audio WAV file
time, the language ships with the little-known wave module in its standard library, offering a quick an
read and write such files. Knowing Python’s wave module can help you dip your toes into digital audi

If topics like audio analysis, sound editing, or music synthesis get you excited, then you’re in for a tre
taste of them!

In this tutorial, you’ll learn how to:

• Read and write WAV files using pure Python


• Handle the 24-bit PCM encoding of audio samples
• Interpret and plot the underlying amplitude levels
• Record online audio streams like Internet radio stations
• Animate visualizations in the time and frequency domains
• Synthesize sounds and apply special effects

Although not required, you’ll get the most out of this tutorial if you’re familiar with NumPy and Matp
working with audio data. Additionally, knowing about numeric arrays in Python will help you better u
data representation in computer memory.

Click the link below to access the bonus materials, where you’ll find sample audio files for practice,
source code of all the examples demonstrated in this tutorial:

Get Your Code: Click here to download the free sample code that shows you how to read and w

You can also take the quiz to test your knowledge and see how much you’ve learned:

Take the Quiz: Test your knowledge with our interactive “Reading and Writing WAV Files in Pyt
score upon completion to help you track your learning progress:

39 of 86
• High Fidelity: Because most WAV files contain raw, uncompressed audio data, they’re perfect fo
the highest possible sound quality, such as with music production or audio editing. On the flips
significant storage space compared to lossy compression formats like MP3.

It’s worth noting that WAV files are specialized kinds of the Resource Interchange File Format (RIFF), w
for audio and video streams. Other popular file formats based on RIFF include AVI and MIDI. RIFF itse
older IFF format originally developed by Electronic Arts to store video game resources.

Before diving in, you’ll deconstruct the WAV file format itself to better understand its structure and ho
free to jump ahead if you just want to see how to use the wave module in Python.

Remove ads

The Waveform Part of WAV


What you perceive as sound is a disturbance of pressure traveling through a physical medium, such a
fundamental level, every sound is a wave that you can describe using three attributes:

1. Amplitude is the measure of the sound wave’s strength, which you perceive as loudness.
2. Frequency is the reciprocal of the wavelength or the number of oscillations per second, which c
3. Phase is the point in the wave cycle at which the wave starts, not registered by the human ear d

The word waveform, which appears in the WAV file format’s name, refers to the graphical depiction o
you’ve ever opened a sound file using audio editing software, such as Audacity, then you’ve likely see
content that looked something like this:

40 of 86
find quiet sections that may need editing.

Coming up next, you’ll learn how WAV files store these amplitude levels in digital form.

The Structure of a WAV File


The WAV audio file format is a binary format that exhibits the following structure on disk:

The Structure of a WAV File

As you can see, a WAV file begins with a header comprised of metadata, which describes how to inter
frames that follow. Each frame consists of channels that correspond to loudspeakers, such as left an
turn, every channel contains an encoded audio sample, which is a numeric value representing the re
given point in time.

The most important parameters that you’ll find in a WAV header are:

• Encoding: The digital representation of a sampled audio signal. Available encoding types inclu
Pulse-Code Modulation (PCM) and a few compressed formats like ADPCM, A-Law, or µ-Law.
• Channels: The number of channels in each frame, which is usually equal to one for mono and t
but could be more for surround sound recordings.
• Frame Rate: The number of frames per second, also known as the sampling rate or the samplin
hertz. It affects the range of representable frequencies, which has an impact on the perceived so
• Bit Depth: The number of bits per audio sample, which determines the maximum number of di
The higher the bit depth, the greater the dynamic range of the encoded signal, making subtle n

Python’s wave module supports only the Pulse-Code Modulation (PCM) encoding, which is by far th
that you can usually ignore other formats. Moreover, Python is limited to integer data types, while P
defining several bit depths to choose from, including floating-point ones:

Data Type Signed Bits Min Value

Integer No 8 0

41 of 86
After reading an audio sample into Python, you’ll typically normalize its value so that it always fall
the scale, regardless of the original range of PCM values. Then, before writing it back to a WAV file,
the value to make it fit the desired range.

The 8-bit integer encoding is the only one relying on unsigned numbers. In contrast, all remaining d
and negative sample values.

Another important detail you should take into account when reading audio samples is the byte orde
specifies that multi-byte values are stored in little-endian or start with the least significant byte first.
typically need to worry about it when you use the wave module to read or write audio data in Python
cases when you do!

The 8-bit, 16-bit, and 32-bit integers have standard representations in the C programming language,
interpreter builds on. However, the 24-bit integer is an outlier without a corresponding built-in C dat
that aren’t unheard of in music production, as they help strike a balance between size and quality. La
emulate them in Python.

To faithfully represent music, most WAV files use stereo PCM encoding with 16-bit signed integers s
44,100 frames per second. These parameters correspond to the standard CD-quality audio. Coinciden
frequency is roughly double the highest frequency that most humans can hear. According to the Nyq
theorem, that’s sufficient to capture sounds in digital form without distortion.

Now that you know what’s inside a WAV file, it’s time to load one into Python!

Remove ads

Get to Know Python’s wave Module


The wave module takes care of reading and writing WAV files but is otherwise pretty minimal. It was im
hundred lines of pure Python code, not counting the comments. Perhaps most importantly, you can’
To play a sound in Python, you’ll need to install a separate library.

Note: Although the wave module itself doesn’t support audio playback, you can still listen to your
through this tutorial. Use the media player that came with your operating system or any third-par
VLC.

As mentioned earlier, the wave module only supports four integer-based, uncompressed PCM encod

• 8-bit unsigned integer

42 of 86
... print(wav_file)
...
<wave.Wave_read object at 0x7fc07b2ab950>

When you don’t pass any additional arguments, the wave.open() function, which is the only function
public interface, opens the given file for reading and returns a Wave_read instance. You can use that o
information stored in the WAV file’s header and read the encoded audio frames:

Python

>>> with wave.open("Bongo_sound.wav") as wav_file:


... metadata = wav_file.getparams()
... frames = wav_file.readframes(metadata.nframes)
...

>>> metadata
_wave_params(
nchannels=1,
sampwidth=2,
framerate=44100,
nframes=212419,
comptype='NONE',
compname='not compressed'
)

>>> frames
b'\x01\x00\xfe\xff\x02\x00\xfe\xff\x01\x00\x01\x00\xfe\xff\x02\x00...'

>>> len(frames)
424838

You can conveniently get all parameters of your WAV file into a named tuple with descriptive attribut
.framerate. Alternatively, you can call the individual methods, such as .getnchannels(), on the Wav
pick the specific pieces of metadata that you’re interested in.

The underlying audio frames get exposed to you as an unprocessed bytes instance, which is a really
byte values. Unfortunately, you can’t do much beyond what you’ve seen here because the wave mod
bytes without providing any help in their interpretation.

Your sample recording of the Bongo drum, which is less than five seconds long and only uses one cha
a million bytes! To make sense of them, you must know the encoding format and manually decode th
numbers.

According to the metadata you’ve just obtained, there’s only one channel in each frame, and each au

43 of 86
>>> import struct
>>> format_string = "<" + "h" * (len(frames) // 2)
>>> pcm_samples = struct.unpack(format_string, frames)
>>> len(pcm_samples)
212419

The less-than symbol (<) in the format string explicitly indicates little-endian as the byte order of eac
In contrast, an array implicitly assumes your platform’s byte order, which means that you may need
method when necessary.

Finally, you can use NumPy as an efficient and powerful alternative to Python’s standard library mod
already work with numerical data:

Python

>>> import numpy as np


>>> pcm_samples = np.frombuffer(frames, dtype="<h")
>>> normalized_amplitudes = pcm_samples / (2 ** 15)

By using a NumPy array to store your PCM samples, you can leverage its element-wise vectorized ope
amplitude of the encoded audio signal. The highest magnitude of an amplitude stored on a signed s
32,768 or -215. Dividing each sample by 215 scales them to a right-open interval between -1.0 and 1.0,
audio processing tasks.

That brings you to a point where you can finally start performing interesting tasks on your audio data
the waveform or applying special effects. Before you do, however, you should learn how to save your

Remove ads

Write Your First WAV File in Python


Knowing how to use the wave module in Python opens up exciting possibilities, such as sound synth
compose your own music or sound effects from scratch and listen to them? Now, you can do just that

Mathematically, you can represent any complex sound as the sum of sufficiently many sinusoidal wa
amplitudes, and phases. By mixing them in the right proportions, you can recreate the unique timbre
instruments playing the same note. Later, you may combine a few musical notes into chords and use
melodies.

Here’s the general formula for calculating the amplitude A(t) at time instant t of a sine wave with freq
whose maximum amplitude is A:

44 of 86
audio, defaulting to 44.1 kHz. The helper function, sound_wave(), takes the frequency in hertz and d
expected wave as parameters. Based on them, it calculates the 8-bit unsigned integer PCM samples o
wave and yields them to the caller.

To determine the time instant before plugging it into the formula, you divide the current frame numb
gives you the current time in seconds. Then, you calculate the amplitude at that point using the simp
earlier. Finally, you shift, scale, and clip the amplitude to fit the range of 0 to 255, corresponding to an
audio sample.

You can now synthesize, say, a pure sound of the musical note A by generating a sine wave with a freq
two and a half seconds. Then, you can use the wave module to save the resulting PCM audio samples

Python synth_mono.py

import math
import wave

FRAMES_PER_SECOND = 44100

def sound_wave(frequency, num_seconds):


for frame in range(round(num_seconds * FRAMES_PER_SECOND)):
time = frame / FRAMES_PER_SECOND
amplitude = math.sin(2 * math.pi * frequency * time)
yield round((amplitude + 1) / 2 * 255)

with wave.open("output.wav", mode="wb") as wav_file:


wav_file.setnchannels(1)
wav_file.setsampwidth(1)
wav_file.setframerate(FRAMES_PER_SECOND)
wav_file.writeframes(bytes(sound_wave(440, 2.5)))

You start by adding the necessary import statement and call the wave.open() function with the mode
string literal "wb", which stands for writing in binary mode. In that case, the function returns a Wave_
Python always opens your WAV files in binary mode even if you don’t explicitly use the letter b in the

Next, you set the number of channels to one, which represents mono audio, and the sample width to
which corresponds to PCM encoding with 8-bit unsigned integer samples. You also pass your frame
Lastly, you convert the computed PCM audio samples into a sequence of bytes before writing them to

Now that you understand what the code does, you can go ahead and run the script:

Shell

$ python synth_mono.py

45 of 86
12
13 left_channel = sound_wave(440, 2.5)
14 right_channel = sound_wave(480, 2.5)
15 stereo_frames = itertools.chain(*zip(left_channel, right_channel))
16
17 with wave.open("output.wav", mode="wb") as wav_file:
18 wav_file.setnchannels(2)
19 wav_file.setsampwidth(1)
20 wav_file.setframerate(FRAMES_PER_SECOND)
21 wav_file.writeframes(bytes(stereo_frames))

The highlighted lines represent the necessary changes. First, you import the itertools module so th
zipped and unpacked pairs of audio samples from both channels on line 15. Note that it’s one of man
channel interleave. While the left channel stays as before, the right one becomes another sound wav
frequency. Both have equal lengths of two and a half seconds.

You also update the number of channels in the file’s metadata accordingly and convert the stereo fra
played back, your generated audio file should resemble the sound of a ringing tone used by most ph
can simulate other tones from various countries by adjusting the two sound frequencies, 440 Hz and
frequencies, you may need more than two channels.

Note: Reducing your frame rate to something like 8 kHz will make the ringing tone sound even mo
mimicking the limited frequency range of a typical telephone line.

Alternatively, instead of allocating separate channels for your sound waves, you can mix them togeth
effects. For example, two sound waves with very close frequencies produce a beating interference pa
experienced this phenomenon firsthand when traveling on an airplane because the jet engines on bo
exactly the same speeds. This creates a distinctive pulsating sound in the cabin.

Mixing two sound waves boils down to adding their amplitudes. Just make sure to clamp their sum
exceed the available amplitude range to avoid distortion:

Python synth_beat.py

46 of 86
sound. The built-in min() and max() functions help you keep the resulting amplitude between -1.0 a

Play around with the script above by increasing or decreasing the distance between both frequencie
the resulting beating rhythm.

Remove ads

Encode With Higher Bit Depths


So far, you’ve been representing every audio sample with a single byte or eight bits to keep things sim
distinct amplitude levels, which was decent enough for your needs. However, you’ll want to bump up
later to achieve a much greater dynamic range and better sound quality. This comes at a cost, though
memory.

To use one of the multi-byte PCM encodings, such as the 16-bit one, you’ll need to consider the con
int and a suitable byte representation. It involves handling the byte order and sign bit, in particular.
to help with that, which you’ll explore now:

• array
• bytearray and int.to_bytes()
• numpy.ndarray

When switching to a higher bit depth, you must adjust the scaling and byte conversion accordingly. F
signed integers, you can use the array module and make the following tweaks in your synth_stere

Python synth_stereo_16bits_array.py

47 of 86
23 wav_file.setsampwidth(2)
24 wav_file.setframerate(FRAMES_PER_SECOND)
25 wav_file.writeframes(stereo_frames.tobytes())

Here’s a quick breakdown of the most important pieces:

• Line 11 scales and clamps the calculated amplitude to the 16-bit signed integer range, which ex
32,767. Note the asymmetry of the extreme values.
• Lines 16 to 19 define an array of signed short integers and fill it with the interleaved samples fr
• Line 23 sets the sample width to two bytes, which corresponds to the 16-bit PCM encoding.
• Line 25 converts the array to a sequence of bytes and writes them to the file.

Recall that Python’s array relies on your computer’s native byte order. To have more granular contr
details, you can use an alternative solution based on a bytearray:

Python synth_stereo_16bits_bytearray.py

1 import math
2 import wave
3 from functools import partial
4
5 FRAMES_PER_SECOND = 44100
6
7 def sound_wave(frequency, num_seconds):
8 for frame in range(round(num_seconds * FRAMES_PER_SECOND)):
9 time = frame / FRAMES_PER_SECOND
10 amplitude = math.sin(2 * math.pi * frequency * time)
11 yield max(-32768, min(round(amplitude * 32768), 32767))
12
13 int16 = partial(int.to_bytes, length=2, byteorder="little", signed=True)
14
15 left_channel = sound_wave(440, 2.5)
16 right_channel = sound_wave(480, 2.5)
17
18 stereo_frames = bytearray()
19 for left_sample, right_sample in zip(left_channel, right_channel):
20 stereo_frames.extend(int16(left_sample))
21 stereo_frames.extend(int16(right_sample))
22
23 with wave.open("output.wav", mode="wb") as wav_file:
24 wav_file.setnchannels(2)
25 wav_file.setsampwidth(2)
26 wav_file.setframerate(FRAMES_PER_SECOND)
27 wav_file.writeframes(stereo_frames)

48 of 86
8 amplitude = np.sin(2 * np.pi * frequency * time)
9 return np.clip(
10 np.round(amplitude * 32768),
11 -32768,
12 32767,
13 ).astype("<h")
14
15 left_channel = sound_wave(440, 2.5)
16 right_channel = sound_wave(480, 2.5)
17 stereo_frames = np.dstack((left_channel, right_channel)).flatten()
18
19 with wave.open("output.wav", mode="wb") as wav_file:
20 wav_file.setnchannels(2)
21 wav_file.setsampwidth(2)
22 wav_file.setframerate(FRAMES_PER_SECOND)
23 wav_file.writeframes(stereo_frames)

Thanks to NumPy’s vectorized operations, you eliminate the need for explicit looping in your sound
you generate an array of time instants measured in seconds and then pass that array as input to the n
computes the corresponding amplitude values. Later, on line 17, you stack and interleave the two ch
signal ready to be written into your WAV file.

The conversion of integers to 16-bit PCM samples gets accomplished by calling NumPy’s .astype() m
"<h" as an argument, which is the same as np.int16 on little-endian platforms. However, before doi
clip your amplitude values to prevent NumPy from silently overflowing or underflowing, which could
WAV file.

Doesn’t this code look more compact and readable than the other two versions? From now on, you’ll
remaining part of this tutorial.

That said, encoding the PCM samples by hand gets pretty cumbersome because of the necessary am
conversion involved. Python’s lack of signed bytes sometimes makes it even more challenging. Anyw
addressed the 24-bit PCM encoding, which requires special handling. In the next section, you’ll stream
wrapping it in a convenient abstraction.

Decipher the PCM-Encoded Audio Samples


This part is going to get slightly more advanced, but it’ll make working with WAV files in Python much
long run. Afterward, you’ll be able to build all kinds of cool audio-related applications, which you’ll e
few sections. Along the way, you’ll leverage modern Python features, such as enumerations, data cla

By the end of this tutorial, you should have a custom Python package called waveio consisting of the

49 of 86
Remove ads

Enumerate the Encoding Formats


Use your favorite IDE or code editor to create a new Python package and name it waveio. Then, defin
inside your package and populate it with the following code:

Python waveio/encoding.py

from enum import IntEnum

class PCMEncoding(IntEnum):
UNSIGNED_8 = 1
SIGNED_16 = 2
SIGNED_24 = 3
SIGNED_32 = 4

The PCMEncoding class extends IntEnum, which is a special kind of enumeration that combines the En
built-in int data type. As a result, each member of this enumeration becomes synonymous with an in
directly use in logical and arithmetic expressions:

Python

>>> from waveio.encoding import PCMEncoding


>>> 2 == PCMEncoding.SIGNED_16
True
>>> 8 * PCMEncoding.SIGNED_16
16

This will become useful for finding the number of bits and the range of values supported by an encod

The values 1 through 4 in your enumeration represent the number of bytes occupied by a single aud
format. For instance, the SIGNED_16 member has a value of two because the corresponding 16-bit PC
two bytes per audio sample. Thanks to that, you can leverage the sampwidth field of a WAV file’s head
suitable encoding:

Python

>>> PCMEncoding(2)
<PCMEncoding.SIGNED_16: 2>

Passing an integer value representing the desired sample width through the enumeration’s construct
encoding instance.

50 of 86
@property
def num_bits(self):
return 8 * self

To find the maximum value representable on the current encoding, you first check if self, which sign
per sample, compares equal to one. When it does, it means that you’re dealing with an 8-bit unsigne
value is 255. Otherwise, the maximum value of a signed integer is the negative of its minimum value

Note: This stems from the asymmetry of the binary two’s complement representation of integers i
example, if the minimum value of a signed short is equal to -32,768, then its maximum value mu

The minimum value of an unsigned integer is always zero. On the other hand, finding the minimum v
requires the knowledge of the number of bits per sample. You calculate it in yet another property by
bytes, which is represented by self, by the eight bits in each byte. Then, you take the negative of two
number of bits minus one.

Note that to fit the code of each of the .max and .min properties on a single line, you use Python’s co
is the equivalent of the ternary conditional operator in other programming languages.

Now, you have all the building blocks required to decode audio frames into numeric amplitudes. At t
raw bytes loaded from a WAV file into meaningful numeric amplitudes in your Python code.

Convert Audio Frames to Amplitudes


You’ll use NumPy to streamline your code and make the decoding of PCM samples more performant
ahead and add a new method to your PCMEncoding class, which will handle all four encoding formats

Python waveio/encoding.py

51 of 86
To handle the different encodings, you branch out your code using structural pattern matching with
case soft keywords introduced in Python 3.10. When none of your enumeration members match the
a TypeError exception with a suitable message.

Also, you use the Ellipsis (...) symbol as a placeholder in each branch to prevent Python from rais
empty code block. Alternatively, you could’ve used the pass statement to achieve a similar effect. So
placeholders with code tailored to handling the relevant encodings.

Start by covering the 8-bit unsigned integer PCM encoding in your first branch:

Python waveio/encoding.py

from enum import IntEnum

import numpy as np

class PCMEncoding(IntEnum):
# ...

def decode(self, frames):


match self:
case PCMEncoding.UNSIGNED_8:
return np.frombuffer(frames, "u1") / self.max * 2 - 1
case PCMEncoding.SIGNED_16:
...
case PCMEncoding.SIGNED_24:
...
case PCMEncoding.SIGNED_32:
...
case _:
raise TypeError("unsupported encoding")

In this code branch, you turn the binary frames into a one-dimensional NumPy array of signed floati
ranging from -1.0 to 1.0. By calling np.frombuffer() with its second positional parameter (dtype) se
NumPy to interpret the underlying values as one-byte long unsigned integers. Then, you normalize a
samples, which causes the result to become an array of floating-point values.

Note: NumPy offers several ways to declare the same data type. Some of these alternatives are mo
but use different notations, which can become confusing. For example, to specify a value of the si
C, you’d choose one of the following:

• Built-in Alias: np.short or np.int16


• Character Code: "h"

52 of 86
...
>>> amplitudes
array([ 0.00392157, 0.05882353, 0.12156863, ..., -0.18431373,
-0.12156863, -0.05882353])

You open the 8-bit mono file sampled at 44.1 kHz. After reading its metadata from the file’s header an
into memory, you instantiate your PCMEncoding class and call its .decode() method on the frames. A
array of scaled amplitude values.

Decoding the 16-bit and 32-bit signed integer samples works similarly, so you can implement both in

Python waveio/encoding.py

from enum import IntEnum

import numpy as np

class PCMEncoding(IntEnum):
# ...

def decode(self, frames):


match self:
case PCMEncoding.UNSIGNED_8:
return np.frombuffer(frames, "u1") / self.max * 2 - 1
case PCMEncoding.SIGNED_16:
return np.frombuffer(frames, "<i2") / -self.min
case PCMEncoding.SIGNED_24:
...
case PCMEncoding.SIGNED_32:
return np.frombuffer(frames, "<i4") / -self.min
case _:
raise TypeError("unsupported encoding")

This time, you normalize the samples by dividing them by their corresponding maximum magnitud
correction, as signed integers are already centered around zero. However, there’s a slight asymmetry
a greater absolute value than the maximum, which is why you take the negative minimum instead of
factor. This results in a right-open interval from -1.0 inclusive to 1.0 exclusive.

Finally, it’s time to address the elephant in the room: decoding audio samples represented as 24-bit s

Remove ads

Interpret the 24-Bit Depth of Audio


One way to tackle the decoding of 24-bit signed integers is by repeatedly calling Python’s int.from

53 of 86
)
for i in range(0, len(frames), 3)
)
return np.fromiter(samples, "<i4") / -self.min
case PCMEncoding.SIGNED_32:
return np.frombuffer(frames, "<i4") / -self.min
case _:
raise TypeError("unsupported encoding")

First, you create a generator expression that iterates over the byte stream in steps of three, correspon
each audio sample. You convert such a byte triplet to a Python int, specifying the byte order and the
Next, you pass your generator expression to NumPy’s np.fromiter() function and use the same scal

It gets the job done and looks fairly readable but may be too slow for most practical purposes. Here’s
implementations based on NumPy, which is significantly more efficient:

Python waveio/encoding.py

from enum import IntEnum

import numpy as np

class PCMEncoding(IntEnum):
# ...

def decode(self, frames):


match self:
case PCMEncoding.UNSIGNED_8:
return np.frombuffer(frames, "u1") / self.max * 2 - 1
case PCMEncoding.SIGNED_16:
return np.frombuffer(frames, "<i2") / -self.min
case PCMEncoding.SIGNED_24:
triplets = np.frombuffer(frames, "u1").reshape(-1, 3)
padded = np.pad(triplets, ((0, 0), (0, 1)), mode="constant")
samples = padded.flatten().view("<i4")
samples[samples > self.max] += 2 * self.min
return samples / -self.min
case PCMEncoding.SIGNED_32:
return np.frombuffer(frames, "<i4") / -self.min
case _:
raise TypeError("unsupported encoding")

The trick here is to reshape your flat array of bytes into a two-dimensional matrix comprising three c
the consecutive bytes of a sample. When you specify -1 as one of the dimensions, NumPy automatic
that dimension based on the length of the array and the size of the other dimension.

54 of 86
>>> sounds_folder = Path("/path/to/your/downloads/folder")
>>> filenames = [
... "44100_pcm08_mono.wav",
... "44100_pcm16_mono.wav",
... "44100_pcm24_mono.wav",
... "44100_pcm32_mono.wav",
... ]

>>> for filename in filenames:


... with wave.open(str(sounds_folder / filename)) as wav_file:
... metadata = wav_file.getparams()
... frames = wav_file.readframes(metadata.nframes)
... encoding = PCMEncoding(metadata.sampwidth)
... print(encoding.decode(frames))
...
[ 0.00392157 0.05882353 0.12156863 ... -0.18431373 -0.12156863 -0.05882353]
[ 0. 0.06265259 0.12506104 ... -0.18695068 -0.12506104 -0.06265259]
[ 0. 0.0626483 0.12505054 ... -0.18696141 -0.12505054 -0.0626483 ]
[ 0. 0.06264832 0.12505052 ... -0.18696144 -0.12505052 -0.06264832]

Right off the bat, you can tell that all four files represent the same sound despite using different bit d
strikingly similar, with only slight variations due to varying precision. The 8-bit PCM encoding is notic
the rest but still captures the overall shape of the same sound wave.

To keep the promise that your encoding module and its PCMEncoding class carry in their names, you
encoding part of this two-way conversion.

Encode Amplitudes as Audio Frames


Add the .encode() method to your class now. It’ll take the normalized amplitudes as an argument
with the encoded audio frames ready for writing to a WAV file. Here’s the method’s scaffolding, which
counterpart you implemented earlier:

Python waveio/encoding.py

55 of 86
format.

However, to encode your processed amplitudes into a binary format, you’ll not only need to impleme
minimum and maximum values, but also clamping to keep them within the allowed range of PCM va
you can call np.clip() to do the work for you:

Python waveio/encoding.py

from enum import IntEnum

import numpy as np

class PCMEncoding(IntEnum):
# ...

def _clamp(self, samples):


return np.clip(samples, self.min, self.max)

You wrap the call to np.clip() in a non-public method named ._clamp(), which expects the PCM au
argument. The minimum and maximum values of the corresponding encoding are determined by yo
the method delegates to.

Below are the steps for encoding the three standard formats, 8-bit, 16-bit, and 32-bit, which share co

Python waveio/encoding.py

56 of 86
In each case, you scale your amplitudes so that they use the entire range of PCM values. Additionally
shift that range to get rid of the negative part. Then, you clamp and convert the scaled samples to the
representation. This is necessary because scaling may result in values outside the allowed range.

Producing PCM values in the 24-bit encoding format is slightly more involved. As before, you can use
NumPy’s array reshaping to help align the data correctly. Here’s the first version:

Python waveio/encoding.py

from enum import IntEnum

import numpy as np

class PCMEncoding(IntEnum):
# ...

def encode(self, amplitudes):


match self:
case PCMEncoding.UNSIGNED_8:
samples = np.round((amplitudes + 1) / 2 * self.max)
return self._clamp(samples).astype("u1").tobytes()
case PCMEncoding.SIGNED_16:
samples = np.round(-self.min * amplitudes)
return self._clamp(samples).astype("<i2").tobytes()
case PCMEncoding.SIGNED_24:
return b"".join(
round(sample).to_bytes(3, byteorder="little", signed=True)
for sample in self._clamp(-self.min * amplitudes)
)
case PCMEncoding.SIGNED_32:
samples = np.round(-self.min * amplitudes)
return self._clamp(samples).astype("<i4").tobytes()
case _:
raise TypeError("unsupported encoding")

You use another generator expression, which iterates over the scaled and clamped samples. Each sam
integer value and converted to a bytes instance with a call to int.to_bytes(). Then, you pass your g
the .join() method of an empty bytes literal to concatenate the individual bytes into a longer sequ

And, here’s the optimized version of the same task based on NumPy:

Python waveio/encoding.py

57 of 86
.flatten()
.tobytes()
)
case PCMEncoding.SIGNED_32:
samples = np.round(-self.min * amplitudes)
return self._clamp(samples).astype("<i4").tobytes()
case _:
raise TypeError("unsupported encoding")

After scaling and clamping the input amplitudes, you make a memory view over your array of integer
sequence of unsigned bytes. Next, you reshape it into a matrix consisting of four columns and disrega
slicing syntax before flattening the array.

Now that you can decode and encode audio samples using various bit depths, it’s time to put your ne

Remove ads

Visualize Audio Samples as a Waveform


In this section, you’ll have an opportunity to implement the metadata and reader modules in your cu
When you combine them with the encoding module that you built previously, you’ll be able to plot a
persisted in a WAV file.

Encapsulate WAV File’s Metadata


Managing multiple parameters comprising the WAV file’s metadata can be cumbersome. To make you
can group them under a common namespace by defining a custom data class like the one below:

Python waveio/metadata.py

from dataclasses import dataclass

from waveio.encoding import PCMEncoding

@dataclass(frozen=True)
class WAVMetadata:
encoding: PCMEncoding
frames_per_second: float
num_channels: int
num_frames: int | None = None

This data class is marked as frozen, which means that you won’t be able to change the values of the
you create a new instance of WAVMetadata. In other words, objects of this class are immutable. This is

58 of 86
@dataclass(frozen=True)
class WAVMetadata:
encoding: PCMEncoding
frames_per_second: float
num_channels: int
num_frames: int | None = None

@property
def num_seconds(self):
if self.num_frames is None:
raise ValueError("indeterminate stream of audio frames")
return self.num_frames / self.frames_per_second

By defining a property, you’ll be able to access the number of seconds as if it were just another attrib
class.

Knowing how the audio frames translate to seconds lets you visualize the sound in the time domain.
waveform, you must load it from a WAV file first. Instead of using the wave module directly like before
another abstraction that you’ll build next.

Load All Audio Frames Eagerly


The relative simplicity of the wave module in Python makes it an accessible gateway to sound analys
Unfortunately, by exposing the low-level intricacies of the WAV file format, it makes you solely respon
binary data. This can quickly become overwhelming even before you get to solving more complex au

To help with that, you can build a custom adapter that will wrap the wave module, hiding the technic
WAV files. It’ll let you access and interpret the underlying sound amplitudes in a more user-friendly w

Go ahead and create another module named reader in your waveio package, and define the followin

Python waveio/reader.py

59 of 86
corresponding file for reading in binary mode using the wave module, and instantiates WAVMetadata

The two special methods, .__enter__() and .__exit__(), work in tandem by performing the setup
associated with the WAV file. They make your class a context manager, so you can instantiate it throu

Python

>>> from waveio.reader import WAVReader


>>> with WAVReader("44100_pcm08_mono.wav") as wav:
... print(wav.metadata)
...
WAVMetadata(
encoding=<PCMEncoding.UNSIGNED_8: 1>,
frames_per_second=44100,
num_channels=1,
num_frames=132300
)

The .__enter__() method returns the newly created WAVReader instance while .__exit__() ensure
properly closed before leaving the current block of code.

With this new class in place, you can conveniently access the WAV file’s metadata along with the PCM
this lets you convert the raw bytes into a long sequence of numeric amplitude levels.

When dealing with relatively small files, such as the recording of a bicycle bell, it’s okay to eagerly lo
and have them converted into a one-dimensional NumPy array of amplitudes. To do so, you can imp
method in your class:

Python waveio/reader.py

import wave

from waveio.encoding import PCMEncoding


from waveio.metadata import WAVMetadata

class WAVReader:
# ...

def _read(self, max_frames=None):


self._wav_file.rewind()
frames = self._wav_file.readframes(max_frames)
return self.metadata.encoding.decode(frames)

Because calling .readframes() on a Wave_read instance moves the internal pointer forward, you cal
you’ll read the file from the beginning in case you called your method more than once. Then, you dec

60 of 86
class WAVReader:
# ...

@cached_property
@reshape("rows")
def frames(self):
return self._read(self.metadata.num_frames)

@cached_property
@reshape("columns")
def channels(self):
return self.frames

You wrap the call to your internal ._read() method in a cached property named .frames, so that yo
most when you first access that property. The next time you access it, you’ll reuse the value remembe
second property, .channels, delegates to the first one, it applies a different parameter value to your

You can now add the following definition of the missing decorator:

Python waveio/reader.py

import wave
from functools import cached_property, wraps

from waveio.encoding import PCMEncoding


from waveio.metadata import WAVMetadata

def reshape(shape):
if shape not in ("rows", "columns"):
raise ValueError("shape must be either 'rows' or 'columns'")

def decorator(method):
@wraps(method)
def wrapper(self, *args, **kwargs):
values = method(self, *args, **kwargs)
reshaped = values.reshape(-1, self.metadata.num_channels)
return reshaped if shape == "rows" else reshaped.T
return wrapper

return decorator

# ...

It’s a parameterized decorator factory, which takes a string named shape, whose value can be eithe
Depending on the supplied value, it reshapes the NumPy array returned by the wrapped method into
channels. You use minus one as the size of the first dimension to let NumPy derive it from the numbe

61 of 86
Because this is a stereo audio, each item in .frames consists of a pair of amplitudes corresponding to
On the other hand, the individual .channels comprise a sequence of amplitudes for their respective
process them independently if needed.

Great! You’re now ready to visualize the complete waveform of each channel in your WAV files. Before
to add the following lines to the __init__.py file in your waveio package:

Python waveio/__init__.py

from waveio.reader import WAVReader

__all__ = ["WAVReader"]

The first will let you import the WAVReader class directly from the package, skipping the intermediate
variable __all__ holds a list of names available in a wildcard import.

Remove ads

Plot a Static Waveform Using Matplotlib


In this section, you’ll combine the pieces together to visually represent a waveform of a WAV file. By b
waveio abstractions and leveraging the Matplotlib library, you’ll be able to create graphs just like this

62 of 86
5
6 from waveio import WAVReader
7
8 def main():
9 args = parse_args()
10 with WAVReader(args.path) as wav:
11 plot(args.path.name, wav.metadata, wav.channels)
12
13 def parse_args():
14 parser = ArgumentParser(description="Plot the waveform of a WAV file")
15 parser.add_argument("path", type=Path, help="path to the WAV file")
16 return parser.parse_args()
17
18 def plot(filename, metadata, channels):
19 fig, ax = plt.subplots(
20 nrows=metadata.num_channels,
21 ncols=1,
22 figsize=(16, 9),
23 sharex=True,
24 )
25
26 if isinstance(ax, plt.Axes):
27 ax = [ax]
28
29 for i, channel in enumerate(channels):
30 ax[i].set_title(f"Channel #{i + 1}")
31 ax[i].set_yticks([-1, -0.5, 0, 0.5, 1])
32 ax[i].plot(channel)
33
34 fig.canvas.manager.set_window_title(filename)
35 plt.tight_layout()
36 plt.show()
37
38 if __name__ == "__main__":
39 main()

You use the argparse module to read your script’s arguments from the command line. Currently, the
positional argument, which is the path pointing to a WAV file somewhere on your disk. You use the pa
represent that file path.

Following the name-main idiom, you call your main() function, which is the script’s entry point, at th
parsing the command-line arguments and opening the corresponding WAV file with your WAVReader
plot its audio content.

The plot() function performs these steps:

63 of 86
7
8 from waveio import WAVReader
9
10 # ...
11
12 def plot(filename, metadata, channels):
13 fig, ax = plt.subplots(
14 nrows=metadata.num_channels,
15 ncols=1,
16 figsize=(16, 9),
17 sharex=True,
18 )
19
20 if isinstance(ax, plt.Axes):
21 ax = [ax]
22
23 time_formatter = FuncFormatter(format_time)
24 timeline = np.linspace(
25 start=0,
26 stop=metadata.num_seconds,
27 num=metadata.num_frames
28 )
29
30 for i, channel in enumerate(channels):
31 ax[i].set_title(f"Channel #{i + 1}")
32 ax[i].set_yticks([-1, -0.5, 0, 0.5, 1])
33 ax[i].xaxis.set_major_formatter(time_formatter)
34 ax[i].plot(timeline, channel)
35
36 fig.canvas.manager.set_window_title(filename)
37 plt.tight_layout()
38 plt.show()
39
40 def format_time(instant, _):
41 if instant < 60:
42 return f"{instant:g}s"
43 minutes, seconds = divmod(instant, 60)
44 return f"{minutes:g}m {seconds:02g}s"
45
46 if __name__ == "__main__":
47 main()

By calling NumPy’s linspace() on lines 24 to 28, you calculate the timeline comprising evenly distri
measured in seconds. The number of these time instants equals the number of audio frames, letting
line 34. As a result, the number of seconds on the horizontal axis matches the respective amplitude v

Additionally, you define a custom formatter function for your timeline and hook it up to the shared h

64 of 86
# ...

# ...

In this case, you pick a style sheet inspired by the visualizations found on the FiveThirtyEight website
opinion poll analysis. If this style is unavailable, then you can fall back on Matplotlib’s default style.

Go ahead and test your plotting script against mono, stereo, and potentially even surround sound f

Once you open a larger WAV file, the amount of information will make it difficult to examine fine deta
interface provides zooming and panning tools, it can be quicker to slice audio frames to the time ran
plotting. Later, you’ll use this technique to animate your visualizations!

Read a Slice of Audio Frames


If you have a particularly long audio file, then you can reduce the time it takes to load and decode the
skipping and narrowing down the range of audio frames of interest:

65 of 86
default=0.0,
help="start time in seconds (default: 0.0)",
)
parser.add_argument(
"-e",
"--end",
type=float,
default=None,
help="end time in seconds (default: end of file)",
)
return parser.parse_args()

# ...

When you don’t specify the start time with -s or --start, then your script assumes you want to begin
the very beginning at zero seconds. On the other hand, the default value for the -e or --end argumen
you’ll treat as the total duration of the entire file.

Note: You can supply negative values for either or both arguments to indicate offsets from the end
example, --start -0.5 would reveal the last half a second of the waveform.

Next, you’ll want to plot the waveforms of all audio channels sliced to the specified time range. You’l
inside a new method in your WAVReader class, which you can call now:

Python plot_waveform.py

# ...

def main():
args = parse_args()
with WAVReader(args.path) as wav:
plot(
args.path.name,
wav.metadata,
wav.channels_sliced(args.start, args.end)
)

# ...

You’ve essentially replaced a reference to the .channels property with a call to the .channels_slice
your new command-line arguments—or their default values—to it. When you omit the --start and -
script should work as before, plotting the entire waveform.

As you plot a slice of audio frames within your channels, you’ll also want to match the timeline with

66 of 86
At this point, you’re done with editing the plotting script. It’s time to update your waveio.reader mo

To allow for reading the WAV file from an arbitrary frame index, you can take advantage of the wave.W
.setpos() method instead of rewinding to the file’s beginning. Go ahead and replace .rewind() wit
parameter, start_frame, to your internal ._read() method, and give it a default value of None:

Python waveio/reader.py

# ...

class WAVReader:
# ...

@cached_property
@reshape("rows")
def frames(self):
return self._read(self.metadata.num_frames, start_frame=0)

# ...

def _read(self, max_frames=None, start_frame=None):


if start_frame is not None:
self._wav_file.setpos(start_frame)
frames = self._wav_file.readframes(max_frames)
return self.metadata.encoding.decode(frames)

When you call ._read() with some value for this new parameter, you’ll move the internal pointer to t
you’ll commence from the last known position, which means that the function won’t rewind the poin
Therefore, you must explicitly pass zero as the starting frame when you read all frames eagerly in the

Now, define the .channels_sliced() method that you called before in your plotting script:

Python waveio/reader.py

67 of 86
specify the end one, it defaults to the total duration of the file.

Inside that method, you create a slice() object of the corresponding frame indices. To account for n
the slice into a range() object by providing the total number of frames. Then, you use that range to r
audio frames into amplitude values.

Before returning from the function, you combine the sliced amplitudes and the corresponding range
wrapper for the NumPy array. This wrapper will behave just like a regular array, but it’ll also expose t
that you can use to calculate the correct timeline for plotting.

This is the wrapper’s implementation that you can add to the reader module before your WAVReader

Python waveio/reader.py

# ...

class ArraySlice:
def __init__(self, values, frames_range):
self.values = values
self.frames_range = frames_range

def __iter__(self):
return iter(self.values)

def __getattr__(self, name):


return getattr(self.values, name)

def reshape(self, *args, **kwargs):


reshaped = self.values.reshape(*args, **kwargs)
return ArraySlice(reshaped, self.frames_range)

@property
def T(self):
return ArraySlice(self.values.T, self.frames_range)

# ...

Whenever you try to access one of NumPy array’s attributes on your wrapper, the .__getattr__() m
.values field, which is the original array of amplitude levels. The .__iter__() method makes it poss
wrapper.

Unfortunately, you also need to override the array’s .reshape() method and the .T property, which
decorator. That’s because they return plain NumPy arrays instead of your wrapper, effectively erasing
about the range of frame indices. You can address this problem by wrapping their results again.

68 of 86
For testing purposes, you can grab a large enough WAV file from an artist who goes by the name Wae
platforms:

Hi! I’m Waesto, a music producer creating melodic music for content creators to use here on YouT
media. You may use the music for free as long as I am credited in the description! (Source)

To download Waesto’s music in the WAV file format, you need to subscribe to the artist’s YouTube cha
other social media. Their music is generally copyrighted but free to use as long as it’s credited approp

You’ll find a direct download link and the credits in each video description on YouTube. For example,
songs, entitled Sleepless, provides a link to the corresponding WAV file hosted on the Hypeddit platfo
uncompressed PCM stereo audio sampled at 44.1 kHz, which weighs in at about forty-six megabytes.

Feel free to use any of Waesto’s tracks or a completely different WAV file that appeals to you. Just rem
large one.

Animate the Waveform Graph in Real Time


Instead of plotting a static waveform of the whole or a part of a WAV file, you can use the sliding wind
small segment of the audio as it plays. This will create an interesting oscilloscope effect by updating

69 of 86
Animating the Waveform With the Oscilloscope Effect

Such a dynamic representation of sound is an example of music visualization akin to the visual effect
Winamp player.

You don’t need to modify your WAVReader object, which already provides the means necessary to jum
WAV file. As earlier, you’ll create a new script file to handle the visualization. Name the script plot_os
with the source code below:

Python plot_oscilloscope.py

70 of 86
help="sliding window size in seconds",
)
return parser.parse_args()

def slide_window(window_seconds, wav):


...

def animate(filename, seconds, windows):


...

if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("Aborted")

The overall structure of the script remains analogous to the one you created in the previous section.
command-line arguments including the path to a WAV file and an optional sliding window’s duration
milliseconds. The shorter the window, the fewer amplitudes will show up on the screen. At the same
become smoother by refreshing each frame more often.

Note: There are two placeholder functions, animate() and slide_window(), which you’ll implem

After opening the specified WAV file for reading, you call animate() with the filename, the window’s d
evaluated sequence of windows obtained from slide_window(), which is a generator function:

Python plot_oscilloscope.py

71 of 86
window, it determines where it begins and ends on the timeline to slice the audio frames appropriate
average amplitude from all channels at the given time instant.

The animation part consumes your generator of windows and plots each of them with a slight delay:

Python plot_oscilloscope.py

from argparse import ArgumentParser


from pathlib import Path

import matplotlib.pyplot as plt


import numpy as np

from waveio import WAVReader

# ...

def animate(filename, seconds, windows):


try:
plt.style.use("dark_background")
except OSError:
pass # Fall back to the default style

fig, ax = plt.subplots(figsize=(16, 9))


fig.canvas.manager.set_window_title(filename)

plt.tight_layout()
plt.box(False)

for window in windows:


plt.cla()
ax.set_xticks([])
ax.set_yticks([])
ax.set_ylim(-1.0, 1.0)
plt.plot(window)
plt.pause(seconds)

# ...

First, you pick Matplotlib’s theme with a dark background for a more dramatic visual effect. Then, yo
Axes object, remove the default border around your visualization, and iterate over the windows. On e
plot, hide the ticks, and fix the vertical scale to keep it consistent across all windows. After plotting th
pause the loop for the duration of a single window.

Here’s how you can start the animation from the command line to visualize the downloaded WAV file

Shell

72 of 86
Animating the Spectrogram of a WAV File

The width of each vertical bar corresponds to a range of frequencies or a frequency band, with its he
energy level within that band at any given moment. Frequencies increase from left to right, with low
on the left side of the spectrum and higher frequencies toward the right.

Note: To achieve such an engaging effect in Python, you’ll need to calculate the short-term Fourie
sequence of Discrete Fourier Transforms (DFT) computed separately for each audio segment enclo
window. You’ll use NumPy to leverage the Fast Fourier Transform (FFT) algorithm for improved pe

That said, calculating FFT remains a computationally demanding task. So, if your animation ends
try to reduce the number of frequency bands by decreasing the sliding window’s size.

Now copy the entire source code from plot_oscilloscope.py and paste it into a new script named p
which you’ll modify to create a new visualization of the WAV file.

Because you’ll be calculating the FFT of short audio segments, you’ll want to overlap adjacent segme
leakage caused by abrupt discontinuities at the edges. They introduce phantom frequencies into the
in the actual signal. An overlay of fifty percent is a good starting point, but you can make it configur
command-line argument:

Python plot_spectrogram.py

73 of 86
# ...

The --overlap argument’s value must be an integer number between zero inclusive and one hundre
percentage. The greater the overlap, the smoother the animation will appear.

Note: Unlike the waveform, a spectrum visualization is much more resource-intensive to compute
the window duration to as little as a few milliseconds! That’s why you’ve adjusted the default valu
argument.

You can now modify your slide_window() function to accept that overlap percentage as an addition

Python plot_spectrogram.py

# ...

def slide_window(window_seconds, overlap_percentage, wav):


step_seconds = window_seconds * (1 - overlap_percentage / 100)
num_windows = round(wav.metadata.num_seconds / step_seconds)
for i in range(num_windows):
begin_seconds = i * step_seconds
end_seconds = begin_seconds + window_seconds
channels = wav.channels_sliced(begin_seconds, end_seconds)
yield np.mean(tuple(channels), axis=0)

# ...

Instead of moving the window by its whole duration as before, you introduce a step that can be smal
number of windows in total. On the other hand, when the overlap percentage is zero, you arrange the
other without any overlap between them.

You can now pass the overlap requested at the command line to your generator function, as well as t

Python plot_spectrogram.py

74 of 86
2
3 def fft(windows, wav):
4 sampling_period = 1 / wav.metadata.frames_per_second
5 for window in windows:
6 frequencies = np.fft.rfftfreq(window.size, sampling_period)
7 magnitudes = np.abs(
8 np.fft.rfft(
9 (window - np.mean(window)) * np.blackman(window.size)
10 )
11 )
12 yield frequencies, magnitudes
13
14 # ...

There’s a lot going on, so it’ll help to break this function down line by line:

• Line 4 calculates the sampling period, which is the time interval between audio samples in the
frame rate.
• Line 6 uses your window size and the sampling period to determine the frequencies in the audi
frequency in the spectrum will be exactly half the frame rate due to the Nyquist–Shannon theor
• Lines 7 to 11 find the magnitudes of each frequency determined on line 6. Before performing th
wave amplitudes, you subtract the mean value, often referred to as the DC bias, which is the no
representing the zero-frequency term in the Fourier transform. Additionally, you apply a windo
the window’s edges even more.
• Line 12 yields a tuple comprising the frequencies and their corresponding magnitudes.

Lastly, you must update your animation code to draw a bar plot of frequencies at each sliding window

Python plot_spectrogram.py

75 of 86
You pass the overlap percentage in order to adjust the animation’s delay between subsequent frame
frequencies and their magnitudes, plotting them as vertical bars spaced out by the desired gap. You a
accordingly.

Run the command below to kick off the spectrogram’s animation:

Shell

$ python plot_spectrogram.py file.wav --seconds 0.001 --overlap 95

Play around with the sliding window’s duration and the overlap percentage to see how they affect yo
bored and are thirsty for more practical uses of the wave module in Python, you can step up your gam
challenging!

Record an Internet Radio Station as a WAV File


Up until now, you’ve been using abstractions from your waveio package to read and decode WAV file
allowed you to focus on higher-level tasks. It’s now time to add the missing piece of the puzzle and im
type’s counterpart. You’ll make a lazy writer object capable of writing chunks of audio data into a W

For this task, you’ll undertake a hands-on example—stream ripping an Internet radio station to a loca

Note: Always check the terms of service to see if it’s legal to record a particular live stream, especi
content.

If you have a premium account on di.fm, then you can use your unique security token to connect t
server using third-party software. That includes your custom Python script. Alternatively, you mig
radio station that permits recording of their broadcast for personal and non-commercial use. For e
list of NPR stations.

To simplify connecting to an online stream, you’ll use a tiny helper class to obtain audio frames in rea
collapsible section below to reveal the source code and instructions on using the RadioStream class

Source code for the RadioStream class

Now, create the writer module in your waveio package and use the code below to implement the fu
writing audio frames into a new WAV file:

Python waveio/writer.py

76 of 86
self._wav_file.writeframes(frames)

The WAVWriter class takes a WAVMetadata instance and a path to the output WAV file. It then opens th
mode and uses the metadata to set the appropriate header values. Note that the number of audio fra
this stage so instead of specifying it, you let the wave module update it later when the file’s closed.

Just like the reader, your writer object follows the context manager protocol. When you enter a new
keyword, the new WAVWriter instance will return itself. Conversely, exiting the context will ensure tha
closed even if an error occurs.

After creating an instance of WAVWriter, you can add a chunk of data to your WAV file by calling .appe
dimensional NumPy array of channels as an argument. The method will reshape the channels into a
values and encode them using the format specified in the metadata.

Remember to add the following import statement to your waveio package’s __init__.py file before

Python waveio/__init__.py

from waveio.reader import WAVReader


from waveio.writer import WAVWriter

__all__ = ["WAVReader", "WAVWriter"]

This enables direct importing of the WAVWriter class from your package, bypassing the intermediate

Finally, you can connect the dots and build your very own stream ripper in Python:

Python record_stream.py

77 of 86
)
return parser.parse_args()

if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("Aborted")

You open a radio stream using the URL provided as a command-line argument and use the obtained
WAV file. Usually, the first chunk of the stream contains information such as the media container form
number of channels, and bit depth. Next, you loop over the stream and append each decoded chunk
file, capturing the volatile moment of a live broadcast.

Here’s an example command showing how you can record the Classic EuroDance channel:

Shell

$ RADIO_URL=http://prem2.di.fm:80/classiceurodance?your-secret-token
$ python record_stream.py "$RADIO_URL" --output ripped.wav

Note that you won’t see any output while the program’s running. To stop recording the chosen radio
hit ⌃ Ctrl + C on Linux and Windows or ⌘ Cmd + C if you’re on macOS.

While you can write a WAV file in chunks now, you haven’t actually implemented proper logic for the
before. Even though you can load a slice of audio data delimited by the given timestamps, it isn’t the
sequence of fixed-size chunks in a loop. You’ll have the opportunity to implement such a chunk-base
what comes next.

Widen the Stereo Field of a WAV File


In this section, you’ll simultaneously read a chunk of audio frames from one WAV file and write its mo
file in a lazy fashion. To do that, you’re going to need to enhance your WAVReader object by adding th

Python waveio/reader.py

78 of 86
frames to read.

Note: If a chunk was a regular Python list or another sequence type, then you could simplify your
Walrus operator (:=). More specifically, you could directly evaluate the chunk as the loop’s continu

Python

while chunk := self._read(max_frames):

This relies on the implicit conversion of an expression to a Boolean value in a logical context. Pyth
collections as falsy while non-empty ones as truthy. However, that’s not the case for NumPy arrays
behavior ambiguous and forbid it altogether.

Like most other methods and properties in this class, .channels_lazy() is decorated with @reshape
amplitudes in a more convenient way. Unfortunately, this decorator acts on a NumPy array, whereas
generator object. To make them compatible, you must update the decorator’s definition by handling

Python waveio/reader.py

79 of 86
reshaped = values.reshape(-1, self.metadata.num_channels)
return reshaped if shape == "rows" else reshaped.T
return wrapper

return decorator

# ...

You use the inspect module to determine if your decorator wraps a regular method or a generator m
the same thing, but the generator wrapper yields the reshaped values in each iteration, while the reg
returns them.

Lastly, you can add another property that’ll tell you whether your WAV file is a stereo one or not:

Python waveio/reader.py

# ...

class WAVReader:
# ...

@cached_property
def stereo(self):
return 2 == self.metadata.num_channels

# ...

# ...

It checks if the number of channels declared in the file’s header is equal to two.

With these changes in place, you can read your WAV files in chunks and start applying various sound
can widen or narrow the stereo field of an audio file to enhance or diminish the sensation of space. O
converting a conventional stereo signal comprising the left and right channels into the mid and side

The mid channel (M) contains a monophonic component that’s common to both sides, while the sid
differences between the left (L) and right (R) channels. You can convert between both representatio
formulas:

Conversion Between the Mid-Side and Left-Right Channels

80 of 86
"--output",
dest="output_path",
required=True,
type=str,
help="path to the output WAV file",
)
parser.add_argument(
"-s",
"--strength",
type=float,
default=1.0,
help="strength (defaults to 1)",
)
return parser.parse_args()

if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("Aborted")

The --strength parameter is a multiplier for the side channel. Use a value greater than one to widen
between zero and one to narrow it.

Next, implement the channel conversion formulas as Python functions:

Python stereo_booster.py

from argparse import ArgumentParser

def main():
args = parse_args()

def convert_to_ms(left, right):


return (left + right) / 2, (left - right) / 2

def convert_to_lr(mid, side):


return mid + side, mid - side

# ...

Both take two parameters corresponding to either the left and right, or the mid and side channels wh
converted channels.

Finally, you can open a stereo WAV file for reading, loop through its channels in chunks, and apply th
processing:

81 of 86
The output WAV file has the same encoding format as the input file. After converting each chunk into
you convert them back to the left and right channels while boosting the side one only.

Notice that you append the modified channels as separate arguments now, whereas your radio recor
NumPy array of combined channels. To make the .append_channels() method work with both type
update your WAVWriter class as follows:

Python waveio/writer.py

import wave

import numpy as np

class WAVWriter:
# ...

def append_channels(self, *channels):


match channels:
case [combined] if combined.ndim > 1:
self.append_amplitudes(combined.T.reshape(-1))
case _:
self.append_amplitudes(np.dstack(channels).reshape(-1))

def append_amplitudes(self, amplitudes):


frames = self.metadata.encoding.encode(amplitudes)
self._wav_file.writeframes(frames)

# ...

# ...

You use structural pattern matching again to differentiate between a single multi-channel array and
arrays, reshaping them into a flat sequence of amplitudes for encoding accordingly.

Try boosting one of the sample WAV files, such as the bicycle bell sound, by a factor of five:

Shell

$ python stereo_booster.py -i Bicycle-bell.wav -o boosted.wav -s 5

Remember to choose a stereo sound, and for the best listening experience, use your external loudspe
headphones to play back the output file. Can you hear the difference?

Conclusion

82 of 86
Get Your Code: Click here to download the free sample code that shows you how to read and w

Take the Quiz: Test your knowledge with our interactive “Reading and Writing WAV Files in Pyt
score upon completion to help you track your learning progress:

Interactive Quiz
Reading and Writing WAV Files in Python
In this quiz, you can test your knowledge of handling WAV aud
module. By applying what you've learned, you'll demonstrate
analyze and visualize waveforms, create dynamic spectrogram
effects.

Mark as Completed Share

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of
days. No spam ever. Unsubscribe any time. Curated by the Real Python
team.

Email Address

Send Me Python Tricks »

About Bartosz Zaczyński

83 of 86
Each tutorial at Real Python is created by a team of developers so that it meets our high quality stan
worked on this tutorial are:

Aldren Brenda

Kate Martin

Master Real-World Python Skills With Unlimited Access t

Join us and get access to thousands of tutorials, hands-on video courses, and a comm
expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

84 of 86
Related Tutorials:
• Playing and Recording Sound in Python
• Bytes Objects: Handling Binary Data in Python
• Fourier Transforms With scipy.fft: Python Signal Processing
• The Ultimate Guide To Speech Recognition With Python
• Iterators and Iterables in Python: Run Efficient Iterations

Learn Python
Start Here
Tutorial Search
Code Mentor
Python Reference
Become a Member
Team Plans
Support Center

Courses & Paths


Learning Paths
Quizzes & Exercises
Browse Topics
Books

Community
Podcast
Newsletter
Community Chat
Office Hours
Workshops
Learner Stories

Company
Team

85 of 86
© 2012–2025 DevCademy Media Inc. DBA Real Python. All rights reserved.
REALPYTHON™ is a trademark of DevCademy Media Inc.

86 of 86

You might also like