0% found this document useful (0 votes)
32 views9 pages

Linear Predictive Coding in Speech Analysis

The project report discusses Linear Predictive Coding (LPC) as a technique for analyzing speech signals, highlighting its effectiveness in estimating speech parameters and its applications in speech recognition. The report details the implementation of LPC using Python's librosa library, including an algorithm for processing audio signals and generating synthetic speech. It concludes with results from various audio tests and notes limitations related to the processing of certain audio inputs.

Uploaded by

abdul94744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views9 pages

Linear Predictive Coding in Speech Analysis

The project report discusses Linear Predictive Coding (LPC) as a technique for analyzing speech signals, highlighting its effectiveness in estimating speech parameters and its applications in speech recognition. The report details the implementation of LPC using Python's librosa library, including an algorithm for processing audio signals and generating synthetic speech. It concludes with results from various audio tests and notes limitations related to the processing of certain audio inputs.

Uploaded by

abdul94744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS571 Project Report

Linear Predictive Coding

Rajesh R
S21005 - MS Scholar
23 November 2021

Page 1 of 9
Linear Predictive Coding

Linear predictive analysis of speech signals is one of the most powerful


speech analysis techniques. This method has become the predominant
technique for estimating the parameters of the discrete-time model for speech
production (i.e., pitch, formants, short-time spectra, vocal tract area functions)
and is widely used for representing speech in low bit rate transmission or storage
and for automatic speech and speaker recognition. The importance of this
method lies both in its ability to provide accurate estimates of the speech
parameters and in its relative ease of computation.1 [1]

Basic Idea & Principle

LPA of speech samples can be approximated as a linear combinations of


past speech signals. By minimising sum of squared di erences over nite
interval between the actual speech samples and linearly predicted ones, a
unique set of predictor coe cients can be determined.

S(z) G
H(z) = = p
U(z) 1 + ∑k=1 ak z −k

[1]

1Text Extracted from Pearson. Rabiner, Lawrence R. Schafer, Ronald W - Theory and applications of
digital speech processing-Pearson_Prentice Hall

Page 2 of 9
ffi
ff
fi
Speech samples s(n) are related to the excitation u(n) by simple di erence
equation,
p


s(n) = ak s(n − k) + Gu(n)
k=1

Between the pitch pulses Gu(n) is zero. Therefore, s(n) can be predicted
from a linearly weighted summation of past samples. However if Gu(n) is
included then we can predict s(n) only by approximation. If Gu(n) = 0 between
the pulses, we get,
s(n) = a1s(n − 1) + a2 s(n − 2) + . . . . . . + ap s(n − p)
That is the nth sample can be viewed as a linear combination of p past
samples.

Inverse Filtering

The aim is to determine the all zero lter which is the inverse of the vocal
tract model. Inverse ltering of s(n) , to give the error signal, e(n) , and direct
ltering of the error signal, e(n), to reconstruct the original speech signal, s(n).

[1]

Energy = e 2(n) and goal is to minimise the e 2(n) . Note that inverse of all
pole lter is all zero lter.

Linear Prediction Model


p
̂ =

s(n) αk s(n − k) where αk is the predictor coe cients. The error
k=1

̂ is given by,
between the actual signal s(n) and predicted values s(n)
p
̂ = s(n)−

e(n) = s(n)− s(n) αk s(n − k)
k=1

Page 3 of 9
fi
fi
fi
fi
fi
ffi
ff
[1]

The gure2 shows the linear predictor model for reconstructing the original
signal back. From the block diagram,

S(z) G
H(z) = = p Vocal tract (IIR lter)
U(z) 1 − ∑k=1 ak z −k
p
E(z)
ak z −k is Linear Predictor (FIR)

=1−
S(z) k=1
p
E(z) 1 − ∑k=1 ak z −k
=G p =G
U(z) 1 − ∑k=1 ak z −k

E(z)
= G →E(z) = GU(z)
U(z)

Therefore, e(n) = Gu(n) . Ideallt we need a technique to produce the


coe cients of the speech production model αk . If we determine the correct
coe cients, then the error signal e(n) = Gu(n) and the linear predictor is called
an Inverse lter.
The transfer function of the inverse lter is given by,
p
E(z)
ak z −k

A(z) = =1−
S(z) k=1

A by-product of the LPC analysis is the generation of the error signal e(n)
and it is a good approximation to the excitation source. It is expected that the
prediction error e(n) will be large for voiced speech at the beginning of each
pitch period. Thus the pitch period can be determined by detecting the positions

2Figure took from Pearson. Rabiner, Lawrence R. Schafer, Ronald W - Theory and applications of digital
speech processing-Pearson_Prentice Hall

Page 4 of 9
ffi
ffi
fi
fi
fi
fi
of the samples of e(n) which are large & de ning the period as the di erence
between pairs of sample of e(n) which exceed a reasonable threshold.
Alternatively the pitch period can be determined by performing an
autocorrelation analysis on e(n) and detecting the largest peak in the
appropriate range. Another way of interpreting why error signal is valuable for the
pitch detection is the observation that the spectrum of the error signal is
approximately at, thus the e ects of formants have been eliminated in the error
signal.
In conclusion, we can say that except for a sample at the beginning of each
pitch period, every sample of the voiced speech waveform can be predicted
from the past p samples.

For voiced speech e(n) would consist of a train of impulses. e(n) would be
most of the time except at the beginning of the pitch period. The prediction is not
valid at instants of time where the input pulses occur. The prediction error
e(n) = Gu(n) would be scaled pulse train for voiced speech frames and scaled
random noise for unvoiced speech frames. Because of the time varying nature of
the speech the predictor coe cients should be estimated from short segments
of speech signal.

Implementation of LPC

Representing spectral envelope of a digital signal of speech in compressed


form and encoding good quality speech at low bit rate.
We have used short time processing of speech signals and segregated into
frames and applied LPC analysis using the inbuilt python module [Link]()
and plotted its LPC Spectrum with its DFT spectrum. Later obtaining the Inverse
lter. Using this inverse lter, we can get the residual signal e(n). Using residual
signal with LPC lter we get synthetic speech frames and nally we are getting
compressed signal by overlap add method.

Page 5 of 9
fi
fl
fi
fi
ff
ffi
fi
fi
ff
Algorithm

1. Imports various useful packages


2. input_speech, fs = [Link]([Link])
3. Window = [Link](fs, False)
4. function frameblocks(signal, window, o=0.5):
#creates input speech into frames
returns frames
5. For each frame:
dft_frame = numpy. t. t(frame)
plot numpy.log10(abs(dft_frame))
lpc_coe = librosa(frame, order) #Returns order+1 length vector
w, h = [Link](lpc_coe , 1)
plot(w, numpy.log10(abs(h))
poles = 1
residual = signal.l lter(coe cients, poles, frame) #Inverse lter
synthetic = signal.l lter([1], coe cients, residual)
return all synthetic frames
6. function add_frame_blocks(signal, windows o=0.5)
#recreates the signal
return output signal
7. [Link](output_signal, fs)
8. end

Page 6 of 9
ff
fi
fi
ff
ff
ff
ffi
ffi
fi
Results & Analysis

(1) Tried with the human voice audio [Link] and here are the results:

Fig (1) The LPC Envelope & DFT of frame 12. Fig (2) The LPC Envelope & DFT of frame 15

(2) Tried with band music audio [Link] le and the results are:

Fig (1) The LPC Envelope & DFT of frame 15. Fig (2) The LPC Envelope & DFT of frame 100

Page 7 of 9
fi
(2) Tried with sinusoidal generated audio le and the results are:

Fig (1) The LPC Envelope & DFT of frame 10. Fig (2) The LPC Envelope & DFT of frame 100

Conclusion

I have created a user friendly simple UI to perform all these tasks and the
user manual is attached. The code works with all the di erent types of audio
signals. I have learned to write code on myself to implement LPC and to perform
LPA to estimate all kinds of its speech parameters. I have also managed to
create and UI by this project.
Since I have used librosa module to estimate LPC coe cients, the module
themselves have a aw and they failed to process audio which resulted in the ill-
conditioned system. So, this project also not supports that type of audio inputs.
This limitations can be handled by using powerful computational techniques.

GitHub Link
Have a look at the source code, user manual, results obtained and various
resources that used in the project
[Link]
Signals

Page 8 of 9
fl
fi
ffi
ff
References

1. Pearson. Rabiner, Lawrence R. Schafer, W Ronald - “Theory and


applications of digital speech processing” - Pearson_Prentice Hall

Page 9 of 9

You might also like