0% found this document useful (0 votes)
34 views24 pages

Firs DSP

The document discusses digital signal processing concepts like data compression, biomedical signal processing, military applications, and human language technologies. It also provides MATLAB code examples for window functions, linear predictive coding, speech production, and filter design.

Uploaded by

ber allam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views24 pages

Firs DSP

The document discusses digital signal processing concepts like data compression, biomedical signal processing, military applications, and human language technologies. It also provides MATLAB code examples for window functions, linear predictive coding, speech production, and filter design.

Uploaded by

ber allam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DSP-1 Assignment-1

Question #1:
a) Data compression is done by eliminating the statistical redundancy of the
signal and it’s either lossy compression by eliminating unnecessary or less
important information and of higher compression rates or lossless
compression in which all the information could be retrieved and of lesser
compression rates. A device that compresses the signal is the encoder and
the one to retrieve the signal is the decoder. For Image, 10:1 to 50:1
compression rate could be achieved. For Video, and benefiting of the
sequential exhibition of data, a very high aspect ratio of 2000:1 could be
achieved. Image compression is very important for image storage
(educational and business documents, medical images, weather maps and
fingerprints) and image transmission (remote sensing via satellite, military
communication via radar, teleconferencing and FAX). And video
compression is basically used worldwide in surveillance systems where its
tradeoff is to identify and record only the interesting video frames
continuously and high compression rates are required.

b) DSP helps us exhibit many bioelectrical signals like: ECG, EMG, GSR,
EOG AND BRAIN SIGNALS. It’s also essential for medical diagnosis via
medical imaging like: MRI, CT Scans, X-rays, Tomography, Ultrasound and
Electron Microscopy. Medical Imaging is a way to construct an image from
measurements which helps to give a basis tool for disease diagnosis.

c) Military application: In RADAR systems by sending signals that travels


back to the source when hitting an object calculating its distance and in
Sonar Processing using matched filtering for the detection of humans in IR
Imagery as Target detection. And also, as an automatic target detection using
optical or infrared cameras.and pilotless aircraft for war purposes.

1|Page
d) Human Language Technologies are influenced by DSP in a major way in
many applications like:

Automatic speech recognition, optical character recognition and converting


text to speech applications.

Question #2:
Verification involves classifying the person’s face and comparing it with a picture
in the database and making sure he is who he claims to be so it’s a one-to-one
comparison while identification is concerned with the person’s identity(name) and
searches within the list in the database to make sure he exists so it’s a one-to-many
comparison.

Question #3:
Using MATLAB code shown below:
M = 64; n = 0:M-1;
window0 = ones(M,1);
window1 = 0.5*(1-cos(2*pi*n/M));
window2 = 54 - 0.46*cos(2*pi*n/M);
figure;
plot (n,window0,'r');
hold;
plot (n,window1,'b');
plot (n,window2,'y');
NFFT = 1024;
X1=fft(window0,NFFT);
X2=fft(window1,NFFT);
X3=fft(window2,NFFT);

Freq = (0:NFFT-1)/NFFT;
figure;

2|Page
plot(Freq, 10*log10(abs(X1(1:length(Freq)))),'r');
plot(Freq, 10*log10(abs(X2(1:length(Freq)))),'b');
plot(Freq, 10*log10(abs(X3(1:length(Freq)))),'y');

Figure 1: shows the three in time domain

Figure 2: shows the three at M=64

3|Page
Figure 3: shows the three at M=200

The red one is the rectangular window, the blue one is the hanning window and the
yellow one is the hamming window.

The worst one is the rectangular as its amplitude does not settle to zero in the
middle, the other two are better for the most of the signal is located at the first few
high amplitude of the signal as it tends to zero very fast at every M samples. And it
seems that the hanning is better than the hamming.

4|Page
Question #4:

How can we produce speech? Speech is produced by an air stream from the lungs, which goes
through the trachea and the oral and nasal cavities. It involves four processes: Initiation,
phonation, oro-nasal process and articulation. The initiation process is the moment when the air
is expelled from the lungs. The phonation process occurs at the larynx. The larynx has two
horizontal folds of tissue in the passage of air; they are the vocal folds. The gap between these
folds is called the glottis.
The glottis can be closed. Then, no air can pass. Or it can have a narrow opening which can
make the vocal folds vibrate producing the “voiced sounds”. Finally, it can be wide open, as in
normal breathing, and, thus, the vibration of the vocal folds is reduced, producing the “voiceless
sounds”. After it has gone through the larynx and the pharynx, the air can go into the nasal or

5|Page
the oral cavity. The velum is the part responsible for that selection. Through the oro-nasal
process we can differentiate between the nasal consonants (/m/, /n/) and other sounds.
Finally, the articulation process is the most obvious one: it takes place in the mouth and it is the
process through which we can differentiate most speech sounds. In the mouth we can
distinguish between the oral cavity, which acts as a resonator, and the articulators, which can
be active or passive: upper and lower lips, upper and lower teeth, tongue (tip, blade, front, back)
and roof of the mouth (alveolar ridge, palate and velum). So, speech sounds are distinguished
from one another in terms of the place where and the manner how they are articulated.

Question #5:
Using MATLAB:

[a1;a2] =inv ([r0, r1; r1, r0]) *[r1; r2] and E=r0^2-a1^2*r1^2-a2^2*r2^2

a1=1.111, a2=-.3889, E=.1722

Question #6:
A speech waveform S has the values: So , S 1 , S2 , S 3 ,
S4 , S5 , S 6 , S7 , S 8=[1, 2 , 3 ,1 , 4 ,1 , 2 , 4 , 3] ,
For the frame of 5 samples with No pre-emphasized:

6|Page
a. auto correlation parameters can be calculated using the equation:

n=N −1−i
r i= ∑ S n . S n+i
n=0

So r 0 =1∗1+2∗2+3∗3+ 1∗1+ 4∗4=31


r 1=1∗2+2∗3+3∗1+1∗4=15
r 2=1∗3+2∗1+ 3∗4=17
Parameters a 1∧a2 for LPC of order 2 can be calculated as:

[ ][ ] [ ]
r 0 r 1 a1
r 1 r 0 a1
r
= 1
r2

So [ 3115 1531] [ aa ]=[ 1517] 1

∆= |3115 1531|=736 ∆ =|
a1
17 31|
15 15
=210 ∆ =|
15 17|
31 15
=302 a2

∆a ∆a
a 1= 1
=0.285 a 2= 2
=0.4103
∆ ∆
b. with overlapping with two samples the second frame will be: [1 , 4 , 1 , 2, 4 ] ,
Using the same equations above:

r 0 =1∗1+ 4∗4+ 1∗1+2∗2+4∗4=38

r 1=1∗4 +4∗1+1∗2+2∗4=18

r 2=1∗1+4∗2+1∗4=13

So

[ 38 18 a1
18 38 a1
= ][ ] [ ]
18
13

So a 1=0.4018 a 2=0.1518
c. with the pre-emphasis constant is 0.95

7|Page
The first 5 samples will be:
1 st sample=1 , 2 nd=2−0.95∗1=1.05 ,3 rd=1.1 , 4 th=−1.85 , 5 th=3.05 ,
1. The first five samples are [1, 1.05 , 1.1 ,−1.85 , 3.05]
As before

r 0 =16.0375

r 1=−5.4725

r 2=2.5125

So

[−5.4725
16.0375
][ ] [
−5.4725 a1 −5.4725
16.0375 a1
=
2.5125 ]
a 1=−0.3257 a 2=0.045526

2. The second five samples will be:


1 st sample=1−0.95∗3=−1.85 , 2 nd=4−0.95∗1=3.05 ,3 rd=−2.8 , 4 th=1.05 ,5 th=2.1 ,
The second five samples are:[−1.85 , 3.05 ,−2.8 ,1.05 , 2.1]
As before:
r 0 =¿26.0775

r 1=−14.9175

r 2=2.5025

So

[−14.9175
26.0775
][ ] [
−14.9175 a1 −14.9175
26.0775 a1
=
2.5025 ]
8|Page
So a 1=−0.7687 a 2=−0.34376

Question #7:
The MATLAB code:
alpha = [-2.12,2.89,-3.4,3.55,-3.13,2.25,-1.2,0.47];
fs = 16000; Ts = 1/fs;

z =tf('z',Ts);
D=1;
for n=1:8
D = D + alpha(n)*(z^(-n));
end
G = 10;
H = G/D;
bode(H);

Figure 4: shows the bode plot of the transfer function of Question #8

Question #8:
The MATLAB code:
Dem = [0.1483 ,-0.1158 , -0.016 , -0.0706, 0.5867, -0.9232,...
0.5018, -0.444,1.2491,-1.8779, 1];

9|Page
Num = zeros(1,11);
Num(1,1) = 1;
H = tf(Num,Dem , 1/16000);
bode(H);
grid on;

1st formant at 1.7k Hz with bandwidth = 0.62k Hz


2nd formant at 12k Hz with bandwidth = 4.9k Hz
3rd formant at 18k Hz with bandwidth = 3.5k Hz
4th formant at 35.7k Hz with bandwidth = 5.3k Hz

Figure 5: shows the first and second formants, Question #8

10 | P a g e
Figure 6: shows the third and fourth formants, Question #8

Question #9:
For a system with a conjegute pole of r*e^(plus or minus theta ) , it has a peak at wn*sqr(1-2t^2)
As wn= sqrt(theta^2+r^2) and with a bandwidth of (-2* ln(r))=2*(1-r)
- For Pole at 0.9739±0.1051j=0.979 e^(i*.107), w1=0.107*ws rad/sec
and B=2*(1-.979) =.0424*ws rad/sec

- For Pole at 0.6432±0.6121j =..8879 e^(i*0.7606) then w3=.7606*ws rad/s


and B=2*(1-0.8879) =.2242*ws rad/sec

- For Pole at 0.3493±0.744j =0.8219 e^(i*1.1318) then w5=1.1318 then and


B=2*(1-0.8219)=0.3562*ws rad/sec

- For Pole at –0.4768±0.6397j =0.7978e^(i*0.9303) then w4= 0.9303*ws


and B=2*(1-0.7978) =.4044*ws rad/sec

- For Pole at – 0.5508±0.3904j =0.6751e^(i*0.6165) then w2= 0.6165 then


and B=2*(1-0.8219) =0.6498*ws rad/sec

11 | P a g e
- so overall the system has formants at 0.107, 0.6165, .7606, 0.9303, 1.1318 ,
each with Bandwidth of .0424, 0.6498, 0.2242, .4044, 0.3562

Formant frequency (rad/sec) Bandwidth (rad/sec)

0.107*16000=1.712K .0424∗16000=.678 k

0.6165*16000=9.864k 0.6498*16000=10.396k
.7606*16000=12.169k 0.2242*16000=3.587K

0.9303*16000=14.884k .4044*16000=6.47k

1.1318*16000=18.109k 5.699 k

After lots of searching on the general role of calculating the formants from the conjugate
poles of the system we found that it’s stated that a z_plane conjugate pole must have r
greater than 0.7 in order to be represented as a formant so the second pole of r=.6751 does
not represent
a formants and for the formant above at 35K of the previous question may be neglected as
the first five formants are enough to represent the signal.

Question #10:

I. A phoneme is a unit of sound that distinguishes one word from another in a


particular language.
II. A vowel is a syllabic speech sound pronounced without any stricture in
the vocal tract.
III. A consonant is a speech sound that is articulated with complete or partial
closure of the vocal tract.
IV. The voiced phoneme can refer to the articulatory process in which the vocal
folds vibrate, its primary use in phonetics to describe phones, which are
particular speech sounds.

12 | P a g e
The unvoiced phoneme can refer to the voices at which being pronounced
without the larynx vibrating.
V. Fricatives are consonants produced by forcing air through a narrow channel
made by placing two articulators close together.These may be the lower lip
against the upper teeth, in the case of [f]; the back of the tongue against
the soft palate, in the case of German [x] (the final consonant of Bach); or
the side of the tongue against the molars, in the case of Welsh [ɬ] (appearing
twice in the name Llanelli). This turbulent airflow is called frication.
VI. In phonetics, a plosive, also known as an occlusive or simply a stop, is
a pulmonic consonant in which the vocal tract is blocked so that
all airflow ceases.
VII. Voiced fricative >> ‘v’ such in “voting” and ‘z’ such in “zoo”.
VIII. Unvoiced fricative >> ‘f ’ such in “fact” and ‘s’ such in “sitter”.
IX. A nasal phoneme >> ‘n’ such in “nick” and ‘m’ such in “man”.
X. A stop, voiced phoneme >> ‘b’ , ‘d’ and ‘g’.

Question #11:

At first, the outer ear catches the sound wave which then travels through a narrow
passageway called the ear canal. The sound wave reaches the eardrum which is a membrane
roughly half the size of a dime and makes it vibrate which in turns vibrates three tiny bones
called the malleus, incus and stapes.

These bones amplify or increase the sound vibrations and send them to the cochlea. The
cochlea is filled with fluid and the sound vibrations make this fluid ripple which creates
waves. Hair-like structure called stereocilia sit on top of hair cells and are grouped together as
hair-cell bundles inside the cochlea.

The hair cells inside the cochlea ride these waves and the hair bundles are moved which in
turn, turns these movement into electrical signal. As the hair bundles are moved, ions rush
into the top of hair cells, causing the release of chemicals at the bottom of the hair cells.
The chemicals bind to the auditory nerve cells and create an electrical signal which travels
along the auditory nerve to the brain.

13 | P a g e
Different hair cells respond to different frequencies of sound all the way from (20Hz to 20KHz),
the hair cells at the base of the cochlea detect higher pitch sounds, the hair cells towards the
top of the spiral detects progressively lower pitch sounds, and the lowest pitch sounds reside at
the top of the cochlea.

Figure 7: shows different frequencies due to cochlea

Figure 8: shows the ear components

14 | P a g e
Question #12:

:
An overview of the text grid

15 | P a g e
Allocating the phonemes of zero

16 | P a g e
17 | P a g e
18 | P a g e
19 | P a g e
20 | P a g e
21 | P a g e
22 | P a g e
23 | P a g e
24 | P a g e

You might also like