0% found this document useful (0 votes)

69 views47 pages

Digital Signal Processing: Course

The document discusses speech signal processing and time-domain features. It introduces speech signals, their basic properties including randomness and variability. It describes techniques for voiced/unvoiced/silence segmentation using attributes like short-time energy and zero-crossing rate. Time-domain pitch estimation methods like autocorrelation function and average magnitude difference function are also covered. Applications of these time-domain features for speech analysis are discussed.

Uploaded by

YasashiRikai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views47 pages

Digital Signal Processing: Course

Uploaded by

YasashiRikai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

COURSE:

DIGITAL SIGNAL PROCESSING

Instructor: Ninh Khanh Duy

CHAPTER 6:
SPEECH SIGNAL PROCESSING

Lecture 6.1: Introduction to speech signals

Lecture 6.2: Time-domain features and applications
Lecture 6.3: Frequency-domain features and applications
Duration: 6 periods
Lecture 6.1
Introduction to speech signals

! Outline:
1. Overview of speech signals
2. Basic properties of speech signals
Overview of speech signals

! Speech signals are obtained by a digital recording process

(sampling, quantizing, coding) of acoustic waves

“Các bạn trẻ …”

! Speech signals encode messages of speakers, which include

linguistic information such as phonemes, sentence types, etc
Overview of speech signals

! Acoustic wave at mouth and nose is the output of the air low
going from lung through human vocal tract
Mechanisms of phones and voicing

Air flow

/s/ /a/
Vocal
cords/folds

" Speech (Output signal): include different phones and voicing

" Resonance cavities (System) ⇒ diff. phones: /a/, /m/, /s/, /z/
" Air flow after vocal cords (Input signal) ⇒ diff. voicing:
• Vocal cords vibrate: Quasi-periodic pulses ⇒ voiced phones: /a/, /m/
• Vocal cords close: Turbulence ⇒ unvoiced phones: /s/, /z/, /p/, /k/
Lecture 6.1
Introduction to speech signals

! Outline:
1. Overview of speech signals
2. Basic properties of speech signals
Basic properties of speech signals

! Randomness
" Speech (like most real-world signals) is random: impossible to
predict with certainty their future values from past values
# Deterministic signal: for each value of time we have a rule which
enables us to determine the precise value of the signal

" The value of a signal at any instant of time x(t) is a random

variable
# The actual value of a signal is only known after observation

" A signal is assumed to be generated by a random process with a

structure that can be characterized and described
Basic properties of speech signals

! Variability
" Depend on different microphones
Basic properties of speech signals

! Variability
" Depend on different speakers (voices)
Basic properties of speech signals

! Variability
" Depend on dif. physical/emotional states of the same speaker
Basic properties of speech signals

! Characteristics are slowly varying in time

" Time/Frequency related features are quite stable within short
segments of 10-50 ms (duration to pronounce a phoneme)
Short-time processing technique

! Divide a signal into consecutive frames, each having a fixed duration

(e.g., 25 ms)

! Extract features frame-by-frame

! Combine extracted features into feature sequence (time axis is now

frame index)
Homework

1. Read Section 2 & 3 of “CS425 Audio and Speech Processing_Hodgkinson_2012”

2. Write a program to compute the energy and power of a recorded signal

following the formulas (2.1) & (2.2) in page 25 of the textbook
“Applied Digital Signal Processing -Theory and Practice_Manolakis-Ingle_2011”
CHAPTER 6:
SPEECH SIGNAL PROCESSING

Lecture 6.1: Introduction to speech signals

Lecture 6.2: Time-domain features and applications
Lecture 6.3: Frequency-domain features and applications
Duration: 6 periods
Lecture 6.2
Time-domain features and applications

! Outline:
1. Voiced/Unvoiced/Silence segmentation
2. Time-domain pitch estimation
Introduction to
Voiced/Unvoiced/Silence classification

! Recorded signal include speech & silence regions

" Speech: regions exhibit voice activities (producing phones)

" Silence: regions exhibit no phone except environmental noise

Introduction to
Voiced/Unvoiced/Silence classification

! A speech region is divided into voiced & unvoiced segments

" Voiced: exhibit strong periodicity, resulted by vibration of vocal folds

" Unvoiced: exhibit weak/no periodicity, resulted by closed vocal folds

Speech/Silence discrimination

! Problem statement
" Input: a signal

" Output: the signal with vertical boundaries between speech and
silence regions

! Constraint
" The minimum length of silence region is 300ms to exclude very
short pauses when speaking
Speech/Silence discrimination

! Observation

Level of silence is mostly lesser than that of speech segments,

except when
" Environmental noise may has level higer than that of unvoiced
fricatives (e.g., /s/, /z/)

" Recording environment has a high noise level (or low Signal-to-
Noise Ratio (SNR))

$ Use signal level as the discrimination criterion

Speech/Silence discrimination

! Candidate attribute functions

" Short-Time Energy (STE): sum of square of the waveform values
over a finite number of samples belonging to a frame (20-25 ms)

n: frame index
m: sample index
N: frame length (samples)
Speech/Silence discrimination

! Candidate attribute functions

" Magnitude Average (MA): sum of absolute of the waveform values
over a finite number of samples belonging to a frame

n: frame index
m: sample index
N: frame length (samples)
" For practical uses, we rather use the N values centered around n,

from n−N/2 to n+N/2−1

Speech/Silence discrimination

! Candidate attribute functions

" Short-Time Energy (STE) vs. Magnitude Average (MA)

Both functions reflect the waveform envelope, but STE emphasizes large values
Speech/Silence discrimination

! Algorithm
" Based on some threshold of the attribute function to discriminate a
frame as speech or silence

" This threshold is to be found based on given training signals with

different environmental noise levels
Voiced/Unvoiced discrimination

! Problem statement
" Input: a signal including only speech region (assuming no silence)

" Output: the signal with vertical boundaries between voiced and
unvoiced segments

! If input signal includes some silence $ no problem because

silence is non-periodic & could be considered as unvoiced
Voiced/Unvoiced discrimination

! Same idea as previous task

" Look for attributes that characterise contrastingly the states to
discriminate

" Setting for each state a threshold based on training signals

! Different point
" Combine several features to discriminate voiced vs. unvoiced
Voiced/Unvoiced discrimination

! Discriminatory attributes and functions

" STE or MA: unvoiced segments has level generally lesser than
voiced segments
Voiced/Unvoiced discrimination

! Discriminatory attributes and functions

" Zero-Crossing Rate (ZCR): the rate at which the waveform crosses
the zero-axis

" Unvoiced segments exhibit a denser waveform, more turbulent

than voiced segments $ UV has significantly higher ZCR than V
Voiced/Unvoiced discrimination

! Discriminatory attributes and functions

" Zero-Crossing Rate (ZCR): the rate at which the waveform crosses
the zero-axis

n: frame index
m: sample index
N: frame length
Voiced/Unvoiced discrimination

! Normalisation of attribute functions

" Useful when combine (e.g., adding) multiple attribute functions into
one

" Then a voicing threshold can be set for the composite function

" Otherwise, must set various thresholds for dif. attribute functions
Lecture 6.2
Time-domain features and applications

! Outline:
1. Voiced/Unvoiced/Silence discrimination
2. Time-domain pitch estimation
Pitch or Fundamental frequency (F0)

! A feature dedicated only for periodic signals (e.g., voiced segments)

! Definition

" Fundamental frequency (F0), inverse of the fundamental period, is

the number of signal cycles per seconds
• For speech: F0 is actually the vibration frequency of vocal cords

" Pitch is the perceptual counterpart of F0 (e.g, high/low-pitched

voice)

! Importance

" Pitch contour conveys the intonation of an utterance (rising/falling)

" For Vietnamese: 06 tones (ngang, huyền, ngã, hỏi, sắc, nặng)
Pitch/F0 estimation

! Problem statement
" Input: a signal (may including silence/voiced/unvoiced segments)

" Output: F0 contour of the signal (a F0 value for each frame)

! Constraint
" Valid F0 values for adult voices is from 70Hz to 450 Hz
Pitch/F0 estimation

! An example F0 contour extracted from signal

Pitch/F0 estimation

! Two time-domain methods

" Short-Time Autocorrelation function (ACF)

" Short-Time Average Magnitude Difference Function (AMDF)

! Both based on the following property of periodic signal

NT : pitch period/fundamental period (in samples)

! Voiced segments of speech are quasi-periodic

$ “=“ never occurs

Autocorrelation function (ACF)

! The ACF of a signal gives an indication of how alike itself a

signal is when shifted

! Definition

n: lag/shift
m: sample index

! Application: for a periodic signal x, the ACF is globally

maximal at every lag that is an integer multiple of the period
" For quasi-periodic signal$ local maximal (peak)
Autocorrelation function (ACF)

! Short-time ACF of a frame:

n: lag (samples)
m: sample index
N: frame length (samples)
! The ACF should be normalized to obtain maximum value of 1
by dividing by largest autocorrelation value at lag zero xx[0]

! Complexity per frame: O(N2)

Short-Time Autocorrelation function

(Kondoz, 2004)
Short-Time Autocorrelation function

The normalized height of highest local peak is propotional

to degree of voicing $ can be used for V/U decision
Algorithm
(for a frame)

(Trần Văn Tâm, 2019)

Short-Time Autocorrelation function

! Autocorrelation peak detection

! Reducing the scope of the search

" F0 is from 70Hz to 450 Hz $ searching range of maximum lag
Short-Time Autocorrelation function

! Be careful with virtual pitch values

Lucky frame $ correct F0

Short-Time Autocorrelation function

! Be careful with virtual pitch values

Unlucky frame $ incorrect F0

Average Magnitude Difference Function

! The AMDF of a signal gives an indication of how different a

signal itself is compared to its shifted version

! Definition

(n: lag, m: sample index)

! Application: for a periodic signal x, the AMDF is zero at every

lag that is an integer multiple of the period of the waveform
" For quasi-periodic signal$ local minimal (dip)
Average Magnitude Difference Function
(Ex. w/ 4 frames)

(Kondoz, 2004)
Average Magnitude Difference Function

! Short-time AMDF of a frame

n: lag (samples)
N: frame length (samples)

! Computationally much cheaper than the ACF

! Have similar algorithm & problems to the ACF

Homework
1. Các thành viên mỗi nhóm thảo luận và phân công nhiệm
vụ, ghi rõ SV nào làm task nào (ko được trùng nhau):
- 1a (phân đoạn speech vs. silence)
- 1b (phân đoạn voiced vs. unvoiced)
- 2a (tính F0 dùng hàm tự tương quan)
- 2b (tính F0 dùng hàm AMDF).
Nhập task (1a/1b/2a/2b) vào link danh sách nhóm.
Hạn cuối: trước buổi học tuần sau.
Sau hạn này SV nào ko nhập coi như ko tham gia làm BT
nhóm và nhận 0 điểm thi GK.
2. Nhóm trưởng tick (X) vào cột “Nhóm trưởng” trên link.

Speech Signal Processing Overview
No ratings yet
Speech Signal Processing Overview
54 pages
Week 5 Silent Discrimination
No ratings yet
Week 5 Silent Discrimination
7 pages
Speech Processing Lab: VUS Discrimination
No ratings yet
Speech Processing Lab: VUS Discrimination
11 pages
Time Dependent Processing of Speech
No ratings yet
Time Dependent Processing of Speech
26 pages
Module2 SSP
No ratings yet
Module2 SSP
70 pages
A Tutorial To Extract The Pitch in Speech Signals Using Autocorrelation
No ratings yet
A Tutorial To Extract The Pitch in Speech Signals Using Autocorrelation
11 pages
Text-Independent Speaker Recognition
No ratings yet
Text-Independent Speaker Recognition
12 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
30 pages
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
No ratings yet
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
31 pages
Speaker Recognition File
No ratings yet
Speaker Recognition File
16 pages
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
100% (1)
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
32 pages
Lab 9 A
No ratings yet
Lab 9 A
12 pages
Matlab Speech Segmentation Guide
No ratings yet
Matlab Speech Segmentation Guide
3 pages
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
No ratings yet
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
7 pages
Terez Pitch Detection Algorithm
No ratings yet
Terez Pitch Detection Algorithm
4 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
15 pages
Week 4 Auditory Perception & Time Domain Parameters
No ratings yet
Week 4 Auditory Perception & Time Domain Parameters
8 pages
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
No ratings yet
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
30 pages
Pitch Detection of Speech Signals (Project Report)
No ratings yet
Pitch Detection of Speech Signals (Project Report)
9 pages
Iot Project Report
No ratings yet
Iot Project Report
15 pages
46 Silence PDF
No ratings yet
46 Silence PDF
8 pages
Towards Neurocomputational Speech and So
No ratings yet
Towards Neurocomputational Speech and So
279 pages
Audproc 2
No ratings yet
Audproc 2
40 pages
MEH-Nakai Lab-1
No ratings yet
MEH-Nakai Lab-1
93 pages
Speech Acoustics Project
No ratings yet
Speech Acoustics Project
22 pages
Audio Processing
No ratings yet
Audio Processing
19 pages
Abstract:: Text-Independent and Dependent Methods. in A Text
No ratings yet
Abstract:: Text-Independent and Dependent Methods. in A Text
11 pages
Lab9: Speech Synthesis
No ratings yet
Lab9: Speech Synthesis
13 pages
SVP (1-5) Units Notes 4th Yr CSM
No ratings yet
SVP (1-5) Units Notes 4th Yr CSM
35 pages
Lec 65
No ratings yet
Lec 65
11 pages
Am-Demodulation of Speech Spectra and Its Application To Noise Robust Speech Recognition
No ratings yet
Am-Demodulation of Speech Spectra and Its Application To Noise Robust Speech Recognition
4 pages
Steve Harris+Joern Nettingsmeier-Audio Engineering
No ratings yet
Steve Harris+Joern Nettingsmeier-Audio Engineering
57 pages
Spectral Energy Based Voice Activity Detection For Real-Time Voice Interface
No ratings yet
Spectral Energy Based Voice Activity Detection For Real-Time Voice Interface
17 pages
Automatic Speaker Recognition System
100% (1)
Automatic Speaker Recognition System
11 pages
Signals: Lasciate Ogni Speranza, Voi CH 'Entrate. Dante Alighieri, The Divine Comedy
No ratings yet
Signals: Lasciate Ogni Speranza, Voi CH 'Entrate. Dante Alighieri, The Divine Comedy
17 pages
6.3 Time-Domain Parameters
No ratings yet
6.3 Time-Domain Parameters
7 pages
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
DSP Speech Recognition Analysis
No ratings yet
DSP Speech Recognition Analysis
32 pages
Lectures 7-8 Winter 2012
No ratings yet
Lectures 7-8 Winter 2012
73 pages
Wavelet Transform for Speaker Recognition
No ratings yet
Wavelet Transform for Speaker Recognition
22 pages
3.2 Automatic Speech Recognition
No ratings yet
3.2 Automatic Speech Recognition
151 pages
CSEIT1722347
No ratings yet
CSEIT1722347
5 pages
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
No ratings yet
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
50 pages
Introduction (UCS749)
No ratings yet
Introduction (UCS749)
59 pages
An Automatic Speaker Recognition System
No ratings yet
An Automatic Speaker Recognition System
11 pages
Information Optimization For Speaker Recognition Using Correlation Functions
No ratings yet
Information Optimization For Speaker Recognition Using Correlation Functions
11 pages
Real-Time Trumpet Note Detection System
No ratings yet
Real-Time Trumpet Note Detection System
6 pages
Speaker Recognition System Using MFCC and Vector Quantization
No ratings yet
Speaker Recognition System Using MFCC and Vector Quantization
7 pages
Silence Removal Algorithm for Speech Recognition
No ratings yet
Silence Removal Algorithm for Speech Recognition
5 pages
Speech Coding and Phoneme Classification Using Matlab and Neuralworks
No ratings yet
Speech Coding and Phoneme Classification Using Matlab and Neuralworks
4 pages
Review Analysis of Real World Noise: Dheeraj Joshi, Prashant Moud
No ratings yet
Review Analysis of Real World Noise: Dheeraj Joshi, Prashant Moud
6 pages
Homework 1
No ratings yet
Homework 1
3 pages
Speaker Recognition Methods Guide
No ratings yet
Speaker Recognition Methods Guide
16 pages
Speaker Recognition Methods Overview
No ratings yet
Speaker Recognition Methods Overview
16 pages
Introduction (UCS749)
No ratings yet
Introduction (UCS749)
72 pages
Time Domain Methods Speech Processing
No ratings yet
Time Domain Methods Speech Processing
56 pages
Speech Sounds in NLP: Production & Analysis
No ratings yet
Speech Sounds in NLP: Production & Analysis
9 pages
The Translation of Prepositions Between English and Arabic Sûrah
No ratings yet
The Translation of Prepositions Between English and Arabic Sûrah
15 pages
ARTIKEL UNTUK ULASAN KRITIS Pelaksanaan Aktiviti Seni Kreatif Dalam Pendidikan Pra Sekolah (WORD)
No ratings yet
ARTIKEL UNTUK ULASAN KRITIS Pelaksanaan Aktiviti Seni Kreatif Dalam Pendidikan Pra Sekolah (WORD)
23 pages
Guia para Professores v0 PT
No ratings yet
Guia para Professores v0 PT
76 pages
Verb Tenses Overview and Examples
No ratings yet
Verb Tenses Overview and Examples
14 pages
Pre-Knowledge Quiz for English IV
No ratings yet
Pre-Knowledge Quiz for English IV
2 pages
Medical Technologist Application
No ratings yet
Medical Technologist Application
2 pages
English Assignment (Literary Devices) - Najmi 10B PDF
No ratings yet
English Assignment (Literary Devices) - Najmi 10B PDF
3 pages
Kindergarten Phonics Fun
No ratings yet
Kindergarten Phonics Fun
5 pages
First Work
No ratings yet
First Work
2 pages
Military Letter Format Guide
No ratings yet
Military Letter Format Guide
61 pages
English 7: Filipino Identity Module
No ratings yet
English 7: Filipino Identity Module
4 pages
Dramatization As A Method of Developing Spoken English Skill
No ratings yet
Dramatization As A Method of Developing Spoken English Skill
9 pages
English Grammar Review Exercises
No ratings yet
English Grammar Review Exercises
1 page
Trademark Law: Rosetta Stone v. Google
No ratings yet
Trademark Law: Rosetta Stone v. Google
41 pages
វិញ្ញាសាអង់គ្លេស ប្រឡងបាក់ឌុប ២០២១
No ratings yet
វិញ្ញាសាអង់គ្លេស ប្រឡងបាក់ឌុប ២០២១
58 pages
New Keystone
100% (2)
New Keystone
3 pages
Act 1 Scene 3
No ratings yet
Act 1 Scene 3
14 pages
Untitled Document
0% (1)
Untitled Document
4 pages
DT20234387292 Application
No ratings yet
DT20234387292 Application
5 pages
Linguistics Module 2
No ratings yet
Linguistics Module 2
16 pages
Revision On Contrastive Analysis 2023 Daklak
No ratings yet
Revision On Contrastive Analysis 2023 Daklak
6 pages
Matseis 1.12 Manual
No ratings yet
Matseis 1.12 Manual
165 pages
Topic 8 - Phonological System of The English Language II - Consonants.
No ratings yet
Topic 8 - Phonological System of The English Language II - Consonants.
10 pages
Memoire Djobal
No ratings yet
Memoire Djobal
45 pages
Latin and Arabic: Entangled Histories
100% (3)
Latin and Arabic: Entangled Histories
289 pages
Solution Manual For Neue Horizonte, 8th Edition - Instant Download To Read The Complete Content
100% (12)
Solution Manual For Neue Horizonte, 8th Edition - Instant Download To Read The Complete Content
29 pages
Ya 2 Caf U91 G
No ratings yet
Ya 2 Caf U91 G
1 page
Ayurveda: A Multilectic Interpretation
No ratings yet
Ayurveda: A Multilectic Interpretation
8 pages
تصحيح الدورة العادية 2010 exam national english 2 bac PDF
No ratings yet
تصحيح الدورة العادية 2010 exam national english 2 bac PDF
3 pages
Ustinova - 1998 - The Supreme Gods of The Bosporan PDF
No ratings yet
Ustinova - 1998 - The Supreme Gods of The Bosporan PDF
384 pages

Digital Signal Processing: Course

Uploaded by

Digital Signal Processing: Course

Uploaded by

COURSE:

DIGITAL SIGNAL PROCESSING

Instructor: Ninh Khanh Duy

Lecture 6.1: Introduction to speech signals

! Speech signals are obtained by a digital recording process

“Các bạn trẻ …”

! Speech signals encode messages of speakers, which include

" Speech (Output signal): include different phones and voicing

" The value of a signal at any instant of time x(t) is a random

" A signal is assumed to be generated by a random process with a

! Characteristics are slowly varying in time

! Divide a signal into consecutive frames, each having a fixed duration

! Extract features frame-by-frame

! Combine extracted features into feature sequence (time axis is now

1. Read Section 2 & 3 of “CS425 Audio and Speech Processing_Hodgkinson_2012”

2. Write a program to compute the energy and power of a recorded signal

Lecture 6.1: Introduction to speech signals

! Recorded signal include speech & silence regions

" Silence: regions exhibit no phone except environmental noise

! A speech region is divided into voiced & unvoiced segments

" Unvoiced: exhibit weak/no periodicity, resulted by closed vocal folds

Level of silence is mostly lesser than that of speech segments,

$ Use signal level as the discrimination criterion

! Candidate attribute functions

! Candidate attribute functions

from n−N/2 to n+N/2−1

! Candidate attribute functions

" This threshold is to be found based on given training signals with

! If input signal includes some silence $ no problem because

! Same idea as previous task

" Setting for each state a threshold based on training signals

! Discriminatory attributes and functions

! Discriminatory attributes and functions

" Unvoiced segments exhibit a denser waveform, more turbulent

! Discriminatory attributes and functions

! Normalisation of attribute functions

! A feature dedicated only for periodic signals (e.g., voiced segments)

" Fundamental frequency (F0), inverse of the fundamental period, is

" Pitch is the perceptual counterpart of F0 (e.g, high/low-pitched

" Pitch contour conveys the intonation of an utterance (rising/falling)

" Output: F0 contour of the signal (a F0 value for each frame)

! An example F0 contour extracted from signal

! Two time-domain methods

" Short-Time Average Magnitude Difference Function (AMDF)

! Both based on the following property of periodic signal

NT : pitch period/fundamental period (in samples)

! Voiced segments of speech are quasi-periodic

$ “=“ never occurs

! The ACF of a signal gives an indication of how alike itself a

! Application: for a periodic signal x, the ACF is globally

! Short-time ACF of a frame:

! Complexity per frame: O(N2)

The normalized height of highest local peak is propotional

(Trần Văn Tâm, 2019)

! Autocorrelation peak detection

! Reducing the scope of the search

! Be careful with virtual pitch values

Lucky frame $ correct F0

! Be careful with virtual pitch values

Unlucky frame $ incorrect F0

! The AMDF of a signal gives an indication of how different a

(n: lag, m: sample index)

! Application: for a periodic signal x, the AMDF is zero at every

! Short-time AMDF of a frame

! Computationally much cheaper than the ACF

! Have similar algorithm & problems to the ACF

You might also like