Unit V Digital Signal Processors
Unit V Digital Signal Processors
2
Please read this disclaimer before proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
3
R.M.K. ENGINEERING COLLEGE
Date : 1.10.2024
4
Table of Contents
S.No Contents Page
Number
1 Course Objectives 7
2 Pre-Requisites 8
3 Syllabus 9
4 Course outcomes 10
13
6.1 Lecture Plan
14
6.2 Activity based learning
15
6.3 Lecture Notes
➢ General DSP Architecture 17
➢ Fixed Vs Floating 25
➢ Addressing Modes 33
➢ Programming 36
➢ Circular Buffering 37
41
6.4 Assignments
42
6.5 Part A Q & A
46
6.6 Part B Qs
47
6.7 Supportive online Certification courses
48
6.8 Real time Applications in day-to-day life and
to Industry 5
S.No Contents Page
Number
49
6.9 Contents beyond the Syllabus
7 Assessment Schedule 50
6
1. COURSE OBJECTIVES
OBJECTIVES:
▪ To Examine the LTI systems using Z Transform.
▪ To describe the characteristics of IIR filters and design IIR filters for given
specifications.
▪ To familiarize different design methods available for FIR filters and its realization
structures.
7
2. PRE-REQUISITES
By learning this course, the Student will gain knowledge about Laplace Transform
and Z-Transforms in solving differential and difference equations in this course.
8
3. SYLLABUS
Discrete Fourier transform (DFT) and its properties - periodicity, symmetry and
circular convolution. FFT Algorithm- Radix -2 DIT FFT, Radix-2 DIF FFT- overlap
save and overlap add method.
Analog filters - Butterworth filters, Chebyshev Type I filters (Up to 2nd order)
Transformation of analog filters into equivalent digital filters using Bilinear Z
Transform method - Realization Structures for IIR filters- direct, cascade and
parallel forms.
Design of linear phase FIR filters using Fourier series and windowing method -
Rectangular, Hamming and Hanning window- Realization structures for FIR filters
– Transversal and linear phase structures – Comparison of FIR and IIR Filters.
TOTAL : 45 PERIODS
9
4. COURSE OUTCOMES
10
5. CO- PO/PSO Mapping
Program
Program Outcomes Specific
Course Leve
Outcomes
Outcom l of K3,
es CO K3 K4 K4 K5 K5, A3 A2 A3 A3 A3 A3 A2 K5 K5 K3
K6
PO- PO- PO- PO- PO- PO- PO- PO- PSO- PSO- PSO-
PO-1 PO-2 PO-3 PO-12
4 5 6 7 8 9 10 11 1 2 3
C305.1 K3 3 3 2 2 2 1 - - - - 1 1 - 2 3
C305.2 K3 3 3 2 2 2 1 - - - - 1 1 - 2 3
C305.3 K3 3 3 2 2 2 1 - - - - 1 1 - 2 3
C305.4 K3 3 3 2 2 2 1 - - - - 1 1 - 2 3
C305.5 K3 2 3 3 2 2 1 - - - - 1 1 - 1 2
C305.6 K2 2 3 3 2 2 1 - - - - 1 1 - 1 2
C305 3 3 2 2 2 1 - - - - 1 1 - 2 3
11
UNIT-5
DIGITAL SIGNAL PROCESSORS
12
6.1 LECTURE PLAN
UNIT V – DIGITAL SIGNAL PROCESSORS
Mode of Delivery
Taxonomy level
Proposed Date
No. of Periods
Pertaining CO
Actual Date
Reason for
Deviation
S.No
Topic
Types of Digital
1 Signal processors 1 CO6 K2 PPT
and Applications
Functionalities of
2 Digital Signal 1 CO6 K2 PPT
processors
MAC & Circular
3 1 CO6 K2 PPT
Buffering
Architecture of TMS
4 1 CO6 K2 PPT
320C5X
Architecture of TMS
5 1 CO6 K2 PPT
320C54X
6 VLIW Architecture 1 CO6 K2 PPT
Instruction Sets-
8 1 CO6 K2 PPT
Introduction
Programming-
9 1 CO6 K3 PPT
Introduction
13
6.2 ACTIVITY BASED LEARNING TO EXPERIENCE
IMAGE PROCESSING – APPLICATION OF DSP
Take an 8 ½ x 11 piece of paper, fold it and cut it as shown below.
Now have a partner hold the paper from the top. Keep your fingers apart around the
paper about two inches below your partner’s fingers. Ask your partner to randomly
let go of the paper. Are you able to catch the paper with your fingers before it
passes your fingers?
If the answer is no, here is why! Our brain uses image processing to determine
when your partner let go of the paper and then sends signals to your hand for your
fingers to catch it. However, your system has a delay between when your brain
sends the signal and when your muscles act. That delay is long enough to allow the
paper to slip past your fingers. The whole process from “seeing” the paper falling to
“acting” takes time. There is minimum time that we cannot reduce. The process is
faster than what we can handle. We wish we had a shorter processing time.
14
6.3 Lecture Notes
Unit – 5 Digital Signal Processors
15
UNIT-5 DIGITAL SIGNAL PROCESORS
What is a DSP?
16
5.1 GENERAL DSP ARCHITECTURE
DSP Blocks
The internal hardware of a digital signal processor consists of many
blocks:
1. CPU
2. Arithmetic Logic Unit (ALU)
3. Accumulators
4. Barrel shifter
5. Multiplier unit
6. Compare Select and Store Unit ( CSSU )
7. Memory cache
8. DMA controller
17
Von Neumann Architecture
• Von Neumann architecture contains a single memory and a single bus for
transferring data into and out of the central processing unit (CPU).
• Multiplying two numbers requires at least three clock cycles. We don't count
the time to transfer the result back to memory, because we assume that it
remains in the CPU for additional manipulation (such as the sum of
products in an FIR filter).
• The Von Neumann design is quite satisfactory when you are content to
execute all of the required tasks in serial.
Harvard Architecture
• It has separate memories for data and program instructions, with
separate buses for each. Since the buses operate independently, program
instructions and data can be fetched at the same time, improving the
speed over the single bus design.
• This architecture increases the speed of computation as compared to
Von Neumann architecture.
18
INSTRUCTION CACHE
• DSP algorithms generally spend most of their execution time in loops.
• The same set of program instructions will continually pass from program
memory to the CPU.
• By including an instruction cache in the CPU we can speed up the execution.
I/O CONTROLLER
• The SHARC DSPs provides both serial and parallel communications ports.
• For example, at a 40 MHz clock speed, there are two serial ports that operate at
40 Mbits/second each,
• Thus the I/O port helps in faster execution.
• Harvard architecture
• Dedicated single-cycle Multiply-Accumulate (MAC) instruction (hardware MAC
units)
• Single-Instruction Multiple Data (SIMD) Very Large Instruction Word (VLIW)
architecture
• Pipelining
• Saturation arithmetic
• Zero overhead looping
• Hardware circular addressing
• Cache
• DMA
19
MULTIPLIER-ACCUMULATOR UNIT ( MAC )
ai xi
Multiplier
a i-1 x i-1
ai xi
n
Adder
Σ(a ix i )
i=0
ai xi + a i-1 x i-1
Register
Can compute a sum of n-products in n cycles
20
Single Instruction - Multiple Data (SIMD)
A technique for data-level parallelism by employing a number of processing
elements working in parallel
225px-SIMD
a
F
PU
b
e c
PU
g
x d
PU
y
z w
PU
h
21
CISC vs. RISC vs. VLIW
Pipelining
Instruction cycle requires Four Phases :
1.Fetch phase in which the instruction is fetched from the program
memory
2.Decode phase in which the instruction is decoded
3.Memory read phase in which the operand required for the execution of the instruction
read from the data memory
4.Execution phase in which execution as well as the storage of the results in either on of
registers or memory is carried out
Instruction cycles of processor with no pipelining
22
Instruction cycles of processor with pipelining
Saturation Arithmetic
Hardware support for loops with a constant number of iterations using hardware
loop counters and loop buffers
No branching
No loop overhead
No pipeline stalls or branch prediction
No need for loop unrolling
23
Hardware Circular Addressing
• A data structure implementing a fixed length queue of fixed size objects
where objects are added to the head of the queue while items are removed from
the tail of the queue.
• Requires at least 2 pointers (head and tail)
• Extensively used in digital filtering
Head
X[n]
X[n-1]
X[n]
Cycle1
X[n-2]
X[n-3] X[n-3]
The feature that allows peripherals to access main memory without the
intervention of the CPU
Typically, the CPU initiates DMA transfer, does other operations while the transfer
is in progress, and receives an interrupt from the DMA controller once the
operation is complete.
Can create cache coherency problems (the data in the cache may be different
from the data in the external memory after DMA)
Requires a DMA controller
24
DSP vs Microcontroller
DSP Microcontroller
25
• Fixed point arithmetic is much aster than floating point in general purpose
computers. However, with DSPs the speed is about the same, a result of the
hardware being highly optimized for math operations. The internal architecture of
a floating point DSP is more complicated than for a fixed point device. All the
registers and data buses must be 32 bits wide instead of only 16; the multiplier
and ALU must be able to quickly perform floating point arithmetic, the instruction
set must be larger (so that they can handle both floating and fixed point
numbers), and so on. Floating point (32 bit) has better precision and a higher
dynamic range than fixed point (16 bit) . In addition, floating point programs
often have a shorter development cycle, since the programmer doesn't generally
need to worry about issues such as overflow, underflow, and round-off error.
• On the other hand, fixed point DSPs have traditionally been cheaper than floating
point devices. Nothing changes more rapidly than the price of electronics. Cost is
a key factor in understanding how DSPs are evolving,
• The following characteristics make this family the ideal choice for a wide range of
processing applications:
❑ Very flexible instruction set
❑ Inherent operational flexibility
❑ High-speed performance
❑ Innovative, parallel architectural design
❑ Cost-effectiveness
❑ The ’C5x generation consists of the ’C50, ’C51, ’C52, ’C53, ’C53S, ’C56, ’C57,
and ’C57S DSPs, which are fabricated by CMOS integrated-circuit technology.
❑ Their architectural design is based on the C25.
❑ The operational flexibility and speed of the ’C5x are the result of
combining an advanced Harvard architecture (which has separate buses for
program memory and data memory),
❑ A CPU with application-specific hardware logic, on-chip peripherals,
on-chip memory, and a highly specialized instruction set.
❑ The ’C5x is designed to execute up to 50 million instructions per second
(MIPS).
26
Evaluation of the TMS320 family
Advantages of TMS320
The ’C5x devices offer these advantages:
➢ Enhanced TMS320 architectural design for increased performance
and versatility.
➢ Modular architectural design for fast development of spin-off devices.
➢ Advanced integrated-circuit processing technology for increased performance and
low power consumption.
➢ Source code compatibility with ’C1x, ’C2x, and ’C2xx DSPs for fast and easy
performance upgrades.
➢ Reduced power consumption and increased radiation hardness because of
new static design techniques.
➢ Enhanced instruction set for faster algorithms and for optimized high-level
language operation.
27
ARCHITECHURE OF TMS320C5X
1.Architecture.
2.Bus Structure & memory.
3.CPU.
4.Addressing Modes.
5.AL Syntax.
Bus Structure
•Program Bus (PB)= carries the instruction code & immediate operands from
•Program Address Bus (PAB) = Provides addresses to program memory space for
•Data Read Bus (DB) = Interconnects various elements of CPU to data memory
space.
•Data Read Address Bus (DAB) = Provides the address to access the data
memory space
28
Central Processing Unit (CPU)
The ’C5x CPU consists of these elements:
▪ Central arithmetic logic unit (CALU)
▪ Parallel logic unit (PLU)
▪ Auxiliary register arithmetic unit (ARAU)
▪ Memory-mapped registers
▪ Program controller
29
• The ‘C5X has 96 registers mapped into page 0 of the data memory space.
• All ‘C5X DSPs have:
• 28 CPU registers &
• 16 input/output (I/O) port registers but have different numbers of
peripherals & reserved registers.
Program Controller
Memory
On - Chip Memory
❑ Program read-only memory (PROM)
❑ Data/program dual-access RAM (DARAM)
❑ Data/program single-access RAM (SARAM)
Memory Space
❑ 64K-word program memory space,
❑ 64K-word local data memory space,
❑ 64K-word input/ output ports,
❑ 32K-word global data memory space.
Program ROM
This memory is used for booting program code from slower external ROM
or EPROM to fast on-chip or external RAM.
Data/Program Dual-Access RAM :
All ’C5x DSPs carry a 1056- word X 16-bit on-chip dual-access RAM (DARAM).
The DARAM is divided into three individually selectable memory blocks:
➢ 512-word data or program DARAM block B0,
➢ 512-word data DARAM block B1,
➢ 32-word data DARAM block B2.
The DARAM is primarily intended to store data values but, when needed, can be
used to store programs as well.
DARAM improves the operational speed of the ’C5x CPU as The CPU operates
with a 4-deep pipeline
30
Data/Program Single-Access RAM :
➢All ’C5x DSPs except the ’C52 carry a 16-bit on-chip single-access RAM (SARAM) of
various sizes
➢Code can be booted from an off-chip ROM and then executed at full
speed, once it is loaded into the on-chip SARAM.
31
3. TDM Serial Port:
▪ The TDM serial port available on the ’C50, ’C51, and ’C53 devices is a full-
duplexed serial port that can be configured by software either for synchronous
operations or for time-division multiplexed operations.
▪ The TDM serial port is commonly used in multiprocessor applications.
4. User-Maskable Interrupts:
▪ Four external interrupt lines (INT1 –INT4 )
▪ Five internal interrupts,
▪ A timer interrupt and
▪ Four serial port interrupts, are user maskable.
5. Test/Emulation:
standard 1149.1 (JTAG) interface with boundary scan capability is used for
6. Clock Generator:
(PLL) circuit. The clock generator can be driven internally by a crystal resonator
7. Hardware Timer:
A 16-bit hardware timer with a 4-bit pre-scaler is available. The timer can be
stopped, restarted, reset, or disabled by specific status bits.
32
5.4 Addressing Modes
• Direct addressing
• Indirect addressing
• Immediate addressing
• Register addressing
• Dedicated-register addressing
• Memory-mapped register addressing
• Circular addressing
33
Immediate Addressing mode
Operand is explicitly known in value
Capability to include data as part of the instruction
Instruction Operation
ADD # imm #imm + A A
#imm: value represented by imm (fixed number such as filter coefficient is
known ahead of time)
A : accumulator register
• In addition, you can modify any scratch pad RAM (DARAM B2)
location or data page 0.
• The ’C5x supports two concurrent circular buffers operating via the
ARs.
4. Load the proper AR value, and set the corresponding circular buffer enable
bit in the CBCR.
35
5.5 Programming
INPUT:
#SD 8000 - 0004
8001 - 0004
#GO C000
Executing....
OUTPUT:
#SD 8005 - 0008
36
INPUT:
#SD 8001 - 0005
8002 - 0002
#GO C000 Executing....
OUTPUT:
#SD 8003 - 000A
5.6 Circular Buffering
Digital Signal Processors are designed to quickly carry out FIR filters and similar
techniques. To understand the hardware, we must first understand
the algorithms.
To start, we need to distinguish between off-line processing and real-time
processing. In off-line processing, the entire input signal resides in the computer
at the same time. For example, a geophysicist might use a seismometer to record
the ground movement during an earthquake. After the shaking is over, the
information may be read into a computer and analyzed in some way. Another
example of off-line processing is medical imaging, such as computed tomography
and MRI. The data set is acquired while the patient is inside the machine, but the
image reconstruction may be delayed until a later time. The key point is that all of
the information is simultaneously available to the processing program. This is
common in scientific research and engineering, but not in consumer products.
Off-line processing is the realm of personal computers and mainframes.
In real-time processing, the output signal is produced at the same time that the
input signal is being acquired. For example, this is needed in telephone
communication, hearing aids, and radar. These applications must have the
information immediately available, although it can be delayed by a short amount.
For instance, a 10 millisecond delay in a telephone call cannot be detected by the
speaker or listener. Likewise, it makes no difference if a radar signal is delayed by
a few seconds before being displayed to the operator. Real-time applications input
a sample, perform the algorithm, and output a sample, over-and-over.
Alternatively, they may input a group of samples, perform the algorithm, and
output a group of samples. This is the world of Digital Signal Processors.
Let us consider FIR filter being implemented in real-time. To calculate the output
sample, we must have access to a certain number of the most recent samples from
the input. For example, suppose we use eight coefficients in this filter, a0, a1, … a7.
This means we must know the value of the eight most recent samples from the
input signal, x[n], x[n-1], … x[n-7]. These eight samples must be stored in memory
and continually updated as new samples are acquired. What is the best way to
manage these stored samples? The answer is circular buffering.
37
Let us consider an eight sample circular buffer. This circular buffer is placed in
eight consecutive memory locations, 20041 to 20048. Figure (a) shows how the
eight samples from the input might be stored at one particular instant in time,
while (b) shows the changes after the next sample is acquired. The idea of
circular buffering is that the end of this linear array is connected to its beginning;
memory location 20041 is viewed as being next to 20048, just as 20044 is next to
20045. You keep track of the array by a pointer (a variable whose value is
an address) that indicates where the most recent sample resides. For instance, in
(a) the pointer contains the address 20044, while in (b) it contains 20045. When a
new sample is acquired, it replaces the oldest sample in the array, and the pointer
is moved one address ahead. Circular buffers are efficient because only one value
needs to be changed when a new sample is acquired.
Four parameters are needed to manage a circular buffer. First, there must be a
pointer that indicates the start of the circular buffer in memory (in this example,
20041). Second, there must be a pointer indicating the end of the array (e.g.,
20048), or a variable that holds its length (e.g., 8). Third, the step size of the
memory addressing must be specified. In Fig. 28-3 the step size is one, for
example: address 20043 contains one sample, address 20044 contains the next
sample, and so on. This is frequently not the case. For instance, the addressing
may refer to bytes, and each sample may require two or four bytes to hold its
value. In these cases, the step size would need to be two or four, respectively.
These three values define the size and configuration of the circular buffer, and will
not change during the program operation. The fourth value, the pointer to the
most recent sample, must be modified as each new sample is acquired. In other
words, there must be program logic that controls how this fourth value is updated
based on the value of the first three values. While this logic is quite simple, it
must be very fast. This is the whole point of this discussion; DSPs should be
optimized at managing circular buffers to achieve the highest possible execution
speed.
Circular buffering is also useful in off-line processing. Consider a program where
both the input and the output signals are completely contained in memory.
Circular buffering isn't needed for a convolution calculation, because every sample
can be immediately accessed. However, many algorithms are implemented
in stages, with an intermediate signal being created between each stage.
38
For instance, a recursive filter carried out as a series of biquads operates in
this way. The brute force method is to store the entire length of each
intermediate signal in memory. Circular buffering provides another option:
store only those intermediate samples needed for the calculation at hand.
This reduces the required amount of memory, at the expense of a more
complicated algorithm. The important idea is that circular buffers
are useful for off-line processing, but critical for real-time applications.
Now the steps needed to implement an FIR filter using circular buffers for
both the input signal and the coefficients can be concentrated. This list may
seem trivial and overexamined- it's not! The efficient handling of these
individual tasks is what separates a DSP from a traditional microprocessor.
For each new sample, all the following steps need to be taken:
The goal is to make these steps execute quickly. Since steps 6-12 will be repeated
many times (once for each coefficient in the filter), special attention must be given
to these operations. Traditional microprocessors must generally carry out these 14
steps in serial (one after another), while DSPs are designed to perform them
in parallel. In some cases, all of the operations within the loop (steps 6-12) can be
completed in a single clock cycle.
39
LINK TO VIDEOS:
40
6.4 Assignments ( For higher level learning and Evaluation -
Examples: Case study, Comprehensive design, etc.,)
UNIT V – DIGITAL SIGNAL PROCESSORS
CO
Q.No Questions BT Level
Level
Distinguish between off-line processing and real-time
1. CO5 K3
processing with an example
Write an assembly language program to perform circular
2. convolution of two 8-pint sequences using instructions of CO5 K3
TMS320C54x processors
3. Study the features of fixed point and floating point architecture. CO6 K3
41
6.5 Part A Q & A (with K level and CO)
UNIT V – DIGITAL SIGNAL PROCESSORS
PART - A
CO BT
Q.No Questions
Level Level
1. Mention the features of the DSP processor
42
UNIT V – DIGITAL SIGNAL PROCESSORS
PART - A
CO BT
Q.No Questions
Level Level
4. What are the different stages in pipelining?
Pipelining divides the instruction in 5 stages
•Instruction fetch,
•Instruction decode, CO5 K2
•Operand fetch,
•Instruction execution and
•Operand store.
5. What is pipelining in DSP ?
43
UNIT V – DIGITAL SIGNAL PROCESSORS
PART - A
CO BT
Q.No Questions
Level Level
7. What are the applications of PDSP’s?
PDSPs are designed mainly for embedded
DSP applications. As such, the user may never realize the
existence of a PDSP in an information appliance. CO5 K2
Important applications of PDSPs include modem, hard
drive controller, cellular phone data pump, set-top box, etc.
44
UNIT V – DIGITAL SIGNAL PROCESSORS
PART - A
CO BT
Q.No Questions
Level Level
11. Distinguish between fixed point and floating point
architecture.
S.
No Fixed point Floating point
architecture architecture
1 Fixed-point DSPs are Floating-point DSPs
designed to represent and represent and manipulate
manipulate integers – rational numbers via a
positive and negative minimum of 32 bits in a
whole numbers – via a manner similar to scientific
minimum of 16 bits, notation, where a number is
yielding up to 65,536 represented with a mantissa
possible bit patterns (216) and an exponent yielding up K2
CO6
to 4,294,967,296 possible
bit patterns (232)
45
6.6 Part B Qs (with K level and CO)
PART – B UNIT-V
CO BT
Q.No Questions
Level Level
1. Write an ALP to perform circular convolution through MAC
CO6 K3
operation in TMS320C5x.
2.
Explain in detail about the types of the DSP processors CO6 K2
3
Write in detail about TMS320C5x architecture CO6 K2
4 Explain the various addressing modes of TMS320C5x
CO6 K2
processor with example.
5
Compare fixed and floating point architecture CO6 K2
6 Explain the various instruction sets of TMS320C5x processor
CO6 K2
with example.
7
Explain in detail about VLIW architecture. CO6 K2
46
6.7 Supportive online Certification courses (NPTEL,
Swayam, Coursera, Udemy, etc.,)
https://swayam.gov.in/nd1_noc19_ee50/
https://www.coursera.org/learn/dsp1
INSTRUCTOR
Paolo Prandoni
https://online.stanford.edu/courses/ee264-digital-signal-processing
47
6.8 Real time Applications in day to day life and to
Industry
1. https://www.analog.com/media/en/technical-documentation/dsp-
book/dsp_book_Ch9.pdf
APPLICATIONS OF DSP
48
6.9 Contents beyond the Syllabus ( COE related Value
added courses)
1. https://www.intechopen.com/books/applications-of-digital-signal-
processing-through-practical-approach/application-of-dsp-in-power-
conversion-systems-a-practical-approach-for-multiphase-drives
49
7. Assessment Schedule
Internal Assessment 2
Revision Test 1
Model Exam
University Exam
50
8. Prescribed Text Books & Reference Books
TEXT BOOK:
REFERENCES:
51
9. Mini Project suggestions
52
Thank you
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.
53