0% found this document useful (0 votes)

20 views

DSP Processors: We Have Seen That The Multiply and Accumulate (MAC) Operation Is Very Prevalent in DSP Computation

This document discusses the design of a digital signal processor (DSP). It describes 5 steps taken to optimize a basic CPU design for digital signal processing tasks: 1) Adding a MAC (multiply-accumulate) instruction, 2) Adding parallel pointer arithmetic units, 3) Adding separate memory banks and buses, 4) Adopting a Harvard architecture with separate program and data memory, 5) Adding pipelining to allow parallel instruction execution. Even with these optimizations, additional cycles are still needed per MAC operation, showing the need for further optimization.

Uploaded by

ertwert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

DSP Processors: We Have Seen That The Multiply and Accumulate (MAC) Operation Is Very Prevalent in DSP Computation

Uploaded by

ertwert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 9

DSP Processors

We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters correlation of two signals x DSP FFT

A Digital Signal Processor (DSP) is a CPU that can compute each MAC tap in 1 clock cycle
Thus the entire L coefficient MAC takes (about) L clock cycles For in real-time the time between input of 2 x values must be more than L clock cycles

XTAL
ALU with ADD, MULT, etc
registers

bus

memory

a c

b d
DSP Slide 1

MACs
the basic MAC loop is
loop over all times n initialize yn 0 loop over i from 1 to number of coefficients yn yn + ai * xj (j related to i) output yn

in order to implement in low-level programming for real-time we need to update the static buffer from now on, we'll assume that x values in pre-prepared vector for efficiency we don't use array indexing, rather pointers we must explicitly increment the pointers we must place values into registers in order to do arithmetic
loop over all times n clear y register set number of iterations to n loop update a pointer update x pointer multiply z a * x (indirect addressing) increment y y + z (register operations) output y

DSP

Slide 2

Cycle counting
We still cant count cycles need to take fetch and decode into account need to take loading and storing of registers into account we need to know number of cycles for each arithmetic operation let's assume each takes 1 cycle (multiplication typically takes more) assume zero-overhead loop (clears y register, sets loop counter, etc.) Then the operations inside the outer loop look something like this: 1. Update pointer to ai 2. Update pointer to xj 3. Load contents of ai into register a 4. Load contents of xj into register x 5. Fetch operation (MULT) 6. Decode operation (MULT) 7. MULT a*x with result in register z 8. Fetch operation (INC) 9. Decode operation (INC) 10. INC register y by contents of register z So it takes at least 10 cycles to perform each MAC using a regular CPU
DSP Slide 3

Step 1 - new opcode

To build a DSP we need to enhance the basic CPU with new hardware (silicon) The easiest step is to define a new opcode called MAC
Note that the result needs a special register
Example: if registers are 16 bit product needs 32 bits And when summing many need 40 bits
ALU with ADD, MULT, MAC, etc
p-registers

bus
px

memory

The code now looks like this:

PC
accumulator

2.
3. 4. 5. 6. 7.

Update pointer to ai y a x Update pointer to xj Load contents of ai into register a Load contents of xj into register x Fetch operation (MAC) Decode operation (MAC) MAC a*x with incremented to accumulator y

registers

However 7 > 1, so this is still NOT a DSP !

DSP Slide 4

Step 2 - register arithmetic

The two operations Update pointer to ai Update pointer to xj could be performed in parallel but both performed by the ALU
ALU with ADD, MULT, MAC, etc
p-registers

bus
px

memory

So we add pointer arithmetic units one for each register

Special sign || used in assembler to mean operations in parallel
1.

INC/DEC accumulator registers

Update pointer to ai || Update pointer to xj 2. Load contents of ai into register a 3. Load contents of xj into register x 4. Fetch operation (MAC) 5. Decode operation (MAC) 6. MAC a*x with incremented to accumulator y However 6 > 1, so this is still NOT a DSP !
DSP Slide 5

Step 3 - memory banks and buses

We would like to perform the loads in parallel but we can't since they both have to go over the same bus So we add another bus ALU with ADD, MULT, and we need to define memory banks MAC, etc bus so that no contention ! p-registers
There is dual-port memory but it has an arbitrator which adds delay
PC pa px
INC/DEC accumulator registers

bank 1

bus
a x

bank 2

y
1.

Update pointer to ai || Update pointer to xj 2. Load ai into a || Load xj into x 3. Fetch operation (MAC) 4. Decode operation (MAC) 5. MAC a*x with incremented to accumulator y However 5 > 1, so this is still NOT a DSP !
DSP Slide 6

Step 4 - Harvard architecture

Van Neumann architecture

one memory for data and program can change program during run-time one memory for program one memory (or more) for data needn't count fetch since in parallel we can remove decode as well (see later)

ALU with ADD, MULT, MAC, etc

p-registers

bus bus bus

data 1 data 2 program

Harvard architecture (predates VN)

INC/DEC accumulator registers

Update pointer to ai || Update pointer to xj 2. Load ai into a || Load xj into x 3. MAC a*x with incremented to accumulator y However 3 > 1, so this is still NOT a DSP !
1.
DSP Slide 7

Step 5 - pipelines
We seem to be stuck Update MUST be before Load Load MUST be before MAC But we can use a pipelined approach Then, on average, it takes 1 tick per tap actually, if pipeline depth is D, N taps take N+D-1 ticks For large N >> D or when we fill the pipeline the number of ticks per tap is 1 (this is a DSP)

op
U1 U2 L1 U3 L2 M1 U4 L3 M2 U5 L4 M3 L5 M4 M5

t
1 2 3 4 5 6 7
DSP Slide 8

Fixed point
Most DSPs are fixed point, i.e. handle integer (2s complement) numbers only

floating point is more expensive and slower floating point numbers can underflow fixed point numbers can overflow

Accumulators have guard bits to protect against overflow

When regular fixed point CPUs overflow numbers greater than MAXINT become negative numbers smaller than -MAXINT become positive
Most fixed point DSPs have a saturation arithmetic mode numbers larger than MAXINT become MAXINT numbers smaller than -MAXINT become -MAXINT this is still an error, but a smaller error There is a tradeoff between safety from overflow and SNR
DSP Slide 9

DSP-8 (DSP Processors)
No ratings yet
DSP-8 (DSP Processors)
8 pages
DSP Processor Fundamentals
No ratings yet
DSP Processor Fundamentals
58 pages
DSP Architecture
100% (1)
DSP Architecture
71 pages
chap15
No ratings yet
chap15
61 pages
DSP Presentation Overview For Class
100% (1)
DSP Presentation Overview For Class
71 pages
unit-5
No ratings yet
unit-5
71 pages
Embedded Systems Notes
No ratings yet
Embedded Systems Notes
13 pages
DSP R20 Unit V
No ratings yet
DSP R20 Unit V
23 pages
DSP_presentation_Sumit 2
No ratings yet
DSP_presentation_Sumit 2
68 pages
Imp 22
No ratings yet
Imp 22
31 pages
Digital Signal Processors: Inderdeep Kaur Aulakh Asst. Prof. (IT), UIET Pu, CHD
No ratings yet
Digital Signal Processors: Inderdeep Kaur Aulakh Asst. Prof. (IT), UIET Pu, CHD
19 pages
Dspa 17ec751 M2
No ratings yet
Dspa 17ec751 M2
27 pages
Chap 15
No ratings yet
Chap 15
60 pages
DSP Architectures
No ratings yet
DSP Architectures
71 pages
DSP_presentation_Sumit 1
No ratings yet
DSP_presentation_Sumit 1
71 pages
DSP_presentation_Sumit 3
No ratings yet
DSP_presentation_Sumit 3
63 pages
8 Data Address Generator
No ratings yet
8 Data Address Generator
14 pages
DSP Processors
100% (1)
DSP Processors
24 pages
9 Program Sequencer
No ratings yet
9 Program Sequencer
68 pages
INTRODUCTION TO DSP PROCESSORS Unit-5
No ratings yet
INTRODUCTION TO DSP PROCESSORS Unit-5
43 pages
Architecture
No ratings yet
Architecture
112 pages
DSP Unit-6
No ratings yet
DSP Unit-6
26 pages
Unit-5 DSP Processor
No ratings yet
Unit-5 DSP Processor
28 pages
Introduction To DSP Processors: K. Vijaya Kumar Asst. Prof. Usharama College of Engineering & Technology
No ratings yet
Introduction To DSP Processors: K. Vijaya Kumar Asst. Prof. Usharama College of Engineering & Technology
45 pages
DSP C16 - UNIT-6 (Ref-2)
No ratings yet
DSP C16 - UNIT-6 (Ref-2)
26 pages
DSPA
No ratings yet
DSPA
29 pages
Introduction To DSP Processors
No ratings yet
Introduction To DSP Processors
9 pages
SP Unit 3 SB
No ratings yet
SP Unit 3 SB
72 pages
Lec08 DSP
No ratings yet
Lec08 DSP
42 pages
Characteristics of DSP
100% (1)
Characteristics of DSP
15 pages
Unit 5
No ratings yet
Unit 5
24 pages
DSP Architecture - Part 1
No ratings yet
DSP Architecture - Part 1
36 pages
Intro To DSP
No ratings yet
Intro To DSP
30 pages
DSP_presentation_Sumit 4
No ratings yet
DSP_presentation_Sumit 4
55 pages
02 Architecture of Arm
No ratings yet
02 Architecture of Arm
43 pages
Architecture of TMS320C54XX Digital Signal Processors
100% (9)
Architecture of TMS320C54XX Digital Signal Processors
20 pages
Computational Building Blocks of DSP
80% (5)
Computational Building Blocks of DSP
28 pages
Introduction to Digital Signal Processors (DSPs)_student
No ratings yet
Introduction to Digital Signal Processors (DSPs)_student
24 pages
Module 2-1
No ratings yet
Module 2-1
93 pages
Unit 1dspa
No ratings yet
Unit 1dspa
95 pages
Sharc Processor
No ratings yet
Sharc Processor
97 pages
DSP Processor and Architecture
No ratings yet
DSP Processor and Architecture
45 pages
DSP Lab Manual C Matlab Programs Draft 2008 B.Tech ECE IV-I JNTU Hyd V 1.9
100% (21)
DSP Lab Manual C Matlab Programs Draft 2008 B.Tech ECE IV-I JNTU Hyd V 1.9
47 pages
Sanjay - High Performance DSP Architectures
No ratings yet
Sanjay - High Performance DSP Architectures
38 pages
Ece-Vii-dsp Algorithms & Architecture U2
No ratings yet
Ece-Vii-dsp Algorithms & Architecture U2
21 pages
DSP Processors
No ratings yet
DSP Processors
114 pages
Unit 2-2
No ratings yet
Unit 2-2
30 pages
DSP Lecture 01
100% (1)
DSP Lecture 01
39 pages
DSP_presentation_Sumit 5
No ratings yet
DSP_presentation_Sumit 5
45 pages
Pic® Micro Principles V11
From Everand
Pic® Micro Principles V11
Clive W. Humphris
No ratings yet
Pic® Micro Principles Teachers Pack V11
From Everand
Pic® Micro Principles Teachers Pack V11
Clive W. Humphris
No ratings yet
Pic® Micro Principles on Your Mobile
From Everand
Pic® Micro Principles on Your Mobile
Clive W. Humphris
No ratings yet
Learn the Pic® Micro on Your Smartphone
From Everand
Learn the Pic® Micro on Your Smartphone
Clive W. Humphris
No ratings yet
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
From Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Smart Arduino Projects: 10 Hands-On Builds with Shift Registers and Multiplexers for Automation and IoT
From Everand
Smart Arduino Projects: 10 Hands-On Builds with Shift Registers and Multiplexers for Automation and IoT
electronics projects
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
From Everand
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
Analog Dialogue
No ratings yet

DSP Processors: We Have Seen That The Multiply and Accumulate (MAC) Operation Is Very Prevalent in DSP Computation

Uploaded by

DSP Processors: We Have Seen That The Multiply and Accumulate (MAC) Operation Is Very Prevalent in DSP Computation

Uploaded by

DSP Processors

Step 1 - new opcode

The code now looks like this:

However 7 > 1, so this is still NOT a DSP !

Step 2 - register arithmetic

So we add pointer arithmetic units one for each register

INC/DEC accumulator registers

Step 3 - memory banks and buses

Step 4 - Harvard architecture

Van Neumann architecture

ALU with ADD, MULT, MAC, etc

bus bus bus

data 1 data 2 program

Harvard architecture (predates VN)

INC/DEC accumulator registers

Accumulators have guard bits to protect against overflow

You might also like