Digital Design
Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, with RTL Design, VHDL,
and Verilog, 2nd Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2010.
http://www.ddvahid.com
Copyright 2010 Frank Vahid
Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities,
subject to keeping this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf
Digital
2e
with animations)
may Design
not be posted
to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means.
Copyright
2010
1
Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors
Frank Vahid
may obtain PowerPoint
source or obtain special use permissions from Wiley see http://www.ddvahid.com for information.
5.1
Chpt 2
Capture Comb. behavior: Equations, truth tables
Convert to circuit: AND + OR + NOT Comb. logic
Chpt 3
Higher levels
Introduction
Registertransfer
level (RTL)
Logic level
Capture sequential behavior: FSMs
Convert to circuit: Register + Comb. logic Controller
Chpt 4
Datapath components, simple datapaths
Transistor level
Levels of digital
design abstraction
Chpt 5
Capture behavior: High-level state machine
Convert to circuit: Controller + Datapath Processor
Known as RTL (register-transfer level) design
Digital Design 2e
Copyright 2010
Frank Vahid
Processors:
Programmable
(microprocessor)
Custom
2
Note: Slides with animation are denoted with a small red "a" near the animated items
5.2
High-Level State Machines (HLSMs)
Some behaviors too complex for
equations, truth tables, or FSMs
Ex: Soda dispenser
c: bit input, 1 when coin deposited
a: 8-bit input having value of
deposited coin
s: 8-bit input having cost of a soda
d: bit output, processor sets to 1
when total value of deposited coins
equals or exceeds cost of a soda
c
d
8-bit input/output
Storage of current total
Addition (e.g., 25 + 10)
0 1 0 1 0
c
d
0 1 0
Digital Design 2e
Copyright 2010
Frank Vahid
Soda
dispenser
processor
FSM cant represent
50
a 25
25
Soda tot:
tot:
dispenser
25
processor 50
HLSMs
s
8
High-level state machine
(HLSM) extends FSM with:
c
d
Numbers:
Single-bit: '0' (single quotes)
Integer: 0 (no quotes)
Multi-bit: 0000 (double quotes)
a
== for equal, := for assignment
Multi-bit outputs must be
registered via local storage
// precedes a comment
Digital Design 2e
Copyright 2010
Frank Vahid
Soda
dispenser
processor
Multi-bit input/output
Local storage
Arithmetic operations
Conventions
Inputs: c (bit), a (8 bits), s (8 bits)
Outputs: d (bit) // '1' dispenses soda
Local storage: tot (8 bits)
c
Init
Wait
d:='0'
tot:=0
c*(tot<s)
tot:=tot+a
c'*(tot<s)
Disp
SodaDispenser
d:='1'
4
Ex: Cycles-High Counter
P = total number (in binary) of cycles that m is 1
Capture behavior as HLSM
Preg required (multibit outputs must be registered)
Inputs: m (bit)
Outputs: P (32 bits)
Local storage: Preg
Preg
32
Use to hold count
CountHigh
m
clk
CountHigh
CountHigh
S_Clr // Clear Preg to 0s
Preg := 0
Inputs: m (bit)
Outputs: P (32 bits)
Local storage: Preg
CountHigh
S_Clr // Clear Preg to 0s
Preg := 0
Inputs: m (bit)
Outputs: P (32 bits)
Local storage: Preg
S_Clr
// Clear Preg to 0s
Preg := 0
a
m'
S_Wt
// Wait for m == '1'
m'
m'
(a)
Digital Design 2e
Copyright 2010
Frank Vahid
(b)
S_Wt
// Wait for m == '1'
m
// Increment Preg
S_Inc Preg := Preg + 1
(c)
Note: Could have designed directly using an up-counter. But, that methodology
5
is ad hoc, and won't work for more complex examples, like the next one. a
Example: Laser-Based Distance Measurer
T (in seconds)
laser
D
Object of
interest
sensor
2D = T sec * 3*108 m/sec
Laser-based distance measurement pulse laser,
measure time T to sense reflection
Laser light travels at speed of light, 3*108 m/sec
Distance is thus D = (T sec * 3*108 m/sec) / 2
Digital Design 2e
Copyright 2010
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
laser
sensor
from button
D
to display
16
Laser-based
distance
measurer
to laser
S
from sensor
Inputs/outputs
B: bit input, from button, to begin measurement
L: bit output, activates laser
S: bit input, senses laser reflection
D: 16-bit output, to display computed distance
Digital Design 2e
Copyright 2010
Frank Vahid
Example: Laser-Based Distance Measurer
from button B
DistanceMeasurer
Inputs: B (bit), S (bit)
Outputs : L (bit), D (16 bits)
Local storage: Dreg(16)
to display
16
Laserbased
distance
measurer
to laser
from sensor
(required)
a
S0
(first state usually
initializes the system)
L := '0' // laser off
Dreg := 0 // distance is 0
Declare inputs, outputs, and local storage
Dreg required for multi-bit output
Create initial state, name it S0
Initialize laser to off (L:='0')
Initialize displayed distance to 0 (Dreg:=0)
Digital Design 2e
Copyright 2010
Frank Vahid
Recall: '0' means single bit,
0 means integer
Example: Laser-Based Distance Measurer
from button B
DistanceMeasurer
...
B' // button not pressed
to display
S0
S1
L := '0'
Dreg := 0
B
// button
pressed
16
Laserbased
distance
measurer
to laser
from sensor
Add another state, S1, that waits for a button press
B' stay in S1, keep waiting
B go to a new state S2
Q: What should S2 do?
A: Turn on the laser
a
Digital Design 2e
Copyright 2010
Frank Vahid
Example: Laser-Based Distance Measurer
from button B
DistanceMeasurer
...
S0
L := '0'
Dreg := 0
B'
to display
S1
S2
S3
L := '1'
// laser on
L := '0'
// laser off
16
Laserbased
distance
measurer
to laser
from sensor
Add a state S2 that turns on the laser (L:='1')
Then turn off laser (L:='0') in a state S3
Q: What do next? A: Start timer, wait to sense reflection
a
Digital Design 2e
Copyright 2010
Frank Vahid
10
Example: Laser-Based Distance Measurer
B
from button
DistanceMeasurer
Inputs : B (bit), S (bit) Outputs : L (bit), D (16 bits)
Local storage: Dreg, Dctr (16 bits)
B'
S' // no reflection
S0
S1
L := '0'
Dreg := 0
Dctr := 0
// reset cycle
count
D
to display
16
Laser-based
distance
measurer
to laser
from sensor
S // reflection
?
S2
S3
L := '1'
L := '0'
Dctr := Dctr + 1
// count cycles
Stay in S3 until sense reflection (S)
To measure time, count cycles while in S3
To count, declare local storage Dctr
Initialize Dctr to 0 in S1. In S2 would have been O.K. too.
Don't forget to initialize local storagecommon mistake
Increment Dctr each cycle in S3
Digital Design 2e
Copyright 2010
Frank Vahid
11
Example: Laser-Based Distance Measurer
from button
DistanceMeasurer
Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits)
Local storage: Dreg, Dctr (16 bits)
to display
B'
S'
S0
S1
L := '0'
Dreg := 0
Dctr := 0
S2
L := '1'
S3
Laserbased
16
distance
measurer
to laser
S
from sensor
S4
Dreg := Dctr/2
L := '0'
Dctr := Dctr+1 // calculate D
Once reflection detected (S), go to new state S4
Calculate distance
Assuming clock frequency is 3x108, Dctr holds number of meters, so
Dreg:=Dctr/2
After S4, go back to S1 to wait for button again
Digital Design 2e
Copyright 2010
Frank Vahid
12
HLSM Actions: Updates Occur Next Clock Cycle
S'
Local storage updated on clock edges only
Enter state on clock edge
Storage writes in that state occur on next clock edge
Can think of as occurring on outgoing transitions
S3
S
Dctr := Dctr+1
S' / Dctr := Dctr+1
Thus, transition conditions use the OLD value,
not the newly-written value
S3
Example:
Inputs: B (bit)
Outputs: P (bit) // if B, 2 cycles high
Local storage: Jreg (8 bits)
B'
!(Jreg<2)
S0
P := '0'
Jreg := 1
Digital Design 2e
Copyright 2010
Frank Vahid
Jreg<2
S1
P := '1'
Jreg := Jreg + 1
(a)
clk
S0
S1
S1
S0
1
?
2
1
3
2
S/
Dctr := Dctr+1
B
Jreg
P
(b)
13
5.3
RTL Design Process
Capture behavior
Convert to circuit
Need target architecture
Datapath capable of HLSM's
data operations
Controller to control datapath
External
control
...
inputs
External ...
control
outputs
DP
control
inputs
...
External data
inputs
...
Datapath
Controller
...
DP
control
outputs
...
External data
outputs
Digital Design 2e
Copyright 2010
Frank Vahid
14
Ctrl/DP Example for Earlier
Cycles-High Counter
CountHigh
000...00001
A
B
add1
S
32
CountHigh
m
First clear Preg to 0s
Then increment Preg for each
clock cycle that m is 1
Preg
(a)
Preg_clr
Preg_ld
(c)
We
created
this
HLSM
earlier
m'
S_Wt
(b)
Digital Design 2e
Copyright 2010
Frank Vahid
Connect
with
controller
Derive
controller
32
CountHigh
m
//Preg := 0
S_Clr Preg_clr = 1
Preg_ld = 0
m'
m'
//Increment Preg
S_Inc Preg := Preg + 1
Create DP
DP
000...00001
//Wait for m=='1'
m'
clr I
ld Preg
Q
CountHigh Inputs: m (bit)
Outputs: P (32 bits)
LocStr: Preg (32 bits)
//Clear Preg to 0s
S_Clr Preg := 0
//Wait for m=1 Preg_clr
S_Wt Preg_clr = 0
Preg_ld = 0
Preg_ld
m
DP
//Preg:=Preg+1
S_Inc Preg_clr = 0
Preg_ld = 1
A
B
add1
S
32
clr I
ld Preg
Q
Controller
(d)
32
15
RTL Design Process
Digital Design 2e
Copyright 2010
Frank Vahid
16
Example: Soda Dispenser from Earlier
s
Quick overview example.
More details of each step to come.
tot_ld
tot_clr
Inputs: c (bit), a (8 bits), s (8 bits)
Outputs: d (bit) // '1' dispenses soda
Local storage: tot (8 bits)
ld
clr
d:='0'
tot:=0
Wait
c*(tot<s)
Digital Design 2e
Copyright 2010
Frank Vahid
8-bit
adder
8-bit
<
tot_lt_s
tot:=tot+a
Step 2A
c'*(tot<s)
d:='1'
Step 1
Datapath
Disp
SodaDispenser
tot
c
Init
c
tot_ld
tot_clr
Controller
tot_lt_s
Datapath
Step 2B
17
Example: Soda Dispenser
Quick overview example.
More details of each step to come.
c
tot_ld
tot_clr
Inputs: c (bit), a (8 bits), s (8 bits)
Outputs: d (bit) // '1' dispenses soda
Local storage: tot (8 bits)
Controller
tot_lt_s
Datapath
Step 2B
c
Init
d:='0'
tot:=0
Wait
c*(tot<s)
tot:=tot+a
c'*(tot<s)
Inputs : c, tot_lt_s (bit)
Outputs : d, tot_ld , tot_clr (bit)
c
d
Init
Wait
d=0
tot_clr=1
c' *
tot_lt_s
Disp
SodaDispenser
d:='1'
Step 1
Digital Design 2e
Copyright 2010
Frank Vahid
tot_ld
c
Add
tot_clr
tot_ld=1
tot_lt_s
c*tot_lt_s
Disp
Controller
d=1
Step 2C
18
Example: Soda Dispenser
Quick overview example.
More details of each step to come.
Inputs : c, tot_lt_s (bit)
Outputs : d, tot_ld , tot_clr (bit)
Wait
Add
Disp
c
0
0
1
0
0
0
0
1
1
tot_clr
s0
0
0
0
tot_ld
s1
0
0
0
tot_lt_s
Init
1
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
n1
0
0
0
n0
1
1
1
d
0
0
0
1
1
1
1
0
0
0
1
1
0
0
1
0
1
0
1
0
1
1
0
0
1
tot_ld
c
Add
tot_clr
tot_ld=1
tot_lt_s
d
Init
Wait
d=0
tot_clr=1
c *
tot_lt_s
c*tot_lt_s
Disp
d=1
Controller
Step 2C
Use controller design process
(Ch3) to complete the design
a
Digital Design 2e
Copyright 2010
Frank Vahid
19
RTL Design ProcessStep 2A: Create a datapath
Sub-steps
HLSM data inputs/outputs Datapath inputs/outputs.
HLSM local storage item Instantiated register
"Instantiate": Add new component ("instance") to design
Each HLSM state action and transition condition data computation
Datapath components and connections
Also instantiate multiplexors as needed
Need component library from which to choose
clr I
ld reg
Q
A
B
add
S
A
B
cmp
lt eq gt
clk^ and clr=1: Q=0
clk^ and ld=1: Q=I
else Q stays same
S = A+B
(unsigned)
A<B: lt=1
A=B: eq=1
A>B: gt=1
Digital Design 2e
Copyright 2010
Frank Vahid
I
shift<L/R>
Q
shiftL1: <<1
shiftL2: <<2
shiftR1: >>1
...
I1 I0
mux2x1
s0 Q
s0=0: Q=I0
s0=1: Q=I1
20
Step 2A: Create a DatapathSimple Examples
X
Preg = X + Y + Z
Preg = Preg + X
Preg
Preg
Preg
regQ
(a)
X
DP
DP
clr I
ld Preg
Q
P
Digital Design 2e
Copyright 2010
Frank Vahid
A B
add1
S
A
B
add2
S
0 clr I
1 ld Preg
Q
0 clr I
1 ld regQ
Q
A
B
add1
S
0
1
(d)
Y
A
B
add1
S
A
B
add2
S
DP
X+Y
0
1
(c)
A
B
add1
S
A
B
add2
S
X+Y+Z
(b)
k=0: Preg = Y + Z
k=1: Preg = X + Y
Preg
Preg=X+Y; regQ=Y+Z
clr I
ld Preg
Q
I1 I0
mux2x1
s0 Q
DP
0 clr I
1 ld Preg
Q
P
21
Laser-Based Distance MeasurerStep 2A: Create a
Datapath
DistanceMeasurer
Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits)
Local storage: Dreg, Dctr (16 bits)
B'
S'
S0
S1
L := '0'
Dreg := 0
Dctr := 0
S2
L := '1'
S3
S4
L := '0'
Dreg := Dctr/2
Dctr := Dctr+1 // calculate D
1
a
HLSM data I/O DP I/O
HLSM local storage reg
HLSM state action and
transition condition data
computation Datapath
components and connections
Dreg_clr
Dreg_ld
Dctr_clr
Dctr_ld
Datapath
16
A
B
Add1: add(16)
S
16
16
I
Shr1: shiftR1(16)
Q
16
I
clr
ld Dctr: reg(16)
clr
ld
I
Dreg: reg(16)
Q
16
D
Digital Design 2e
Copyright 2010
Frank Vahid
22
Laser-Based Distance MeasurerStep 2B: Connecting
the Datapath to a Controller
from button
to laser
Controller
from sensor
Dreg_clr
Dreg_ld
Dctr_clr
Datapath
Dctr_ld
D
to display
16
300 MHz Clock
Digital Design 2e
Copyright 2010
Frank Vahid
23
Laser-Based Distance MeasurerStep 2C: Derive the
Controller FSM
HLSM
1
DistanceMeasurer
Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits)
Local storage: Dreg, Dctr (16 bits)
B'
Dreg_clr
Dreg_ld
S'
S0
S1
L := '0'
Dreg := 0
Dctr := 0
FSM has same
states,
transitions, and
control I/O
Achieve each
HLSM data
operation using
datapath control
signals in FSM
Digital Design 2e
Copyright 2010
Frank Vahid
S2
Dctr_clr
Dctr_ld
S3
L := '1'
A
B
Add1: add(16)
S
16
16
I
clr
ld Dctr: reg(16)
S4
I
Shr1: shiftR1(16)
Q
16
clr
ld
I
Dreg: reg(16)
Q
16
L := '0'
Dreg := Dctr/2
Dctr := Dctr+1 // calculate D
Inputs: B, S
Controller
Datapath
16
Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_ld
S
a
S0
S1
L=0
Dreg_clr = 1
Dreg_ld = 0
Dctr_clr = 0
Dctr_ld = 0
(laser off)
(clear Dreg)
L=0
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 1
Dctr_ld = 0
(clear count)
S2
S3
L=1
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 0
Dctr_ld = 0
(laser on)
L=0
Dreg_clr = 0
Dreg_ld = 0
Dctr_clr = 0
Dctr_ld = 1
(laser off)
(count up)
S4
L=0
Dreg_clr = 0
Dreg_ld = 1
Dctr_clr = 0
Dctr_ld = 0
(load Dreg with Dctr/2)
(stop counting)
24
Laser-Based Distance MeasurerStep 2C: Derive the
Controller FSM
Controller
Inputs: B, S
Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_ld
S0
S1
L=0
Dreg_clr = 1
(laser off)
(clear Dreg)
Dctr_clr = 1
(clear count)
Same FSM, using
convention of
unassigned
outputs implicitly
assigned 0
Digital Design 2e
Copyright 2010
Frank Vahid
S2
S3
L=1
(laser on)
L=0
Dctr_ld = 1
(laser off)
(count up)
S4
Dreg_ld = 1
Dctr_ld = 0
(load Dreg with Dctr/2)
(stop counting)
Some assignments to 0 still shown, due to
their importance in understanding
desired controller behavior
25
5.4
More RTL Design
Additional datapath components
(signed)
A
B
sub
S
S = A-B
(signed)
Digital Design 2e
Copyright 2010
Frank Vahid
A
B
mul
P
A
abs
Q
P = A*B
Q = |A|
(unsigned) (unsigned)
clr
inc upcnt
Q
clk^ and clr=1: Q=0
clk^ and inc=1: Q=Q+1
else Q stays same
W_d
W_a
clk^ and W_e=1:
W_e
RF[W_a]= W_d
RF R_e=1:
R_a
R_e
R_d = RF[R_a]
R_d
26
RTL Design Involving Register File or Memory
HLSM array: Ordered list of items
Ex: Local storage: A[4](8-bit) 4 8-bit items
Accessed using notation "A[i]", i is index
A[0] := 9; A[1] := 8; A[2] := 7; A[3] := 22
Array contents now: <9, 8, 7, 22>
X := A[1] will set X to 8
Note: First element's index is 0
Array can be mapped to instantiated register file or memory
Digital Design 2e
Copyright 2010
Frank Vahid
27
ArrayEx Inputs: (none)
Outputs: P (11 bits)
Local storage: A[4](11 bits)
Preg (11 bits)
Init1 Preg := 0
A[0] := 9
Simple Array Example
a
(A[0] == 8)'
Init2 A[1] := 12
12
11
A[0] == 8
A_s
Out1 Preg := A[1]
(a)
A_Wa0
A_Wa1
A_We
A_Ra0
A_Ra1
A_Re
ArrayEx Inputs: A_eq_8
Outputs: A_s, A_Wa0, ...
Init1
Preg_clr = 1
A_s = 0
A_Wa1=0, A_Wa1=0
A_We = 1
Init2
A_eq_8
11
I1 I0
Amux
s0 Q
W_d
W_a
W_e
A
RF[4](11)
R_a
R_e
R_d
8
A
B
Acmp
lt eq gt
A_eq_8
(A_eq_8)'
Preg_clr
A_s = 1
A_Wa1=0, A_Wa0=1
A_We = 1
A_Ra1=0, A_Ra0=0
A_Re = 1
Preg_ld
DP
(b)
clr I
ld Preg
Q
11
Out1 Preg_ld = 1
Digital Design 2e
Copyright 2010
Frank Vahid
Controller
(c)
a
28
RTL Example: Video Compression Sum of Absolute
Only difference: ball moving
Differences
Frame 1
Frame 2
Frame 1
Frame 2
Digitized
Digitized
Digitized
Difference of
frame 1
frame 2
frame 1
2 from 1
1 Mbyte
1 Mbyte
1 Mbyte
0.01 Mbyte
(a)
(b )
Video is a series of frames (e.g., 30 per second)
Most frames similar to previous frame
Just send
difference
Compression idea: just send difference from previous frame
Digital Design 2e
Copyright 2010
Frank Vahid
29
RTL Example: Video Compression Sum of Absolute
Differences
compare
Frame 1
Frame 2
Each is a pixel, assume
represented as 1 byte
(actually, a color picture
might have 3 bytes per
pixel, for intensity of
red, green, and blue
components of pixel)
Need to quickly determine whether two frames are similar
enough to just send difference for second frame
Compare corresponding 16x16 blocks
Treat 16x16 block as 256-byte array
Compute the absolute value of the difference of each array item
Sum those differences if above a threshold, send complete frame
for second frame; if below, can use difference method (using
another technique, not described)
Digital Design 2e
Copyright 2010
Frank Vahid
30
Array Example: Video CompressionSum-of-Absolute
Differences
SAD
A
RF[256](8)
B
RF[256](8)
Inputs: A, B [256](8 bits); go (bit)
Outputs: sad (32 bits)
Local storage: sum, sadreg (32 bits); i (9 bits)
sad
!go
S0
go
go
S1
sum := 0
i := 0
a
S0: wait for go
S1: initialize sum and index
S2: check if done ( (i<256) )
S3: add difference to sum,
increment index
S4: done, write to output sad_reg
Digital Design 2e
Copyright 2010
Frank Vahid
(i<256)
S2
i<256
sum:=sum+abs(A[i]-B[i])
S3
i := i + 1
S4
sadreg := sum
(b)
31
Inputs: A, B [256](8 bits); go (bit)
Outputs: sad (32 bits)
Local storage: sum, sadreg (32 bits); i (9 bits)
S0
!go
go
S1
sum := 0
i := 0
Array Example: Video
CompressionSum-ofAbsolute Differences
!(i<256)
S2
i<256
sum:=sum+abs(A[i]-B[i])
S3
i := i + 1
S4
sadreg := sum
go
S0
go
(i<256) (i_lt_256)
S1
i_lt_256
go
sum=0 sum_clr=1
i=0 i_clr=1
A
cmp B
i_clr
sum_ld
i<256 i_lt_256
sum=sum+abs(A[i]-B[i])
S3
sum_ld=1; AB_rd=1
i=i+1 i_inc=1
sum_clr
sad_reg = sum
sadreg_ld=1
256
A_data B_data
8
a
S2
S4
lt
i_inc
Controller
Digital Design 2e
Copyright 2010
Frank Vahid
AB_addr
AB_rd
sum
32
abs
8
32 32
sadreg_ld
sadreg_clr
Datapath
sadreg
32
sad
32
Circuit vs. Microprocessor
Circuit: Two states (S2 & S3) for each i, 256 is 512 clock cycles
Microprocessor: Loop (for i = 1 to 256), but for each i, must move
memory to local registers, subtract, compute absolute value, add to
sum, increment i say 6 cycles per array item 256*6 = 1536 cycles
Circuit is about 3 times (300%) faster (assuming equal cycle lengths)
Later, well see how to build SAD circuit that is much faster
(i<256)
S2
i<256
sum:=sum+abs(A[i]-B[i])
S3
i:=i+1
Digital Design 2e
Copyright 2010
Frank Vahid
33
Common RTL Design Pitfall Involving Storage Updates
Questions
Local storage: R, Q (8 bits)
Value of Q after state A?
Final state is C or D?
R<100
A
R:=99
Q:=R
R:=R+1
Answers
Q is NOT 99 after state A
Q is 99 in state B, so final state is C
Storage update actions in state
occur simultaneously on next clock
edge
Thus, order actions are written is
irrelevant
A's actions same if:
Q:=R R:=99
R:=99 Q:=R
Digital Design 2e
Copyright 2010
Frank Vahid
(R<100)'
D
R<100
99
100
99
100
clk
C
a
or
34
Common RTL Design Pitfall Involving Storage Updates
New HLSM
using extra
state so read of
R occurs after
write of R
Local storage : R, Q (8 bits)
R<100
A
R:=99
Q:=R
R:=R+1
Q:=R
B2
(R<100)'
R<100 (R<100)'
Digital Design 2e
Copyright 2010
Frank Vahid
clk
B2
99
?
100
99
100
100
99
99
35
RTL Design Involving a Timer
Commonly need explicit time intervals
Ex: Repeatedly blink LED on 1 second, off 1 second
Pre-instantiate timer that HLSM can then use
BlinkLed
BlinkLed
T_M
T_ld
T_en
load
enable
Outputs: L (bit)
32
T_Q'
M
32-bit
1-microsec
Q timer T
T_Q
L
Timer: T
T_Q'
T_Q
T_Q
Init
Off
On
L:='0'
T:=1000000
T_en:='0'
L:='0'
T_en:='1'
L:='1'
T_en:='1'
(a)
(b)
Pre-instantiated timer
HLSM making use of timer
Digital Design 2e
Copyright 2010
Frank Vahid
36
Button Debouncing
Press button
Ideally, output changes to 1
Actually, output bounces
button
B
Due to mechanical reasons
Like ball bouncing when dropped to
floor
Ideal: B
Digital circuit can convert actual
signal closer to ideal signal
Actual: B
bounce
Inputs: Bin (bit) Outputs: Bout (bit)
Timer: T
ButtonDebouncer
Bin'
T_Q'
Bin
Digital Design 2e
Copyright 2010
Frank Vahid
Bin
Bin'
T_Q
Init
WaitBin
Wait20
Bout :='0'
T:=20000
T_en:='0'
Bout:='0'
T_en:='0'
Bout:='1'
T_en:='1'
WhileBin
Bout:='1'
T_en:='0'
37
a
Data Dominated RTL Design Example
Data dominated design: Extensive DP,
simple controller
Control dominated design: Complex
controller, simple DP
Example: Filter
Converts digital input stream to new
digital output stream
Ex: Remove noise
12
digital filter
12
clk
180, 180, 181, 180, 240, 180, 181
240 is probably noise, filter might replace
by 181
Simple filter: Output average of last N
values
Small N: less filtering
Large N: more filtering, but less sharp
output
Digital Design 2e
Copyright 2010
Frank Vahid
38
Data Dominated RTL Design Example: FIR Filter
FIR filter
Finite Impulse Response
Simply a configurable weighted
sum of past input values
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Above known as 3 tap
Tens of taps more common
Very general filter User sets
the constants (c0, c1, c2) to
define specific filter
X
12
12
clk
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Inputs: X (12 bits) Outputs: Y (12 bits)
Local storage: xt0, xt1, xt2, c0, c1, c2 (12 bits);
Yreg (12 bits)
Init
RTL design
Step 1: Create HLSM
Very simple states/transitions
FIR filter
Digital Design 2e
Copyright 2010
Frank Vahid
digital filter
Yreg := 0
xt0 := 0
xt1 := 0
xt2 := 0
c0 := 3
c1 := 2
c2 := 2
FC
Yreg :=
c0*xt0 +
c1*xt1 +
c2*xt2
xt0 := X
xt1 := xt0
xt2 := xt1
Assume constants set to 3, 2, and 2
39
FIR
Filter
Inputs: X (12 bits) Outputs: Y (12 bits)
Local storage: xt0, xt1, xt2, c0, c1, c2 (12 bits);
Yreg (12 bits)
Init
FIR filter
FC
Yreg := 0
xt0 := 0
xt1 := 0
xt2 := 0
c0 := 3
c1 := 2
c2 := 2
Yreg :=
c0*xt0 +
c1*xt1 +
c2*xt2
xt0 := X
xt1 := xt0
xt2 := xt1
3
xt0_clr
c0
xt0
c2_ld
c1
xt1
c2
...
c1_ld
...
xt0_ld
Set clr and ld lines appropriately
c0_ld
Step 2A: Create datapath
Step 2B: Connect Ctrlr/DP (as
earlier examples)
Step 2C: Derive FSM
xt2
12
clk
x(t)
x(t-1)
x(t-2)
*
Yreg_clr
Yreg_ld
Y
Yreg
Datapath for 3-tap FIR filter
Digital Design 2e
Copyright 2010
Frank Vahid
12
40
Circuit vs. Microprocessor
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Comparing the FIR circuit to microprocessor instructions
Microprocessor
100-tap filter: 100 multiplications, 100 additions. Say 2 instructions
per multiplication, 2 per addition. Say 10 ns per instruction.
(100*2 + 100*2)*10 = 4000 ns
Circuit
Assume adder has 2 ns delay, multiplier has 20 ns delay
Longest path goes through one multiplier and two adders
20 + 2 + 2 = 24 ns delay
100-tap filter, following design on previous slide, would have about a
34 ns delay: 1 multiplier and 7 adders on longest path
Circuit is more than 100 times faster (4000/34). Wow.
Digital Design 2e
Copyright 2010
Frank Vahid
41
5.5
Determining Clock Frequency
Designers of digital circuits
often want fastest
performance
clk
Means want high clock
frequency
Frequency limited by longest
register-to-register delay
Known as critical path
If clock is any faster, incorrect
data may be stored into register
Longest path on right is 2 ns
2 ns
delay
+
c
Ignoring wire delays, and
register setup and hold times,
for simplicity
Digital Design 2e
Copyright 2010
Frank Vahid
42
Critical Path
Example shows four paths
Digital Design 2e
Copyright 2010
Frank Vahid
5 ns
delay
2 ns
1 / 7 ns = 142 MHz
Max
(2,7,7,5)
= 7 ns
7 ns
Longest path is thus 7 ns
Fastest frequency
2 ns
delay
7 ns
a to c through +: 2 ns
a to d through + and *: 7 ns
b to d through + and *: 7 ns
b to d through *: 5 ns
2 ns
d
a
43
Critical Path Considering Wire Delays
Real wires have delay too
Must include in critical path
Example shows two paths
clk
Each is 0.5 + 2 + 0.5 = 3 ns
Wire delays may even be greater than
logic delays!
0.5 ns
2 ns
a
0.5 ns
3 ns
1980s/1990s: Wire delays were tiny
compared to logic delays
But wire delays not shrinking as fast as
logic delays
0.5 ns
3 ns
Trend
Must also consider register setup and
hold times, also add to path
Then add some time to the computed
path, just to be safe
e.g., if path is 3 ns, say 4 ns instead
Digital Design 2e
Copyright 2010
Frank Vahid
44
A Circuit May Have Numerous Paths
Paths can exist
In the datapath
In the controller
Between the
controller and
datapath
May be
hundreds or
thousands of
paths
Timing analysis
tools that evaluate
all possible paths
automatically very
helpful
Digital Design 2e
Copyright 2010
Frank Vahid
s
Combinational logic
tot_ld
ld
tot
t ot_clr
clr
(c )
tot_lt_s
n1
8-bit
<
n0
8-bit
adder
8
tot_lt_s
Datapath
s1
clk
s0
(b )
(a)
State register
45
Behavioral Level Design: C to Gates
Inputs : A, B [256](8 bits); go (bit)
Outputs : sad (32 bits)
Local storage : sum, sadreg (32 bits); i (9 bits)
S0
go
S1
!go
sum := 0
i := 0
(i<256)
S2
i<256
S3 sum:=sum+abs(A[i]-B[i])
i := i + 1
S4
sadreg := sum
5.6
C code
int SAD (byte A[256], byte B[256]) // not quite C syntax
{
uint sum; short uint I;
sum = 0;
i = 0;
while (i < 256) {
sum = sum + abs(A[i] B[i]);
i = i + 1;
}
return sum;
}
Earlier sum-of-absolute-differences example
Started with high-level state machine
C code is an even better starting point -- easier to understand
Digital Design 2e
Copyright 2010
Frank Vahid
46
Converting from C to High-Level State Machine
Convert each C construct to
equivalent states and
transitions
Assignment statement
Becomes one state with
assignment
target :=
expression
target = expression;
If-then statement
Becomes state with condition
check, transitioning to then
statements if condition true,
otherwise to ending state
then statements would also
be converted to states
Digital Design 2e
Copyright 2010
Frank Vahid
cond
if (cond) {
// then stmts
}
cond
a
(then stmts)
(end)
47
Converting from C to High-Level State Machine
If-then-else
Becomes state with condition
check, transitioning to then
statements if condition true, or
to else statements if condition
false
cond
if (cond) {
// then stmts
}
else {
// else stmts
}
cond
(then stmts) (else stmts)
a
(end)
While loop statement
Becomes state with condition
check, transitioning to while
loops statements if true, then
transitioning back to condition
check
Digital Design 2e
Copyright 2010
Frank Vahid
cond
while (cond) {
// while stmts
}
cond
(while stmts)
(end)
48
Simple Example of Converting from C to HighLevel State Machine
Inputs: uint X, Y
Outputs: uint Max
(X>Y)
(X>Y)
X>Y
X>Y
if (X > Y) {
Max = X;
(then stmts)
(else stmts)
Max:=X
Max:=Y
}
else {
Max = Y;
(end)
(end)
}
a
(a)
(b)
(c)
Simple example: Computing the maximum of two numbers
Convert if-then-else statement to states (b)
Then convert assignment statements to states (c)
Digital Design 2e
Copyright 2010
Frank Vahid
49
Example: SAD C
code to HLSM
Convert each construct
to states
Inputs: byte A[256],B[256]
bit go;
Output: int sad
main()
{
uint sum; short uint i;
while (1) {
Actually, subset of C
(not all C constructs
easily convertible)
Can use language
other than C
go'
go'
go'
go
sum:=0
i:=0
i=0
sum = 0;
i = 0;
go
sum:=0
while (!go);
Simplify, e.g., merge
states
RTL design process to
convert to circuit
Can thus convert C to
circuit using
straightforward process
(go')'
(c)
(b)
while (i < 256) {
sum = sum + abs(A[i] B[i]);
i = i + 1;
}
sad = sum;
go'
(d)
go
go'
go
(a)
sum:=0
i:=0
go'
sum:=0
i:=0
go
(i<256)'
sum:=0
i:=0
i<256
(i<256)'
sum:=sum
+ abs...
i := i + 1
i<256
while stmts
(i<256)'
i<256
sum:=sum
+ abs...
i := i + 1
sadreg :=
sum
(g)
Digital Design 2e
Copyright 2010
Frank Vahid
sadreg :=
sum
(e)
(f)
50
5.7
Memory Components
Some components are used
outside the controller and DP
MxN memory
M words
RTL design instantiates
datapath components to
create datapath, controlled
by a controller
M words, N bits wide each
Several varieties of memory,
which we now introduce
N-bits
wide each
MN memory
Digital Design 2e
Copyright 2010
Frank Vahid
51
Random Access Memory (RAM)
RAM Readable and writable memory
Random access memory
Strange nameCreated several decades ago to
contrast with sequentially-accessed storage like
tape drives
Logically same as register fileMemory with
address inputs, data inputs/outputs, and control
32
4
32
W_data
R_data
W_addr
R_addr
W_en
1632
register file
R_en
Register file from Chpt. 4
RAM usually one port; RF usually two or more
RAM vs. RF
RAM typically larger than about 512 or 1024 words
RAM typically stores bits using a bit storage
approach that is more efficient than a flip-flop
RAM typically implemented on a chip in a square
rather than rectangular shapekeeps longest
wires (hence delay) short
32
data
10
addr
rw
1024 32
RAM
en
RAM block symbol
Digital Design 2e
Copyright 2010
Frank Vahid
52
RAM Internal Structure
32
10
wdata(N-1) wdata(N-2) wdata0
Let A = log2M
data
addr
rw
1024x32
RAM
d0
en
a0
a1 AxM
d1
decoder
a(A-1)
addr0
addr1
addr(A-1)
clk
en
rw
w ord
data cell
word word
enable enable
rw data
d(M-1)
to all cells
wdata0
wdata
(N-1)
rdata
(N-1)
bit storage
block
(aka cell)
rdata0
Combining rd and wr
data lines
word
enable
data0
Similar internal structure as register file
rdata(N-1) rdata(N-2) rdata0
RAM cell
rw
data(N-1)
Digital Design 2e
Copyright 2010
Frank Vahid
Decoder enables appropriate word based on address inputs
rw controls whether cell is written or read
rd and wr data lines typically combined
53
Lets see whats inside each RAM cell
Static RAM (SRAM)
32
10
SRAM cell
data
data
cell
data
addr
rw
1024x32
RAM
en
word
enable
SRAM cell
data
0
data
1
Static RAM cell
6 transistors (recall inverter is 2 transistors)
Writing this cell
word enable input comes from decoder
When 0, value d loops around inverters
That loop is where a bit stays stored
d
a
word
enable
When 1, the data bit value enters the loop
data is the bit to be stored in this cell
data enters on other side
Example shows a 1 being written into cell
Digital Design 2e
Copyright 2010
Frank Vahid
data
1
word
enable
data
cell
d
54
Static RAM (SRAM)
32
10
data
addr
rw
1024x32
RAM
en
Static RAM cell
SRAM cell
Somewhat trickier
When rw set to read, the RAM logic sets
both data and data to 1
The stored bit d will pull either the left line or
the right bit down slightly below 1
Sense amplifiers detect which side is
slightly pulled down
data
1
data
1
Reading this cell
d
1
0
a
word
enable
<1
To sense amplifiers
The electrical description of SRAM is really
beyond our scope just general idea here,
mainly to contrast with DRAM...
Digital Design 2e
Copyright 2010
Frank Vahid
55
Dynamic RAM (DRAM)
32
10
data
addr
rw
1024x32
RAM
en
DRAM cell
Dynamic RAM cell
1 transistor (rather than 6)
Relies on large capacitor to store bit
Write: Transistor conducts, data voltage
level gets stored on top plate of capacitor
Read: Just look at value of d
Problem: Capacitor discharges over time
Must refresh regularly, by reading d and
then writing it right back
data
cell
word
enable
capacitor
slowly
discharging
(a)
data
enable
d
Digital Design 2e
Copyright 2010
Frank Vahid
discharges
(b)
56
Comparing Memory Types
Register file
Fastest
But biggest size
SRAM
Fast
More compact than register file
DRAM
MxN Memory
implemented as a:
register
file
SRAM
DRAM
Slowest
And refreshing takes time
But very compact
Use register file for small items,
SRAM for large items, and DRAM
for huge items
Size comparison for same
number of bits (not to scale)
Note: DRAMs big capacitor requires
a special chip design process, so
DRAM is often a separate chip
Digital Design 2e
Copyright 2010
Frank Vahid
57
Reading and Writing a RAM
clk
2
1
addr
13
data
500
999
rw
3
9
Z
addr
500
1 means write
en
clk
data
rw
valid setup
time
valid
hold
time
setup
time
500
access
time
RAM[9]
RAM[13]
now equals 500 now equals 999
Writing
(b)
Put address on addr lines, data on data lines, set rw=1, en=1
Reading
Set addr and en lines, but put nothing (Z) on data lines, set rw=0
Data will appear on data lines
Dont forget to obey setup and hold times
In short keep inputs stable before and after a clock edge
Digital Design 2e
Copyright 2010
Frank Vahid
58
RAM Example: Digital Sound Recorder
wire
microphone
en
rw
addr
data
4096x16
RAM
16
analog-todigital
conver ter
ad_buf
ad_ld
12
Ra RrwRen
processor
digital-toanalog
conver ter
wire
da_ld
Behavior
speaker
Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
Well use a 4096x16 RAM (12-bit wide RAM not common)
Play back later
Common behavior in telephone answering machine, toys, voice recorders
To record, processor should read a-to-d, store read values into
successive RAM words
To play, processor should read successive RAM words and enable d-to-a
Digital Design 2e
Copyright 2010
Frank Vahid
59
RAM Example: Digital Sound Recorder
4096x16
RAM
RTL design of processor
Create HLSM
Begin with the record behavior
Create local storage a
Stores current address,
ranges from 0 to 4095 (thus
need 12 bits)
Create state machine that
counts from 0 to 4095 using a
For each a
Read analog-to-digital conv.
ad_ld:=1, ad_buf:=1
Write to RAM at address a
Rareg:=a, Rrw:=1,
Ren:=1
Digital Design 2e
Copyright 2010
Frank Vahid
16
analog-todigital
converter
ad_buf
ad_ld
12
digital-toanalog
converter
Ra Rw Ren
processor
da_ld
Record behavior
Local register: a, Rareg (12 bits)
a<4095
S
a:=0
T
ad_ld:=1
ad_buf:=1
Rareg:=a
Rrw:=1
Ren:=1
U
a:=a+1
(a<4095)
60
RAM Example: Digital Sound Recorder
Now create play behavior
Use local register a again,
create state machine that
counts from 0 to 4095 again
For each a
Read RAM
Write to digital-to-analog conv.
Note: Must write d-to-a one
cycle after reading RAM, when
the read data is available on
the data bus
The record and play state
machines would be parts of a
larger state machine controlled
by signals that determine when
to record or play
Digital Design 2e
Copyright 2010
Frank Vahid
4096x16
RAM
data bus
16
analog-todigital
converter
ad_buf
ad_ld
12
digital-toanalog
converter
Ra Rw Ren
processor
da_ld
Play behavior
Local register: a,Rareg (12 bits)
a<4095
V
a:=0
W
ad_buf:=0
Rareg:=a
Rrw=0
Ren=1
X
da_ld:=1
a:=a+1
(a<4095)
61
Read-Only Memory ROM
Memory that can only be read from, not
written to
32
data
10
addr
Data lines are output only
No need for rw input
rw
1024 32
RAM
en
Advantages over RAM
Compact: May be smaller
Nonvolatile: Saves bits even if power supply
is turned off
Speed: May be faster (especially than
DRAM)
Low power: Doesnt need power supply to
save bits, so can extend battery life
Choose ROM over RAM if stored data wont
change (or wont change often)
RAM block symbol
32
10
data
addr 1024x32
ROM
en
ROM block symbol
For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer
Digital Design 2e
Copyright 2010
Frank Vahid
62
Read-Only Memory ROM
32
data
10
addr
1024x32
ROM
Let A = log2M
en
d0
ROM block symbol
addr0
addr1
addr(A-1)
addr
a0
a1 AxM
d1
decoder
a(A-1)
clk
word
enable
bit storage
block
(aka cell)
w ord
data
word word
enable enable
data
d(M-1)
en
rdata(N-1) rdata(N-2) rdata0
ROM cell
Internal logical structure similar to RAM, without the data
input lines
Digital Design 2e
Copyright 2010
Frank Vahid
63
ROM Types
If a ROM can only be read, how
are the stored bits stored in the
first place?
Storing bits in a ROM known as
programming
Several methods
Mask-programmed ROM
Bits are hardwired as 0s or 1s
during chip manufacturing
2-bit word on right stores 10
word enable (from decoder) simply
passes the hardwired value
through transistor
data line
cell
data line
cell
word
enable
Notice how compact, and fast, this
memory would be
Digital Design 2e
Copyright 2010
Frank Vahid
64
ROM Types
Fuse-Based Programmable
ROM
Each cell has a fuse
A special device, known as a
programmer, blows certain fuses
(using higher-than-normal voltage)
Those cells will be read as 0s
(involving some special electronics)
Cells with unblown fuses will be read
as 1s
2-bit word on right stores 10
Also known as One-Time
Programmable (OTP) ROM
Digital Design 2e
Copyright 2010
Frank Vahid
data line
cell
data line
cell
a
word
enable
fuse
blown fuse
65
ROM Types
Erasable Programmable ROM
(EPROM)
Electrons become trapped in the gate
Only done for cells that should store 0
Other cells (without electrons trapped in
gate) will be 1
2-bit word on right stores 10
Details beyond our scope just general
idea is necessary here
floating-gate
transistor
Uses floating-gate transistor in each cell
Special programmer device uses higherthan-normal voltage to cause electrons to
tunnel into the gate
word
enable
data line
data line
cell
cell
a
1
0
1
e- e-
trapped electrons
To erase, shine ultraviolet light onto chip
Gives trapped electrons energy to escape
Requires chip package to have window
Digital Design 2e
Copyright 2010
Frank Vahid
66
ROM Types
Electronically-Erasable Programmable ROM
(EEPROM)
Similar to EPROM
Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
But erasing done electronically, not using UV light
Erasing done one word at a time
Flash memory
Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously
Became very common starting in late 1990s
Both types are in-system programmable
Can be programmed with new stored bits while in the
system in which the ROM operates
Requires bi-directional data lines, and write control input
Also need busy output to indicate that erasing is in
progress erasing takes some time
Digital Design 2e
Copyright 2010
Frank Vahid
32
data
10
addr
en
1024x32
EEPROM
write
busy
67
ROM Example: Talking Doll
Hello there!
4096x16 ROM
Hello there! audio
divided into 4096
samples, stored
in ROM
16
speaker
Hello there!
a
digital-toanalog
Ra Ren
processor
converter
vibration
sensor
da_ld
v
Doll plays prerecorded message, triggered by vibration
Message must be stored without power supply Use a ROM, not a RAM,
because ROM is nonvolatile
And because message will never change, may use a mask-programmed ROM or
OTP ROM
Processor should wait for vibration (v=1), then read words 0 to 4095 from
the ROM, writing each to the d-to-a
Digital Design 2e
Copyright 2010
Frank Vahid
68
ROM Example: Talking Doll
Local register: a, Rareg (12 bits)
4096x16 ROM
v
a:=0
16
digital-toanalog
converter
Ra Ren
processor
a<4095
T
Rareg:=a
Ren:=1
U
v
(a<4095)
da_ld
da_ld:=1
a:=a+1
HLSM
Create state machine that waits for v=1, and then counts from 0 to
4095 using a local storage a
For each a, read ROM, write to digital-to-analog converter
Digital Design 2e
Copyright 2010
Frank Vahid
69
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
Want to record the outgoing
announcement
What type of memory?
Should store without power
supply ROM, not RAM
Should be in-system
programmable EEPROM
or Flash, not EPROM, OTP
ROM, or mask-programmed
ROM
Will always erase entire
memory when
reprogramming Flash
better than EEPROM
Digital Design 2e
Copyright 2010
Frank Vahid
analog-todigital
converter
busy
erase
en
rw
Were not home.
addr
When rec=1, record digitized
sound in locations 0 to 4095
When play=1, play those
stored sounds to digital-toanalog converter
4096x16 Flash
data
16
ad_buf
ad_ld
12
Ra Rrw Ren er
processor
record
microphone
bu
digital-toanalog
converter
da_ld
rec
play
speaker
70
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
HLSM
Once rec=1, begin
erasing flash by setting
er=1
Wait for flash to finish
erasing by waiting for
bu=0
Execute loop that sets
local register a from 0 to
4095, reading analog-todigital converter and
writing to flash for each a
4096x16 Flash
analog-todigital
converter
ad_buf
12
ad_ld
Ra Rrw Ren er
processor
bu
digital-toanalog
converter
da_ld
rec
record
microphone
S
a:=0
er:=1
rec
Digital Design 2e
Copyright 2010
Frank Vahid
16
play
speaker
Local register: a, Rareg (13 bits)
bu
a<4096
T bu U
er:=0
ad_ld:=1
ad_buf:=1
Rareg:=a
Rrw:=1
Ren:=1
a:=a+1
V
(a<4096)
71
Blurring of Distinction Between ROM and RAM
We said that
RAM is readable and writable
ROM is read-only
ROM
Flash
EEPROM
RAM
a
NVRAM
But some ROMs act almost like RAMs
EEPROM and Flash are in-system programmable
Essentially means that writes are slow
Also, number of writes may be limited (perhaps a few million times)
And, some RAMs act almost like ROMs
Non-volatile RAMs: Can save their data without the power supply
One type: Built-in battery, may work for up to 10 years
Another type: Includes ROM backup for RAM controller writes RAM contents to
ROM before turning off
New memory technologies evolving that merge RAM and ROM benefits
e.g., MRAM
Bottom line
Lot of choices available to designer, must find best fit with design goals
Digital Design 2e
Copyright 2010
Frank Vahid
72
5.8
Queues (FIFOs)
A queue is another component
sometimes used during RTL
design
Queue: A list written to at the
back, from read from the front
Like a list of waiting restaurant
customers
Writing called a push, reading
called a pop
Because first item written into a
queue will be the first item read
out, also called a FIFO (first-infirst-out)
Digital Design 2e
Copyright 2010
Frank Vahid
back
front
write items
read (and
to the back
of the queue
remove) items
from front of
the queue
73
Queues
7
Queue has addresses, and two
pointers: rear and front
rf
0
Initially both point to 0
Push (write)
Item written to address pointed to
by rear
rear incremented
Pop (read)
Item read from address pointed
to by front
front incremented
If front or rear reaches 7, next
(incremented) value should be 0
(for a queue with addresses 0 to
7)
Digital Design 2e
Copyright 2010
Frank Vahid
r
2
r
1
f
0
f
0
B
r
f
74
Queues
Treat memory as a circle
If front or rear reaches 7, next (incremented)
value should be 0 rather than 8 (for a queue
with addresses 0 to 7)
Two conditions of interest
Full queue no room for more items
In 8-entry queue, means 8 items present
No further pushes allowed until a pop occurs
Causes front=rear
Empty queue no items
No pops allowed until a push occurs
Causes front=rear
Both conditions have front=rear
To detect whether front=rear means full or
empty, need state machine that detects if
previous operation was push or pop, sets full
or empty output signal (respectively)
Digital Design 2e
Copyright 2010
Frank Vahid
5
4
75
Queue Implementation
rear used as register files
write address, front as read
address
Simple controller would
set control lines for
pushes and pops, and
also detect full and empty
situations
FSM for controller not
shown
Digital Design 2e
Copyright 2010
Frank Vahid
8x16 register file
16
wdata
wdata
rdata
waddr
raddr
wr
reset
inc
3-bit
up counter
rear
eq
rdata
clr
inc
rd
16
rd
clr
wr
Controller
Can use register file for
item storage
Implement rear and front
using up counters
3-bit
up counter
front
=
full
empty
8-word 16-bit queue
76
Common Uses of a Queue
Computer keyboard
Pushes pressed keys onto queue, meanwhile pops and sends to
computer
Digital video recorder
Pushes captured frames, meanwhile pops frames, compresses
them, and stores them
Computer network routers
Pushes incoming packets onto queue, meanwhile pops packets,
processes destination information, and forwards each packet out
over appropriate port
Digital Design 2e
Copyright 2010
Frank Vahid
77
Queue Usage Example
Example series of pushes
and pops
Note how rear and front
pointers move
Note that popping doesnt
really remove the data from the
queue, but that data is no
longer accessible
Note how rear (and front)
wraps around from address 7
to 0
Note: pushing a full queue is
an error
So is popping an empty queue
Initially empt y
queue
6
rf
0
f
0
r
7
f
1
9
r
0
3
7
1. Aft er pushing
9, 5, 8, 5, 7, 2, 3
r
7
2. Aft er popping
3. Aft er pushing 6
4. Aft er pushing 3
f
1
data:
9
full
rf
Digital Design 2e
Copyright 2010
Frank Vahid
5. Aft er pushing 4
ERROR! Pushing a full queue
results in unknown state.
78
5.9
Multiple Processors
Using multiple processors
can ease design
Keeps distinct behaviors
separate
Ex: Laser-based distance
measurer with button
debounce
from ButtonDebouncer
button
Bin
Bout
16
Laser-based
distance
measurer
to display
L
to laser
S
from sensor
Use two processors
Ex: Code detector with
button press synchronizers
(BPS)
BPS processor for each
input, plus CodeDetector
processor
Star t
Red
Green
Blue
si
BPS
s
u
ri
gi
BPS
bi
ai
BPS
BPS
r
g
Code
detector
Door
lock
b
a
BPS
Digital Design 2e
Copyright 2010
Frank Vahid
79
Interfacing Multiple Processors
Use signal, register, or other component outside processors
Known as global
Common methods use global...
control signal, data signal, register, register file, queue
Typically all multiple processors and clocked globals use
same clock
Synchronized
Digital Design 2e
Copyright 2010
Frank Vahid
80
Ex: Temperature Statistics with Multiple Processors
16-bit unsigned input T from temperature sensor, 16-bit output A. Sample T
every 1 second. Compute output A every minute, should equal average of most
recent 64 samples.
Single HLSM: Complicated
Instead, two HLSMs (and hence two processors) and shared register file
Tsample HLSM: Store T into successive RF address, once per sec.
Avg HLSM: Compute and output average of all 64 RF words, once per min.
Note that each uses distinct timer
Keeping the
sampling and
averaging
behaviors
separate leads to
simple design
Digital Design 2e
Copyright 2010
Frank Vahid
TempStats
W_d
W_a
W_e
Tsample
W_d
W_a
R_a
W_e
R_e
TRF
RF[64](16)
R_d
R_a
R_e
Avg
R_d
81
Ex: Digital Camera with Mult. Processors and Queue
Read and Compress processors (Ch 1)
Compress may take longer, depends on picture
Use queue, read can push additional pics (up to 8)
Likewise, use queue between Compress and Store
Image sensor
8
Read
circuit
wdata
rdata
wr
rd
full Queue empty
[8](8)
8
Compress
circuit
Queue
[8](8)
Store
circuit
Memory
a
Digital Design 2e
Copyright 2010
Frank Vahid
82
5.10
Hierarchy A Key Design Concept
CityG
Country A
Province 3
Digital Design 2e
Copyright 2010
Frank Vahid
CityF
Province 3
To go from transistors to gates, muxes,
decoders, registers, ALUs, controllers,
datapaths, memories, queues, etc.
Imagine trying to comprehend a controller
and datapath at the level of gates
CityE
Province 2
Hierarchy helps us manage complexity
CityC
Province 1
1 item at top (the country)
Country item decomposed into
state/province items
Each state/province item decomposed into
city items
CityB
Province 2
Organization with few items at the top, with
each item decomposed into other items
Common example: Country
Province 1
Hierarchy
CityD
CityA
Country A
Map showing just top two levels
of hierarchy
83
Hierarchy and Abstraction
Abstraction
Hierarchy often involves not just
grouping items into a new item, but also
associating higher-level behavior with
the new item, known as abstraction
Ex: 8-bit adder has understandable highlevel behavioradds two 8-bit binary
numbers
Frees designer from having to
remember, or even understand, the
lower-level details
Digital Design 2e
Copyright 2010
Frank Vahid
a7.. a0
b7.. b0
8-bit adder
co
ci
s7.. s0
84
Hierarchy and Composing Larger Components
from Smaller Versions
A common task is to compose smaller components
into a larger one
i0
4x1
i0
Gates: Suppose you have plenty of 3-input AND gates,
but need a 9-input AND gate
i1
i1
Can simple compose the 9-input gate from several 3-input
gates
i2
i2
i3
i3
Muxes: Suppose you have 4x1 and 2x1 muxes, but
need an 8x1 mux
s2 selects either top or bottom 4x1
s1s0 select particular 4x1 input
Implements 8x1 mux 8 data inputs, 3 selects, one output
P
ro
vin
c
e1
d
2x1
s1
s0
i0
d
i4
4x1
i0
i5
i1
i6
i2
i1
s0
d
i3
s1
s0
0
s1
Digital Design 2e
Copyright 2010
Frank Vahid
s0
s2
85
Hierarchy and Composing Larger Components
from Smaller Versions
Composing memory very common
Making memory words wider
en
addr
Easy just place memories side-by-side until desired width obtained
Share address/control lines, concatenate data lines
Example: Compose 1024x8 ROMs into 1024x32 ROM
10
addr
1024x8
ROM
en
data
addr
1024x8
ROM
en
data
addr
1024x8
ROM
en
data
addr
1024x8
ROM
en
data
data(31..0)
10
1024x32
ROM
data
Digital Design 2e
Copyright 2010
Frank Vahid
32
86
Put memories on top of one another until the
number of desired words is achieved
Use decoder to select among the memories
Can use highest order address input(s) as
decoder input
Although actually, any address line could be
used
a10 just chooses
which memory
to access
Digital Design 2e
Copyright 2010
Frank Vahid
0 1
0 1
P
1 1 1
ro
vin
1 1 1
1
1
P
r1 1
1o
vin
1 c1 1
1 0
1 1
1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 1 0
1 1 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 1
a9..a0
a10
addr
1x2 d0
dcd
i0
e d1
1024x8
ROM
en
data
8
addr
11
en
Example: Compose 1024x8 memories into
2048x8 memory
a10 a9a8
a0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1 0
11
addr
Creating memory with more words
en
addr
Hierarchy and Composing Larger Components
from Smaller Versions
1024x8
ROM
2048x8
ROM
en
data
8
data
addr
1024x8
ROM
en data
addr
1024x8
ROM
en data
To create memory with more
words and wider words, can first
compose to enough words, then
widen.
87
Chapter Summary
Modern digital design involves creating processor-level components
High-level state machines
RTL design process
1. Capture behavior: Use HLSM
2. Convert to circuit
A. Create datapath B. Connect DP to controller C. Derive controller FSM
More RTL design
More components, arrays, timers, control vs. data dominated
Determining fastest clock frequency
By finding critical path
Behavioral-level design C to gates
By using method to convert C (subset) to high-level state machine
Memory components (RAM, ROM)
Queues
Multiple processors
Hierarchy: A key concept used throughout Chapters 2-5
Digital Design 2e
Copyright 2010
Frank Vahid
88