Performance Characterization PDF
Performance Characterization PDF
ECE 261
Krish Chakrabarty
1
Delay Denitions
tpdr: rising propagation delay
From input to rising output crossing VDD/2
1
ECE 261
Krish Chakrabarty
3
Delay Estimation
We would like to be able to easily estimate delay
Not as accurate as simulation But easier to ask What if?
The step response usually looks like a 1st order RC response with a decaying exponential. Use RC delay models to estimate delay
C = total capacitance on output node Use effective resistance R So that tpd = RC
ECE 261
Krish Chakrabarty
4
2
RC Delay Models
Use equivalent circuits for MOS transistors
Ideal switch + capacitance and ON resistance Unit nMOS has resistance R, capacitance C Unit pMOS has resistance 2R, capacitance C
ECE 261
Krish Chakrabarty
5
ECE 261
Krish Chakrabarty
6
3
ECE 261
Krish Chakrabarty
7
ECE 261
Krish Chakrabarty
8
4
ECE 261
Krish Chakrabarty
9
ECE 261
Krish Chakrabarty
10
5
ECE 261
Krish Chakrabarty
11
Elmore Delay
ON transistors look like resistors Pullup or pulldown network modeled as RC ladder Elmore delay of RC ladder
ECE 261
Krish Chakrabarty
12
6
ECE 261
Krish Chakrabarty
13
ECE 261
Krish Chakrabarty
14
7
ECE 261
Krish Chakrabarty
15
ECE 261
Krish Chakrabarty
16
8
ECE 261
Krish Chakrabarty
17
ECE 261
Krish Chakrabarty
18
9
ECE 261
Krish Chakrabarty
19
Delay Components
Delay has two parts
Parasitic delay 6 or 7 RC Independent of load Effort delay 4h RC Proportional to load capacitance
ECE 261
Krish Chakrabarty
20
10
Contamination Delay
Best-case (contamination) delay can be substantially less than propagation delay. Ex: If both inputs fall simultaneously
ECE 261
Krish Chakrabarty
21
Diffusion Capacitance
We assumed contacted diffusion on every s / d. Good layout minimizes diffusion area Ex: NAND3 layout shares one diffusion contact
Reduces output capacitance by 2C Merged uncontacted diffusion might help too
ECE 261
Krish Chakrabarty
22
11
Layout Comparison
Which layout is better?
ECE 261
Krish Chakrabarty
23
poly
ECE 261 Krish Chakrabarty 24
12
For pull-up, only one transistor has to be on, p, eff = min{p1,p2,p3} If p1 = p2 = p3 = p = n/3 then n, eff = p no resizing is necessary
ECE 261
Krish Chakrabarty
25
For pull-down, only one transistor has to be on, n, eff = min{n1,n2,n3} If n1 = n2 = n3 = n = 3p then n,eff=9p,eff considerable resizing is necessary W = 9W !
p n
ECE 261
Krish Chakrabarty
26
13
L
poly
3L
L
poly
ECE 261
Krish Chakrabarty
27
p
Resize the pull-up transistors to make pull-up times equal After resizing: a: 2p, b: 2p, c: p
ECE 261
Krish Chakrabarty
28
14
ECE 261
Transistor Placement
2 2 2 2 Primary inputs (change simultaneously) c tc Gnd b Pull-up stack a ta tb Cb Cc 2 2 2 2 Pull-up stack b a c tc
ECE 261 Krish Chakrabarty
F Ca
Ca
F
30
15
ECE 261
Krish Chakrabarty
31
ECE 261
Krish Chakrabarty
32
16
Logical Effort
Chip designers face a bewildering array of choices
What is the best circuit topology for a function? How many stages of logic give least delay? How wide should the transistors be?
Uses a simple model of delay Allows back-of-the-envelope calculations Helps make rapid comparisons between alternatives Emphasizes remarkable symmetries
Krish Chakrabarty 33
ECE 261
Krish Chakrabarty
34
17
ECE 261
Krish Chakrabarty
35
18
g: logical effort
Measures relative ability of gate to deliver current g 1 for inverter
ECE 261
Krish Chakrabarty
37
ECE 261
Krish Chakrabarty
38
19
ECE 261
39
Delay Plots
d = f + p
= gh + p
ECE 261
Krish Chakrabarty
40
20
Delay Plots
d = f + p
= gh + p
ECE 261
Krish Chakrabarty
41
ECE 261
Krish Chakrabarty
42
21
Catalog of Gates
Logical effort of common gates
Gate type 1 Inverter NAND NOR Tristate / mux 2 1 4/3 5/3 2 5/3 7/3 2 6/3 9/3 2 (n+2)/3 (2n+1)/3 2 2 Number of inputs 3 4 n
ECE 261
Krish Chakrabarty
43
Catalog of Gates
Parasitic delay of common gates
In multiples of pinv (1) Gate type 1 Inverter NAND NOR Tristate / mux 2 XOR, XNOR
ECE 261
2 2 2 4 4
Number of inputs 3 4 3 3 6 6
Krish Chakrabarty
n n n 2n
1 4 4 8 8
44
22
Logical Effort: g = Electrical Effort: h = Parasitic Delay: p = Stage Delay: d = Frequency: fosc =
ECE 261 Krish Chakrabarty 45
Logical Effort: g = 1 Electrical Effort: h = 1 Parasitic Delay: p = 1 Stage Delay: d = 2 Frequency: fosc = 1/(2*N*d) = 1/4N
ECE 261 Krish Chakrabarty
46
23
g = h = p = d =
ECE 261
Krish Chakrabarty
47
g = 1 h = 4 p = 1 d = 5
The FO4 delay is about 200 ps in 0.6 m process 60 ps in a 180 nm process f/3 ns in an f m process
ECE 261
Krish Chakrabarty
48
24
ECE 261
Krish Chakrabarty
49
25
26
Branching Effort
Introduce branching effort
Accounts for branching between stages in path
Note:
Multistage Delays
Path Effort Delay Path Parasitic Delay Path Delay
ECE 261
Krish Chakrabarty
54
27
ECE 261
Krish Chakrabarty
55
Gate Sizes
How wide should the gates be for least delay?
Working backward, apply capacitance transformation to nd input capacitance of each gate given load it drives. Check work by verifying input cap spec is met.
ECE 261
Krish Chakrabarty
56
28
ECE 261
Krish Chakrabarty
57
Logical Effort Electrical Effort Branching Effort Path Effort Best Stage Effort Parasitic Delay Delay
ECE 261
G = H = B = F = P = D =
Krish Chakrabarty 58
29
Logical Effort Electrical Effort Branching Effort Path Effort Best Stage Effort Parasitic Delay Delay
ECE 261
ECE 261
Krish Chakrabarty
60
30
ECE 261
Krish Chakrabarty
61
D =
ECE 261
Krish Chakrabarty
62
31
D = NF1/N + P = N(64)1/N + N
ECE 261
Krish Chakrabarty
63
Derivation
Consider adding inverters to end of path
How many give least delay?
ECE 261
Krish Chakrabarty
64
32
ECE 261
Krish Chakrabarty
65
Review of Denitions
Term number of stages logical effort electrical effort branching effort effort effort delay parasitic delay delay
ECE 261
Stage
Path
Krish Chakrabarty
66
33
ECE 261
Krish Chakrabarty
67
Interconnect
Iteration required in designs with wire
34
Summary
Logical effort is useful for thinking of delay in circuits
Numeric logical effort characterizes gates NANDs are faster than NORs in CMOS Paths are fastest when effort delays are ~4 Path delay is weakly sensitive to stages, sizes But using fewer stages doesnt mean faster paths Delay of path is about log4F FO4 inverter delays Inverters and NAND2 best for driving large caps
35
Dynamic Power
Dynamic power is required to charge and discharge load capacitances when transistors switch. One cycle involves a rising and falling output. On rising output, charge Q = CVDD is required On falling output, charge is dumped to GND This repeats Tfsw times over an interval of T
ECE 261
Krish Chakrabarty
71
ECE 261
Krish Chakrabarty
72
36
ECE 261
Krish Chakrabarty
73
Activity Factor
Suppose the system clock frequency = f Let fsw = f, where = activity factor
If the signal is a clock, = 1 If the signal switches once per cycle, = Dynamic gates: Switch either 0 or 2 times per cycle, = Static gates: Depends on design, but typically = 0.1
Dynamic power:
ECE 261 Krish Chakrabarty 74
37
ECE 261
Krish Chakrabarty
75
Example
200 Mtransistor chip
20M logic transistors Average width: 12 180M memory transistors Average width: 4 1.2 V 100 nm process Cg = 2 fF/m
ECE 261
Krish Chakrabarty
76
38
Dynamic Example
Static CMOS logic gates: activity factor = 0.1 Memory arrays: activity factor = 0.05 (many banks!) Estimate dynamic power consumption per MHz. Neglect wire capacitance and short-circuit current.
ECE 261
Krish Chakrabarty
77
Dynamic Example
Static CMOS logic gates: activity factor = 0.1 Memory arrays: activity factor = 0.05 (many banks!) Estimate dynamic power consumption per MHz. Neglect wire capacitance.
ECE 261
Krish Chakrabarty
78
39
Static Power
Static power is consumed even when chip is quiescent.
Ratioed circuits burn power in ght between ON transistors Leakage draws power from nominally OFF devices
ECE 261
Krish Chakrabarty
79
Ratio Example
The chip contains a 32 word x 48 bit ROM
Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high
ECE 261
Krish Chakrabarty
80
40
Ratio Example
The chip contains a 32 word x 48 bit ROM
Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high
Solution:
ECE 261
Krish Chakrabarty
81
Leakage Example
The process has two threshold voltages and two oxide thicknesses. Subthreshold leakage:
20 nA/m for low Vt 0.02 nA/m for high Vt
Gate leakage:
3 nA/m for thin oxide 0.002 nA/m for thick oxide
Memories use low-leakage transistors everywhere Gates use low-leakage transistors on 80% of logic
ECE 261 Krish Chakrabarty 82
41
ECE 261
Krish Chakrabarty
83
ECE 261
Krish Chakrabarty
84
42
ECE 261
Krish Chakrabarty
86
43
ECE 261
Krish Chakrabarty
87
ECE 261
Krish Chakrabarty
88
44
ECE 261
Krish Chakrabarty
89
ECE 261
Krish Chakrabarty
90
45
46