0% found this document useful (0 votes)
69 views

What's The Deal?: Two-Operand Addition

The document discusses various techniques for designing fast adders in digital circuits. It begins by examining half adders and full adders, noting that carry propagation is the primary limiting factor for speed. Various carry propagation techniques are then analyzed, including carry-ripple, carry-skip, carry-lookahead, and prefix adders. Carry-lookahead adders improve speed by computing all carry signals in parallel using logic gates. Prefix adders generalize this approach by building a tree of logic cells to efficiently compute carry signals spanning multiple bit positions. Overall, the document provides an overview of key adder designs and techniques for optimizing carry propagation to achieve faster addition in digital circuits.

Uploaded by

FatiLily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

What's The Deal?: Two-Operand Addition

The document discusses various techniques for designing fast adders in digital circuits. It begins by examining half adders and full adders, noting that carry propagation is the primary limiting factor for speed. Various carry propagation techniques are then analyzed, including carry-ripple, carry-skip, carry-lookahead, and prefix adders. Carry-lookahead adders improve speed by computing all carry signals in parallel using logic gates. Prefix adders generalize this approach by building a tree of logic cells to efficiently compute carry signals spanning multiple bit positions. Overall, the document provides an overview of key adder designs and techniques for optimizing carry propagation to achieve faster addition in digital circuits.

Uploaded by

FatiLily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Whats the Deal?

All we want to do is add up a couple numbers


Chapter

one tells us that we can add signed numbers


just by adding the mapped positive bit vectors
(ZR=(XR+YR)modC)

TWO-OPERAND ADDITION
Chapter two

Adder Bits
Half Adder

Cout S

Adder Bits
Full Adder

Cin

1
1

Cin

Cout S

Carry generation!
Carry

at i depends on
Non-trivial to do fast lots of inputs

Cout S

Full Adder

Big Problem (for speed)

Half Adder

Cout S

Fast Adders

The primary objective is to speed up the generation


of the carries!
Carry

Propagate Adders

Produce

Carry

an answer in conventional fixed-radix NRS

Save and Signed Digit adders

Avoid

carry propagation by producing sums in redundant


notations

Hybrid

Adders

Combine

as many schemes as make sense

Carry-Propagate Adders
Carry-Ripple
Switched Carry-Ripple (Manchester Carry)
Carry-Skip
Carry-Lookahead
Prefix Adders (Tree Adders)
Carry-Select (Conditional Sum)
Carry-Completion Sensing (self-timed)

Redundant Adders

Basic Carry-Ripple Adder (CRA)

Adder Performance

Carry-Save
Signed Digit

Adder Bits (gates)

Adder Performance

Most standard cell libraries have a


Full Adder cell as a single cell
Implements

Full Adder function directly in nmos and


pmos transistors
Delays should be smaller

Adder Bits (CMOS)

Mirror Adder
Brute Force circuit

Factor S in terms of Cout


S = ABC + (A + B + C)(~Cout)
Critical path is usually Cin to Cout in ripple
adder

Figures from David Harris

Figures from David Harris

Connect for carry-ripple adder

Inversions
Critical path passes through majority gate

Mirror Adder

Built from minority + inverter


Eliminate inverter and use inverting full adder

Build a faster circuit?

Figures from David Harris

Build a faster circuit?

Build a faster circuit?

Complementary Pass Transistor Logic


(CPL)

Dual-rail domino

Slightly faster, but more area

Very fast, but large and power hungry


Used in very fast multipliers

Figures from David Harris

Build a faster carry chain?

Figures from David Harris

Manchester Carry control

Manchester Carry Chain


Use

MCC

transmission gates to make carry wire

Manchester Carry Chain

MCC

Manchester Carry Delay

tsw

is time to set all switches


is time to propagate through a switch
tbuf is a buffer need restoring buffer every m bits
ts computes the sum based on the carries
tp

Timing of MCC

This works well if tp is small

Sizing MCC

Slide from Mark Horowitz, Stanford

Sizing MCC

Slide from Mark Horowitz, Stanford

Buffered Carry Chains

Slide from Mark Horowitz, Stanford

Slide from Mark Horowitz, Stanford

Timing MCC

Layout of MCC

Slide from Mark Horowitz, Stanford

Back to Adder Bits

Back to Adder Bits

Revisit the full adder:

Xi Yi Ci Ci+1 Si Comment

Kill

Kill

Kill

Kill

Propagate

Propagate

Propagate

Propagate

Propagate

Propagate

Propagate

Propagate

Generate

Generate

Generate

Generate

Two types
1-carry

Revisit the full adder:

Xi Yi Ci Ci+1 Si Comment

Carry Chains

Slide from Mark Horowitz, Stanford

chain and 0-carry chain


1-carry always starts at gi=1 (or cin = 1), and
propagates over consecutive positions pj=1
0-carries start at ki=1 position (or cin = 0)

Group Carries

Carry equation can be generalized to groups of


bits

Combine

subranges recursively

Group Carries

Example (2.1)

find bit 13 of the following sum


x = 0110|0010|1100|0011
y = 1011|1101|0001|1110

Example (2.1)

first compute pkg for each bit

Example (2.1)

x = 0110|0010|1100|0011
y = 1011|1101|0001|1110
p|pppp|ppkp|ppgp

Example (2.1)

extend groups
x = 0110|0010|1100|0011
y = 1011|1101|0001|1110
p|pppp|ppkp|ppgp

now combine in groups


x = 0110|0010|1100|0011
y = 1011|1101|0001|1110
p|pppp|ppkp|ppgp

Example (2.1)

extend groups to whole range


x = 0110|0010|1100|0011
y = 1011|1101|0001|1110
p|pppp|ppkp|ppgp

Example (2.1)

Now you can compute c13


x = 0110|0010|1100|0011
y = 1011|1101|0001|1110
p|pppp|ppkp|ppgp

Carry Skip Adder

Example (2.1)

With c13 you can compute s13


x = 0110|0010|1100|0011
y = 1011|1101|0001|1110
p|pppp|ppkp|ppgp

Carry Skip Adder

The idea is to reduce the number of cells the worstcase carry must propagate through
Divide

n-bit adder into groups of m-bits


group propagate for each m-bits
If the entire group p is true, skip around it
Determine

Carry Skip example

Carry Skip worst case

=p
=k
=g
Carry travels through
at most two groups:
the initiating group
and the terminating
group.

Carry Skip delay

Problem with clearing carries

Watch out some books show an AND/OR version


that doesnt really work!
Problem

is that carries might be left over from


previous addition and have to dribble out

Worst case is when a carry is generated in the first


bit of the adder
Then

propagated through all bits up to but not


including the high order bit
That is, skip all groups but the first and last

Group Size

Carry Skip with different m

Previous delay analysis assumes all groups are the


same size
This

isnt the best for speed


generated in the first group have to skip more
groups!
For fixed size:
Carries

Carry Skip Another View

Slide from Mark Horowitz, Stanford

If you vary the group size with the groups at the ends shorter than the groups in the middle, you can speed things up

N=60, tc=ts=tmux=

M=6, TCSK=21

M=4,5,6,7,8,8,7,6,5,4, TCSK=17

Carry Skip Another View

Slide from Mark Horowitz, Stanford

Carry Skip - Layout

Carry Lookahead

General idea find a way to compute all carries at


the same time
Generate

logic for all carries in terms of just the X, Y


and Cin bits

This

is a switching function of 2i+1 variables

Slide from Mark Horowitz, Stanford

Carry Lookahead

Carry Lookahead equations


Remember: Ci = Gi + Pi Ci
C1 = G0 + P0 C0
C2 = G1 + P1 C1
= G1 + P1(G0 + P0 C0)
= G1 + P1 G0 + P1 P0 C0
C3 = G2 + P2G1 + P2P1G0 + P2P1P0C0
C4 = G3 + P3G2 + P3P2G1 + P3P2P1G0
+ P3P2P1P0C0
Or C4 = G3 + P3(G2 +P2( G1 + P1(G0 + P0 C0)))

Carry Lookahead equations

CLA-4 Module

Remember: Ci = Gi + Ai Ci
C1 = G0 + A0 C0
C2 = G1 + A1 C1
= G1 + A1(G0 + A0 C0)
= G1 + A1 G0 + A1 A0 C0
C3 = G2 + A2G1 + A2A1G0 + A2A1A0C0
C4 = G3 + A3G2 + A3A2G1 + A3A2A1G0
+ A3A2A1A0C0
Or C4 = G3 + A3(G2 +A2( G1 + A1(G0 + A0 C0)))

10

CLG-4 Module

Dynamic Logic for 4-bit CLG

Motorola 1u CMOS, 4.5ns for a 64-bit adder


Slide from Mark Horowitz, Stanford

Carries - Another View

Carry Ripple revisited

Carry Ripple revisited

Carry Skip revisited

Gi Pi

Gi-1:0

Gi:0

11

Carry Skip revisited

Carry Lookahead revisited


Carry-lookahead adder computes Gi:0 for many bits in
parallel.
Uses higher-valency cells with more than two inputs

Fixed group size


(4,4,4,4)

Variable group size


(2,3,4,4,3)

Higher Valency Cell

CLA/Manchester adder

Recall C3 = G3 + P3(G2 +P2( G1 + P1(G0 + P0 C0)))

Two-Level CLA

Two-level CLA32 (n=p=4)

For large n, lots of groups so CLA can be slow


Apply

CLA principle among groups


G and A for groups
C(1) = G0+A0C0
C(2) = G1+A1G0+A1A1C0
etc
Once the carries from the groups are produced, they are
used by the first-level CLAs to produce the bit carries
and sums
Compute

12

Three-level CLA

Three-level CLA (n=8, m=2)

Extend to three or more levels by having lookahead


between sections
First

compute ai, pi, gi


level of CLA to compute As and Gs
n/mL CLGs connected in ripple to compute carries of
bits
One level of XOR to compute the sum
L-1

CLA Critical Path

Prefix Adders (Tree Adders)

More general form of carry lookahead tree


Built

using different organizations of the same set of


basic PG cells (PA cells)
All based on the fact that ci corresponds to the
generate signal spanning bit positions (-1) to i-1
Prefix adder is an interconnection of cells that produce
g(i-1,-1) for all i
Cells connected to produce g signals that span an
increasing number of bits

PG (PA) cell

Overlapping Ranges

Starting with g,a of each bit, first level generates


g,a for two bits, then four, etc.
If

right input spans bits [right2,right1] , and left spans


[left2,left1], with right2+1 >= left1

Then

output spans bits [left2,right1]

For

example right[5,2] and left[8,4] means output


spans bits [8,2]

13

PG (PA) Cells

8-bit Prefix Adder

8-bit Prefix Adder

8-bit Prefix Adder

Lower fanout
Increase levels
Fanout can
Be an issue

8-bit prefix adder

Another View of Prefix Adders

Max fanout 2
Min levels

David Harris, Harvey Mudd

14

Brent-Kung

Sklansky Adder

David Harris, Harvey Mudd

Kogge-Stone Adder

David Harris, Harvey Mudd

Tree Adder Taxonomy


Ideal N-bit tree adder would have

L = log N logic levels


Fanout never exceeding 2
No more than one wiring track between levels

Describe adder with 3-D taxonomy (l, f, t)

David Harris, Harvey Mudd

Tree Adder Taxonomy

Logic levels:
L+l
Fanout:
2f + 1
Wiring tracks: 2t

Known tree adders sit on plane defined by


l + f + t = L-1

Tree Adder Taxonomy


Brent-Kung
Sklansky

Kogge-Stone
David Harris, Harvey Mudd

David Harris, Harvey Mudd

15

Han-Carlson

Knowles [2, 1, 1, 1]

David Harris, Harvey Mudd

Ladner-Fischer

David Harris, Harvey Mudd

Taxonomy Revisitied

Sklansky

Knowles

Kogge-Stone
David Harris, Harvey Mudd

Conditional Sum Adder

Ladner-Fischer

Brent-Kung

Han-Carlson

David Harris, Harvey Mudd

Conditional Sum Adder

For each group


Compute

the sum assuming that Cin is 0 and that Cin is 1


you find out the right answer, use a MUX to
select the correct result

When

Carry-select
Conditional

is 1-level select
Sum is a general case up to max levels

16

Carry-Select Adder

Carry-Select Another View

Slide from Mark Horowitz, Stanford

Carry-Select - Layout

Conditional Sum

Conditional principle is applied recursively


Each

group is combined to double the number of bits


at the next level

Slide from Mark Horowitz, Stanford

16-bit Conditional Sum Adder

Example
Step 1: Compute all the
bit results
Step 2: Use the known
results to select
the next groups

17

Pipelined Adders

Variable Time Adder

Carry Completion Sensing Adder


Encode

the carry in a form that lets you tell when its finished
all carry chains have finished, the add is finished
One choice dual-rail encoding
When

Variable Time Adder

Variable Time Adder

Variable Time Adder

Addition Time: proportional to log2(n)

Addition Time: proportional to log2(n)

For uniformly distributed numbers, length of longest


carry chain is approx log2(5n/4)

For uniformly distributed numbers, length of longest


carry chain is approx log2(5n/4)

18

Aside ALU Design

Aside ALU +G function block

Redundant Digit Adders

Use a redundant digit set


Operands

Aside ALU +P function block

Rest of the ALU

Carry Save Adder

Add three binary vectors

But, dont propagate the carries

might be in conventional or in redundant

form
Main idea is to reduce the carry propagation
But, increases number of bits in the result
Useful for things like accumulation,
multi-operand addition, multiplication, etc.

Using

an array of one-bit adders (i.e. full adders)

Output

is two vectors: carry and pseudo-sum


(or sum)

Several

combinations of vc and vs represent the same

result

19

Carry Save Adder

Carry Save Adder

If you want to convert back to conventional numbers,


add vs and vc
Because

there two bits for every conventional sum bit,


you can think of the answer in Carry Save form to be
digits in the set {0,1,2}
Carry Save produces a reduction from three binary
vectors to two, so its also called a 3-2 reduction
Adder is a [3:2] adder

Carry Save Example

Carry Save Example


116
59
170
=345

229
117

Cin
256 128 64

32

16

128 + 128 + 64 +16 + 0 + 8 + 2 = 346

Carry Save

Carry Save [4:2]

What if two operands are both carry-save?


Then

each operand is in Xs Xc form


you need a [4:2] adder instead of a [3:2]
Combine four vectors into two
Still no carries!
Answer is still in redundant carry-save form
So,

20

Carry Save [4:2]

[4:2] Compressor Adder

Note that even though it looks like carry is propagated,


the Cout from each [4:2] cell is computed directly
from the A and B inputs

4:2 compressor cell


Inputs

4:2 compressor cell

Cin=0

Cin=1

Cout

[4:2] Compressor Cell

Nagamatsu,
Toshiba

High Radix Carry Save

Navi and Etiemble

Regular carry-save doubles the


number of bits
You

can reduce the number of bits with


high-radix carry-save
If r is the radix
Vs
Vc

is represented in radix r
has one bit per radix-r digit

21

Radix-8 Carry Save

Radix-8 Carry Save

512

64

1
48
02
78

436
217

138
1*512

Radix-8 Carry Save

(7+1)*8

(3+0)*1

= 579

Signed Digit Adders


Another

Signed Digit Adder

form of redundant digit representation

Signed Digit Adder

22

Signed Digit Adder

Summary

Im not going to spend more time on this one


My

sense is that its not as important in terms of actual


implementations as Carry Save
Reasonably complex stuff multiple recodings

Case Study

Dec Alpha 21064 64-bit adder

Alpha 21064

5ns

cycle time in a 0.75u CMOS process


Very high performance for the day!
A mix of multiple techniques!

In 8-bit chunks Manchester carry chain


Chain

was also tapered to reduce the load caused by


the remainder of the chain
Chain was pre-discharged at start of cycle
Three signals used: P, G, and K
Two Manchester chains:
One
One

Alpha 21064

Carry Lookahead used on


least significant 32 bits

Alpha 21064

Implemented
Provide

as distributed differential circuits


carry that controls most significant 32

Conditional Sum used for


most significant 32

assuming Cin=0
assuming Cin=1

Finally, Carry Select used to produce the most


significant 32 bits.
Final

selection done using NMOS carry-select bytewide muxes

Also apparently pipelined with a row of latches


after the lookahead

Six

8-bit select switches used to implement conditional


sum on the 8-bit level

23

Alpha 21064

72-bit Pentium II Adder

Slide from Mark Horowitz, Stanford

Adder from Imagine

Adder from Imagine

Slide from Mark Horowitz, Stanford

Adder from Imagine

Slide from Mark Horowitz, Stanford

Adder from Imagine

Slide from Mark Horowitz, Stanford

Slide from Mark Horowitz, Stanford

24

Local PGK Logic (Imagine)

Group PKG (Imagine)


Manchester Carry Chains.

Slide from Mark Horowitz, Stanford

Carry Chain Sizing

Slide from Mark Horowitz, Stanford

Static Carry Chains

Slide from Mark Horowitz, Stanford

Global Carry Chain

Slide from Mark Horowitz, Stanford

Conditional Sums

Slide from Mark Horowitz, Stanford

Slide from Mark Horowitz, Stanford

25

Arithmetic for Media Processing

Segmented Add Operation

Slide from Mark Horowitz, Stanford

Modify for Segmentation

Global Carry w/ Segment

Slide from Mark Horowitz, Stanford

Absolute Difference

Slide from Mark Horowitz, Stanford

Absolute Difference

Slide from Mark Horowitz, Stanford

Slide from Mark Horowitz, Stanford

26

Sum of Absolute Differences

Saturation

Slide from Mark Horowitz, Stanford

Hardware Support for Saturation

Slide from Mark Horowitz, Stanford

Simulated Performance

Slide from Mark Horowitz, Stanford

Adder Layout

Slide from Mark Horowitz, Stanford

Related Results

Slide from Mark Horowitz, Stanford

Slide from Mark Horowitz, Stanford

27

Summary (from Harris/Weste)

If theyre fast enough, use ripple-carry


Compact,

simple

Carry skip and carry select work well for small bit
sizes (8-16)
Hybrids

combining techniques are popular

At 32, 64, and beyond, tree adders are much


faster
Again,

hybrids are common

Synthesized Adders (Harris/Weste)

Adder Summary

Area vs. Delay, Synthesized Adders

Similar to my experiment
But

with 0.18u library, Synopsys DesignWare


can map + to carry-ripple, carry-select,
carry-lookahead, and some prefix adders
Fastest are tree adders with (prelayout) speeds of 7.0
and 8.5 FO4 delays for 32 and 64 bit adders
Synopsys

28

You might also like