21EC71 Advanced VLSI Notes Module 2
21EC71 Advanced VLSI Notes Module 2
Advanced VLSI
(21EC71)
SEMESTER – VII
Module 2
Floor planning and placement: Goals and objectives, Measurement of delay in Floor
planning, Floor planning tools, Channel definition, I/O and Power planning and Clock
planning. Placement: Goals and Objectives, Min-cut Placement algorithm, Iterative
Placement Improvement, Time driven placement methods, Physical Design Flow.
Routing: Global Routing: Goals and objectives, Global Routing Methods, Global routing
between blocks, Back annotation.
Textbook 1
Vijaykumar Sajjanar
vjkr.github.io
www.bldeacet.ac.in Page | 1
21EC71 Advanced VLSI
CONTENTS
Floorplanning ..................................................................................................................................................................... 3
FLOORPLANNING GOALS and Objectives .......................................................................................................... 3
Measurement of Delay in Floorplanning ............................................................................................................ 4
Floorplanning Tools .................................................................................................................................................... 5
Channel Definition ....................................................................................................................................................... 6
I/O and Power Planning ............................................................................................................................................ 8
Clock Planning ............................................................................................................................................................ 10
Placement ......................................................................................................................................................................... 11
Placement Terms and Definitions ...................................................................................................................... 11
Placement Goals and Objectives ......................................................................................................................... 12
Placement Algorithms ............................................................................................................................................. 12
min-cut placement method............................................................................................................................... 13
Iterative Placement Improvement ................................................................................................................ 14
Timing-Driven Placement Methods ................................................................................................................... 16
Physical Design Flow.................................................................................................................................................... 17
ROUTING ........................................................................................................................................................................... 18
Global Routing ................................................................................................................................................................ 18
Goals and Objectives ................................................................................................................................................ 18
Global Routing Methods ......................................................................................................................................... 19
Global Routing Between Blocks .......................................................................................................................... 20
Back-annotation ........................................................................................................................................................ 21
www.bldeacet.ac.in Page | 2
21EC71 Advanced VLSI
FLOORPLANNING
Floorplanning is a mapping between the logical description (the netlist) and the
physical description (the floorplan).
FIGURE 16.3 Interconnect and gate delays. As feature sizes decrease, both average
interconnect delay and average gate delay decrease but at different rates. This is because
interconnect capacitance tends to a limit that is independent of scaling. Interconnect delay
now dominates gate delay.
The objectives of floorplanning are to minimize the chip area and minimize delay.
www.bldeacet.ac.in Page | 3
21EC71 Advanced VLSI
FIGURE 16.4 Predicted capacitance. (a) Interconnect lengths as a function of fanout (FO)
and circuit-block size. (b) Wire-load table. There is only one capacitance value for each
fanout (typically the average value). (c) The wire-load table predicts the capacitance and
delay of a net (with a considerable error).
www.bldeacet.ac.in Page | 4
21EC71 Advanced VLSI
FLOORPLANNING TOOLS
Figure 16.6 (a) shows an initial random floorplan generated by a floorplanning tool. Two
of the blocks, A and C in this example, are standard-cell areas (the chip shown in Figure
16.1 is one large standard-cell area). These are flexible blocks (or variable blocks )
because, although their total area is fixed, their shape (aspect ratio) and connector
locations may be adjusted during the placement step.
We may force logic cells to be in selected flexible blocks by seeding. Seeding may be
hard or soft. A hard seed is fixed and not allowed to move during the remaining
floorplanning and placement steps. A soft seed is an initial suggestion only and can be
altered if necessary by the floorplanner.
FIGURE 16.6 Floorplanning a cell-based ASIC. (a) Initial floorplan generated by the
floorplanning tool. Two of the blocks are flexible (A and C) and contain rows of
standard cells (unplaced). A pop-up window shows the status of block A. (b) An
estimated placement for flexible blocks A and C. The connector positions are known
and a rat s nest display shows the heavy congestion below block B. (c) Moving blocks
to improve the floorplan. (d) The updated display shows the reduced congestion after
the changes.
We need to control the aspect ratio of our floorplan because we have to fit our chip into
the die cavity (a fixed-size hole, usually square) inside a package.
With practice, we can create a good initial placement by floorplanning and a pictorial
display
www.bldeacet.ac.in Page | 5
21EC71 Advanced VLSI
FIGURE 16.7 Congestion analysis. (a) The initial floorplan with a 2:1.5 die aspect ratio.
(b) Altering the floorplan to give a 1:1 chip aspect ratio. (c) A trial floorplan with a
congestion map. Blocks A and C have been placed so that we know the terminal positions
in the channels. Shading indicates the ratio of channel density to the channel capacity.
Dark areas show regions that cannot be routed because the channel congestion exceeds the
estimated capacity. (d) Resizing flexible blocks A and C alleviates congestion.
CHANNEL DEFINITION
During the floorplanning step we assign the areas between blocks that are to be used for
interconnect. This process is known as channel definition or channel allocation .
Figure 16.8 shows a T-shaped junction between two rectangular channels and illustrates
why we must route the stem (vertical) of the T before the bar. The general problem of
choosing the order of rectangular channels to route is channel ordering.
FIGURE 16.8 Routing a T-junction between two channels in two-level metal. The dots
represent logic cell pins. (a) Routing channel A (the stem of the T) first allows us to adjust
the width of channel B. (b) If we route channel B first (the top of the T), this fixes the
width of channel A. We have to route the stem of a T-junction before we route the top.
www.bldeacet.ac.in Page | 6
21EC71 Advanced VLSI
Figure 16.9 shows a floorplan of a chip containing several blocks. Suppose we cut along
the block boundaries slicing the chip into two pieces ( Figure 16.9 a). Then suppose we can
slice each of these pieces into two. If we can continue in this fashion until all the blocks are
separated, then we have a slicing floorplan ( Figure 16.9 b). Figure 16.9 (c) shows how the
sequence we use to slice the chip defines a hierarchy of the blocks. Reversing the slicing
order ensures that we route the stems of all the channel T-junctions first.
FIGURE 16.9 Defining the channel routing order for a slicing floorplan using a
slicing tree. (a) Make a cut all the way across the chip between circuit blocks. Continue
slicing until each piece contains just one circuit block. Each cut divides a piece into two
without cutting through a circuit block. (b) A sequence of cuts: 1, 2, 3, and 4 that
successively slices the chip until only circuit blocks are left. (c) The slicing tree
corresponding to the sequence of cuts gives the order in which to route the channels: 4, 3,
2, and finally 1.
Figure 16.10 shows a floorplan that is not a slicing structure. We cannot cut the chip all
the way across with a knife without chopping a circuit block in two. This means we cannot
route any of the channels in this floorplan without routing all of the other channels first. We
say there is a cyclic constraint in this floorplan. There are two solutions to this problem.
One solution is to move the blocks until we obtain a slicing floorplan. The other solution is to
allow the use of L -shaped, rather than rectangular, channels (or areas with fixed
connectors on all sides a switch box ).
FIGURE 16.10 Cyclic constraints. (a) A nonslicing floorplan with a cyclic constraint
that prevents channel routing. (b) In this case it is difficult to find a slicing floorplan
without increasing the chip area. (c) This floorplan may be sliced (with initial cuts 1 or 2)
and has no cyclic constraints, but it is inefficient in area use and will be very difficult to
route.
www.bldeacet.ac.in Page | 7
21EC71 Advanced VLSI
FIGURE 16.12 Pad-limited and core-limited die. (a) A pad-limited die. The number of
pads determines the die size. (b) A core-limited die: The core logic determines the die size.
(c) Using both pad-limited pads and core-limited pads for a square die.
Special power pads are used for the positive supply, or VDD, power buses (or power
rails ) and the ground or negative supply, VSS or GND.
Usually one set of VDD/VSS pads supplies one power ring that runs around the pad
ring and supplies power to the I/O pads only.
Another set of VDD/VSS pads connects to a second power ring that supplies the logic
core.
We sometimes call the I/O power dirty power since it has to supply large transient
currents to the output transistors. We keep dirty power separate to avoid injecting
noise into the internal-logic power (the clean power).
I/O pads also contain special circuits to protect against electrostatic discharge
(ESD). These circuits can withstand very short high-voltage (several kilovolt) pulses
that can be generated during human or machine handling.
Figure 16.13 (a) and (b) are magnified views of the southeast corner of our example chip
and show the different types of I/O cells. Figure 16.13 (c) shows a stagger-bond
arrangement using two rows of I/O pads. In this case the design rules for bond wires (the
spacing and the angle at which the bond wires leave the pads) become very important.
Figure 16.13 (d) shows an area-bump bonding arrangement (also known as flip-chip,
solder-bump or C4, terms coined by IBM who developed this technology [ Masleid, 1991])
used, for example, with ball-grid array ( BGA )packages.
www.bldeacet.ac.in Page | 8
21EC71 Advanced VLSI
FIGURE 16.13 Bonding pads. (a) This chip uses both pad-limited and core-limited
pads. (b) A hybrid corner pad. (c) A chip with stagger-bonded pads. (d) An area-bump
bonded chip (or flip-chip). The chip is turned upside down and solder bumps connect the
pads to the lead frame.
www.bldeacet.ac.in Page | 9
21EC71 Advanced VLSI
CLOCK PLANNING
Figure 16.16 (a) shows a clock spine (not to be confused with a channel spine) routing
scheme with all clock pins driven directly from the clock driver. MGAs and FPGAs often use
this fish bone type of clock distribution scheme.
FIGURE 16.16 Clock distribution. (a) A clock spine for a gate array. (b) A clock spine
for a cell-based ASIC (typical chips have thousands of clock nets).
(c) A clock spine is usually driven from one or more clock-driver cells. Delay in the
driver cell is a function of the number of stages and the ratio of output to input
capacitance for each stage (taper). (d) Clock latency and clock skew. We would like to
minimize both latency and skew.
www.bldeacet.ac.in Page | 10
21EC71 Advanced VLSI
PLACEMENT
After completing a floorplan we can begin placement of the logic cells within the
flexible blocks. Placement is much more suited to automation than floorplanning. Thus we
shall need measurement techniques and algorithms.
FIGURE 16.18 Interconnect structure. (a) The two-level metal CBIC floorplan
shown in Figure 16.11 b. (b) A channel from the flexible block A. This channel has a
channel height equal to the maximum channel density of 7 (there is room for seven
interconnects to run horizontally in m1). (c) A channel that uses OTC (over-the-cell)
routing in m2.
With two layers of metal, we route within the rectangular channels using the first
metal layer for horizontal routing, parallel to the channel spine, and the second metal layer
for the vertical direction (if there is a third metal layer it will normally run in the horizontal
direction again). The maximum number of horizontal interconnects that can be placed side
by side, parallel to the channel spine, is the channel capacity .
www.bldeacet.ac.in Page | 11
21EC71 Advanced VLSI
The most commonly used placement objectives are one or more of the following:
● Minimize the total estimated interconnect length
● Meet the timing requirements for critical nets
● Minimize the interconnect congestion
PLACEMENT ALGORITHMS
There are two classes of placement algorithms commonly used in commercial CAD
tools:
1. constructive placement
a. variations on the min-cut algorithm
b. eigenvalue method
2. iterative placement improvement.
Placement usually starts with a constructed solution and then improves it using an
iterative algorithm.
www.bldeacet.ac.in Page | 12
21EC71 Advanced VLSI
FIGURE 16.24 Min-cut placement. (a) Divide the chip into bins using a grid.
(b) Merge all connections to the center of each bin. (c) Make a cut and swap
logic cells between bins to minimize the cost of the cut. (d) Take the cut piecesand
throw out all the edges that are not inside the piece. (e) Repeat the processwith a
new cut and continue until we reach the individual bins.
www.bldeacet.ac.in Page | 13
21EC71 Advanced VLSI
● The measurement criteria that decides whether to move the selected cells.
There are several interchange or iterative exchange methods that differ in their
selection and measurement criteria:
● pairwise interchange,
● force-directed interchange,
● force-directed relaxation, and
● force-directed pairwise relaxation.
FIGURE 16.26 Interchange. (a) Swapping the source logic cell with a destination logic
cell in pairwise interchange. (b) Sometimes we have to swap more than two logic cells
at a time to reach an optimum placement, but this is expensive in computation time.
Limiting the search to neighborhoods reduces the search time.
Logic cells within a distance e of a logic cell form an e-neighborhood. (c) A one-
neighborhood. (d) A two-neighborhood.
FIGURE 16.27 Force-directed placement. (a) A network with nine logic cells.
(b) We make a grid (one logic cell per bin). (c) Forces are calculated as if springs were
attached to the centers of each logic cell for each connection. The two nets connecting
logic cells A and I correspond to two springs. (d) The forces are proportional to the
spring extensions.
www.bldeacet.ac.in Page | 14
21EC71 Advanced VLSI
Without external forces to counteract the pull of the springs between logic cells, the
network will collapse to a single point as it settles. An important part of force-directed
placement is fixing some of the logic cells in position. Normally ASIC designers use the I/O
pads or other external connections to act as anchor points or fixed seeds.
www.bldeacet.ac.in Page | 15
21EC71 Advanced VLSI
We know that we can use net weights in our algorithms. The problem is to calculate
the weights. One method finds the n most critical paths (using a timing-analysis engine,
possibly in the synthesis tool). The net weights might then be the number of times each net
appears in this list. The problem with this approach is that as soon as we fix (for example)
the first 100 critical nets, suddenly another 200 become critical.
FIGURE 16.29 The zero-slack algorithm. (a) The circuit with no net delays. (b) The
zero-slack algorithm adds net delays (at the outputs of each gate, equivalent to increasing
the gate delay) to reduce the slack times to zero.
With the zero-slack algorithm we simplify but overconstrain the problem. For
example, we might be able to do a better job by making some nets a little longer than the
slack indicates if we can tighten up other nets. What we would really like to do is deal with
paths such as the critical path shown in Figure 16.29 (a) and not just nets . Path-based
algorithms have been proposed to do this, but they are complex and not all commercial
tools have this capability.
www.bldeacet.ac.in Page | 16
21EC71 Advanced VLSI
www.bldeacet.ac.in Page | 17
21EC71 Advanced VLSI
ROUTING
Once the designer has floorplanned a chip and the logic cells within the flexible
blocks have been placed, it is time to make the connections by routing the chip.
1. global routing
2. followed by detailed routing
GLOBAL ROUTING
The details of global routing differ slightly between cell-based ASICs, gate arrays,
and FPGAs, but the principles are the same in each case. A global router does not make any
connections, it just plans them. We typically global route the whole chip (or large pieces if it
is a large chip) before detail routing the whole chip (or the pieces).
● Maximize the probability that the detailed router can complete the routing.
www.bldeacet.ac.in Page | 18
21EC71 Advanced VLSI
global
routing
sequential hierarchical
routing routing
One approach to global routing takes each net in turn and calculates the shortest path
using tree on graph algorithms with the added restriction of using the available channels.
This process is known as sequential routing.
There are two different ways that a global router normally handles this problem.
Using order-independent routing, a global router proceeds by routing each net, ignoring
how crowded the channels are. Whether a particular net is processed first or last does not
matter, the channel assignment will be the same. In order-independent routing, after all the
interconnects are assigned to channels, the global router returns to those channels that are
the most crowded and reassigns some interconnects to other, less crowded, channels.
Starting at the whole chip, or highest level, and proceeding down to the logic cells is
the top-down approach. The bottom-up approach starts at the lowest level of hierarchy and
globally routes the smallest areas first.
www.bldeacet.ac.in Page | 19
21EC71 Advanced VLSI
FIGURE 17.4 Global routing for a cell-based ASIC formulated as a graph problem. (a) A
cell-based ASIC with numbered channels. (b) The channels form the edges of a graph. (c)
The channel-intersection graph. Each channel corresponds to an edge on a graph whose
weight corresponds to the channel length
Figure 17.5 shows an example of global routing for a net with five terminals, labeled A1
through F1, for the cell-based ASIC shown in Figure 17.4 . If a designer wishes to use
minimum total interconnect path length as an objective, the global router finds the minimum-
length tree shown in Figure 17.5 (b). This tree determines the channels the interconnects will
use.
FIGURE 17.5 Finding paths in global routing. (a) A cell-based ASIC (from Figure 17.4 )
showing a single net with a fanout of four (five terminals). We have to order the numbered
channels to complete the interconnect path for terminals A1 through F1. (b) The terminals
are projected to the center of the nearest channel, forming a graph. A minimum-length
tree for the net that uses the channels and takes into account the channel capacities. (c)
The minimum-length tree does not necessarily correspond to minimum delay. If we wish to
minimize the delay from terminal A1 to D1, a different tree might be better.
www.bldeacet.ac.in Page | 20
21EC71 Advanced VLSI
BACK-ANNOTATION
After global routing is complete it is possible to accurately predict what the length
of each interconnect in every net will be after detailed routing, probably to within 5 percent.
The global router can give us not just an estimate of the total net length (which was all we
knew at the placement stage), but the resistance and capacitance of each path in each net.
This RC information is used to calculate net delays. We can back-annotate this net delay
information to the synthesis tool for in-place optimization or to a timing verifier to make
sure there are no timing surprises. Differences in timing predictions at this point arise due
to the different ways in which the placement algorithms estimate the paths and the way the
global router actually builds the paths
www.bldeacet.ac.in Page | 21