0% found this document useful (0 votes)
153 views100 pages

ASIC Module - 3

Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views100 pages

ASIC Module - 3

Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Module 3

Low Level design entry


SCHEMATIC ENTRY:
• graphical design entry
• transforms an idea to a computer file
• an“old” method that periodically regains popularity
• schematic sheets
• frame
• border
• “spades” and “shovels”
• component or device
• low-cost
HIERARCHICAL DESIGN
Schematic example showing hierarchical
design
(a) The schematic of a half-adder, the
subschematic of cell HADD
(b) A schematic symbol for the half adder
(c) A schematic that uses the half-adder cell
(d) The hierarchy of cell HADD

• Use of hierarchy to hide complexity


• hierarchical design
• subschematic
• child
• parent
• flat design
• flat netlist
THE CELL LIBRARY:
• Modules (cells, gates, macros, books)
• Schematic library (vendor-dependent)
• Retargeting
• Porting a design
• Hard macro(placement)
• Soft macro (connection)
NAMES:
• Cell name
• Cell instance
• Instance name
• Icon (picture)
• Symbol
• Name spaces
• Case sensistivity
• Hierarchical names
Schematic Icons and Symbols:
• derived icon
• derived symbol
• subcell
• vectored instance
• cardinality
A 4-bit latch:
(a) drawn as a flat schematic from gate-level primitives
(b) drawn as four instances of the cell symbol DLAT
(c) drawn using a vectored instance of the DLAT cell symbol with cardinality of 4
(d) drawn using a new cell symbol with cell name FourBit
NETS:
• local nets
• external nets
• delimiter
• Verilog and VHDL naming
Schematic Entry for ASICs and PCBs
• In PCB design:
• Component ,wires
• example for component isTTL SN74LS00N Quad 2-input
NAND
• Where each Nand gate is a Component part
• Reference Designator( component label or name or an
unique name attribute such as R99 ).
• Pin number
• Part assignment(used in PCB not in ASIC).
• In ASIC a Hierarchical naming Scheme is used.
Eg:1)cpu.alu.adder.and01 or
2)motherboard:cache:RAM4:Readbit4:Inverter2
Connections
• Terminals also termed as Pins, connectors, or
signals
• Wire segments or nets
• Bus or buses (not busses) (related buses)
• Bundle or array ( non related buses)
• Breakout or Ripper (EDIF) or Extractor
• Swizzle (re-arranging of bits on a bus)
Vectored Instances and Buses

A 16-bit latch:
(a) drawn as four instances of cell FourBit
(b) drawn as a cell named SixteenBit
(c) drawn as four multiple instances of cell FourBit
Edit-in-Place:
• edit-in-place
• alias
• dictionary of names
Attributes
• name
• identifier or label
• We can attach an attribute or property which
signifies some aspect of the component,cell
instance, net or connector.
• NFS filenames (28 characters)
Netlist Screener
• schematic or netlist screener catches errors at
an early stage(simple errors) :
unconnected cell inputs
unconnected cell outputs
nets not driven by any cells
too many nets driven by one cell
nets driven by more than one cell.
• handle (to find components)
• snap to grid
• wildcard matching
• automatic naming
• datapath (multiple instances)
• vectored cell instance
• vectored instance
• cell cardinality
• cardinality
• terminal polarity
• terminal direction
• fanout
• fanin
• standard load
Similarity between ASIC and town
planning:
• A microelectronic system (or system on a
chip) is the town and ASICs (or system
blocks ) are the buildings
• System partitioning corresponds to town
planning.
• Floorplanning is the architect’s job.
• Placement is done by the builder.
• Routing is done by the electrician.
Physical Design
• Divide and conquer
• System partitioning
• Floor-planning (or chip planning)
• Placement
• Routing
• Global routing (or loose routing)
• Local routing (or detailed routing).
CAD Tools
• goals and objectives for each physical design step:
System partitioning:
• Goal: Partition a system into a number of ASICs.
• Objectives: Minimize the number of external connections between
the ASICs. Keep each ASIC smaller than a maximum size.
Floorplanning:
• Goal:Calculate the sizes of all the blocks and assign them locations.
• Objective: Keep the highly connected blocks physically close to each
other.
Placement:
• Goal: Assign the interconnect areas and the location of all the logic
cells within the flexible blocks.
• Objectives: Minimize the ASIC area and the interconnect density.
Global routing:
• Goal: Determine the location of all the
interconnect.
• Objective: Minimize the total interconnect area
used.
Detailed routing:
• Goal: Completely route all the interconnect on
the chip.
• Objective: Minimize the total interconnect length
used.
Methods and Algorithms
• methods or algorithms are exact or heuristic (algorithm is usually
reserved for a method that always gives a solution)
• The complexity O(f(n)) (order (f(n)), if there are constant k and n0so
that the running time T(n) < k*f(n) for all n> n0.
-Where n is a measure of size of problem
• Usually n is very large for ASIC’s.
• The notation does not specify the units of time.
• O(n2) in Nano seconds is better than O(n) in seconds for large n in
ASIC’s.
• O(n) represents the upper limit for the running time of algorithm, A
practical algorithm might take less time than this.
• The constants k and n0 has to selected carefully as they can hide the
overhead in implementation and the dependence on n, upto large
value of n.
• algorithms may be constant, logarithmic, linear, or quadratic in time
1)f(n) = constant(constant in time) steps in algorithm is reapeted one or few times only.
2)f(n) = log n ( logarithmic in time) this happens when a big problem is transformed into a
smaller one.
3)f(n)=n (linear in time) this is good situation for an ASIC algorithm that works with n
objects.
4)f(n) = n log n where a big problem is divided into number of small problems wher each
of them is solved independently.
5) f(n)= n2 (quadratic in time ) it is only practical for small ASIC problems.

• many VLSI problems are NP-complete


• we need metrics: a measurement function or objective function, a cost function or
gain function, and possibly constraints.
System Partitioning:
• CAD can help us with making decision for system
partitioning.
• What we can’t do “What is the cheapest way to build my
system?”
• What we can do “How do I split this circuit into pieces that
will fit on a chip?”
• Microsystems SPARCStation -1 is partitioned into:
– 9-ASIC’s
– 3 memory subsystem(CPU cache, RAM , Memory cache)
– 6-ASSP for I/O
– An ASSP for Time of the day
– An EEPROM
– Video memory subsystem
– 1-A to D ASSP DAC
• Some of the partitioning of the system is based on
whether to use custom ASIC’s or ASSP.
• Some of these design decision are based on intangible
issues such as:
– Time to market
– Previous experience with a technology.
– Ability to reuse part of the design from previous product.

• No tool can help us make this decisions as goals and


objectives are poorly defined and finding a way to
measure this factors is also difficult.
• Where:
• PGA- Pin grid array
• CBIC- LSI Logic cell based ASIC
• PQFA- Plastic quad flat pack
• PLCC-Plastic leaded chip carrier
• GA-LSI Logic channelless gate array
• FC- Full custom.
Estimating ASIC size:
• Estimatethediesizeofa40k-gateASICina0.35μmgatearray,three-
levelmetalprocesswith166I/Opads.
• •Die size includes core size and I/O size.
• •Core size(logic and routing)=(gates/gate density) ×routing factor
• ×(1/gate array utilization)
• –Gate density=standard cell density ×gate array utilization
• •I/O size = a2 where a is the one side of die.
• –One side of die= No of I/O pads in a side ×I/O pad pitch

• 1μm(micron)=0.0393701mil
• (1mil= Thousands of inch)
• Suppose we wish to estimate the die size of a 40K-gate ASIC in a0.35um
gate array, three level metal process with 166 I/O pads.
• For an ASIc of min Feat size 0.35um
• ʎ=min faet size /2=0.175um
• Using the data table the standard cell density =5*10-4 gate/ʎ2 .
• Gate density= 0.35um standard cell density *(0.8 to 0.9)(GAU)
• = (5*10-4 *0.8 )to (5*10-4 * 0.9)
• =4*10 -4 to 4.5 *10 -4 gate/ʎ2.
• Which gives the core size as :
• (4*10 -4 gates/gate density)* routing factotr*(1/gate array utilization)
• = (4*10 -4 )/( 4*10 -4 to 4.5 *10 -4 )*(1 to 2)*1/(0.8 to 0.9)
• =10 8 to 2.5* 10 8 ʎ2
• =4840 to 11,900 mil 2
• Now we need to add(0.175/0.5)*2*(15 to 20)
• =10.5 to 21 mil/side for the pad heights.
• with a pad pitch of 5 mil and roughly I/O pins of
166/4=42 I/O pin on one side we need a die
atleast 5*42= 210mil on a side for the I/O’s.
• Of this area only 1.19*104 /4.4* 104 =27%
Is used for core logic
• A severely pad limited design hence rethink
partitioning of system.
• The table shows the typical areas for data-path
elements, also we can use these data-path elements in
floating point arithmetic.
• Since these elements are large you should not use
floating point arithmetic unless you have to.
– A leading –one detector with barrel shifter normalizes a
mantissa.
– A priority encoder corrects exponents due to mantissa
normalization.
– A de-normalizing barrel shifter aligns mantissas.
– A normalizing barrel shifter with a leading-one detector
normalizes mantissa subtraction.
• Most of he elements have an area per bit, that
depends on the number of bits in the datapath
– Sometimes it varies linearly(multipliers and barrel
shifters)
– For some data-path elements it varies logarithmically
(base 2)with the data-path width(the leading one all
one and zero detectors).
– In some elements you might expect there to be
dependency on data-path width but it is small
(Comparators).
• The exact size depends on the architecture(eg:
carry-save, carry-select etc).
• The area figures also exclude the routing
between data-path elements which is difficult to
predict – as it depends on
--number and size of the data-path elements,
--their type ,
--and how much logic is random and how
much is datapath.
• The above figure(a) shows the typical size of the SRAM constructed
on an ASIC .
• these figures are based on the use of RAM compilers using a
standard CMOS ASIC process , typically using a six transistor cell (
as opposed to building memory from FLIP-FLOP or latches.).
• The actual size of the memory depends on
– 1)the required access time ,
– 2)the use of synchronous and Asynchronous read and write,
– 3)the number and types of ports(read-write).
– 4) the use of special design rules,
– 5) the number of inter connect layers available
– 6) the RAM Architecture( number of devices in RAM cells).
– 7)the process technology.( active pull-up devices or pull-up resistor).
• The max size of the SRAM in figure above is 32-k
bit which occupies approximately =6.0*107ʎ2.
• In a 0.5um process( with ʎ=0.25um) the area of
the 32-k Bit SRAM
=6.0*107*0.25*0.25=3.75*106um2.
• If we need a large SRAM than this then we need
to consult with our ASIC vendor to find out how
to implement a large on- chip memory.
• The figure (b) shows the typical sizes for
multipliers.
• Again the actual multiplier size will depend on
architecture, the process technology, and
design rules.
A simple partitioning Example:
• Consider 12 logic cells labeled as (A-L) and 12 nets(1-12).
• At this level each logic cell is large circuit block and might
be RAM,ROM, an ALU, and so on.
• Each net might be a bus or a single wire.
• But we are assuming it to be a single wire to keep it simple
and of equal weight.
• Objectives of partitioning are as follows:
– Use no more than 3 ASIC’s
– Each ASIC is to contain no more than 4 logic cells
– Use the minimum number of external connections for each
ASIC.
– Use the minimum total number of external connection.
• The solution can be found manually but for large
designs assistance of CAD Tool is required.
• Splitting a network into several pieces is a Network
partitioning problem.
• Here we discuss about 2 algorithms to solve this
problem:
– Constructive partitioning.(uses a set of rules to find a
solution.)
– Iterative Partitioning Improvement(takes the existing
solution and tries to improve it).
– We also use this partitioning algorithms also to solve
Floor-planning and placement problems.
Constuctive partitioning:
• Most common Constructive partitioning uses Seed- growth
or Cluster growth algorithms.
• Simple seed-growth algorithm uses following steps:
– Start new partition with a seed logic cell
– Consider all the logic cells that are not yet in a partition.Select
each of these logic cells in turn.
– Calculate the gain function g(m), that measures the benefit of
adding the logic cell m to the current patition. One measure of
gain is the number of connection s between logic cell m and the
current partition.
– Add the logic cells with highest gain g(m) to the current
partition.
– Repeat the process from step 2. If you reach the limit of logic
cells in a partition, start again at step1.
Constructive partitioning:
• We can choose different gain functions according to our objectives (
but we have to be careful to distinguish between connections and
nets
• The algorithm starts with a seed logic cell.
• The logic cell with the most nets is a good choice as the seed logic
cell.
• We can also use a set of logic cells termed as cluster.
• Some also use the term Clique(graph therory).
• Clique of a graph means a subset of nodes where each pair of node
is connected by an edge.
• In some Tools we may use schematic pages as a starting point for
partitioning
• If we use a high level design language , you can use a verilog
module or a VHDL Entity/ Architecture as seeds.
Itterative partitioning improvement:
• The most common itterative improvement
algorithm are based on :
Interchange and
Group migration
Interchange: the process of interchanging
(swapping) the logic cells in an effort to improve the
partition, is an interchange method.
If the swap improves the partition we accept the
trial interchange ; otherwise we select a new set of
logic cells to Swap.
• There is a limit to what we can achieve with an partitioning algorithm
based on simple interchange.
• Here the figure shows A constructed partition using logic cell C as a seed.
It is difficult to get from this local minimum, with seven external
connections (2, 3, 5, 7, 9,11,12), to the optimum solution of Fig1.
Improvement in Partitioning
• To get from the solution shown in Fig 2 to the solution of Fig
1, which has a minimum number of external connections,
requires a complicated swap.
• The three pairs: D and F, J and K, C and L need to be swapped
Iterative Partitioning Improvement
• Algorithm based on Interchange method and group migration method
• Interchange method(swapping a single logic cell): If the swap improves the
partition, accept the trail interchange otherwise select an set of logic cells to swap.
• Example: Greedy Algorithm–
-It considers only one change
- Rejects it immediately if it is not an improvement.
-Accept the move only if it provides immediate benefit.
-It is known as local minimum.
• Group Migration( swapping a group of logic cell ):Group migration consists of
swapping groups of logic cells between partitions.
• The group migration algorithms –
• Advantages : better than simple interchange methods at improving a solution
• Disadvantages: but are more complex.
• Example:Kernighan–LinAlgorithm(K-L):
• Min-cutProblem:Dividing a graph into two pieces,minimizing the nets(edges) that
are cut
The Kernighan–Lin Algorithm

If we assign cost to each edge of the network graph we can define a cost matrix C=cij
Where Cij=Cji and Cii=0.
If all connections are equal in importance then cost of thematrix are 1 or 0, and in this
special case we call the matrix as connectivity matrix
Costs higher than 1 can represent the number of wires in the bus,multiple connections to a
single logic cell, or nets that we need to keep close for timing reasons.
Each external edge may be weighted by a cost,
and our objective is to reduce the cost.
• Total external cost, cut cost, cut weight:

• External edge cost :


E1=1 and E3=0
• Internal edge cost :
I1=0 and I3=2
The cost difference :Dx = Ex-Ix;D1=1and D3=-2.
• the reduction in cut weight =Gain:
where :
• The K–L algorithm finds a group of node pairs to swap that
increases the gain even though swapping individual node pairs
from that group might decrease the gain.
• The steps of K-L algorithm are:
1. Find two nodes, ai from A, and bi from B, so that the gain from
swapping them is a maximum. The gain gi is

2. Next pretend swap ai and bi even if the gain gi is zero or negative,


and do not consider ai and bi eligible for being swapped again.
3. Repeat steps 1 and 2 a total of m times until all the nodes of A and B
have been pretend swapped. We are back where we started, but we
have ordered pairs of nodes in A and B according to the gain from
interchanging those pairs.
• Now we can choose which nodes we shall actually swap. Suppose
we only swap the first n pairs of nodes that we found in the
preceding process. In other words we swap nodes X = a1, a2, &….,
an from A with nodes Y = b1, b2,&…..,bn from B. The total gain
would be,

• 5. We now choose n corresponding to the maximum value of Gn


• • If the maximum value of Gn > 0, then swap the sets of nodes X
and Y and thus reduce the cut weight by Gn .
• • Use this new partitioning to start the process again at the first
step.
• • If the maximum value of Gn = 0, then we cannot improve the
current partitioning and we stop.
• • We have found a locally optimum solution.
Fiduccia-Mattheyses(F-M) Algorithm:
• Addresses the difference between nets and edges.
• Reduce the computational time.
Key Features of F-M:
• Base logic cell -Only one logic cell moves at a time.
-Base logic cell is chosen to maintain balance between partitions in order to
stop the algorithm from moving all the logic cells to one large partition.
-Balance-the ratio of total logic cell size in one partition to the total logic cell
size in the other . Altering the balance allows us to vary the sizes of the
partitions.
• Critical nets-used to simplify the gain calculations.
–A net is a criticalnet if it has an attached logic cell that , when swapped ,
changes the number of nets cut.
–It is only necessary to recalculate the gains of logic cells on critical nets that
are attached to the base logic cell.
• The logic cells that are free to move are stored in a doubly linked list . The lists are
sorted according to gain . This allows the logic cells with maximum gain to be found
quickly.
• Reduce the computation time-increases only slightly more than linearly with the
number of logic cells in the network.
Features of FM Algorithm:
Modification of KL Algorithm:
–Can handle non-uniform vertex weights (areas)
–Allow unbalanced partitions
–Extended to handle hypergraphs
–Clever way to select vertices to move, run
much faster.
Ideas of FM Algorithm:
•Similar to KL:
–Work in passes.
–Lock vertices after moved.
–Actually, only move those vertices up to the
maximum partial sum of gain.
•Difference from KL:
–Not exchanging pairs of vertices.
--Move only one vertex at each time.
–The use of gain bucket data structure.
FM Partitioning:
• Moves are made based on object gain.
• Object Gain: The amount of change in cut crossings that
will occur if an object is moved from its current partition
into the other partition.
-each object is assigned a gain
-objects are put into a sorted gain list
-the object with the highest gain from the larger of the two
sides is selected and moved.
-the moved object is "locked"
-gains of "touched" objects are
recomputed
-gain lists are resorted
Time Complexity of FM
•For each pass,
–Constant time to find the best vertex to move.
–After each move, time to update gain buckets
is proportional to degree of vertex moved.
–Total time is O(p), where p is total number of
pins
•Number of passes is usually small.
Overcome problems in K-L using F-M algorithm
•To generate unequal partitioning
–Dummy logic cells with no connections
introduced in K-L algorithm
–Adjust partition size according to balance
parameter in F-M algorithm
•To fix logic cells in place during partitioning
–That logic cells should not be considered as
base logic cells in F-M algorithm.
.
Look-ahead Algorithm:

• Why Look-ahead?
•K–L and F–M algorithms consider only the immediate gain to be
made by moving a node.
•When there is a tie between nodes with equal gain (as often
happens), there is no mechanism to make the best choice.
Algorithm
•The gain for the initial move is called as the first-level gain.
•Gains from subsequent moves are then second-level and higher
gains.
•Define a gain vector that contains these gains.
•The choice of nodes to be swapped are found Using the gain
vector.
•This reduces both the mean and variation in the number of cuts
in the resulting partitions
• An example of network partitioning that shows the need to look
ahead when selecting logic cells to be moved between partitions.
• Partitionings(a), (b), and (c) show one sequence of moves –Partition
I
• Partitionings(d), (e), and (f) show a second sequence –Partition II
Partition I:
• The partitioning in (a) can be improved by moving node 2 from A to
B with a gain of 1. The result of this move is shown in (b). This
partitioning can be improved by moving node 3 to B, again with a
gain of 1.
Partition II:
• The partitioning shown in (d) is the same as (a). We can move node
5 to B with a gain of 1 as shown in (e), but now we can move node 4
to B with a gain of 2.
• The algorithm discussed until now are dividing
the system into 2 partition but if we intend to
split the system into more than 2 partition
then we have to apply the algorithm
iteratively/recursively.
• Suppose we wish to make 3 partition then we
first apply FM algorithm with 2:1 balance and
later apply FM to larger partition to divide it
further to get roughly 3 equal parts.

You might also like