0% found this document useful (0 votes)
30 views11 pages

Routing

The document provides an overview of FPGA routing and architecture. It discusses the routing process and dependencies on FPGA architecture. It also describes the general architecture of Xilinx FPGAs and routing resources such as connection boxes and switch boxes. The document aims to help understand FPGA routing and its relationship to target architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views11 pages

Routing

The document provides an overview of FPGA routing and architecture. It discusses the routing process and dependencies on FPGA architecture. It also describes the general architecture of Xilinx FPGAs and routing resources such as connection boxes and switch boxes. The document aims to help understand FPGA routing and its relationship to target architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

23

Tutorial on FPGA Routing


Daniel Francisco Gómez Prado

Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, USA

I. INTRODUCTION To understand better this dependency between


routing and the target architecture, an overview of one
The entire CAD process that is necessary to of the most important commercially available FPGAs is
implement a circuit in an FPGA (from the RTL shown below
description of the design) consists of the following • Xilinx
steps:
• Logic optimization. Performs two-level or The general architecture of Xilinx FPGAs consists
multi-level minimization of the Boolean of a two-dimensional array of programmable blocks,
equations to optimize area, delay, or a called Configurable Logic Blocks – CLBs [24], with
combination of both. horizontal and vertical routing channels between CLB’s
• Technology mapping. Transforms the rows and columns. The routing resources available on
Boolean equations into a circuit of FPGA logic this architecture are:
blocks. This step also optimizes the total
number of logic blocks required (area A. Connection boxes
optimization) or the number of logic blocks in
time-critical paths (delay optimization). The C boxes connect the channel wires with the
• Placement. Selects the specific location for input and output pins of the CLBs. It has two major
each logic block in the FPGA, while trying to properties that can affect the routability of a design: its
minimize the total length of interconnect flexibility, Fc, which is the number of wires that each
required. logic block pin can connect to; and its topology, which
• Routing. Connects the available FPGA’s is the pattern of switches2 that make the connection
routing resources1 with the logic blocks (especially if Fc is low). For example in figure 1, for a
distributed inside the FPGA by the placement C box with Fc = 2, topology 1 can not connect pin A
tool, carrying signals from where they are with pin B, meanwhile topology 2 can.
generated to where they are used.

Routing is an important step of the process as most


of the FPGA’s area is devoted to the interconnect [21],
and the interconnection delays are greater than the logic
delays of the designed circuit. Therefore an efficient
routing algorithm tries to reduce the total wiring area
and the lengths of critical-path nets to improve the
performance of the circuit; and for this, the router needs
the interconnect information of the target FPGA
architecture. This means that the problem of routing is
architecture dependent and therefore the number of
routers needed to route FGPAs is as varied as FPGA
architectures there are in the market.
Fig. 1. Connection box topology

1
The clock net is not considered here as it is usually routed via a dedicated
routing network in commercial FPGAs
2
The switches can be pass transistors or multiplexers

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


24

B. Switch boxes:

The S boxes allow wires to switch between vertical and


horizontal wires. Its flexibility, Fs, defines for a wiring
segment entering the S block the number of other
wiring segments it can be connected to. The topology
of the S blocks is very important since it is possible to
choose two different topologies with the same
flexibility Fs that result in very different routabilities.
For example, figure 2 shows that meanwhile topology 1
can’t connect wire A with B, topology 2 can.

Fig. 2. Switch box topology

Switch boxes that connect only tracks in the same


domain, i.e. 0-0, 1-1, are called planar or subset switch
boxes. Switch boxes that allow connection to any other
domains, i.e. 0-3, 1-2, are called Wilton switch boxes,
and they are broadly used as they provide greater
flexibility on routing.

C. Single-length lines.

They are intended for relatively short connections


among CLBs and they span through one CLB only. See
figure 3.b.

D. Double-length lines.

They are similar to the Single-length Lines, except


that each one spans two CLBs, offering lower routing Fig. 3. Island Style Architectecture
delays for moderately long connection.
Increasing the flexibility of the switch box, the
E. Long lines connection box and the number of wires per channel
makes routing a trivial problem [17] as all possible
They are appropriate for connections that require interconnections are available. But increasing routing
reaching several CLBs with low-skew. See figure 3.c. resources has the drawback that waste area and
transistors in the FPGA, as only a fraction of those
resources will be used for a given design, even worse it
increases the number of interconnect transistors which
are the principal reason of delay on FPGAs.
As FPGAs have prefabricated routing resources, the
router must work within the framework of the
architecture’s resources, deciding exactly which routing
resources will be used to carry the signals between

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


25

logic blocks, and making sure that no more connections • The wire segments span only one logic block
are made through a region than there are resources to before terminating. This means that all
support them. Thus the router must consider the interconnections have to pass as many C boxes and
congestion of signals in a channel, and through multiple S boxes as logic blocks there are between the two
iterations rip out and reroute those congested areas and connecting points.
wires. This search of possible connections to route the Commercial FPGAs have double-length and long
placed logic blocks is not ensured to be feasible and it wires to speed up this kind of connections, and
is possible that after a given number of iterations, 40 avoid congesting the C and S boxes.
for example, the circuit can’t still be routed and the
placement has to be redone. Therefore, together with
the routing algorithm a routability detection algorithm
is clearly desirable to avoid long routing iterations on
designs that eventually will be determined to be
unroutable.

II. THE FPGA MODEL

Academic research has adopted as FPGA


architecture a simplified version of the island style
model from Xilinx. The main reason is that FPGA Fig. 4. The FPGA Model
market share is divided in mainly three companies:
Xilinx with the highest share has an average presence
of roughly half of the total market3, Altera has roughly III. GENERAL BACKGROUND FOR ROUTING
one-third, and Actel has one-six of the market. From
these three companies Actel and Altera have Routing is an NP complete problem4 [23] that is
respectively solved their routing problems by adapting generally separated in two phases using the divide and
channel-style ASIC routing algorithms [1] or over conquer paradigm [8]: a global routing that balances
assigning routing resources [2]; so the active research the densities of all routing channels, and a detailed
area left to academia is the island style architecture routing that assigns specific wiring segments for each
from Xilinx FPGAs, nevertheless this is an important connection [17][18][25]. These two phases avoid
architecture as it is responsible of half of the entire congestion and optimize the performance of the circuit,
FPGAs production [3] on the market. making sure all nets are routed minimizing wirelength
In academia the most common simplifications made and capacitance on the path. By running both
to the island style model are: algorithms a complete routing solution can be created.
Of course there are a number of routing algorithms
• Each logic block has 4 inputs pins and 1 output pin, that solve the problem using a mixed routing, both
and all logic blocks are alike. global and detailed routing at the same time, based on
Commercial FPGAs have logic blocks with the idea that a higher integration of the two phases can
different number of inputs, ranging from 3 to 7, and prevent inaccurate estimation and the routing result will
they provide two or more outputs. be better. The drawback of this approach is that as
• The C box is implemented with pass transistors circuit size grows this mixed routing becomes more
rather than multiplexers for input connections. This complex and less scalable [13].
allows two or more tracks to be electrically
connected via the input pin by turning on individual A. Global routing
switches in the C box. This is called input pin
doglegs. The global router performs a coarse route to
Commercial FPGAs implement the C box via determine, for each connection, the minimum distance
multiplexers to save area, so only one track may be path through routing channels that it has to go through.
connected to the input pin and no input pin doglegs If the net to be routed has more than two terminals the
are possible. See figure 4.

3
FPGA market share research by www.rhk.com/rhk/research and
4
www.icinsight.com There is no polynomial algorithm that can solve the problem.

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


26

global router will break the net into a set of two- different results. For i.e.: if the order in which
terminal5 connections and route each set independently. the two nets are routed in figure 5 is reversed, a
The global router considers for each connection better solution is found.
multiple ways of routing it and chooses the one that
passes through the least congested routing channels. By
keeping track of the usage of each routing channel,
congestion is avoided; and the principal objective of the
global router, balancing the usage of the routing
channels, is achieved.
Once all connections have been coarse routed, the
solution is optimized by ripping up and re-routing each
connection a small number of times. After that, the
final solution is passed to the detailed router.

B. Detail routing

The detail router determines for each two point


connection the specific wiring segments to use in the
routing channel assigned by the global router. To do
this, detail routing algorithms construct a directed
graph from the routing resources to represent the
available connection between wires, C blocks, S blocks
and logic blocks within the FPGA.
The search performed on this directed graph is
usually based on Dijkstra’s algorithm to find the
shortest path between two nodes. The paths are labeled
according to a cost function that takes into account the
usage of each wire segment and the distance of the
interconnecting points. The distance is estimated by
calculating the wire length in the bounding box of the
interconnecting points using a Manhattan metric. Most
of the routers relax the bounding box constraints and
allow searching for possible solutions in the
surrounding routing channels of the bounding box. This
is done to avoid subsequent iterations of ripping out
and re-routing if the solution lies on the near outside of
the bounding box.
The most common detail routing algorithms are:
• Maze Router
The Maze routing algorithm is based on a
wavefront expansion technique that attempts to
find the shortest path between two points while
avoiding any used routing resources [4]. This
algorithm is an iterative process that rips up and
re-routes some of the routes to eliminate
congested routing channels.
The principal drawback of the maze routing is
that it does the routing without taking into
account that the path found can block the
routing of the subsequent nets. This means that
the performance of the algorithm is net ordering
dependent, and different orderings will yield
5
By breaking the multiple output net in a set of two-net connections, the
global router is (most likely) allowing dogleg pin.

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


27

searches in different domains will not be needed. This


is the concept of domain negotiation.
Domain negotiation consists on ranking the domains
based on the usage of its wires adjacent to the output
pins before routing. Then the cost function is modified
by [15]:

fi = (1 − α) × (fi−1 + ci) + α × di + rd
Fig 5. The Maze router wavefront6
Domains with lower congestion will have a lower
• A* Search Routing rank, rd, thus promoting routing in less congested
The maze routing is a special case of the A* domains first.
routing. The A* routing allows to tune the path ƒ The Pathfinder
search from a breadth-first search algorithm into a
shorter depth-first search algorithm. The BFS is an The pathfinder algorithm is based on the maze
exhaustive search that consider all possible paths router, but speeds up the algorithm by routing every
and will find the best path if there is any but has connection on a free obstacle environment and allowing
the drawback that it can be slow; meanwhile, the routing resources to be overused.
DFS may not find the minimum cost path but can After a single iteration of the algorithm, all nets are
be fast. See figure 6. routed once as if they were the only connection to be
Weighting a scaling factor α between 0 and 1 the routed; and the cost of using every resource is
A* routing tunes the search from BFS to DFS. The calculated according to its demand. The cost function
cost function used to evaluate the directed graph for implemented by the pathfinder is [10]:
each node i is [15]:
fi = (1 + hn*hfac) × (1 + pn*pfac) + bn,n+1
fi = (1 − α) × (fi−1 + ci) + α × di
where bn,n+1 is the penalty of bending the wire, pn is
Where ci is the node cost and indicates the current the cost of using a specific wire, hn is the history that
usage of the node and it is used to penalize nodes keeps track of the usage of the wire during previous
occupied by previous routes; fi−1 is the total cost of iterations; and, hfac and pfac are the respective weighting
the previous path, and di is the estimated cost of the factors.
path from the node i to the destination. Subsequent iterations rip up and re-route all nets,
and the process goes on until no overuse of routing
resources exist. This process of ripping out and re-
routing every net allows the pathfinder algorithm to
minimize the net ordering problem of the maze routing.

IV. THE STATE OF THE ART IN ROUTING

The routers described in this section represent the


trend in FPGA routing research. Even thought these are
Fig 6. BFS and DFS algorithms academic tools and they don't actually route any real
FPGA, they are important because modifying the used
In FPGA architectures with planar or subset switch model architecture, the core algorithms implemented on
boxes, wires can only change domain at the output of a these tools can be effectively use to route commercial
logic block or at an input dogleg pin; this means that FPGAs.
the route from output to input is confined to the same
track domain. Therefore in a DFS search, it is important A. VPR: Versatile Place and Route
to attempt routing first in domains that have high
probabilities of completion, so that subsequent DFS The VPR router is one of the most versatile routers
available in academia as it allows describing the
6
targeting architecture. It can be used to route island
source: http://foghorn.cadlab.lafayette.edu/cadapplets/MazeRouter.html
style FPGAs as well as row-based FPGAs [19]. In this

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


28

router the type of switch boxes for the FPGA can be dynamically inside the iteration of the algorithm, when
chosen to be [20]: planar, Wilton or universal; different routing every net; and the other one is computed once
length of wires can be defined, input dogleg pins can be at the beginning of each iteration.
allowed or disallowed; and the parameters of the cost
function can be modified. The VPR router can performƒ VPR’s timing-driven router
a global routing or a combined global-detailed routing;
being the VPR combined router able to change the The objective of this algorithm is to increase
current global routing configuration when it can not hardware circuit speed. To do this, it adds an Elmore
easily find a detail routing solution [6]. delay model to the function cost, so the routing gives
This router is based on a modified version of the preference to those solutions with minimum delay. To
Pathfinder algorithm, and it can run in two different set an upper bound on delay, this algorithm starts
flavors to target two different main objectives: routing the nets with most distant connections first. The
imposed ordering on routing produces suboptimal track
• VPR’s routability-driven router counts, and faster results.
Another modification common to both approaches is
The primary objective of this algorithm is routing a that for multiple output nets the maze wavefront
design successfully with minimum track count. For expansion is modified. As mentioned before the global
this, the routability-driven incorporates a modified route breaks all n terminal net into n-1 two-terminal
routing cost model as show below [22]: nets, and it performs n-1 iterations of the wavefront
expansion to connect the nets. The normal maze router
costn = bn*hn*pn + bendn,m empties the wavefront expansion for each iteration,
meanwhile the VPR router does not empty the current
where bn is the base cost, usually 1 or 0.95 for most wavefront, it adds all the routing resource segments
routing resources and 0 for sinks, the latter is to prevent required to connect the reached terminal to the
the algorithm to keep searching for possible wavefront with a cost of 0, and it continue expanding
connections if the sink can already be reached. Note normally; therefore, the next terminal will be reached
that congestion in the sink is not possible as it will much more quickly than if the entire wavefront
mean that the design requires an input to be driven by expansion would have been started from scratch. Figure
two different sources, therefore a base cost of 0 for the 7 shows a) a wavefront expansion; a normal maze
sink improves the running time of the algorithm router when reaches a terminal net empties the
without affecting its quality. bendn,m penalizes bending wavefront and restart it from the beginning, as shown
the wire when routing and it is only taken into account in b), meanwhile the VPR router adds the last net found
by the global routing. pn is the present congestion to the wavefront with a cost of 0 and continues
penalty and its value is the difference between the expanding it, see c).
number of nets using a channel and the number of
wires that can be placed on that channel. It is call
present congestion because its value is updated within
an iteration of the algorithm to avoid overusing a
channel. Its cost is given by:

pn = 1 + max( 0 , [1 + occupancyn – capacityn]*pfac )

with pfac equal to 0.5 in the first iteration and to 1.5


or 2 times its previous value in subsequent interations.
hn is the historical congestion penalty and it keeps track
of previous cost of the resources, thus avoiding reusing Figure 7. VPR wavefront expansion
a channel in subsequent iteration. It starts with a value
of 1 in the first iteration and then it is: Other than the modifications stated above the VPR
algorithm behaves as a Pathfinder algorithm [19]
hin = hi-1n + max( 0 , [1 + occupancyn – capacityn]*hfac ) routing each net by the shortest path it can find
regardless of any overuse of routing resources, and
where hfac is a constant value between 0.2 and 1. ripping up and re-routing every net in the circuit and
The present congestion and the historical congestion recalculating the cost of using a given routing resource.
are computed similarly, but one is computed

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


29

B. ROAD: An Order-Impervious Optimal Detailed clique while routing a path φ, and the maximum
Router for FPGAs tracks per channel available in the FPGA is t,
then if bumping the net produces (k+m) > t the
The routers described so far have been based on the depth first search is prune as the solution will be
rip-out and reroute paradigm. The ROAD router is unfeasible. The number m in the inequality
based instead in the bump and refit B&R paradigm. above is the number of unusable tracks for the
The main idea of this paradigm is to modify the nets clique, this is the m nets in the clique whose
already routed when a new conflicting net is found. It adjacent are ancestor of the path φ; and
starts by routing the nets one by one until a conflict is therefore they can’t be used to route the actual
found, if there are other tracks that can successfully path as they will cause a cyclic conflict.
route the conflicting net, the problem is solved and the
next net is routed. In the case that all routing resources • Lookahead transition cost functions: This is a
have been used and no other tracks are available, the cost function that measures which transition of
router bumps all tracks conflicting with the resource T →T
the net ni on track Tj to the track Tk , ni j k , is
needed, and then all those unrouted net segments are
refitted, as at least one of them wont be able to fit more likely to succed so fewer searches are
properly in the design, this would cause the unfitted performed and backtracked. Two principal
track to be routed through another channel possibly factors considered on the cost function are the
bumping another tracks. In this way, the B&R total wirelength of the bumped nets and the
paradigm makes net congested areas to be depopulated. flexibility of the bumped nets. This flexibility
Therefore in the B&R algorithm when a track is means that if there is a solution with one net of
bumped, the bumped track can be propagated wirelength 9, and there is another solution
producing a path with many bump searches until a produced by 3 smaller nets each of wirelength 3,
vacant resource or a spare routing area is found; and if the set of smaller nets will be more likely to
all possible paths are exhausted and no solution exists produce a feasible solution as it is easier to
or a cycle is detected (a previous bump in the path is move smaller nets than a big one. The cost
revisited) a backtrack to the initial conflicting resource function will be [7]:
is done and another track is bumped instead.
Even thought this represents no problem for an ∑ l (n ) j
incremental router, in which some nets have been T →T
TC1 (ni j k )=
n j ∈ad j Tk ( ni )

added to an existing routing in an FPGA, and the goal ad j Tk ( ni )


is to route the new connections without changing the
global topology of the existing nets; for a detail router Where ad j Tk ( ni ) are the neighbors of ni that are on the
this represents a main problem as many more of the track Tk , and l(nj) is the length of nj in terms of the
prior routed nets are bumped, thus leading to extensive tracks segment it occupies. The previous function can
and time consuming depth first based searches. be further improved by looking ahead to the next
To overcome this problem three major transitions, this is done by calculating the minimum
enhancements are done to the space search in the B&R cost of going from Tj to the track Tk and then from Tk to
algorithm: the track Tl , the final cost is given by:
• Learning-based search space pruning: This
∑ min
T →Tk
technique records the unsuccessful bumps of a ∀Tl TC1 (ni j )
n j ∈ad j Tk ( ni )
net, and if later on, in another depth first search TC 2 (niTk →Tl ) =
process we try to bump the same net again, a ad j Tk (ni )
comparison on the search graphs is done. If both
search graphs are found to be isomorphic the
bumping of the net is disregarded as it will be The three enhancements do not affect the quality of
unsuccessful, thus saving search time. the routing result as they only prune results that are
• Clique-based search space pruning: This suboptimal or search spaces that do not yield any result;
technique dynamically determines the presence and with these enhancements the basic B&R algorithm
of cliques among nets and its size k is used to is sped up 604 times, which makes the algorithm fast
determine the minimum number of different enough to perform not only incremental but complete
tracks needed to route successfully the nets in detail routing.
the clique. If we attempt to bump a net in the

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


30

The ROAD detail router based on B&R is routing solution exists with the given placement and
implemented such as if no solution exits (the circuit is global routing topology.
unroutable) one track is added to the channel so the It is important to note that the equations are written
router can find the minimum track solution for the in the CNF form and they are not represented as a
given placement and global assignment. This router is BDD7. A BDD satisfiability approach explicitly
said to be independent of the net order in which it represents all possible satisfying assignments as paths
routes because bumping previous routes is equivalent to through the BDD directed acyclic graph, and any path
reversing previous routing decisions, or changing the found in this graph to 1 is said to satisfy the problem,
order in which the nets are routed. and if such a path doesn’t exist it is said to be
unsatisfiable. The problem with BDDs is that without a
good variable ordering the BDD graph can become
C. Routing Approach Via Search-Based Boolean memory-unmanageable in intermediate computations;
Satisfiability and finding a good variable ordering is an NP-complete
problem.
This approach addresses the routing problem Instead of BDDs the SAT instances created from the
completely different, transforming the complex routing constraint Boolean function are solved using
interactions of the routing constraints as a Boolean GRASP [5][6][16], a generic search algorithm for the
function. It has two main virtues: all paths for all nets satisfiability problem. GRASP is based on search
are considered simultaneously as the routing constraints techniques that analyze conflicts on the graph and base
are a set of equations that need to be satisfied on this it can prune large portions of the search space.
simultaneously; and the Boolean function is satisfiable The analysis yields the causes that produce conflicts,
if and only if the design is routable. The latter means and this information is recorded to recognize similar
that if there is no satisfying assignment for the Boolean conflicts on the graph and assignments that are
function, it is proven that no routing solution exist for necessary for a solution to be found. This means that
the design, for the given placement and global route GRASP searches to find one satisfying assignment, and
assignment. must search exhaustively to conclude that no satisfying
The Satisfiability-Based Detailed Router (SDR) assignments exist; a trade-off of more search-time for
takes as input the connections assigned for a global manageable memory sizes.
router and produces two types of constraints [6]: The
connectivity constraints, ensure that a net has a
continuous path between the net terminals; these V. CONTRAST OF THE APPROACHES
constraints form a list of tracks and C boxes that can
possibly form the path in channel segment. And the To thoroughly understand the differences among the
exclusivity constraints, ensure that different nets are routers previously described, it is necessary to compare
assigned to different tracks in a channel so no overlap them based on the ideal objectives of any router:
occurs. In the intersection of horizontal and vertical
channels (S boxes) the constraints of a same net are • Unroutability detection
connected by the logic operation AND. Only the SAT approach is able to prove that the
Once these constraints have been formulated they layout is unroutable for a given placement and
are transformed and encoded into Boolean equations global route assignment, though this conclusion
represented in Conjunctive Normal Form – CNF can take long, as it has to determine that the
clauses. The conjunction of all connectivity and SAT problem is unsatisfiable and this means
exclusivity constraints for all nets form the routing that all possible search combination have to be
constraint Boolean function, which models the routing done. VPR can not determine routability and it
problem as a whole. The Boolean SAT solver takes as stops searching after 30 iterations assuming by
input the routability function and tries to satisfy the then that the circuit is unroutable, during these
assignments or to prove that the given layout is not iterations VPR global-detailed router can
satisfiable. If the layout is satisfiable, the solution is an modify the global assignment if it simplifies the
assignment of binary values 1 or 0 to the Boolean detail routing operation. This characteristic
variables which encode the track variables. This allows VPR to find routing solutions that the
information is transformed into an assignment of actual SAT solver determines as unroutable, as the
routing resources (tracks, C boxes and corresponding S SAT solver relies heavily on the global
boxes) to nets which forms the actual FPGA routing
7
solution. If it is not satisfiable, then no legal detailed Binary Decision Diagram is a graph representation for Boolean functions
based on the Shannon expansion.

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


31

assignment, and the VPR actually performs • Minimum Track count


modification on the global assignment if Both ROAD and VPR routability-driven router
needed. The ROAD approach is not really achieve the same minimum number of tracks
concern with determining routability of the per channel on all benchmarks. VPR timing-
given layout as its ultimate goal is to route the driven router requires 5% more tracks than
circuit with minimum track count, so if it can’t ROAD and SAT router requires about 25%
find a solution with a given track count it will more than ROAD.
add one track to the channel and it will continue These results are heavily correlated with the
routing the design. cost function implemented inside the routers.
The overall unroutability detection classification VPR has two different kind of cost function for
of the routers is then SAT, VPR and ROAD. each of its routers, being the routability-driven
algorithm concerned with achieving minimum
• Running time track count. ROAD does not really have a cost
The fastest router is the ROAD algorithm, function oriented toward minimum track count,
with the enhancement perform to the basic it is more search-base pruning oriented; but the
B&R algorithm ROAD is able to route in fact that ROAD looks for the clique
average 2 times faster than VPR routability- interconnectivity and no tracks are added to the
driven router. To perform this comparison channel unless it is mandatory makes this
VPR is run as a combined global-detailed router to always find the minimum track count.
router and as global only router, the SAT doesn’t have any cost function
difference of these values is assume to be implemented in its algorithm and its search is
the time spent in the detail routing for VPR, completely “flat”, as it only looks for a solution
and the comparison is against this value. regardless of its optimality.
The overall minimum track count classification
VPR timing-driven router is [20] 2 to 10 times faster of the routers is then VPR routability-driven,
than VPR routability-driven router, so in average VPR ROAD, VPR timing-driven and SAT.
timing-driven router is still faster than ROAD by 1 to 5 • Memory utilization
times faster. Even though memory utilization has not been
To establish the running time of SAT we compare addressed as an objective for any of the routers
the benchmarks provided on [6] and [7] (see table described, a small comparison can be
below), only two circuits can be compared ALU2 and performed by realizing the correlation between
VDA. A straight forward comparison of these circuits memory and running time of an algorithm.
is misleading and concludes that VPR is 3 times faster Thus VPR timing-driven algorithm being the
than SAT and for those specific circuits ROAD is 18 fastest has the least memory requirements,
times faster than SAT. Such comparison is false as ROAD with its search-based pruning and a
ROAD and VPR benchmark were run on a 550Mhz running time faster than VPR routability-driven
Pentium III and SAT was run on a 170Mhz Ultra 5 has second least memory requirements, VPR
Sparc, so a more fair comparison speeds up SAT results routability-driven is the third and the non-
by a factor of 3 yielding that SAT and VPR have directed search of SAT that needs to solve
approximately the same running time and ROAD is 6 simultaneously all routing constraints has the
times faster for those specific circuits. The latter highest memory requirements.
comparison only gives us an idea of the running time The overall memory utilization classification of
performance of SAT and more benchmarks needs to be the routers is then VPR timing-driven, ROAD,
compared before a final conclusion on SAT running VPR routability-driven and SAT.
time can be made. • Circuit speed
The only router concerned with this objective is
TABLE 1. COMPARISON FROM [6] AND [7] the VPR timing-driven algorithm, so all the
Ckt name VPR ROAD SAT other approaches will show slower performance
ALU2 8.54 1.41 26.52 (8.84) and higher delays.
VDA 34.13 4.99 98.14 (32.71) It can be seen that different approaches to the same
problem inherently targets different main objectives,
The overall running time classification of the routers is VPR heuristically search for minimum tracks by
then VPR timing-driven, ROAD, VPR routability- minimizing the net ordering problem while its modified
driven and SAT. version looks for fast running times allowing slightly

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


32

higher track counts, B&R finds the optimal minimum VPR; and that more research has to be done in the SAT
track by overcoming the net ordering problem, and approach to make it competitive, maybe adding a
SAT can formally determine which circuits are specialize metric cost on the search to reduce the
unroutable. number of tracks and speed the running time.
The approaches presented here are not the only
ones, and there are many more that can outperform the
VI. SUMMARY approaches described in a particular objective. For
example Just In Time routing [13] intended to place the
It has been shown that FPGA routing is a complex routing algorithm in hardware so it can reroute and
problem and even thought historically it has been reprogram another FPGAs achieves outstanding
underestimated by VLSI designers, due to its fixed running times and very few memory requirements with
routing resources that should make the routing easier, it the penalty of requiring more tracks; another approach
has been all the contrary. Fixed routing resources differentiate the so far combined delay minimization
makes routing in FPGA a much harder problem since and wirelenght optimization [11], by using Steiner
multiple and all constraints have to be satisfied to graphs to obtain better circuit performance, and some
successfully route the design. others are capable of detecting routability as early as in
Most approaches to FPGA routing have been based the first iteration of the pathfinder router using some
on the divide and conquer paradigm, in which the heuristic [9] [14].
routing has been split in two phases, a global route that The different approaches in routing and the different
sparse the track requirement throughout the FPGA and performs obtained do not mean that research on some
a detail router that does the actual assignment of trends should be pruned as they have not outperform
routing resources. From these two phases the detail any previous router. All research in the area enlighten
router is a much harder problem as it has to consider in the routing problem from a different perspective thus
deep the architecture of the FPGA. For the detail router, helping to refine or to improve existing algorithms or
the maze algorithm has been the starting point and even to combine some of them.
different modifications and improvements have been As architectures evolve and the logic inside each
done to the basic algorithm with the A* search and the logic block grows, routing resources will be more
pathfinder algorithm, and finally this approach has scarce and routing will be more constraint, therefore
reach is state of the art with the VPR tool. FPGA routing will always be an active topic of
Of course the maze & rip-out & reroute algorithm research throughout the life of FPGA technology.
used by VPR has not been the only approach to the
routing problem, and two other different approaches
have been shown, the ROAD router based on bump & REFERENCES
refit paradigm, and the SAT router based on the
satisfiability constraints of an equivalent Boolean [1] Actel Inc, Axcelerator family FPGA, 2004.
function of the routing problem. The summary [2] Altera Corporation, Stratix II Device Handbook,
performance of these three different approaches can be Volume 1, Jul 2004.
seen in the next table. [3] Electronics Weekly; ABI/INFORM Trade &
Industry, Feb 25, 2004, pg. 12
TABLE 2. ROUTER COMPARISON [4] Mo, A. Tabbara and R. Brayton, A Force-Directed
VPR VPR ROAD SAT Maze Router, Department of EECS, University of
Timing routability California at Berkeley.
-driven -driven [5] Nam, K Sakallah and R. Rutenbar Satisfiability-
Unroutabilit Heurist Heuristic None Formal Based Layout Revisited: Detailed Routing of
y detection ic Prove Complex FPGAs Via Search-Based Boolean SAT,
Running Best Good Very Bad
ACM/SIGDA International Symp on FPGA, 1999,
time Good
pp. 167-175.
Minimum Good Best Best Bad
track count [6] G.Nam, K. Sakallah and R. Rutenbar, A New
Memory Best Good Very Bad FPGA Detailed Routing Approach Via Search-
requirement Good Based Boolean Satisfiability, IEEE Transactions
on computer-aided design of integrated circuits
and systems, vol. 21, no. 6, june 2002.
The above comparison shows that ROAD approach
is a nice trade off between the two different flavors of

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006


33

[7] Arslan and S. Dutt, ROAD: An Order-Impervious Emerging Trends in VLSI Systems Design, ISVLSI
Complete Detailed Router for FPGAs, ICCD 2003 2004.
pp.350-356. [24] Xilinx Inc, Virtex II Data Book, 2004.
[8] H. Arslan and S. Dutt, An Effective Hop-Based [25] Y. Changy, S. Thakury, K. Zhuz, and D. Wong, A
Detailed Router for FPGAs for Optimizing Track New Global Routing Algorithm for FPGAs, ACM
Usage and Circuit Performance. GLSVLSI 2004. 1994.
[9] J.Swartz, V. Betz and J. Rose, A fast routability-
driven router for FPGAs, In 6th International
Workshop on FPGAs, 1998, Monterrey, CA.
[10] L. McMurchie and C. Ebeling, PathFinder: A
negotiation-Based Performance-Driven Router for
FPGAs, ACM FPGA Symp. 1997, pp. 111-117.
[11] M.Alexander and G. Robins, New Performance-
driven FPGA routing algorithms, 32nd ACM/IEEE
Design Automation Conference, San Francisco,
CA, 1995, pp. 562-567.
[12] R. Jayaraman,Physical Design For FPGAs, ISPD
2001, Sonoma, CA.
[13] R. Lysecky, F. Vahid and S. Tan, Dynamic FPGA
Routing for Just-in-Time FPGA Compilation,
DAC 2004, San Diego, CA.
[14] R. Tessier, Fast Place and Route Approaches for
FPGAs, PhD thesis, Massachusett Institute of
Technology, 1999.
[15] R.Tessier, Negotiated A* Routing for FPGAs, in
Proceedings of the Fifth Canadian Workshop on
Field-Programmable Devices, 1998.
[16] R. Wood and R. Rutenbar FPGA Routing and
Routability EstimationVia Boolean Satisfiability,
ACM 1997, Monterey CA.
[17] S. Brown, Routing Algorithm and Architectures
for Field programmable Gate Arrays, PhD thesis,
Department of Electrical Engineering, University
of Toronto. 1992.
[18] S. Hauck and A. Agarwal, Software Technologies
for Reconfigurable Systems, IEEE Computer,
1997.
[19] Betz and J. Rose, VPR: A New Packing,
Placement, and Routing Tool for FPGA Research,
in Proceedings, Field-Programmable Logic,
Oxford, U.K. , 1997.
[20] Betz, VPR and T-VPack User’s manual – version
4.30, 2000.
[21] V. Betz and J. Rose, FPGA Routing Architecture:
Segmentation and Buffering to Optimize Speed
and Density.
[22] V. Betz, J. Rose and A. Marquardt, Architecture
and CAD for deep-submicron FPGAs, Kluwer
Academic Publishers, ISBN 0-7923-8460-1, 1999.
[23] V. Gudise and G. Venayagamoorthy, FPGA
Placement and Routing Using Particle Swarm
Optimization, in Proceedings of the IEEE
Computer Society Annual Symposium on VLSI

ELECTRÓNICA - UNMSM Nº17, Agosto del 2006

You might also like