Routing
Routing
1
The clock net is not considered here as it is usually routed via a dedicated
routing network in commercial FPGAs
2
The switches can be pass transistors or multiplexers
B. Switch boxes:
C. Single-length lines.
D. Double-length lines.
logic blocks, and making sure that no more connections • The wire segments span only one logic block
are made through a region than there are resources to before terminating. This means that all
support them. Thus the router must consider the interconnections have to pass as many C boxes and
congestion of signals in a channel, and through multiple S boxes as logic blocks there are between the two
iterations rip out and reroute those congested areas and connecting points.
wires. This search of possible connections to route the Commercial FPGAs have double-length and long
placed logic blocks is not ensured to be feasible and it wires to speed up this kind of connections, and
is possible that after a given number of iterations, 40 avoid congesting the C and S boxes.
for example, the circuit can’t still be routed and the
placement has to be redone. Therefore, together with
the routing algorithm a routability detection algorithm
is clearly desirable to avoid long routing iterations on
designs that eventually will be determined to be
unroutable.
3
FPGA market share research by www.rhk.com/rhk/research and
4
www.icinsight.com There is no polynomial algorithm that can solve the problem.
global router will break the net into a set of two- different results. For i.e.: if the order in which
terminal5 connections and route each set independently. the two nets are routed in figure 5 is reversed, a
The global router considers for each connection better solution is found.
multiple ways of routing it and chooses the one that
passes through the least congested routing channels. By
keeping track of the usage of each routing channel,
congestion is avoided; and the principal objective of the
global router, balancing the usage of the routing
channels, is achieved.
Once all connections have been coarse routed, the
solution is optimized by ripping up and re-routing each
connection a small number of times. After that, the
final solution is passed to the detailed router.
B. Detail routing
fi = (1 − α) × (fi−1 + ci) + α × di + rd
Fig 5. The Maze router wavefront6
Domains with lower congestion will have a lower
• A* Search Routing rank, rd, thus promoting routing in less congested
The maze routing is a special case of the A* domains first.
routing. The A* routing allows to tune the path The Pathfinder
search from a breadth-first search algorithm into a
shorter depth-first search algorithm. The BFS is an The pathfinder algorithm is based on the maze
exhaustive search that consider all possible paths router, but speeds up the algorithm by routing every
and will find the best path if there is any but has connection on a free obstacle environment and allowing
the drawback that it can be slow; meanwhile, the routing resources to be overused.
DFS may not find the minimum cost path but can After a single iteration of the algorithm, all nets are
be fast. See figure 6. routed once as if they were the only connection to be
Weighting a scaling factor α between 0 and 1 the routed; and the cost of using every resource is
A* routing tunes the search from BFS to DFS. The calculated according to its demand. The cost function
cost function used to evaluate the directed graph for implemented by the pathfinder is [10]:
each node i is [15]:
fi = (1 + hn*hfac) × (1 + pn*pfac) + bn,n+1
fi = (1 − α) × (fi−1 + ci) + α × di
where bn,n+1 is the penalty of bending the wire, pn is
Where ci is the node cost and indicates the current the cost of using a specific wire, hn is the history that
usage of the node and it is used to penalize nodes keeps track of the usage of the wire during previous
occupied by previous routes; fi−1 is the total cost of iterations; and, hfac and pfac are the respective weighting
the previous path, and di is the estimated cost of the factors.
path from the node i to the destination. Subsequent iterations rip up and re-route all nets,
and the process goes on until no overuse of routing
resources exist. This process of ripping out and re-
routing every net allows the pathfinder algorithm to
minimize the net ordering problem of the maze routing.
router the type of switch boxes for the FPGA can be dynamically inside the iteration of the algorithm, when
chosen to be [20]: planar, Wilton or universal; different routing every net; and the other one is computed once
length of wires can be defined, input dogleg pins can be at the beginning of each iteration.
allowed or disallowed; and the parameters of the cost
function can be modified. The VPR router can perform VPR’s timing-driven router
a global routing or a combined global-detailed routing;
being the VPR combined router able to change the The objective of this algorithm is to increase
current global routing configuration when it can not hardware circuit speed. To do this, it adds an Elmore
easily find a detail routing solution [6]. delay model to the function cost, so the routing gives
This router is based on a modified version of the preference to those solutions with minimum delay. To
Pathfinder algorithm, and it can run in two different set an upper bound on delay, this algorithm starts
flavors to target two different main objectives: routing the nets with most distant connections first. The
imposed ordering on routing produces suboptimal track
• VPR’s routability-driven router counts, and faster results.
Another modification common to both approaches is
The primary objective of this algorithm is routing a that for multiple output nets the maze wavefront
design successfully with minimum track count. For expansion is modified. As mentioned before the global
this, the routability-driven incorporates a modified route breaks all n terminal net into n-1 two-terminal
routing cost model as show below [22]: nets, and it performs n-1 iterations of the wavefront
expansion to connect the nets. The normal maze router
costn = bn*hn*pn + bendn,m empties the wavefront expansion for each iteration,
meanwhile the VPR router does not empty the current
where bn is the base cost, usually 1 or 0.95 for most wavefront, it adds all the routing resource segments
routing resources and 0 for sinks, the latter is to prevent required to connect the reached terminal to the
the algorithm to keep searching for possible wavefront with a cost of 0, and it continue expanding
connections if the sink can already be reached. Note normally; therefore, the next terminal will be reached
that congestion in the sink is not possible as it will much more quickly than if the entire wavefront
mean that the design requires an input to be driven by expansion would have been started from scratch. Figure
two different sources, therefore a base cost of 0 for the 7 shows a) a wavefront expansion; a normal maze
sink improves the running time of the algorithm router when reaches a terminal net empties the
without affecting its quality. bendn,m penalizes bending wavefront and restart it from the beginning, as shown
the wire when routing and it is only taken into account in b), meanwhile the VPR router adds the last net found
by the global routing. pn is the present congestion to the wavefront with a cost of 0 and continues
penalty and its value is the difference between the expanding it, see c).
number of nets using a channel and the number of
wires that can be placed on that channel. It is call
present congestion because its value is updated within
an iteration of the algorithm to avoid overusing a
channel. Its cost is given by:
B. ROAD: An Order-Impervious Optimal Detailed clique while routing a path φ, and the maximum
Router for FPGAs tracks per channel available in the FPGA is t,
then if bumping the net produces (k+m) > t the
The routers described so far have been based on the depth first search is prune as the solution will be
rip-out and reroute paradigm. The ROAD router is unfeasible. The number m in the inequality
based instead in the bump and refit B&R paradigm. above is the number of unusable tracks for the
The main idea of this paradigm is to modify the nets clique, this is the m nets in the clique whose
already routed when a new conflicting net is found. It adjacent are ancestor of the path φ; and
starts by routing the nets one by one until a conflict is therefore they can’t be used to route the actual
found, if there are other tracks that can successfully path as they will cause a cyclic conflict.
route the conflicting net, the problem is solved and the
next net is routed. In the case that all routing resources • Lookahead transition cost functions: This is a
have been used and no other tracks are available, the cost function that measures which transition of
router bumps all tracks conflicting with the resource T →T
the net ni on track Tj to the track Tk , ni j k , is
needed, and then all those unrouted net segments are
refitted, as at least one of them wont be able to fit more likely to succed so fewer searches are
properly in the design, this would cause the unfitted performed and backtracked. Two principal
track to be routed through another channel possibly factors considered on the cost function are the
bumping another tracks. In this way, the B&R total wirelength of the bumped nets and the
paradigm makes net congested areas to be depopulated. flexibility of the bumped nets. This flexibility
Therefore in the B&R algorithm when a track is means that if there is a solution with one net of
bumped, the bumped track can be propagated wirelength 9, and there is another solution
producing a path with many bump searches until a produced by 3 smaller nets each of wirelength 3,
vacant resource or a spare routing area is found; and if the set of smaller nets will be more likely to
all possible paths are exhausted and no solution exists produce a feasible solution as it is easier to
or a cycle is detected (a previous bump in the path is move smaller nets than a big one. The cost
revisited) a backtrack to the initial conflicting resource function will be [7]:
is done and another track is bumped instead.
Even thought this represents no problem for an ∑ l (n ) j
incremental router, in which some nets have been T →T
TC1 (ni j k )=
n j ∈ad j Tk ( ni )
The ROAD detail router based on B&R is routing solution exists with the given placement and
implemented such as if no solution exits (the circuit is global routing topology.
unroutable) one track is added to the channel so the It is important to note that the equations are written
router can find the minimum track solution for the in the CNF form and they are not represented as a
given placement and global assignment. This router is BDD7. A BDD satisfiability approach explicitly
said to be independent of the net order in which it represents all possible satisfying assignments as paths
routes because bumping previous routes is equivalent to through the BDD directed acyclic graph, and any path
reversing previous routing decisions, or changing the found in this graph to 1 is said to satisfy the problem,
order in which the nets are routed. and if such a path doesn’t exist it is said to be
unsatisfiable. The problem with BDDs is that without a
good variable ordering the BDD graph can become
C. Routing Approach Via Search-Based Boolean memory-unmanageable in intermediate computations;
Satisfiability and finding a good variable ordering is an NP-complete
problem.
This approach addresses the routing problem Instead of BDDs the SAT instances created from the
completely different, transforming the complex routing constraint Boolean function are solved using
interactions of the routing constraints as a Boolean GRASP [5][6][16], a generic search algorithm for the
function. It has two main virtues: all paths for all nets satisfiability problem. GRASP is based on search
are considered simultaneously as the routing constraints techniques that analyze conflicts on the graph and base
are a set of equations that need to be satisfied on this it can prune large portions of the search space.
simultaneously; and the Boolean function is satisfiable The analysis yields the causes that produce conflicts,
if and only if the design is routable. The latter means and this information is recorded to recognize similar
that if there is no satisfying assignment for the Boolean conflicts on the graph and assignments that are
function, it is proven that no routing solution exist for necessary for a solution to be found. This means that
the design, for the given placement and global route GRASP searches to find one satisfying assignment, and
assignment. must search exhaustively to conclude that no satisfying
The Satisfiability-Based Detailed Router (SDR) assignments exist; a trade-off of more search-time for
takes as input the connections assigned for a global manageable memory sizes.
router and produces two types of constraints [6]: The
connectivity constraints, ensure that a net has a
continuous path between the net terminals; these V. CONTRAST OF THE APPROACHES
constraints form a list of tracks and C boxes that can
possibly form the path in channel segment. And the To thoroughly understand the differences among the
exclusivity constraints, ensure that different nets are routers previously described, it is necessary to compare
assigned to different tracks in a channel so no overlap them based on the ideal objectives of any router:
occurs. In the intersection of horizontal and vertical
channels (S boxes) the constraints of a same net are • Unroutability detection
connected by the logic operation AND. Only the SAT approach is able to prove that the
Once these constraints have been formulated they layout is unroutable for a given placement and
are transformed and encoded into Boolean equations global route assignment, though this conclusion
represented in Conjunctive Normal Form – CNF can take long, as it has to determine that the
clauses. The conjunction of all connectivity and SAT problem is unsatisfiable and this means
exclusivity constraints for all nets form the routing that all possible search combination have to be
constraint Boolean function, which models the routing done. VPR can not determine routability and it
problem as a whole. The Boolean SAT solver takes as stops searching after 30 iterations assuming by
input the routability function and tries to satisfy the then that the circuit is unroutable, during these
assignments or to prove that the given layout is not iterations VPR global-detailed router can
satisfiable. If the layout is satisfiable, the solution is an modify the global assignment if it simplifies the
assignment of binary values 1 or 0 to the Boolean detail routing operation. This characteristic
variables which encode the track variables. This allows VPR to find routing solutions that the
information is transformed into an assignment of actual SAT solver determines as unroutable, as the
routing resources (tracks, C boxes and corresponding S SAT solver relies heavily on the global
boxes) to nets which forms the actual FPGA routing
7
solution. If it is not satisfiable, then no legal detailed Binary Decision Diagram is a graph representation for Boolean functions
based on the Shannon expansion.
higher track counts, B&R finds the optimal minimum VPR; and that more research has to be done in the SAT
track by overcoming the net ordering problem, and approach to make it competitive, maybe adding a
SAT can formally determine which circuits are specialize metric cost on the search to reduce the
unroutable. number of tracks and speed the running time.
The approaches presented here are not the only
ones, and there are many more that can outperform the
VI. SUMMARY approaches described in a particular objective. For
example Just In Time routing [13] intended to place the
It has been shown that FPGA routing is a complex routing algorithm in hardware so it can reroute and
problem and even thought historically it has been reprogram another FPGAs achieves outstanding
underestimated by VLSI designers, due to its fixed running times and very few memory requirements with
routing resources that should make the routing easier, it the penalty of requiring more tracks; another approach
has been all the contrary. Fixed routing resources differentiate the so far combined delay minimization
makes routing in FPGA a much harder problem since and wirelenght optimization [11], by using Steiner
multiple and all constraints have to be satisfied to graphs to obtain better circuit performance, and some
successfully route the design. others are capable of detecting routability as early as in
Most approaches to FPGA routing have been based the first iteration of the pathfinder router using some
on the divide and conquer paradigm, in which the heuristic [9] [14].
routing has been split in two phases, a global route that The different approaches in routing and the different
sparse the track requirement throughout the FPGA and performs obtained do not mean that research on some
a detail router that does the actual assignment of trends should be pruned as they have not outperform
routing resources. From these two phases the detail any previous router. All research in the area enlighten
router is a much harder problem as it has to consider in the routing problem from a different perspective thus
deep the architecture of the FPGA. For the detail router, helping to refine or to improve existing algorithms or
the maze algorithm has been the starting point and even to combine some of them.
different modifications and improvements have been As architectures evolve and the logic inside each
done to the basic algorithm with the A* search and the logic block grows, routing resources will be more
pathfinder algorithm, and finally this approach has scarce and routing will be more constraint, therefore
reach is state of the art with the VPR tool. FPGA routing will always be an active topic of
Of course the maze & rip-out & reroute algorithm research throughout the life of FPGA technology.
used by VPR has not been the only approach to the
routing problem, and two other different approaches
have been shown, the ROAD router based on bump & REFERENCES
refit paradigm, and the SAT router based on the
satisfiability constraints of an equivalent Boolean [1] Actel Inc, Axcelerator family FPGA, 2004.
function of the routing problem. The summary [2] Altera Corporation, Stratix II Device Handbook,
performance of these three different approaches can be Volume 1, Jul 2004.
seen in the next table. [3] Electronics Weekly; ABI/INFORM Trade &
Industry, Feb 25, 2004, pg. 12
TABLE 2. ROUTER COMPARISON [4] Mo, A. Tabbara and R. Brayton, A Force-Directed
VPR VPR ROAD SAT Maze Router, Department of EECS, University of
Timing routability California at Berkeley.
-driven -driven [5] Nam, K Sakallah and R. Rutenbar Satisfiability-
Unroutabilit Heurist Heuristic None Formal Based Layout Revisited: Detailed Routing of
y detection ic Prove Complex FPGAs Via Search-Based Boolean SAT,
Running Best Good Very Bad
ACM/SIGDA International Symp on FPGA, 1999,
time Good
pp. 167-175.
Minimum Good Best Best Bad
track count [6] G.Nam, K. Sakallah and R. Rutenbar, A New
Memory Best Good Very Bad FPGA Detailed Routing Approach Via Search-
requirement Good Based Boolean Satisfiability, IEEE Transactions
on computer-aided design of integrated circuits
and systems, vol. 21, no. 6, june 2002.
The above comparison shows that ROAD approach
is a nice trade off between the two different flavors of
[7] Arslan and S. Dutt, ROAD: An Order-Impervious Emerging Trends in VLSI Systems Design, ISVLSI
Complete Detailed Router for FPGAs, ICCD 2003 2004.
pp.350-356. [24] Xilinx Inc, Virtex II Data Book, 2004.
[8] H. Arslan and S. Dutt, An Effective Hop-Based [25] Y. Changy, S. Thakury, K. Zhuz, and D. Wong, A
Detailed Router for FPGAs for Optimizing Track New Global Routing Algorithm for FPGAs, ACM
Usage and Circuit Performance. GLSVLSI 2004. 1994.
[9] J.Swartz, V. Betz and J. Rose, A fast routability-
driven router for FPGAs, In 6th International
Workshop on FPGAs, 1998, Monterrey, CA.
[10] L. McMurchie and C. Ebeling, PathFinder: A
negotiation-Based Performance-Driven Router for
FPGAs, ACM FPGA Symp. 1997, pp. 111-117.
[11] M.Alexander and G. Robins, New Performance-
driven FPGA routing algorithms, 32nd ACM/IEEE
Design Automation Conference, San Francisco,
CA, 1995, pp. 562-567.
[12] R. Jayaraman,Physical Design For FPGAs, ISPD
2001, Sonoma, CA.
[13] R. Lysecky, F. Vahid and S. Tan, Dynamic FPGA
Routing for Just-in-Time FPGA Compilation,
DAC 2004, San Diego, CA.
[14] R. Tessier, Fast Place and Route Approaches for
FPGAs, PhD thesis, Massachusett Institute of
Technology, 1999.
[15] R.Tessier, Negotiated A* Routing for FPGAs, in
Proceedings of the Fifth Canadian Workshop on
Field-Programmable Devices, 1998.
[16] R. Wood and R. Rutenbar FPGA Routing and
Routability EstimationVia Boolean Satisfiability,
ACM 1997, Monterey CA.
[17] S. Brown, Routing Algorithm and Architectures
for Field programmable Gate Arrays, PhD thesis,
Department of Electrical Engineering, University
of Toronto. 1992.
[18] S. Hauck and A. Agarwal, Software Technologies
for Reconfigurable Systems, IEEE Computer,
1997.
[19] Betz and J. Rose, VPR: A New Packing,
Placement, and Routing Tool for FPGA Research,
in Proceedings, Field-Programmable Logic,
Oxford, U.K. , 1997.
[20] Betz, VPR and T-VPack User’s manual – version
4.30, 2000.
[21] V. Betz and J. Rose, FPGA Routing Architecture:
Segmentation and Buffering to Optimize Speed
and Density.
[22] V. Betz, J. Rose and A. Marquardt, Architecture
and CAD for deep-submicron FPGAs, Kluwer
Academic Publishers, ISBN 0-7923-8460-1, 1999.
[23] V. Gudise and G. Venayagamoorthy, FPGA
Placement and Routing Using Particle Swarm
Optimization, in Proceedings of the IEEE
Computer Society Annual Symposium on VLSI