1
A Min-Cost Flow Based Detailed Router for FPGAs
Seokjin Lee*, Yongseok Cheon*, D. F. Wong+
*The University of Texas at Austin+ University of Illinois at Urbana-Champaign
2
Outline Overview Introduction
FPGA Architecture, Routing resources
Problem Definitions Algorithm Description
Min-cost flow based router Lagrangian relaxation
Experimental Results Conclusion
3
Overview FlowRoute - A congestion-driven detailed
router Finds a feasible routing with minimum
total delay for a given placed netlist. Routes all the nets connected to a LUT
simultaneously by a min-cost flow algorithm
Iterative refinement with Lagrangian relaxation
4
FPGA Architecture Logic modules
Implements logic functions
LUTs, flip-flops Routing resources
Wire segments Programmable
switches I/O modules
L
S
wiresegments
logicmodule
I/Omodule
programmableswitch
L L
L L L
LLL
S S
S S S
S S S
<A typical FPGA architecture>
5
FPGA Routing Resources Prefabricated
routing resources Congestion
constraints Limited Routability
High RC delays and large area of switches
a b
cd
ef
g h
L2 L4
L1 L3
7
Graph Representation
Routing resource graph G (V , E) V : I/O pins of logic modules, wire segments E : feasible connections between the nodes Routing problem: Finding vertex disjoint trees T={T1,…,Tn}
3
2
8
7 13
16
10
9
a b
c
d
g h
e
f
a b
c
d
e
f
g h
12
34
56
7
8
910
1314
1112
1516
L1
L2
L3
L4
8
Problem Definitions The Routing for One LUT
(ROL) Problem Find routes for all the net
segments connected to a LUT
Using equivalence of input pins of a LUT
FPGA detailed routing problem
Find routes for all the nets.
Soving ROL problem for all LUTs in an FPGA
9
Flow Network for ROL Construct Gf(Vf, Ef) from G(V, E) Vf = V U {s, s1, s2, …, sn, t}, si : subsource Ef = E U Es U Es’ U Et
Es = {(s, si)| i = 1, …, n}, Es’ = {(si, v)|i = 1, …, n, v in Ti}
Et = {(pi, t)| pi in Sp} Edge capacity
rf(e) = 1 for all e in Ef
Node capacity rf(v) = 1, for all v in Vf – {s, t}
Cost: cf(e) = c(e) for e in E, cf(v) = c(v) for v in V
10
Flow Network (example)
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
s1
s2
s3
s4
s t
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
11
ROL_NF (example)
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
s1
s2
s3
s4
s t
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
(1,0)
13
ROL_NF for ROL A min-cost max-flow f* in Gf
corresponds to a solution to ROL with minimum total delay cost.
If |f*|=n, all the net segments connected to a LUT
ROL_NF exactly solves ROL problem in polynomial time
14
ROL_NF Algorithm ROL_NF 1. Construct Gf (Vf, Ef) 2. Assign costs and capacities 3. Run min-cost max-flow
algorithm on Gf (Vf, Ef) 4. Derive routes for the nets
from the computed flow
15
Lagrangian Relaxation General technique for solving
optimization problems with difficult constraints
Original optimization problem is divided into subproblems
Each subproblem is solved by repetitive application of ROL_NF
Lagrangian multipliers guide the router
16
Lagrangian Relaxation
kk b)(g
b)(g
b)(g
f
x
x
x
x
...
s.t
)( min
22
11
))((...
))((
))(( )( min
222
111
kkk bg
bg
bgf
x
x
xx
Original problem Lagrangian subproblem
k ,...,, update 21)}])(()([min{max0
bλ
xx gf
17
LR for FPGA detailed routing
Original problem Lagrangian subproblem
λ updatemax{min L(x)}
1
.
min,
k ik
ki iki
x
ts
xc
Vi
k ikik ii iki xxc
xL
)1(min{
)(min
18
Solving Lagrangian Subproblem
By rearranging terms, L(x) = ki(ci + i)xik – ii LS’ = min{ki(ci + i)xik}
ROL_NF solves LS’ Set (ci + i) as a cost of i ci = di (delay term) * qi (congestion
term)
19
Updating Lagrangian Multipliers
)}1(,0max{1 k ikr
ri
ri x
econvergenc
lim 0lim1
r
ii
rr
r
Subgradient Method
r : stepsize
20
FlowRoute
1. Initialize 2. For each lk in L do
3. Rip up nets connected to lk4. Call ROL_NF 5. Update costs and reset capacities6. Update 7. Repeat Step 2 – 6 until no shared
resource exists
21
Experimental Results FPGA model used
Symmetrical-array-based FPGA Each logic block contains four 4-input LUTs and
flip-flops Switch connections: Fs = 3, Fc = W Fs: number of connections per wire entering the
switch box Fc : number of tracks to which each logic block pin can connect W : number of tracks in a channel
22
Experimental Results Tested on MCNC benchmark circuits Results compared with VPR router Used smaller number of routing tracks Improvement on critical path delay up
to 28.9 % (average 14.1%) Total wire length reduced (ave. 8.3%)
23
Experimental Results Channel width and delay comparison
Circuits LUTs/ FFs
Number of tracks
Critical Path Delay
VPR FR VPR FR
9symml 104 10 9 26.7 25.1 (6.0%)
term1 128 13 12 25.3 23.3 (7.9%)
apex7 252 13 13 26.1 21.3 (18.4%)
example2
404 17 16 29.6 23.2 (21.6%)
alu2 224 17 17 54.7 49.2 (10.1%)
Too-lrg 208 19 19 31.2 30.2 (3.2%)
vda 456 23 23 46.5 38.9 (16.3%)
alu4 1560 33 33 143.4 122.5 (14.6%)
s298 1960 27 27 274.0 194.7 (28.9%)
24
Conclusion A new congestion-driven routing
algorithm for FPGAs Find a feasible routing with
minimum total delay – expects reduced critical path delay
Can be used in multiple stage routing scheme
Top Related