Post on 31-Dec-2015
A Network Flow Approach to Timing Driven IncrementalPlacement for ASICs 1048576
Shantanu Dutt Huan Ren Fenghua Yuan and Vishal Suthar
Dept of Electrical and Computer Engineering
University of Illinois Chicago
Outline
Motivation amp prior work General methodology of FlowPlace Net delay model TD analytical global placement TD network flow based detailed placer Benchmarks Experimental results Conclusions
Motivation Placement in high performance designs
Has large effect on performance metrics eg timing power Fast timing closure is a major but often hard-to-realize goal Need to meet several metrics at the same time
Incremental timing-driven placement Initial placement improve timing incrementally on crit paths More accurate timing information can be acquired from the initi
al placement Minimize the affect to other metrics in initial placementmdashconver
gence is a byproduct Also important for ECO applications
Prior Work Existing timing driven placement
Path-based minimize the critical paths directly Pros timing is essentially path-basedCons excessive number of paths
Net-based transform timing into net-weights or net-budgetsPros low complexity flexibleCons often ignores path information has a convergence problem
Net-based approach is the most common method Kahng et al (ISPDrsquo02)
Minimize the max weighted net delay using LP with net weight based on the max path delay violation through the net
All paths meet constraints simultaneously Can fit into a standard WL-driven top-down design flow
Yang et al (ICCADrsquo02)New slack allocation approach which assigns more slack to nets
with larger estimated WL and fanoutMinimizing total net delay violation using simulated annealingAchieves a more efficient slack usage in final placement
Prior Work (cont)
Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)
Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)
Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation
Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p
art of the delay accurately
Incremental TD placement
Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results
Nw flow based detailed placement
i j
i j
n p limitn p
d d d
Our Goals amp Methodology Initial placed circuit
STA amp Determine critical node set (moveC)
TD nw-flow based detailedplacement (TIF) On moveC
TD analytical global placement (TAN)on moveC
New placement w improvedperformance
bull Goalsbull Accurate pre-route delay est
bull Targeted global amp detailed TD re-placement of critical amp near-critical paths
bull Minimal effect on the rest of the circuit
bull Fast
WL and Pre-Route Delay Model
( ) ( ) ( )i j
j i c i cu n
L n x x y y
3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C
ud (xd yd)up (xp yp)
uq (xq yq)
ui (xi yi) centroidC (xc yc)
Star graph model
ud (xd yd)
ui (xi yi)
up (xp yp)
uq (xq yq)
ld i2
ld i
Delay model
WL calculationWe use a star graph model to calculate WL
Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C
22 ( )
2i j d i d i g
rcD u n l rl C
Driver node driving load capacitance
Self interconnect delay
Self interconnect seeing other interconnect amp load capacitance
of Ctotal
(1-of Ctotal
Fidelity of our model The future model is still under development which modeling nets with multiple star structures
Best results for
Circuit Mac64 Matrix Vp2 Mac32 error
Routed delay 34 38 43 67 0
Our curr model 40 51 62 82 295
Multi-star model 35 49 52 73 155
TD Analytical Global Placement (TAN)
A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective
function
Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation
A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step
1 2 3( )
( ) ( ) ( ) ( )i j
c j j i iu critical n
D n D n D u D u
Net w 2 critical pathsthrough it
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Outline
Motivation amp prior work General methodology of FlowPlace Net delay model TD analytical global placement TD network flow based detailed placer Benchmarks Experimental results Conclusions
Motivation Placement in high performance designs
Has large effect on performance metrics eg timing power Fast timing closure is a major but often hard-to-realize goal Need to meet several metrics at the same time
Incremental timing-driven placement Initial placement improve timing incrementally on crit paths More accurate timing information can be acquired from the initi
al placement Minimize the affect to other metrics in initial placementmdashconver
gence is a byproduct Also important for ECO applications
Prior Work Existing timing driven placement
Path-based minimize the critical paths directly Pros timing is essentially path-basedCons excessive number of paths
Net-based transform timing into net-weights or net-budgetsPros low complexity flexibleCons often ignores path information has a convergence problem
Net-based approach is the most common method Kahng et al (ISPDrsquo02)
Minimize the max weighted net delay using LP with net weight based on the max path delay violation through the net
All paths meet constraints simultaneously Can fit into a standard WL-driven top-down design flow
Yang et al (ICCADrsquo02)New slack allocation approach which assigns more slack to nets
with larger estimated WL and fanoutMinimizing total net delay violation using simulated annealingAchieves a more efficient slack usage in final placement
Prior Work (cont)
Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)
Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)
Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation
Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p
art of the delay accurately
Incremental TD placement
Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results
Nw flow based detailed placement
i j
i j
n p limitn p
d d d
Our Goals amp Methodology Initial placed circuit
STA amp Determine critical node set (moveC)
TD nw-flow based detailedplacement (TIF) On moveC
TD analytical global placement (TAN)on moveC
New placement w improvedperformance
bull Goalsbull Accurate pre-route delay est
bull Targeted global amp detailed TD re-placement of critical amp near-critical paths
bull Minimal effect on the rest of the circuit
bull Fast
WL and Pre-Route Delay Model
( ) ( ) ( )i j
j i c i cu n
L n x x y y
3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C
ud (xd yd)up (xp yp)
uq (xq yq)
ui (xi yi) centroidC (xc yc)
Star graph model
ud (xd yd)
ui (xi yi)
up (xp yp)
uq (xq yq)
ld i2
ld i
Delay model
WL calculationWe use a star graph model to calculate WL
Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C
22 ( )
2i j d i d i g
rcD u n l rl C
Driver node driving load capacitance
Self interconnect delay
Self interconnect seeing other interconnect amp load capacitance
of Ctotal
(1-of Ctotal
Fidelity of our model The future model is still under development which modeling nets with multiple star structures
Best results for
Circuit Mac64 Matrix Vp2 Mac32 error
Routed delay 34 38 43 67 0
Our curr model 40 51 62 82 295
Multi-star model 35 49 52 73 155
TD Analytical Global Placement (TAN)
A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective
function
Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation
A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step
1 2 3( )
( ) ( ) ( ) ( )i j
c j j i iu critical n
D n D n D u D u
Net w 2 critical pathsthrough it
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Motivation Placement in high performance designs
Has large effect on performance metrics eg timing power Fast timing closure is a major but often hard-to-realize goal Need to meet several metrics at the same time
Incremental timing-driven placement Initial placement improve timing incrementally on crit paths More accurate timing information can be acquired from the initi
al placement Minimize the affect to other metrics in initial placementmdashconver
gence is a byproduct Also important for ECO applications
Prior Work Existing timing driven placement
Path-based minimize the critical paths directly Pros timing is essentially path-basedCons excessive number of paths
Net-based transform timing into net-weights or net-budgetsPros low complexity flexibleCons often ignores path information has a convergence problem
Net-based approach is the most common method Kahng et al (ISPDrsquo02)
Minimize the max weighted net delay using LP with net weight based on the max path delay violation through the net
All paths meet constraints simultaneously Can fit into a standard WL-driven top-down design flow
Yang et al (ICCADrsquo02)New slack allocation approach which assigns more slack to nets
with larger estimated WL and fanoutMinimizing total net delay violation using simulated annealingAchieves a more efficient slack usage in final placement
Prior Work (cont)
Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)
Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)
Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation
Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p
art of the delay accurately
Incremental TD placement
Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results
Nw flow based detailed placement
i j
i j
n p limitn p
d d d
Our Goals amp Methodology Initial placed circuit
STA amp Determine critical node set (moveC)
TD nw-flow based detailedplacement (TIF) On moveC
TD analytical global placement (TAN)on moveC
New placement w improvedperformance
bull Goalsbull Accurate pre-route delay est
bull Targeted global amp detailed TD re-placement of critical amp near-critical paths
bull Minimal effect on the rest of the circuit
bull Fast
WL and Pre-Route Delay Model
( ) ( ) ( )i j
j i c i cu n
L n x x y y
3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C
ud (xd yd)up (xp yp)
uq (xq yq)
ui (xi yi) centroidC (xc yc)
Star graph model
ud (xd yd)
ui (xi yi)
up (xp yp)
uq (xq yq)
ld i2
ld i
Delay model
WL calculationWe use a star graph model to calculate WL
Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C
22 ( )
2i j d i d i g
rcD u n l rl C
Driver node driving load capacitance
Self interconnect delay
Self interconnect seeing other interconnect amp load capacitance
of Ctotal
(1-of Ctotal
Fidelity of our model The future model is still under development which modeling nets with multiple star structures
Best results for
Circuit Mac64 Matrix Vp2 Mac32 error
Routed delay 34 38 43 67 0
Our curr model 40 51 62 82 295
Multi-star model 35 49 52 73 155
TD Analytical Global Placement (TAN)
A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective
function
Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation
A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step
1 2 3( )
( ) ( ) ( ) ( )i j
c j j i iu critical n
D n D n D u D u
Net w 2 critical pathsthrough it
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Prior Work Existing timing driven placement
Path-based minimize the critical paths directly Pros timing is essentially path-basedCons excessive number of paths
Net-based transform timing into net-weights or net-budgetsPros low complexity flexibleCons often ignores path information has a convergence problem
Net-based approach is the most common method Kahng et al (ISPDrsquo02)
Minimize the max weighted net delay using LP with net weight based on the max path delay violation through the net
All paths meet constraints simultaneously Can fit into a standard WL-driven top-down design flow
Yang et al (ICCADrsquo02)New slack allocation approach which assigns more slack to nets
with larger estimated WL and fanoutMinimizing total net delay violation using simulated annealingAchieves a more efficient slack usage in final placement
Prior Work (cont)
Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)
Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)
Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation
Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p
art of the delay accurately
Incremental TD placement
Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results
Nw flow based detailed placement
i j
i j
n p limitn p
d d d
Our Goals amp Methodology Initial placed circuit
STA amp Determine critical node set (moveC)
TD nw-flow based detailedplacement (TIF) On moveC
TD analytical global placement (TAN)on moveC
New placement w improvedperformance
bull Goalsbull Accurate pre-route delay est
bull Targeted global amp detailed TD re-placement of critical amp near-critical paths
bull Minimal effect on the rest of the circuit
bull Fast
WL and Pre-Route Delay Model
( ) ( ) ( )i j
j i c i cu n
L n x x y y
3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C
ud (xd yd)up (xp yp)
uq (xq yq)
ui (xi yi) centroidC (xc yc)
Star graph model
ud (xd yd)
ui (xi yi)
up (xp yp)
uq (xq yq)
ld i2
ld i
Delay model
WL calculationWe use a star graph model to calculate WL
Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C
22 ( )
2i j d i d i g
rcD u n l rl C
Driver node driving load capacitance
Self interconnect delay
Self interconnect seeing other interconnect amp load capacitance
of Ctotal
(1-of Ctotal
Fidelity of our model The future model is still under development which modeling nets with multiple star structures
Best results for
Circuit Mac64 Matrix Vp2 Mac32 error
Routed delay 34 38 43 67 0
Our curr model 40 51 62 82 295
Multi-star model 35 49 52 73 155
TD Analytical Global Placement (TAN)
A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective
function
Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation
A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step
1 2 3( )
( ) ( ) ( ) ( )i j
c j j i iu critical n
D n D n D u D u
Net w 2 critical pathsthrough it
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Prior Work (cont)
Wonjoon et al (ICCADrsquo03) Path based constraints for every violated path pj (has maximum path limit)
Simple bisection method to remove overlap no control of delay change Luo et al (DACrsquo06)
Consider both cells in the critical path and cells that are logically adjacent to the critical path to control timing perturbation
Delay model with delayslew propagation Both algorithms use LP for replacement which doesnrsquot address the quadratic p
art of the delay accurately
Incremental TD placement
Brenner et al (ISPDrsquo04) Doll et al (ICCADrsquo94) Try to send flow from congested area or cells that havenrsquot been placed to vacant area with minimum cost Allow temporary small illegality (eg overlap or out of boundary) caused by movement according to the flow WL driven and the deterioration is small from global placement results
Nw flow based detailed placement
i j
i j
n p limitn p
d d d
Our Goals amp Methodology Initial placed circuit
STA amp Determine critical node set (moveC)
TD nw-flow based detailedplacement (TIF) On moveC
TD analytical global placement (TAN)on moveC
New placement w improvedperformance
bull Goalsbull Accurate pre-route delay est
bull Targeted global amp detailed TD re-placement of critical amp near-critical paths
bull Minimal effect on the rest of the circuit
bull Fast
WL and Pre-Route Delay Model
( ) ( ) ( )i j
j i c i cu n
L n x x y y
3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C
ud (xd yd)up (xp yp)
uq (xq yq)
ui (xi yi) centroidC (xc yc)
Star graph model
ud (xd yd)
ui (xi yi)
up (xp yp)
uq (xq yq)
ld i2
ld i
Delay model
WL calculationWe use a star graph model to calculate WL
Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C
22 ( )
2i j d i d i g
rcD u n l rl C
Driver node driving load capacitance
Self interconnect delay
Self interconnect seeing other interconnect amp load capacitance
of Ctotal
(1-of Ctotal
Fidelity of our model The future model is still under development which modeling nets with multiple star structures
Best results for
Circuit Mac64 Matrix Vp2 Mac32 error
Routed delay 34 38 43 67 0
Our curr model 40 51 62 82 295
Multi-star model 35 49 52 73 155
TD Analytical Global Placement (TAN)
A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective
function
Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation
A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step
1 2 3( )
( ) ( ) ( ) ( )i j
c j j i iu critical n
D n D n D u D u
Net w 2 critical pathsthrough it
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Our Goals amp Methodology Initial placed circuit
STA amp Determine critical node set (moveC)
TD nw-flow based detailedplacement (TIF) On moveC
TD analytical global placement (TAN)on moveC
New placement w improvedperformance
bull Goalsbull Accurate pre-route delay est
bull Targeted global amp detailed TD re-placement of critical amp near-critical paths
bull Minimal effect on the rest of the circuit
bull Fast
WL and Pre-Route Delay Model
( ) ( ) ( )i j
j i c i cu n
L n x x y y
3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C
ud (xd yd)up (xp yp)
uq (xq yq)
ui (xi yi) centroidC (xc yc)
Star graph model
ud (xd yd)
ui (xi yi)
up (xp yp)
uq (xq yq)
ld i2
ld i
Delay model
WL calculationWe use a star graph model to calculate WL
Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C
22 ( )
2i j d i d i g
rcD u n l rl C
Driver node driving load capacitance
Self interconnect delay
Self interconnect seeing other interconnect amp load capacitance
of Ctotal
(1-of Ctotal
Fidelity of our model The future model is still under development which modeling nets with multiple star structures
Best results for
Circuit Mac64 Matrix Vp2 Mac32 error
Routed delay 34 38 43 67 0
Our curr model 40 51 62 82 295
Multi-star model 35 49 52 73 155
TD Analytical Global Placement (TAN)
A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective
function
Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation
A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step
1 2 3( )
( ) ( ) ( ) ( )i j
c j j i iu critical n
D n D n D u D u
Net w 2 critical pathsthrough it
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
WL and Pre-Route Delay Model
( ) ( ) ( )i j
j i c i cu n
L n x x y y
3 ( ) ( 2)((1 2)( ( ) ( 2) )i j d i j gD u n r l cL n k C
ud (xd yd)up (xp yp)
uq (xq yq)
ui (xi yi) centroidC (xc yc)
Star graph model
ud (xd yd)
ui (xi yi)
up (xp yp)
uq (xq yq)
ld i2
ld i
Delay model
WL calculationWe use a star graph model to calculate WL
Pre-route delay model1( ) ( ( ) ( 1) )j d j gD n R cL n k C
22 ( )
2i j d i d i g
rcD u n l rl C
Driver node driving load capacitance
Self interconnect delay
Self interconnect seeing other interconnect amp load capacitance
of Ctotal
(1-of Ctotal
Fidelity of our model The future model is still under development which modeling nets with multiple star structures
Best results for
Circuit Mac64 Matrix Vp2 Mac32 error
Routed delay 34 38 43 67 0
Our curr model 40 51 62 82 295
Multi-star model 35 49 52 73 155
TD Analytical Global Placement (TAN)
A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective
function
Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation
A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step
1 2 3( )
( ) ( ) ( ) ( )i j
c j j i iu critical n
D n D n D u D u
Net w 2 critical pathsthrough it
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
TD Analytical Global Placement (TAN)
A TD extension of a combination of Gordian and Gordian-L Essentially a quadratic programming approach Use an iterative approach to model the linear terms of delay in objective
function
Critical delay cost of a net Need to focus only on the sinks on the critical paths Formulation
A net with more critical paths through it is more important to optimizemdashcan achieve min on all those paths w one opt step
1 2 3( )
( ) ( ) ( ) ( )i j
c j j i iu critical n
D n D n D u D u
Net w 2 critical pathsthrough it
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
TD Analytical Global Placement (contd)
Allocated slack of a net A weight measure for determining TD WL reduction of a net Two factor needs to be considered minimum path slack through the
net and of nets in that path
Therefore we uniformly allocate path slack to each net the allocated slack of a net is
( ) ( ( )) ( of nets in path ( ))a j max j max jS n S P n P n
4 4 4
6 6
Before optimization two paths have the same delay
3 3 3
3 3
After optimization one is longer than the other
Net slack= Path slack Observaton Nets with the same weight in TAN tend to have the same length after optimization
2 2 2
3 3
After optimization both paths have approx the same delay
Thus we can get
Equi-delaypaths
Net delay
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
TD Analytical Global Placement (contd)
( ) ( ) ( ) ( ) ( ( )) ( )jD j c j a j c quad n c lin j a jC n D n S n D D n S n
F ( ( ) ( )) ( )j
c quad j c lin j a jn moveN
D n D n S n
Final objective function to solve min-max via min-sum The delay cost of a net
The objective function
The delay cost part is divided into quadratic and linear part
Quadratic terms Can be solved by normal quadratic programming technique
Linear part The linear terms here is approximated by a quadratic terms as following In the formulation the coordinates in the denominator is the current value We do several iterations until the results convergentThe linear terms of y is dealt in the same way
2
( )( )
( )i c
i ci c
x xx x
x x
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
TD NW-Flow Based Detailed Placement (TIF)General Purpose
1 Solves the overlap problem form global placer2 Minimizes the deterioration of delay improvement obtd from global placer3 Legalizes the placement satisfying WS constraints
General nw-flow graph
Source
A2
C11 C12 C13 C14
C21 C22 W21 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
Flow to legalize A1 position
C12 C13 C14C11C21
C22 C24 W2A1
Cell placement after cells are moved in the flow direction
bull Arc cost = TD cost linear amp step functbull Arc capacity
bull hor how much a cell can move (accuracy issues)bull vert width of head cellbull S moved cell width(cell)bull row T WS of row
ST
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Arc Cost in TIF Sensitivity based cost
We define delay of a net to be the delay from its driver cell to its most critical sink cell Consider the net delay change when a cell is moved
Arc cost formulation For a cell we find the most critical nets (belong to path with smallest
slack) connected to it the unit flow cost of the arcs from the cell is
Delay model
ud
ui
up
uq
of Crsquototal
(1-of Crsquototal
ursquoi
ldi
lrsquodi
1
3
( )
( ) ( 2)(1 2)( )
j d d i
b j d i d i
D n R c l
D n r l c l
If ui is the critical sink or driver
2
3
( )
( ) ( 2)(1 2)( ( ) ( 2) )
j d i d i d i g
a j d i j g
D n rcl l r l C
D n r l c L n k C
Otherwise
2 3( ) ( ) 0j a jD n D n
_
1( ) ( ( ))
( ) ( )j
jn critical nets j
cost e D nS n cap e
From experiments gives best results
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Tackling Illegalities in TIF The incremental detailed placement problem is a DOP Thus certain illegalities
are introduced in it by using a continuous optimization method There are two
major problems I Discrete flow requirement in vertical arcs
The vertical arc represents vertical cell movement by a discrete amount (dist to nearest row)
flow on it should be either full capacity (cell width) or 0 Nw-flow solution may not meet this requirement Resulting placement problems
Initial placement Resulting Placement The full cost of movement is not incurred in nw-flow Cell moved up has larger area than the nw flow modeled
v
w u x
u
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
disp(v)=2
disp(w)=3v
overlap
w(v)=5
w x
u v
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Tackling Illegalities in TIF (contd)
Our flow discretizing soln for vertical arc The 3 step process Step1 Initially vertical arc cap=1 cost=full cost Step2 After the first 1 unit flow is passed cap=original cap-1 cost=0 Step3 After all flow is passed The cost and capacity of the adjacent
horizontal arc are updated to 0
Step1
Full cost is incurred
Step2Final placement
w u x
v
(7 c3) (7 c2)
(1 full-cost)f1=1
w(v)=7
w(v)=5w u x
v
(7 c3) (7 c2)
f1=1
w(v)=7
w(v)=5
(40)
f2=4
w u x
v
(inf0)
(inf0)
f1=1
w(v)=7
w(v)=5
(40)
f2=4 u v
w x
disp(u)=5
disp(w)=5
disp(v)=5
Encourage flow to keep going through arc
Step3
Horiz arc costupdated
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Tackling illegalities in TIF (contd)II Split flows
This occurs when there are flows on both upward and downward arcs
Two heuristics to solve the problem The two split flow will go through the tree
structure to the sink There are two heuristic
1 Max flow We choose the branch tree with larger flow2 Min cost We choose the branch tree with smaller
flow cost looking at the first k levels
C21 C22
C31 C32
A1
(5c1)
(5c2)f1=2
f2=3
Our experiment shows Max flow heuristic does better
C21 C22
C31 C32
helliphellip
hellip
f1
f2
C12
C23
C33
Tree1
Tree2
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Satisfying White Space Constraints
Due to the discrete nature of the detailed placement problem the white space constraint max row width does not exceed a pre-specified limit canrsquot be ensured by the nw-flow process
Two methods are used to deal with this problem Dynamic row size constraint monitoring Push-violation arcs in the next iteration
Non-discrete flow
w u x
v
(7 c3) (7 c2)
(5 c1)
w(v)=7
f1=2
f2=3disp(w)=5
disp(u)=5
w(v)=5
w
u vWS=3 WS=-2
WS violation
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Satisfying WS Constraints (contd) Dynamic WS constraint monitoring
Monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
Initial flow on vertical arc the total cell width is moved to target row Fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on in a row for each direction of the flow Once a viol in a direction occurs no further are allowed unless it g
oes to 0 Monitored by top and bottom viol guards Gb and Gt
If violation remains in the row then Push violation arc in the next iteration Thrashing prevented by dis
allowing reverse movement
W=3
W=7
W=9
W=4Min-costflow
Min-costflow
Gb = 0
Gt = 0
Full row
4
-5
Net viol = 0 4 -1
Violated row
S
Otherwise
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Global Network Flow Global flow network gives a global view of generally how
flows will go With the global flow we can eliminate detailed-flow arcs that are
not likely to have flow on it This can greatly reduce the cycles in the detailed nw-flow thus
reducing time without obvious improvement deterioration
Row i-1
Row i
Row i+1
A2
A1
Sink
(w(A2)0) (violi 0)
violated row
(w(Wi+1) Ci+1) (w(R
) C
i+1
i))
Ci+1 is probabilistic average of all left-to-right detailed horizontal arc costs in the row
Ci+1I is the weighted average of the detailed vertical arc costs between two rows
65 runtime reduction at the cost of 1-2 timing deterioration
Global nw flow
Detailed nw flow
Physical flow interpretation
All new cells placed amp all viol fixed
No
YesEnd
TIFrsquos High-level Flow
(on inducednetwork)
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Benchmarks There are three set of benchmarks Ibm Faraday and TD-Drago
n The Ibm and Faraday are originally not timing benchmarks we
generate synthetic timing characteristics for them The Ibm circuits donrsquot identify FFs We determine FFs in cycles a
nd break all cycles with minimum of FFs The average percentage of FFs is 13
Both suites donrsquot have information of resistance and capacities of cells and interconnects We choose the typical value of 18 microns technique for these parameters
Benchmark Characteristics
Ibm Faraday TD-Dragon
of cells 12506-210341 11734-32622 3093-25616
of nets 13636-201640 11815-33186 3200-26017
critical path length 21-220 16-25 20-60
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Efficacy of TD Arc Costs After TAN TD cost 0 cost unit cost
ibm03 331 303 259 273
ibm09 121 84 36 49
Ibm08 307 284 202 241
ibm15 218 188 127 140
ibm18 369 341 290 311
Dsp1 251 213 167 171
Risc1 246 222 197 199
TDMatrix 79 40 05 22
TDMac32 121 77 36 59
TDmac64 139 100 69 78
GeomArith Avg
195 218 1523 185 88 139 117 154
bull Global place (TAN) Detailed place (TD cost) deterioration 43 Detailed place (unit cost) deterioration 78 Detailed place (0 cost) deterioration 107bull 45 deterioration reduction of global place results by going from unit-cost TD-cost
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Final Results
Delay improvement for ibm benchmarksmdashinitial placement WL-driven (Dragon)
Delay improvement for Faraday benchmarksmdashinitial placement WL-driven (Dragon)
197
242
206
243
45
37
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Delay improv for TD-Dragon benchmarks placed by Dragon (cell delay)
Delay improv for TD-Dragon benchmarks placed by Dragon (no cell delay)
Delay improvement on TD-Dragon placement for different WS constraints
0
2
4
6
8
10
12
matri x vp2 mac32 mac64 avg TAN
3510TAN
bull [Wonjoon amp Bazargan ICCADrsquo03] achieves an avg of 28 improv with 5 WSbull For 5 WS our improvement is 42 (50 relative improvement)
Final Results (contd)
82120
196241
62
102
3845
40
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Empirical Asymptotic Time Complexity
y = 0 0057x + 266 51
0200400600800
1000120014001600
0 50000 100000 150000 200000 250000
cel l
runt
ime
Linear curve best fits data
y = 0 6857x + 51 48
0200400600800
10001200140016001800
0 500 1000 1500 2000 2500movabl e cel l s
runt
ime
Linear curve best fits data
bull Runtime is 18 of Dragon and 12 of TD-Dragonbull Obtains a soln for a 210K cct ibm18 w 34 improv in 24 mins
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Conclusions Proposed a TD incremental placement flow FlowPlace
Global and detailed incremental placer New accurate pre-route net delay models Can opt both quadratic and the linear delay terms in global placer TD nw flow to solve detailed TD placement
sensitivity-based TD arc costs constraint satisfaction (eg WS) discretization of illegal continuous solns global nw flow graph
Promising results Delay improv up to 34--for a 210K-cell WL-opt layout in 24 mins Delay improv up to 10--for a 26K-cell TD-opt layout in just above 5 mi
ns The average delay improvement is1834 The WL deterioration is an average of 8 The average run time is only 12-18 of original placement runtime
TD-IBM benchmarks and placed outputs avail at the FlowPlace page wwweceuicedu~duttbenchmarks-etcFlowPlaceflowhtml
Concepts can be extended to timing and power optimization with constraints and physical re-synthesis
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement
Satisfying white space constraints Dynamic WS constraint monitoring
We monitor total cell width in each row after every ndashve cycle-based iter improv of nw flow
initial flow on vertical arc the total cell width is moved target row
fully reverse flow the total cell width is moved back to orig row To facilitate cell movement we allow temporary white space violati
on under constraintsW=5
Sink
vio_top=0
WS=2
vio_top=3
vio_bot=0
u
v
xWS=-3
WS=-2
W=7
W=5
vio_top=3
vio_bot=2
uvWS=-5
WS=5
W=5 vio_top=0
vio_bot=0
u
vWS=0
WS=0
Viol_max=max cell width Violation from above and bellow are calculated separately
Because the flow allowed in step two due to separate violation limit for flow from above and below we can finally legalize the placement