1 Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew...
-
Upload
marvin-hughston -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew...
1
Timing-Driven, Over-the-Block Rectilinear Steiner Tree
Construction with Pre-Buffering and Slew Constraints
Yilin Zhang and David Z. Pan
ECE, Univ. of Texas at Austin
ISPD’ 2014
Outline
Background & MotivationTOB-RSMT
› Problem Formulation› TOB-RSMT Algorithms
Experimental ResultsConclusion
2
History of VLSI RSMTs
Wirelength driven: BOI, BI1S, RV-based RST, FLUTE and GeoSteiner
Obstacle-avoiding RSMT (OA-RSMT) › [Chow+, VLSI14] [Liu+, DAC12][Li+, ICCAD08]
Over-the-block RSMT (OB-RSMT) are proposed since 2012
› [Huang+, ICCAD12] [Zhang+, ICCAD12] Minimum delay routing tree (MDRT) : BA-Tree,
etc.RAT-driven RSMT: C-Tree, etc.
3
Limitations on Previous Timing-driven RST
Cluster nodes during bottom-up method› Such as BA-Tree and C-Tree
Clustering distance metric: › spatial and slack
4
Hard to find accurate slack:Some segments are not fixed yet All segments are not buffered yet
Limitations in Dealing Blocks
Completely neglect block will have slew problem› No over-the-block buffer allowed
Obstacle avoiding› More congested outside-block› Detour means more WL and worse timing
5
detours
Post-buffering Topology Tuning is NecessaryBuffering plays a big role in delay reduction
› Shielding effect; linear delay on long wire› But it is always placed after wiring
Change topology after buffering is fruitful!
6
DSB unchanged
DSA decreased Db2
Our Contributions
Use pre-buffering to find practical slack for each node in the graph
Use over-the-block routing resource to improve WL, buffering cost and timing
Apply post-buffering tuning to improve timing on critical paths with little extra cost
7
Outline
Background & MotivationTOB-RSMT
› Problem Formulation› TOB-RSMT Algorithms
Experimental ResultsConclusion
8
Problem Formulation
N = {s0,s1,s2,...,sn}, n sinks and source s0
B = {b1, b2, . . . , bm}, non-overlapping rectilinear blocks in two-dimensional space R
Buffered T(V, E) connects all the pins in N to optimize WNS with the lowest buffering cost
› V is the set of nodes › E is the set of horizontal and vertical edges.
Slew rate on every point in T within constraints› Slew mode buffering [Hu+, TCAD07]
No buffers are allowed over the blocks
9
Timing Models
Elmore Delay
Slew
› Peri Model + Bakoglu’s Metric
» ( 4% error [Kashyap+, ISPD03] [Bakoglu+, 90] )
10
Overall Algorithm
11
Initial timing-driven RST with Pre-buffering
Find all over-the-block slew violation and fix them
Buffering
Tune the topology according to buffering information
Buffering
N & B
Return buffered T
Initial Tree Generation with Pre-Buffering
12
Iterative method› Until converges or oscillates between several states
Feed back real delay to each node to find slack (criticality)
› Identified critical sinks before topology construction are real critical ones
› Practical slack on each node
Initial Tree with Pre-Buffering Flow
13
[Lin+, TCAD11]
14
Initial Tree with Pre-Buffering Example
Simple model without buffering suggests D is critical
However, with buffering, D is not critical
Now, D is inserted far from source
with less WL
Buffering-Aware Over-the-Block TD-RST
TD-RST needs over-the-block route
› Better WL, buffer resources and timing
› Replace obstacle-avoiding detours with shorter over-
the-block connection
15
150ps 100ps
120ps 110ps
16
Different with WL-driven BOB-RSMT
Original
WL driven
Move non-critical paths to save slew
Protect critical paths for timing
WL+slack
The hard problem with over-the-block is slewEach topology confines a set of inside treesUse hypothetic buffer to check if it is possible for
buffering
17
Slew Constraints in Buffering-Aware TD-RST
Optimization Primitives
Three optimization primitives
18
Parallel sliding Perpendicular sliding
EP merging[Zhang, ICCAD12]
Formulation consider slack and WL together
19
Formulation of Buffering-Aware TD-RST
WijCdEPit: delay
increase for every sink downstream EPi
t
Increase of TNS
Increase of WL
Buffer-location-based Tuning Benefits
Tuning topology after buffering benefits!
Buffering resources are costly
Improve timing without increasing buffers is
tempting
› With small amount of WL increase
We propose a way to post-tune the topology
base on buffer location information
20
Saturated/Un-saturated Buffers
Some buffers are “Saturated” and some are “Un-
saturated”
› Saturate: the slew reaches maximum
› Un-saturated: slew does not reach maximum
21
Buffer-location-based Tuning Study
Un-saturated buffer == opportunity
22
WL increase
Delay to A improves
Buffer-location-based Tuning Condition
Δslew = slewmax – slewcur
Lmax is the max allowed distance to relocate
› If neglecting buffer input cap, Lmax =
› If consider buffer input cap, Lmax =
23
Buffer-location-based Tuning Flow
24
Sort all sinks according to slack
Tuning
Buffered T
Return buffered T
n = n.parent
satisfy Lmax constraint ?
For each neg slack sink n
n at source?
N
YContinue
Buffering
Outline
Background & MotivationTOB-RSMT
› Problem Formulation› TOB-RSMT Algorithms
Experimental ResultsConclusion
25
Experimental Setups
C++ programming language Intel Core 3.0GHz Linux machine with 32GB
memory Gurobi Optimizer 5.10 for mathematical
optimizationRC01-RC12 are benchmarks [Feng+, ISPD06]Two sizes of buffers: 450 ohms and 850 ohms,
3.8 fF and 1.9 fF Interconnect RC from ITRS and slew constraints
70ps
26
Experimental Setups
SD-OARST is baseline [Lin+, TCAD11]TOB-RST-1 OA-RST with pre-bufferingTOB-RST-2 is over-the-block with pre-bufferingTOB-RST is over-the-block with pre-buffering
and post-buffering tuning
27
Experimental Results
28
TOB-RST-1 to SD-OARST › similarity of WL (buffering cost)› pre-buffering benefits the slack
TOB-RST-2 to TOB-RST-1: › 179ps on average for WNS› buffering cost and WL reduced by 6% and 5%
TOB-RST to TOB-RST-2: › 70ps in WNS on average, less than 1% more WL
Experimental Results
29
Outline
Background & MotivationTOB-RSMT
› Problem Formulation› TOB-RSMT Algorithms
Experimental ResultsConclusion
30
Conclusion
Timing-driven over-the-block rectilinear Steiner minimum tree
Use pre-buffering to find practical slack for each node
Use over-the-block routing resources to improve WL, buffering cost and timing
Apply post-buffering tuning to improve timing on critical paths with little extra cost
Significantly improve WNS for all benchmarks along with 2% less WL and 4% less buffering cost than SD-OARST
31
Acknowledgment
This work is supported in part by Oracle Thanks to Dr. Salim Chowdhury, Dr. Rajendran
Panda and Dr. Akshay Sharma from Oracle
32
Thank you!Questions?