1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang *, Jason Cong *,...
-
Upload
peter-ross -
Category
Documents
-
view
216 -
download
2
Transcript of 1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang *, Jason Cong *,...
1
Physical Hierarchy Generation with Routing Congestion Control
Chin-Chih Chang*, Jason Cong*, Zhigang (David) Pan+, and Xin Yuan*
* UCLA Computer Science Department+ IBM T.J. Watson Research Center
This paper is supported in part by SRC, an IBM Faculty Partnership Award, a grant from Intel, and a grant from Fujitsu under the California MICRO program
2
Overview
Motivation and problem formulation for physical hierarchy generation
Algorithm and contributions Multilevel coarse placement framework Hierarchical area density control Fast incremental global routing
Experimental results Conclusions
3
Challenges in Deep Sub-micron VLSI Designs
Performance Problems – need to optimize the dominating factor, i.e. interconnect delays See interconnects as early as possible Optimize interconnects in almost all design stages
Design convergence problems – need to eliminate mismatches between early estimations and final layouts Accurate estimation/optimization of interconnect delay in
early design stages Consider interconnect routability in early design stages Consider crosstalk noise impacts in early design stages
Require accurate global interconnect estimation/optimization in early design stages
4
Physical Hierarchy Generation Problem Formulation
Hard IP Soft module
Same color for modules of the same logic hierarchy
Logical Hierarchy
Assign modules to physical hierarchy
Defines global interconnects
• Optimization objectives of this work: • wire length minimization• routing congestion minimization
Physical Hierarchy = Placement bins + module locations
•Other objectives could also be used (not a complete list): performance, noise, power, etc.
5
Discussions on Previous Work on Placement with Routability Considerations
Modeling methods: Weighted BBOX [Cheng ICCAD’94], weighted BBOX with
congestion region expansion: [Yang ICCAD’01] Reconstruction of Steiner tree on each move: [Tsay Intl. Conf.
Asic’92] Optimization methods:
Recursive partition placement with pre-computed Steiner tree [Mayrhofer ICCAD’90]
Cell padding or region growing/shrinking: [Hou ASPDAC’01], [Sadakane CICC’97], [Parakh DAC’98], [Brenner ISPD’02], [Yang ISPD’02]
Most accurate routing estimation from global routing itself. Need to find tradeoff between accuracy and run time
6
Algorithm Overview: V-shape Multi-Level Coarse Placement
Coarsening by clustering
Refinement by placementInitial Placement
Congestion driven at the finest few placement levels
Fast global routing for congestion estimation
7
Algorithm Overview - Clustering
Finest cluster level Coarsest cluster level
Clustering: group clusters (or cells) together Usually under certain area constraints Clustering criteria: connectivity driven, performance driven, etc.
8
Algorithm Overview - Refinement by Placement
Initial Coarsest Level Placement
Declustering Placement
Declustering Placement Final coarse placement solution
Use the same grid structure in each level of placement Variable cluster size (may bigger than a bin): handled by
hierarchical area density control Use fast incremental routing for congestion estimation
9
Area Density Problems in Multi-level Coarse Placement
Traditional area density control: Cell area in each bin < bin area
utilization with a small percentage of overflow
Does not work when cluster sizes may have significant variations and may be bigger than a bin
How about use different grid sizes for different levels of clustering? Hard to find fixed percentages
that works Significant placement cost jump
when switch grid sizes
10
Hierarchical Area Density Control
Use the same grid structure for placement for all clustering levels
Impose hierarchy on bin structure for area density control
Each cluster move must satisfy the area constraints on each level in the bin hierarchy
Area constraint for moving a cell of size A Allowed overflow on each level in the
bin hierarchy = kA, k is a small constant (usually 1 or 2)
Work well in multi-level framework: Area constraints gradually tightened
during optimization
11
Fast Incremental A-tree Routing for Multi-pin Nets
Simple incremental A-tree Recursively Quad-partition
grids Each pin recursively
connects to lower left corner of each level of partition
For net with bounding box length B, at most 2 *log B edge updates for each pin move, except the root.
Each edge routed by LZ-router
First Quadrant
Root(source pin)
12
Fast LZ-routing for Two-pin Connections
Decide HVH or VHV: Select the less congested layer
Binary search on V-stem (or H-stem) Initial left region and right
region to cover bounding box Repeat
Query wire usage on both regions
Select region with less congestion
Wire usage query can be done in O(log grid_size)
Left region Right region
HVH VHV
13
Placement Cost Functions
Wire length driven: Summation of net bounding boxes of all nets
Congestion driven: Wire usages estimated from the fast global router Cost = Summation of square of wire usages in all bins For fixed wire width
cost equivalent to summation of weighted wire length, weight on a bin = wire usage of the bin
For congestion driven run: only turns on congestion driven cost at the finest placement level
W1 W2 W3
Congestion cost = W12 + W22 + … + W92 W4 W5 W6
W7 W8 W9
14
Experimental Results on Wire Length Minimization
Multi-level simulated annealing coarse placement Wire length comparison with GORDIAN-L:
Our engine only turns on wire length optimization Legalized by DOMINO for wire length comparison
Our multi-level engine performs well for big circuits
• 20k-50k test cases: avqlarge, avqsmall, ibm04, ibm07
• 50k-100k test cases: ibm09, ibm10
• 100k-210k test cases: ibm14, ibm15, ibm16, ibm17, ibm18
mPG+DOM/GOR+DOM Wire Length Comparison
97%
100%
96%
93%
94%
95%
96%
97%
98%
99%
100%
20k-50k 50-100k 100k-210k
mPG+DOM/GOR+DOM CPU Time Comparison
81%
43%
22%
0%
10%
20%30%
40%
50%
60%70%
80%
90%
20k-50k 50-100k 100k-210k
15
Experimental Results on Congestion Control
BBOX WL Routed WL Max boundary
congestion
Total overflow
CPU
mPG 1 1 1 1 1
mPG-cg.rd 1.05 0.97 0.93 0.47 6.1
mPG-cg 1.05 0.94 0.87 0.21 18.9
Test cases: ibm01, ibm04, ibm07, ibm11, ibm13, ibm15
mPG: wire length driven modemPG-cg: congestion driven at finest clustering levelmPG-cg.rd: alternative congestion driven + wire length driven at fines clustering level
16
Conclusions
Multi-level simulated annealing coarse placement Hierarchical area density control Fast global routing estimation Capable of wire length minimization with/without congestion
minimization
Compare to GordianL, mPG generates comparable solution with 3-6 times speedup for test cases > 100K
Congestion driven mPG reduce estimated global routing overflows by 50%-80% with 6-19 times CPU time