1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang , Jason Cong ,...

1

Physical Hierarchy Generation with Routing Congestion Control

Chin-Chih Chang*, Jason Cong*, Zhigang (David) Pan+, and Xin Yuan*

* UCLA Computer Science Department+ IBM T.J. Watson Research Center

This paper is supported in part by SRC, an IBM Faculty Partnership Award, a grant from Intel, and a grant from Fujitsu under the California MICRO program

2

Overview

Motivation and problem formulation for physical hierarchy generation

Algorithm and contributions Multilevel coarse placement framework Hierarchical area density control Fast incremental global routing

Experimental results Conclusions

3

Challenges in Deep Sub-micron VLSI Designs

Performance Problems – need to optimize the dominating factor, i.e. interconnect delays See interconnects as early as possible Optimize interconnects in almost all design stages

Design convergence problems – need to eliminate mismatches between early estimations and final layouts Accurate estimation/optimization of interconnect delay in

early design stages Consider interconnect routability in early design stages Consider crosstalk noise impacts in early design stages

Require accurate global interconnect estimation/optimization in early design stages

4

Physical Hierarchy Generation Problem Formulation

Hard IP Soft module

Same color for modules of the same logic hierarchy

Logical Hierarchy

Assign modules to physical hierarchy

Defines global interconnects

• Optimization objectives of this work: • wire length minimization• routing congestion minimization

Physical Hierarchy = Placement bins + module locations

•Other objectives could also be used (not a complete list): performance, noise, power, etc.

5

Discussions on Previous Work on Placement with Routability Considerations

Modeling methods: Weighted BBOX [Cheng ICCAD’94], weighted BBOX with

congestion region expansion: [Yang ICCAD’01] Reconstruction of Steiner tree on each move: [Tsay Intl. Conf.

Asic’92] Optimization methods:

Recursive partition placement with pre-computed Steiner tree [Mayrhofer ICCAD’90]

Cell padding or region growing/shrinking: [Hou ASPDAC’01], [Sadakane CICC’97], [Parakh DAC’98], [Brenner ISPD’02], [Yang ISPD’02]

Most accurate routing estimation from global routing itself. Need to find tradeoff between accuracy and run time

6

Algorithm Overview: V-shape Multi-Level Coarse Placement

Coarsening by clustering

Refinement by placementInitial Placement

Congestion driven at the finest few placement levels

Fast global routing for congestion estimation

7

Algorithm Overview - Clustering

Finest cluster level Coarsest cluster level

Clustering: group clusters (or cells) together Usually under certain area constraints Clustering criteria: connectivity driven, performance driven, etc.

8

Algorithm Overview - Refinement by Placement

Initial Coarsest Level Placement

Declustering Placement

Declustering Placement Final coarse placement solution

Use the same grid structure in each level of placement Variable cluster size (may bigger than a bin): handled by

hierarchical area density control Use fast incremental routing for congestion estimation

9

Area Density Problems in Multi-level Coarse Placement

Traditional area density control: Cell area in each bin < bin area

utilization with a small percentage of overflow

Does not work when cluster sizes may have significant variations and may be bigger than a bin

How about use different grid sizes for different levels of clustering? Hard to find fixed percentages

that works Significant placement cost jump

when switch grid sizes

10

Hierarchical Area Density Control

Use the same grid structure for placement for all clustering levels

Impose hierarchy on bin structure for area density control

Each cluster move must satisfy the area constraints on each level in the bin hierarchy

Area constraint for moving a cell of size A Allowed overflow on each level in the

bin hierarchy = kA, k is a small constant (usually 1 or 2)

Work well in multi-level framework: Area constraints gradually tightened

during optimization

11

Fast Incremental A-tree Routing for Multi-pin Nets

Simple incremental A-tree Recursively Quad-partition

grids Each pin recursively

connects to lower left corner of each level of partition

For net with bounding box length B, at most 2 *log B edge updates for each pin move, except the root.

Each edge routed by LZ-router

First Quadrant

Root(source pin)

12

Fast LZ-routing for Two-pin Connections

Decide HVH or VHV: Select the less congested layer

Binary search on V-stem (or H-stem) Initial left region and right

region to cover bounding box Repeat

Query wire usage on both regions

Select region with less congestion

Wire usage query can be done in O(log grid_size)

Left region Right region

HVH VHV

13

Placement Cost Functions

Wire length driven: Summation of net bounding boxes of all nets

Congestion driven: Wire usages estimated from the fast global router Cost = Summation of square of wire usages in all bins For fixed wire width

cost equivalent to summation of weighted wire length, weight on a bin = wire usage of the bin

For congestion driven run: only turns on congestion driven cost at the finest placement level

W1 W2 W3

Congestion cost = W12 + W22 + … + W92 W4 W5 W6

W7 W8 W9

14

Experimental Results on Wire Length Minimization

Multi-level simulated annealing coarse placement Wire length comparison with GORDIAN-L:

Our engine only turns on wire length optimization Legalized by DOMINO for wire length comparison

Our multi-level engine performs well for big circuits

• 20k-50k test cases: avqlarge, avqsmall, ibm04, ibm07

• 50k-100k test cases: ibm09, ibm10

• 100k-210k test cases: ibm14, ibm15, ibm16, ibm17, ibm18

mPG+DOM/GOR+DOM Wire Length Comparison

97%

100%

96%

93%

94%

95%

96%

97%

98%

99%

100%

20k-50k 50-100k 100k-210k

mPG+DOM/GOR+DOM CPU Time Comparison

81%

43%

22%

0%

10%

20%30%

40%

50%

60%70%

80%

90%

20k-50k 50-100k 100k-210k

15

Experimental Results on Congestion Control

BBOX WL Routed WL Max boundary

congestion

Total overflow

CPU

mPG 1 1 1 1 1

mPG-cg.rd 1.05 0.97 0.93 0.47 6.1

mPG-cg 1.05 0.94 0.87 0.21 18.9

Test cases: ibm01, ibm04, ibm07, ibm11, ibm13, ibm15

mPG: wire length driven modemPG-cg: congestion driven at finest clustering levelmPG-cg.rd: alternative congestion driven + wire length driven at fines clustering level

16

Conclusions

Multi-level simulated annealing coarse placement Hierarchical area density control Fast global routing estimation Capable of wire length minimization with/without congestion

minimization

Compare to GordianL, mPG generates comparable solution with 3-6 times speedup for test cases > 100K

Congestion driven mPG reduce estimated global routing overflows by 50%-80% with 6-19 times CPU time

1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang , Jason Cong ,...

Documents

Transcript of 1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang , Jason Cong ,...

1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang *, Jason Cong *,...

Documents

Transcript of 1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang *, Jason Cong *,...

1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang , Jason Cong ,...

Transcript of 1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang , Jason Cong ,...