1 Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew...

1

Timing-Driven, Over-the-Block Rectilinear Steiner Tree

Construction with Pre-Buffering and Slew Constraints

Yilin Zhang and David Z. Pan

ECE, Univ. of Texas at Austin

ISPD’ 2014

Outline

Background & MotivationTOB-RSMT

› Problem Formulation› TOB-RSMT Algorithms

Experimental ResultsConclusion

2

History of VLSI RSMTs

Wirelength driven: BOI, BI1S, RV-based RST, FLUTE and GeoSteiner

Obstacle-avoiding RSMT (OA-RSMT) › [Chow+, VLSI14] [Liu+, DAC12][Li+, ICCAD08]

Over-the-block RSMT (OB-RSMT) are proposed since 2012

› [Huang+, ICCAD12] [Zhang+, ICCAD12] Minimum delay routing tree (MDRT) : BA-Tree,

etc.RAT-driven RSMT: C-Tree, etc.

3

Limitations on Previous Timing-driven RST

Cluster nodes during bottom-up method› Such as BA-Tree and C-Tree

Clustering distance metric: › spatial and slack

4

Hard to find accurate slack:Some segments are not fixed yet All segments are not buffered yet

Limitations in Dealing Blocks

Completely neglect block will have slew problem› No over-the-block buffer allowed

Obstacle avoiding› More congested outside-block› Detour means more WL and worse timing

5

detours

Post-buffering Topology Tuning is NecessaryBuffering plays a big role in delay reduction

› Shielding effect; linear delay on long wire› But it is always placed after wiring

Change topology after buffering is fruitful!

6

DSB unchanged

DSA decreased Db2

Our Contributions

Use pre-buffering to find practical slack for each node in the graph

Use over-the-block routing resource to improve WL, buffering cost and timing

Apply post-buffering tuning to improve timing on critical paths with little extra cost

7

Outline




8

Problem Formulation

N = {s0,s1,s2,...,sn}, n sinks and source s0

B = {b1, b2, . . . , bm}, non-overlapping rectilinear blocks in two-dimensional space R

Buffered T(V, E) connects all the pins in N to optimize WNS with the lowest buffering cost

› V is the set of nodes › E is the set of horizontal and vertical edges.

Slew rate on every point in T within constraints› Slew mode buffering [Hu+, TCAD07]

No buffers are allowed over the blocks

9

Timing Models

Elmore Delay

Slew

› Peri Model + Bakoglu’s Metric

» ( 4% error [Kashyap+, ISPD03] [Bakoglu+, 90] )

10

Overall Algorithm

11

Initial timing-driven RST with Pre-buffering

Find all over-the-block slew violation and fix them

Buffering

Tune the topology according to buffering information

Buffering

N & B

Return buffered T

Initial Tree Generation with Pre-Buffering

12

Iterative method› Until converges or oscillates between several states

Feed back real delay to each node to find slack (criticality)

› Identified critical sinks before topology construction are real critical ones

› Practical slack on each node

Initial Tree with Pre-Buffering Flow

13

[Lin+, TCAD11]

14

Initial Tree with Pre-Buffering Example

Simple model without buffering suggests D is critical

However, with buffering, D is not critical

Now, D is inserted far from source

with less WL

Buffering-Aware Over-the-Block TD-RST

TD-RST needs over-the-block route

› Better WL, buffer resources and timing

› Replace obstacle-avoiding detours with shorter over-

the-block connection

15

150ps 100ps

120ps 110ps

16

Different with WL-driven BOB-RSMT

Original

WL driven

Move non-critical paths to save slew

Protect critical paths for timing

WL+slack

The hard problem with over-the-block is slewEach topology confines a set of inside treesUse hypothetic buffer to check if it is possible for

buffering

17

Slew Constraints in Buffering-Aware TD-RST

Optimization Primitives

Three optimization primitives

18

Parallel sliding Perpendicular sliding

EP merging[Zhang, ICCAD12]

Formulation consider slack and WL together

19

Formulation of Buffering-Aware TD-RST

WijCdEPit: delay

increase for every sink downstream EPi

t

Increase of TNS

Increase of WL

Buffer-location-based Tuning Benefits

Tuning topology after buffering benefits!

Buffering resources are costly

Improve timing without increasing buffers is

tempting

› With small amount of WL increase

We propose a way to post-tune the topology

base on buffer location information

20

Saturated/Un-saturated Buffers

Some buffers are “Saturated” and some are “Un-

saturated”

› Saturate: the slew reaches maximum

› Un-saturated: slew does not reach maximum

21

Buffer-location-based Tuning Study

Un-saturated buffer == opportunity

22

WL increase

Delay to A improves

Buffer-location-based Tuning Condition

Δslew = slewmax – slewcur

Lmax is the max allowed distance to relocate

› If neglecting buffer input cap, Lmax =

› If consider buffer input cap, Lmax =

23

Buffer-location-based Tuning Flow

24

Sort all sinks according to slack

Tuning

Buffered T

Return buffered T

n = n.parent

satisfy Lmax constraint ?

For each neg slack sink n

n at source?

N

YContinue

Buffering

Outline




25

Experimental Setups

C++ programming language Intel Core 3.0GHz Linux machine with 32GB

memory Gurobi Optimizer 5.10 for mathematical

optimizationRC01-RC12 are benchmarks [Feng+, ISPD06]Two sizes of buffers: 450 ohms and 850 ohms,

3.8 fF and 1.9 fF Interconnect RC from ITRS and slew constraints

70ps

26

Experimental Setups

SD-OARST is baseline [Lin+, TCAD11]TOB-RST-1 OA-RST with pre-bufferingTOB-RST-2 is over-the-block with pre-bufferingTOB-RST is over-the-block with pre-buffering

and post-buffering tuning

27

Experimental Results

28

TOB-RST-1 to SD-OARST › similarity of WL (buffering cost)› pre-buffering benefits the slack

TOB-RST-2 to TOB-RST-1: › 179ps on average for WNS› buffering cost and WL reduced by 6% and 5%

TOB-RST to TOB-RST-2: › 70ps in WNS on average, less than 1% more WL

Experimental Results

29

Outline




30

Conclusion

Timing-driven over-the-block rectilinear Steiner minimum tree

Use pre-buffering to find practical slack for each node

Use over-the-block routing resources to improve WL, buffering cost and timing

Apply post-buffering tuning to improve timing on critical paths with little extra cost

Significantly improve WNS for all benchmarks along with 2% less WL and 4% less buffering cost than SD-OARST

31

Acknowledgment

This work is supported in part by Oracle Thanks to Dr. Salim Chowdhury, Dr. Rajendran

Panda and Dr. Akshay Sharma from Oracle

32

Thank you!Questions?

1 Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew...

Documents

Transcript of 1 Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew...