Jan 2010Hard IP Reuse1 Hard IP Reuse – a Survey Shmuel Wimer Bar Ilan University, School of...

Jan 2010 Hard IP Reuse 1

Hard IP Reuse – a Survey

Shmuel Wimer

Bar Ilan University, School of Engineering


Outline• Design and Market Considerations

– Hard and soft IP reuse– Intel’s Tick / Tock– Design consideration

• Layout Migration in Work– Transistor, cell and block compaction– Delay and circuit considerations– Late process changes / updates– Design For Manufacturability

• Layout Migration Algorithms and Techniques– Hierarchy in layout– Visibility, compaction and positive cycles– Cell based migration


Part I

Design and Market Considerations


Hard Vs. Soft IP Reuse

• Hard IP reuse is transforming the polygons of old taped-

out data into new process technology– Net list doesn’t change

– Circuit changes are limited to resizing only

– Suitable for custom design

• Soft IP reuse is using the same RTL of old design with a

new target library design in any technology– Architecture, Verilog and RTL are not changing

– Net list is changing

– Layout is done from scratch

• Future is questionable as FPGA usage is spreading


Advantages of Hard IP Reuse

• Fabless companies

– Lower design cost

– Better Mfg. Shopping, few sources

– Competition, TTM

• Intel’s drive

– System Integration, SoC

– Manufacturing cost, volumes

– Performance enhancement

– Competition, TTM


Intel’s Tock / Tick Strategy


Intel Architecture Roadmap


Tock Vs Tick Design

• Tock– New Architecture, design implementation and layout

– Matured process technology

– New CAD tools and DA flows

– Large design team and Long duration

• Tick– Stable design, only few new features

– New process technology

– Maximizing soft and hard IP reuse

– Smaller design team and shorter period


Design Reuse at Intel

• Long history of successful projects at Intel

• 2-year cadence

– Banias 130nm => Dothan 90nm (Centrino) – 2001

– Dothan 90nm => Yonah 65nm (Centrino) – 2003

– Prescott 90nm => CedarMill 65nm (Pentium IV) – 2003

– Merom 65nm => Penryn 45nm (Core 2 Dou) – 2005

– Nehalem 45nm => Westmere 32nm – 2007

– SandyBridge 32nm => IvyBridge 22nm – 2009


• Emphasize on layout migration– Yielded nominal device speedup minus 3% to 6%

– Was okay for 35% speedup across process generations

– 65nm to 45nm was very tough

• In-house development of migration flows

• Core compaction technology purchased from vendor– Migrate and reuse polygons by “classical” compaction

– Straight forward but not fully exploiting new process, one shot

• Saved lot of mask designer resources

• Shorter TTM– 3Q to 5Q design duration compared to 8Q to 9Q in re-design

Design Reuse at Intel – Past


Design Reuse – Present and Future

• Migrate entire design rather than layout– Optimize design factors as timing and power

• Circuit optimization– Cell resizing

– Interconnect optimization (driver – interconnect – receiver)

• Layout optimization– Migrate cell library, trading off scaling and cell performance

– Cell-based: Nehalem to Westmere, SandyBridge to IvyBridge

– Xscale (Intel) to 90nm TSMC (Marvell)

• Much less process dependent than polygon migration– Flexible for further changes and migrations


Design Optimizations in Migration

• Power and Timing tradeoffs

– Resizing and re-spacing optimization techniques

– Interconnect specs Vs. post optimization

– How much improvement is expected?

• 5% speedup

• 5% dynamic power reduction

• Reliability and DFM

• Noise immunity


Typical Layout Migration Flow

65nm / 45nm LO

45nm LO

PWR grid resizing

45nm DR netlist + sizing

interconnect width and space

LO quality guidelines

OPCguidelines

compaction engine(polygon or cell-based)

Intel’s proprietary SW


Challenges in Hard IP Reuse

• Significant reduction of DE and MD effort– Combination of CAD tools, design flows and managerial

decisions– Fast TTM

• More area efficiency• State-of-the-art manufacturing process technologies

– Discrete design rules– Very complex DFM rules– Migration-based process design

• Combine process simulation with layout migration tools– Yield and reliability enhancement

• Analog design re-use


Part II

Layout Migration in Work


90nm 65nm

Transistors and Metal Comparison


130

nm

90n

m

Cell C

omparison


130nm draw

n212u X

103u

90nm draw

n112u X

54u

Block C

omparison


Timing Delivery of Migration

• 65nm Vs. 90nm nominal speedup 26%

• Cycle time speedup set to 22% (Simulation comparison)

– Yonah cycle 360p Vs. Dothan cycle 460p

• LO Migration timing speedup measure:

% of delay speedup that 80% of paths meet

• Migration yielded 19% speedup in above measure

– This is less than desired

– Similar degradation observed in Dothan


Maintaining PWR Grid

• 1st migration scaling

– M2 0.32u to 0.32u

– M3 0.88u to 0.77u (max width violation)

– M4 0.84 to 0.42u (with fixing)

• Tight monitoring of scaling success

• 2nd migration

– slotted M3 to 0.33u + 0.11u space + 0.33u

– Introduced large vias


gate degrades in manufacturing

original drawing fix by new

design rules

Late Changes in DR’s


n-diff

poly

contact

metal1

via

metal2

large contact small via

Process change:small contact

large via

Late Changes in DR’s (Cont’d)


Late Changes in DR’s (Cont’d)

65nm 65nm after changing


Benefits of LO migration

• Low design effort, short schedule (see Dothan Vs.

Tualatin)

• Stable design, no escapees

• Fast timing convergence

• Design can start early, best utilization of HR

• Flexibility for later changes in process

– Raw migration of 90nm to 65nm

– MD’s make first cleanup

– Later 65nm to 65nm migration


Part III

Layout Migration Algorithms and Techniques


Hierarchy in Layout


Non Uniform Hierarchical Migration


before: 45 nm

after: 32 nm / 0.7


before: 45 nm


after: 32 nm / 0.7


Relations Between Layout Objects

• Relations are captured in graph– Sometimes called constraints graph

– Graph describes technology rules and other design constrains• Proximity relations due to noise and delay considerations

• Alignment of layout pieces called pitch matching

• Adjacency relations never change– No swapping of objects

– 1D Compaction can still change orthogonal adjacency relations

• Design rules are captured in visibility graph– Planar by definition

– Design rules Transitivity enables transitive reduction of visibility graph


• Visibility graph of cell layout– Nodes are cells` center lines

– Arc represents cells visible to each other

– Arc weights represent target cell size and spacing between cells

• Visibility graph of polygonal layout– Nodes are polygon edges

– Arc represent polygon interior (material) and spacing between

polygons visible to each other

– Arc weights represent width and space of polygons

• Visibility graph of symbolic layout– Nodes are sticks skeletons (e.g. wires, vias) or centerline of

encapsulated polygons (e.g. transistors)


Visibility Graph in Cell Layout


Visibility Graph in Polygonal Layout


Generation of Reduced Visibility Graph

• Alternative A:– Find visibility graph

• Left-to-right or bottom-to-top sweep-line algorithm

– Remove transitive edges from graph

• Equivalent to matrix multiplication

• Alternative B:– Take advantage of problem being interval graph

– Intervals approached by sweep-line are stored in ordered tree

– Transitive edges are removed during scanning whenever

adjacency relation of two leaves is broken by insertion of new

node


x0 x5x1 x2 x3 x4

Cycles in Edge-based Compaction

1 0 0 1

Allow cell abutment:

2 2min minx x S x x S

5 0 0 5 5 0

Cell size is constrained:

, x x A x x A x x A

2 1 1 2

Maintain minimum wire width:

min minx x W x x W

3 2 2 3

Maintain minimum wire spacing:

min minx x S x x S


x0 x1 x2 x3 x4 x5Smin /2 Smin /2Smin Wmin

A

-A

Wmin

Feasible solution exists if there’s no positive cycle in constrained graph.

Inequalities are translated into constraint graph.

Edge locations can be obtained by finding longest paths.

Inequalities impose a linear programming problem


Bellman-Ford Algorithm

// init

Bool ( , , , ) {

for ( each ) { ; ; }

0 ;

for ( 1 ; | | 1 ; ) {

for ( each

ailization

// set source longest distance to zero

,

G V E w s

v V dist v parent v

dist s

i i V i

e x

BellmanFord

) {

if ( < , ) {

= , ; ;

}

}

}

for ( each , ) {

// examine all arcs

// relaxation chec

k

// positive cycle check

y E

dist y dist x w e x y

dist y dist x w e x y parent y x

e u v E

if ( < , ) { return FALSE }

}

return TRUE ;

}

dist v dist u w e u v


Correctness of Bellman-Ford Algorithm

Let , be weighted directed graph with source , containing

no positive cycle reachable from . Then, at termination of there

exists = _ _ , for every vertex

G V E s

s

dist v longest path dist s v v

:

BellmanFord

Lemma 1

raechable from .s

Let , be weighted directed graph with source . Then for every

, there is a path from to iff at termination of .

G V E s

v V s v dist v

:

BellmanFord

Lemma 2

If contains no positive cycles reachable from then

returns TRUE, = _ _ , for every , and the arcs

, are the longest-path tree rooted at . If

G s

dist v longest path dist s v v V

e v parent v s G

: BellmanFordTherorem

does contain a positive-

weight cycle reachable from , then the algorithm returns FALSE. s


0 1

0 11

We'll prove only the second part.

Assume that a positive cycle , , , raechable from exists, where

. Then , 0. Assume in contrary that algorithm returns

TRUE. There exis

k

k

k i ii

c v v v s

v v w e v v

:

Proof

1 1

1 11 1 1

11 1

ts , 1,2, , .

Summing the inequalities along the cycle yields

, .

The But , hence

i i i i

k k k

i i i ii i i

k k

i i ii i

dist v dist v w e v v i k

c

dist v dist v w e v v

dist v dist v w e v

11, 0, contrudicting

the positive cycle hypothesis.

k

iiv

Algorithm runs in | || | time.O V E


x

6

7

5

-8

-4-22

9

-3z

u v

y

-16

5

a

b

c

d

e

f

g

h

i

j

-∞ -∞

-∞-∞

0

Initialization

Graph edges are labeled a, b, …, I, j according to their order in data structure.

Find longest path from z to all vertices. Report whether positive cycle exists.


6 11

7

0

x

6

7

5

-8

-4-22

9

-3z

u v

y

-16

5

a

b

c

d

e

f

g

h

i

j

-∞ -∞

-∞-∞

i = 1


6 11

7

0

x

6

7

5

-8

-4-22

9

-3z

u v

y

-16

5

a

b

c

d

e

f

g

h

i

j

-∞11 20

17

i = 2

Positive cycles do not exist.

Longest path spanning tree is obtained from parent nodes.


Difference Constraints and Longest Paths

1

2

3

4

5

1 1 0 0 0 0

1 0 0 0 1 1

0 1 0 0 1 1

1 0 1 0 0 5

1 0 0 1 0 4

0 0 1 1 0 1

0 0 1 0 1 6

0 0 0 1 1 5

x

x

x

x

x

0

0

0

0

0

V0

0

V1

V2

V3

V4

V5

0

1-1

5

4

-1

-6-5

0

1

1

65

0

A system of difference constraints can be respresented by constraint graph.

A source node is added and single-source longest-paths problem is then solved by

Bellman-Ford algorithm. Nodes are l

Ax b

v

0abled by longest path weight , , if exists.iv v


0

: Given a system of difference equation, let ,

be its corresponding constrained graph. , , 1,2, ,| |

is a feasible solution iff , has no positive-weight cycles.

i

Ax b G V E

x v v i V

G V E

Theorem

0

0 0 0

: If , has no positive cycle then longest path , to every

node exists. Hence, , , , . Setting ,

yields the feasible solution , .

i

j i i j i i

j i i j

G V E v v

v v v v w e v v x v v

x x w e v v

Proof

1

1 1

1 1

1

1

1 1

Conversely, let , , be a positive-weight cycle, corresponding

to difference equations , , 1,2, , 1. Summation

of both sides yields 0= ,

k

j j j j

j j j j

i i k

i i i i

k

i i i ij j

c v v i i

x x w e v v j k

x x w e v v

10, a contrudiction.

Hence, feasible solution doesn't exist.

k


Bellman-Ford algorithm can therefore solve difference equation system of n

variables and m equations in O(n(n+m)) = O(n2 + mn) time complexity.

Bellman-Ford can be modified for difference equations specific case to yield

O(mn) time complexity.


Overview of Cell-Based Hierarchical Interconnect Migration

Five-step graph contraction procedure

1. Flatten layout visibility graph

2. Define cell call order tree T

3. Merge cell instances within parents in bottom-up T order

4. Stop if positive cycle exists, continue otherwise

5. Eliminate merged cells within parents in bottom-up T order

Positioning of interconnects within their templates

– Top-down linear programming solution by T order

– Feasibility is guaranteed by graph contraction invariance


Assumptions

• All cells sizes are known and must be adhered– Defined by bottom-up cell-based placement stage.

– Outcome of descendant cells sizes and interconnect scaling, specs and estimates.

• Position of son cells within parents must be adhered– Same reasons as above

• Infeasibility incurring at routing migration is resolved by:– Resizing of migrated cells

– Repositioning of son cells

– Relaxing interconnect rules and constraints• Left for later manual fixes by circuit and mask designers

• Interconnect migration re-invoked– Cell migration / placement – interconnect migration iterations


12

3

b

1 3

42

b 1 2 3 b b 1 2 3 ba a

1 2 3 4

Flatten Layout Visibility Graph

wa

wb wb

a

wa

12

3

b

wb


e

a

a

b

d

c

a

b

1

2

3

4

4

5

5

5

Cell Call-Order Tree


V`w1 w2

V``w3 w4

V

W1-offset`W2+offset`

W4+offset`W3-offset`

offset` offset``

(0,0)

Merge Cell Instances• Main idea is to solve per-template problem rather than across all instances

• A merged template located at origin represents all its instances

• All similar polygons collapse into one polygon

• Template-internal arcs remain unchanged

• Length of arcs connecting internal nodes to external ones are transformed by their instance offset


b 1 2 3 b b 1 2 3 ba a

1 2 3 4

b 1 2 3 b

w - offset` + offset`` = 0

w

w - offset`

(0,0)

offset``

offset`

Merge Cell Instances


Visibility Graph Equivalence

b 1 2 3 b b 1 2 3 ba a

1 2 3 4

wa

wb wb

Similar polygons are connected by similarity arcs

• lengths equal to offsets difference

Guarantees that all instances will adhere a single template

• cell-based intent preserved

• hierarchy intents of design are preserved

Longest paths and their lengths are invariant under merge transformation


Reduction of Merged Graph

3

3

4

5

6 2

6

2

4

3

4

8

7

8

7

823

Blue, red and green are different merged templates of merged graph

Blue ≤ red ≤ green in cell call-order tree

Reduction of merged graph is done as follows:

• Merged templates are worked bottom up in cell call-order tree

• Nodes are eliminated by replacing all in and out arc pair by bypassing ones

• Parallel arcs are replaced by the longest one among them

Let’s eliminate the blue template


6

2

4

3

4

8

7

8

710

9

11

12

10

10

11

8

12

23

14

15

6

5

The elimination (reduction) of any number of merged template doesn’t change

the length of longest path between any two remaining vertices

Reduced graphs are created successively, per each template elimination.

Eventually only top-level cell will remain, containing only top-level polygons

with modified length of arcs.


Deriving Polygons Exact positions

• No positive cycles in M(V,E) Feasible solution exists

– preserves similarity of same template instances (cell-based)

– Preserves cell size and location within parent

– Preserves technology width/space design rules

• Series of reduced graphs is solved top-down

– Inequalities involves only one-template polygons

– Higher level polygons (cell-call order tree nodes) are

progressively known

– Solution can be obtained by any of flattened layout solvers

Jan 2010Hard IP Reuse1 Hard IP Reuse – a Survey Shmuel Wimer Bar Ilan University, School of...

Documents

Transcript of Jan 2010Hard IP Reuse1 Hard IP Reuse – a Survey Shmuel Wimer Bar Ilan University, School of...