Jan 2010Hard IP Reuse1 Hard IP Reuse – a Survey Shmuel Wimer Bar Ilan University, School of...
-
Upload
leo-dalton -
Category
Documents
-
view
215 -
download
0
Transcript of Jan 2010Hard IP Reuse1 Hard IP Reuse – a Survey Shmuel Wimer Bar Ilan University, School of...
Jan 2010 Hard IP Reuse 1
Hard IP Reuse – a Survey
Shmuel Wimer
Bar Ilan University, School of Engineering
Jan 2010 Hard IP Reuse 2
Outline• Design and Market Considerations
– Hard and soft IP reuse– Intel’s Tick / Tock– Design consideration
• Layout Migration in Work– Transistor, cell and block compaction– Delay and circuit considerations– Late process changes / updates– Design For Manufacturability
• Layout Migration Algorithms and Techniques– Hierarchy in layout– Visibility, compaction and positive cycles– Cell based migration
Jan 2010 Hard IP Reuse 3
Part I
Design and Market Considerations
Jan 2010 Hard IP Reuse 4
Hard Vs. Soft IP Reuse
• Hard IP reuse is transforming the polygons of old taped-
out data into new process technology– Net list doesn’t change
– Circuit changes are limited to resizing only
– Suitable for custom design
• Soft IP reuse is using the same RTL of old design with a
new target library design in any technology– Architecture, Verilog and RTL are not changing
– Net list is changing
– Layout is done from scratch
• Future is questionable as FPGA usage is spreading
Jan 2010 Hard IP Reuse 5
Advantages of Hard IP Reuse
• Fabless companies
– Lower design cost
– Better Mfg. Shopping, few sources
– Competition, TTM
• Intel’s drive
– System Integration, SoC
– Manufacturing cost, volumes
– Performance enhancement
– Competition, TTM
Jan 2010 Hard IP Reuse 6
Intel’s Tock / Tick Strategy
Jan 2010 Hard IP Reuse 7
Intel Architecture Roadmap
Jan 2010 Hard IP Reuse 8
Tock Vs Tick Design
• Tock– New Architecture, design implementation and layout
– Matured process technology
– New CAD tools and DA flows
– Large design team and Long duration
• Tick– Stable design, only few new features
– New process technology
– Maximizing soft and hard IP reuse
– Smaller design team and shorter period
Jan 2010 Hard IP Reuse 9
Design Reuse at Intel
• Long history of successful projects at Intel
• 2-year cadence
– Banias 130nm => Dothan 90nm (Centrino) – 2001
– Dothan 90nm => Yonah 65nm (Centrino) – 2003
– Prescott 90nm => CedarMill 65nm (Pentium IV) – 2003
– Merom 65nm => Penryn 45nm (Core 2 Dou) – 2005
– Nehalem 45nm => Westmere 32nm – 2007
– SandyBridge 32nm => IvyBridge 22nm – 2009
Jan 2010 Hard IP Reuse 10
• Emphasize on layout migration– Yielded nominal device speedup minus 3% to 6%
– Was okay for 35% speedup across process generations
– 65nm to 45nm was very tough
• In-house development of migration flows
• Core compaction technology purchased from vendor– Migrate and reuse polygons by “classical” compaction
– Straight forward but not fully exploiting new process, one shot
• Saved lot of mask designer resources
• Shorter TTM– 3Q to 5Q design duration compared to 8Q to 9Q in re-design
Design Reuse at Intel – Past
Jan 2010 Hard IP Reuse 11
Design Reuse – Present and Future
• Migrate entire design rather than layout– Optimize design factors as timing and power
• Circuit optimization– Cell resizing
– Interconnect optimization (driver – interconnect – receiver)
• Layout optimization– Migrate cell library, trading off scaling and cell performance
– Cell-based: Nehalem to Westmere, SandyBridge to IvyBridge
– Xscale (Intel) to 90nm TSMC (Marvell)
• Much less process dependent than polygon migration– Flexible for further changes and migrations
Jan 2010 Hard IP Reuse 12
Design Optimizations in Migration
• Power and Timing tradeoffs
– Resizing and re-spacing optimization techniques
– Interconnect specs Vs. post optimization
– How much improvement is expected?
• 5% speedup
• 5% dynamic power reduction
• Reliability and DFM
• Noise immunity
Jan 2010 Hard IP Reuse 13
Typical Layout Migration Flow
65nm / 45nm LO
45nm LO
PWR grid resizing
45nm DR netlist + sizing
interconnect width and space
LO quality guidelines
OPCguidelines
compaction engine(polygon or cell-based)
Intel’s proprietary SW
Jan 2010 Hard IP Reuse 14
Challenges in Hard IP Reuse
• Significant reduction of DE and MD effort– Combination of CAD tools, design flows and managerial
decisions– Fast TTM
• More area efficiency• State-of-the-art manufacturing process technologies
– Discrete design rules– Very complex DFM rules– Migration-based process design
• Combine process simulation with layout migration tools– Yield and reliability enhancement
• Analog design re-use
Jan 2010 Hard IP Reuse 15
Part II
Layout Migration in Work
Jan 2010 Hard IP Reuse 16
90nm 65nm
Transistors and Metal Comparison
Jan 2010 Hard IP Reuse 17
90nm 65nm
Transistors and Metal Comparison
Jan 2010 Hard IP Reuse 18
130
nm
90n
m
Cell C
omparison
Jan 2010 Hard IP Reuse 19
130nm draw
n212u X
103u
90nm draw
n112u X
54u
Block C
omparison
Jan 2010 Hard IP Reuse 20
Timing Delivery of Migration
• 65nm Vs. 90nm nominal speedup 26%
• Cycle time speedup set to 22% (Simulation comparison)
– Yonah cycle 360p Vs. Dothan cycle 460p
• LO Migration timing speedup measure:
% of delay speedup that 80% of paths meet
• Migration yielded 19% speedup in above measure
– This is less than desired
– Similar degradation observed in Dothan
Jan 2010 Hard IP Reuse 21
Maintaining PWR Grid
• 1st migration scaling
– M2 0.32u to 0.32u
– M3 0.88u to 0.77u (max width violation)
– M4 0.84 to 0.42u (with fixing)
• Tight monitoring of scaling success
• 2nd migration
– slotted M3 to 0.33u + 0.11u space + 0.33u
– Introduced large vias
Jan 2010 Hard IP Reuse 22
gate degrades in manufacturing
original drawing fix by new
design rules
Late Changes in DR’s
Jan 2010 Hard IP Reuse 23
n-diff
poly
contact
metal1
via
metal2
large contact small via
Process change:small contact
large via
Late Changes in DR’s (Cont’d)
Jan 2010 Hard IP Reuse 24
Late Changes in DR’s (Cont’d)
65nm 65nm after changing
Jan 2010 Hard IP Reuse 25
Benefits of LO migration
• Low design effort, short schedule (see Dothan Vs.
Tualatin)
• Stable design, no escapees
• Fast timing convergence
• Design can start early, best utilization of HR
• Flexibility for later changes in process
– Raw migration of 90nm to 65nm
– MD’s make first cleanup
– Later 65nm to 65nm migration
Jan 2010 Hard IP Reuse 26
Part III
Layout Migration Algorithms and Techniques
Jan 2010 Hard IP Reuse 27
Hierarchy in Layout
Jan 2010 Hard IP Reuse 28
Non Uniform Hierarchical Migration
Jan 2010 Hard IP Reuse 29
before: 45 nm
after: 32 nm / 0.7
Jan 2010 Hard IP Reuse 30
before: 45 nm
Jan 2010 Hard IP Reuse 31
after: 32 nm / 0.7
Jan 2010 Hard IP Reuse 32
Relations Between Layout Objects
• Relations are captured in graph– Sometimes called constraints graph
– Graph describes technology rules and other design constrains• Proximity relations due to noise and delay considerations
• Alignment of layout pieces called pitch matching
• Adjacency relations never change– No swapping of objects
– 1D Compaction can still change orthogonal adjacency relations
• Design rules are captured in visibility graph– Planar by definition
– Design rules Transitivity enables transitive reduction of visibility graph
Jan 2010 Hard IP Reuse 33
• Visibility graph of cell layout– Nodes are cells` center lines
– Arc represents cells visible to each other
– Arc weights represent target cell size and spacing between cells
• Visibility graph of polygonal layout– Nodes are polygon edges
– Arc represent polygon interior (material) and spacing between
polygons visible to each other
– Arc weights represent width and space of polygons
• Visibility graph of symbolic layout– Nodes are sticks skeletons (e.g. wires, vias) or centerline of
encapsulated polygons (e.g. transistors)
Jan 2010 Hard IP Reuse 34
Visibility Graph in Cell Layout
Jan 2010 Hard IP Reuse 35
Visibility Graph in Polygonal Layout
Jan 2010 Hard IP Reuse 36
Generation of Reduced Visibility Graph
• Alternative A:– Find visibility graph
• Left-to-right or bottom-to-top sweep-line algorithm
– Remove transitive edges from graph
• Equivalent to matrix multiplication
• Alternative B:– Take advantage of problem being interval graph
– Intervals approached by sweep-line are stored in ordered tree
– Transitive edges are removed during scanning whenever
adjacency relation of two leaves is broken by insertion of new
node
Jan 2010 Hard IP Reuse 37
Jan 2010 Hard IP Reuse 38
x0 x5x1 x2 x3 x4
Cycles in Edge-based Compaction
1 0 0 1
Allow cell abutment:
2 2min minx x S x x S
5 0 0 5 5 0
Cell size is constrained:
, x x A x x A x x A
2 1 1 2
Maintain minimum wire width:
min minx x W x x W
3 2 2 3
Maintain minimum wire spacing:
min minx x S x x S
Jan 2010 Hard IP Reuse 39
x0 x1 x2 x3 x4 x5Smin /2 Smin /2Smin Wmin
A
-A
Wmin
Feasible solution exists if there’s no positive cycle in constrained graph.
Inequalities are translated into constraint graph.
Edge locations can be obtained by finding longest paths.
Inequalities impose a linear programming problem
Jan 2010 Hard IP Reuse 40
Bellman-Ford Algorithm
// init
Bool ( , , , ) {
for ( each ) { ; ; }
0 ;
for ( 1 ; | | 1 ; ) {
for ( each
ailization
// set source longest distance to zero
,
G V E w s
v V dist v parent v
dist s
i i V i
e x
BellmanFord
) {
if ( < , ) {
= , ; ;
}
}
}
for ( each , ) {
// examine all arcs
// relaxation chec
k
// positive cycle check
y E
dist y dist x w e x y
dist y dist x w e x y parent y x
e u v E
if ( < , ) { return FALSE }
}
return TRUE ;
}
dist v dist u w e u v
Jan 2010 Hard IP Reuse 41
Correctness of Bellman-Ford Algorithm
Let , be weighted directed graph with source , containing
no positive cycle reachable from . Then, at termination of there
exists = _ _ , for every vertex
G V E s
s
dist v longest path dist s v v
:
BellmanFord
Lemma 1
raechable from .s
Let , be weighted directed graph with source . Then for every
, there is a path from to iff at termination of .
G V E s
v V s v dist v
:
BellmanFord
Lemma 2
If contains no positive cycles reachable from then
returns TRUE, = _ _ , for every , and the arcs
, are the longest-path tree rooted at . If
G s
dist v longest path dist s v v V
e v parent v s G
: BellmanFordTherorem
does contain a positive-
weight cycle reachable from , then the algorithm returns FALSE. s
Jan 2010 Hard IP Reuse 42
0 1
0 11
We'll prove only the second part.
Assume that a positive cycle , , , raechable from exists, where
. Then , 0. Assume in contrary that algorithm returns
TRUE. There exis
k
k
k i ii
c v v v s
v v w e v v
:
Proof
1 1
1 11 1 1
11 1
ts , 1,2, , .
Summing the inequalities along the cycle yields
, .
The But , hence
i i i i
k k k
i i i ii i i
k k
i i ii i
dist v dist v w e v v i k
c
dist v dist v w e v v
dist v dist v w e v
11, 0, contrudicting
the positive cycle hypothesis.
k
iiv
Algorithm runs in | || | time.O V E
Jan 2010 Hard IP Reuse 43
x
6
7
5
-8
-4-22
9
-3z
u v
y
-16
5
a
b
c
d
e
f
g
h
i
j
-∞ -∞
-∞-∞
0
Initialization
Graph edges are labeled a, b, …, I, j according to their order in data structure.
Find longest path from z to all vertices. Report whether positive cycle exists.
Jan 2010 Hard IP Reuse 44
6 11
7
0
x
6
7
5
-8
-4-22
9
-3z
u v
y
-16
5
a
b
c
d
e
f
g
h
i
j
-∞ -∞
-∞-∞
i = 1
Jan 2010 Hard IP Reuse 45
6 11
7
0
x
6
7
5
-8
-4-22
9
-3z
u v
y
-16
5
a
b
c
d
e
f
g
h
i
j
-∞11 20
17
i = 2
Positive cycles do not exist.
Longest path spanning tree is obtained from parent nodes.
Jan 2010 Hard IP Reuse 46
Difference Constraints and Longest Paths
1
2
3
4
5
1 1 0 0 0 0
1 0 0 0 1 1
0 1 0 0 1 1
1 0 1 0 0 5
1 0 0 1 0 4
0 0 1 1 0 1
0 0 1 0 1 6
0 0 0 1 1 5
x
x
x
x
x
0
0
0
0
0
V0
0
V1
V2
V3
V4
V5
0
1-1
5
4
-1
-6-5
0
1
1
65
0
A system of difference constraints can be respresented by constraint graph.
A source node is added and single-source longest-paths problem is then solved by
Bellman-Ford algorithm. Nodes are l
Ax b
v
0abled by longest path weight , , if exists.iv v
Jan 2010 Hard IP Reuse 47
0
: Given a system of difference equation, let ,
be its corresponding constrained graph. , , 1,2, ,| |
is a feasible solution iff , has no positive-weight cycles.
i
Ax b G V E
x v v i V
G V E
Theorem
0
0 0 0
: If , has no positive cycle then longest path , to every
node exists. Hence, , , , . Setting ,
yields the feasible solution , .
i
j i i j i i
j i i j
G V E v v
v v v v w e v v x v v
x x w e v v
Proof
1
1 1
1 1
1
1
1 1
Conversely, let , , be a positive-weight cycle, corresponding
to difference equations , , 1,2, , 1. Summation
of both sides yields 0= ,
k
j j j j
j j j j
i i k
i i i i
k
i i i ij j
c v v i i
x x w e v v j k
x x w e v v
10, a contrudiction.
Hence, feasible solution doesn't exist.
k
Jan 2010 Hard IP Reuse 48
Bellman-Ford algorithm can therefore solve difference equation system of n
variables and m equations in O(n(n+m)) = O(n2 + mn) time complexity.
Bellman-Ford can be modified for difference equations specific case to yield
O(mn) time complexity.
Jan 2010 Hard IP Reuse 49
Overview of Cell-Based Hierarchical Interconnect Migration
Five-step graph contraction procedure
1. Flatten layout visibility graph
2. Define cell call order tree T
3. Merge cell instances within parents in bottom-up T order
4. Stop if positive cycle exists, continue otherwise
5. Eliminate merged cells within parents in bottom-up T order
Positioning of interconnects within their templates
– Top-down linear programming solution by T order
– Feasibility is guaranteed by graph contraction invariance
Jan 2010 Hard IP Reuse 50
Assumptions
• All cells sizes are known and must be adhered– Defined by bottom-up cell-based placement stage.
– Outcome of descendant cells sizes and interconnect scaling, specs and estimates.
• Position of son cells within parents must be adhered– Same reasons as above
• Infeasibility incurring at routing migration is resolved by:– Resizing of migrated cells
– Repositioning of son cells
– Relaxing interconnect rules and constraints• Left for later manual fixes by circuit and mask designers
• Interconnect migration re-invoked– Cell migration / placement – interconnect migration iterations
Jan 2010 Hard IP Reuse 51
12
3
b
1 3
42
b 1 2 3 b b 1 2 3 ba a
1 2 3 4
Flatten Layout Visibility Graph
wa
wb wb
a
wa
12
3
b
wb
Jan 2010 Hard IP Reuse 52
e
a
a
b
d
c
a
b
1
2
3
4
4
5
5
5
Cell Call-Order Tree
Jan 2010 Hard IP Reuse 53
V`w1 w2
V``w3 w4
V
W1-offset`W2+offset`
W4+offset`W3-offset`
offset` offset``
(0,0)
Merge Cell Instances• Main idea is to solve per-template problem rather than across all instances
• A merged template located at origin represents all its instances
• All similar polygons collapse into one polygon
• Template-internal arcs remain unchanged
• Length of arcs connecting internal nodes to external ones are transformed by their instance offset
Jan 2010 Hard IP Reuse 54
b 1 2 3 b b 1 2 3 ba a
1 2 3 4
b 1 2 3 b
w - offset` + offset`` = 0
w
w - offset`
(0,0)
offset``
offset`
Merge Cell Instances
Jan 2010 Hard IP Reuse 55
Visibility Graph Equivalence
b 1 2 3 b b 1 2 3 ba a
1 2 3 4
wa
wb wb
Similar polygons are connected by similarity arcs
• lengths equal to offsets difference
Guarantees that all instances will adhere a single template
• cell-based intent preserved
• hierarchy intents of design are preserved
Longest paths and their lengths are invariant under merge transformation
Jan 2010 Hard IP Reuse 56
Reduction of Merged Graph
3
3
4
5
6 2
6
2
4
3
4
8
7
8
7
823
Blue, red and green are different merged templates of merged graph
Blue ≤ red ≤ green in cell call-order tree
Reduction of merged graph is done as follows:
• Merged templates are worked bottom up in cell call-order tree
• Nodes are eliminated by replacing all in and out arc pair by bypassing ones
• Parallel arcs are replaced by the longest one among them
Let’s eliminate the blue template
Jan 2010 Hard IP Reuse 57
6
2
4
3
4
8
7
8
710
9
11
12
10
10
11
8
12
23
14
15
6
5
The elimination (reduction) of any number of merged template doesn’t change
the length of longest path between any two remaining vertices
Reduced graphs are created successively, per each template elimination.
Eventually only top-level cell will remain, containing only top-level polygons
with modified length of arcs.
Jan 2010 Hard IP Reuse 58
Deriving Polygons Exact positions
• No positive cycles in M(V,E) Feasible solution exists
– preserves similarity of same template instances (cell-based)
– Preserves cell size and location within parent
– Preserves technology width/space design rules
• Series of reduced graphs is solved top-down
– Inequalities involves only one-template polygons
– Higher level polygons (cell-call order tree nodes) are
progressively known
– Solution can be obtained by any of flattened layout solvers