Post on 22-Apr-2015
description
VLSI Physical Design AutomationIntroduction , partitioning
Sushil kunduRoll No:20084056Registration No: 1954Dept. of Applied Electronics & Instrumentation Engg.,University Institute of Technology burdwan
• VLSI CAD (also known as EDA – electronic design automation) students, in particular for chip implementation (physical design)
• Circuit designers to understand how tools work behind the scene
• Process engineers to tune process that is more circuit/physical design friendly
• Mathematical/Computer Science majors who want to find tough problems to solve– Lots of VLSI physical design problems can be
formulated into combinatorial optimization or mathematical programming problems.
– Actually, most CAD problems are NP-complete -> heuristics
Intended Audience
Objective of this LectureTo review the materials used in fabrication of VLSI
devices.To review the structure of devices and process
involved in fabricating different types of VLSI circuits.
To review the basic algorithm concepts. Understand the process of VLSI layout design. Study the basic algorithms used in layout design
of VLSI circuits. Learn about the physical design automation
techniques used in the best-known academic and commercial layout systems.
Physical Design• Converts a circuit description into a geometricdescription.– This description is used for fabrication of the chip.• Basic steps in the physical design cycle:1. Partitioning2. Floorplanning 3. placement4. Routing5. Compaction
6
System Level Partitioning
Board Level Partitioning
Chip Level Partitioning
System
PCBs
Chips
Subcircuits/ Blocks
So, what is Partitioning?
7
Why partition ?• Ask Lord Curzon
– The most effective way to solve problems of high complexity : Parallel CAD Development
• System-level partitioning for multi-chip designs– Inter-chip interconnection delay dominates
system performance• IO Pin Limitation • In deep-submicron designs, partitioning
defines local and global interconnect, and has significant impact on circuit performance
Importance of Circuit PartitioningDivide-and-conquer methodologyThe most effective way to solve problems of high complexity
E.g.: min-cut based placement, partitioning-based test generation,…
System-level partitioning for multi-chip designs inter-chip interconnection delay dominates system performance.
Circuit emulation/parallel simulation partition large circuit into multiple FPGAs (e.g.
Quickturn), or multiple special-purpose processors (e.g. Zycad).
Parallel CAD developmentTask decomposition and load balancing
In deep-submicron designs, partitioning defines local and global interconnect, and has significant impact on circuit performance
…… ……
9
Objectives
• Since each partition can correspond to a chip, interesting objectives are:– Minimum number of partitions
• Subject to maximum size (area) of each partition
– Minimum number of interconnections between partitions• Since they correspond to off-chip wiring with
more delay and less reliability• Less pin count on ICs (larger IO pins, much
higher packaging cost)– Balanced partitioning given bound for area of
each partition
Partitioning:Partitioning is the task of dividing a circuit into smaller parts . The objective is to partition the circuit into parts, so that the size of each component is within prescribed ranges and the number of connections between the components is minimized . Different ways to partition correspond to different circuit implementations . Therefore, a good partitioning can significantly improve circuit performance and reduce layout costs .• Decomposition of a complex system into smaller
subsystems– Done hierarchically– Partitioning done until each subsystem has
manageable size– Each subsystem can be designed independently
• Interconnections between partitions minimized– Less hassle interfacing the subsystems– Communication between subsystems usually
costly
Partitioning of a Circuit
Input size: 48
Cut 1=4Size 1=15
Cut 2=4Size 2=16 Size 3=17
Hierarcahical Partitioning
• Levels of partitioning:– System-level partitioning:
Each sub-system can be designed as a single PCB
– Board-level partitioning:Circuit assigned to a PCB is partitioned into sub-circuitseach fabricated as a VLSI chip
– Chip-level partitioning:Circuit assigned to the chip is divided into manageable sub-circuitsNOTE: physically not necessary
13
Delay at Different Levels of Partitions
AB
C
PCB1
D
x
10x
20x
PCB2
14
Partitioning: Formal Definition• Input:
– Graph or hypergraph– Usually with vertex weights– Usually weighted edges
• Constraints– Number of partitions (K-way partitioning)– Maximum capacity of each partition
ORmaximum allowable difference between partitions
• Objective– Assign nodes to partitions subject to constraints
s.t. the cutsize is minimized• Tractability-Is NP-complete
Circuit Representation
• Netlist:– Gates: A, B, C, D– Nets: {A,B,C}, {B,D}, {C,D}
• Hypergraph:– Vertices: A, B, C, D– Hyperedges: {A,B,C}, {B,D}, {C,D}
– Vertex label: Gate size/area– Hyperedge label:
Importance of net (weight)
AB
C D
A
B
C D
16
Circuit Partitioning: Formulation
Bi-partitioning formulation: Minimize interconnections between partitions
•Minimum cut: min c(x, x’)
•minimum bisection: min c(x, x’) with |x|= |x’|
•minimum ratio-cut: min c(x, x’) / |x||x’|
X X’
c(X,X’)
17
A Bi-Partitioning Example
Min-cut size=13Min-Bisection size = 300Min-ratio-cut size= 19
a
b
c e
d f
mini-ratio-cut min-bisection
min-cut 9
10
100
100 100100100
100
4
Ratio-cut helps to identify natural clusters
18
Iterative Partitioning Algorithms
• Greedy iterative improvement method (Deterministic)– [Kernighan-Lin 1970]
• Simulated Annealing (Non-Deterministic)
19
Restricted Partition Problem
• Restrictions:– For Bisectioning of circuit– Assume all gates are of the same size– Works only for 2-terminal nets
• If all nets are 2-terminal, hypergraph graph
a
b
c dHypergraph Representation
Graph Representation
ab
c d
20
Problem Formulation
• Input: A graph with – Set vertices V (|V| = 2n)– Set of edges E (|E| = m) – Cost cAB for each edge {A, B} in E
• Output: 2 partitions X & Y such that– Total cost of edge cuts is minimized– Each partition has n vertices
• This problem is NP-Complete!!!!!
21
A Trivial Approach
• Try all possible bisections and find the best one• If there are 2n vertices,
# of possibilities = (2n)! / n!2 = nO(n)
• For 4 vertices (a,b,c,d), 3 possibilities1. X={a,b} & Y={c,d}2. X={a,c} & Y={b,d}3. X={a,d} & Y={b,c}
• For 100 vertices, 5x1028 possibilities• Need 1.59x1013 years if one can try 100M
possbilities per second
Definitions
• Definition 1: Consider any node a in block X. The contribution of node a to the cutset is called the external cost of a and is denoted as Ea, where Ea =Σcav (for all v in Y)
• Definition 2: The internal cost Ia of node a in X is defined as follows: Ia =Σcav (for all v in X)
Example
• External cost (connection) Ea = 2• Internal cost Ia = 1
a
b
c
d
X Y
Idea of KL Algorithm
• Da = Decrease in cut value if moving a = Ea-Ia– Moving node a from block X to block Y would
decrease the value of the cutset by Ea and increase it by Ia
a
bc
d
X Y
a
b
c
d
X Y
Da = 2-1 = 1Db = 1-1 = 0
Idea of KL Algorithm
• Note that we want to balance two partitions• If switch A & B, gain(A,B) = DA+DB-2cAB
– cAB : edge cost for AB
A
B
C
D
X Y
A
B
CD
X Y
gain(A,B) = 1+0-2 = -1
• Start with any initial legal partitions X and Y• A pass (exchanging each vertex exactly once) is
described below:1. For i := 1 to n do From the unlocked (unexchanged) vertices, choose a pair (A,B) s.t. gain(A,B) is largest Exchange A and B. Lock A and B. Let gi = gain(A,B)2. Find the k s.t. G=g1+...+gk is maximized3. Switch the first k pairs
• Repeat the pass until there is no improvement (G=0)
Idea of KL Algorithm
Example
1
X
2
3
4
5
6
Y
Original Cut Value = 9
4
X
2
3
1
5
6
Y
Optimal Cut Value = 5
A good step-by-step example in SY book
Time Complexity of KL
• For each pass,– O(n2) time to find the best pair to exchange.– n pairs exchanged.– Total time is O(n3) per pass.
• Better implementation can get O(n2log n) time per pass.
• Number of passes is usually small.
Recap of Kernighan-Lin’s AlgorithmPair-wise exchange of nodes to reduce
cut sizeAllow cut size to increase temporarily within a pass Compute the gain of a swapRepeat Perform a feasible swap of max
gainMark swapped nodes “locked”;Update swap gains;
Until no feasible swap; Find max prefix partial sum in gain sequence g1, g2, …, gm
Make corresponding swaps permanent.
Start another pass if current pass reduces the cut size (usually converge after a few passes)
u v a
v u
locked
Other Partitioning Methods
• KL and FM have each held up very well
• Min-cut / max-flow algorithms–Ford-Fulkerson – for unconstrained
partitions• Ratio cut• Genetic algorithm• Simulated annealing
References and Copyright
Textbooks referred (none required)[Mic94] G. De Micheli
“Synthesis and Optimization of Digital Circuits”McGraw-Hill, 1994.
[CLR90] T. H. Cormen, C. E. Leiserson, R. L. Rivest“Introduction to Algorithms”MIT Press, 1990.
[Sar96] M. Sarrafzadeh, C. K. Wong“An Introduction to VLSI Physical Design”McGraw-Hill, 1996.
[She99] N. Sherwani“Algorithms For VLSI Physical Design Automation”Kluwer Academic Publishers, 3rd edition, 1999.
THANK YOU