Physical Synthesis 2 - IWLSiwls.org/iwls2015/physical-synthesis-2.0.pdfA. B. Kahng, Physical...
Transcript of Physical Synthesis 2 - IWLSiwls.org/iwls2015/physical-synthesis-2.0.pdfA. B. Kahng, Physical...
1A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Physical Synthesis 2.0
Andrew B. KahngUCSD CSE and ECE Departments
[email protected]://vlsicad.ucsd.edu
ECE 260B – CSE 241A Intro and ASIC Flow 2 Andrew B. Kahng, UCSD
Concept: “Design Principles”
Partition the problem divide and conquer, hierarchy Different abstraction levels: RT-level, gate-level, switch-level,
transistor-level
Orthogonalize concerns Function vs. implementation
Logic vs. timing vs. embedding
Solve chicken-egg conundrums
Constrain the design space to simplify the design process Balance between design complexity and performance E.g., standard-cell methodology “freedom from choice”
[UCSD ECE 260BCSE 241A]
ECE 260B – CSE 241A Intro and ASIC Flow 3 Andrew B. Kahng, UCSD
Concept: How the IC Design Flow is EvolvingFlow expands in two directions
System-Level Design Design for Manufacturability (DFM)More design care-abouts
Area, Timing, Power, Signal Integrity, Reliability, Cost
Key challenges: loops, chicken-egg “Design closure” through tight
integrations RTL, GDSII “signoffs” = business
structure of semiconductor creation“One-pass flow”: required for
Productivity, requires Predictability By Guardbands? By “Unifications”? By Statistics? By Methodology (to avoid issues)?
High Level Synthesis
GDSII
Logic Synthesis
FP, Place, CTS, Opt
Routing
Extraction, Timing, Physical
Verification
Manufacturing
Architecture Design
Verification
RTL
Gate Netlist
Updated Gate Netlist
[UCSD ECE 260BCSE 241A]
4A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges / Stressors
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• New Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
5A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Logic Design Needs Spatial Information• High aspect ratio floorplan: shift one macro block from left to
right, and vary its shape (with constant area) • 10% power range (post-route): center location, taller blockage
= more power, more contribution of wire (delays)• Separation of logical, temporal, spatial must crumble
190
195
200
205
210
215
220
225
230
Pow
er (m
W)
0% 25% 50% 75% 100%
Shift the location of blockage
260µm x 65µm184µm x 92µm
Macro size
6A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
How Do We Predict Spatial Information ?
• Predict by modeling• Machine learning, regression, etc.• (Don’t dismiss this!)[SLIP15] http://vlsicad.ucsd.edu/Publications/Conferences/325/c325.pdf[DAC00] http://vlsicad.ucsd.edu/Publications/Conferences/112/c112.pdf[DATE13] http://vlsicad.ucsd.edu/Publications/Conferences/296/c296.pdf[SLIP13] http://vlsicad.ucsd.edu/Publications/Conferences/300/c300.pdf
• Predict by assuming and enforcing• Make a prediction, then make the prediction come true• (Constant-delay methodology)
• Predict by doing• Constructive prediction • (Run under the hood – quick and dirty, else no leverage)
7A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• New Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
8A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Synthesis vs. Physical Synthesis• Synthesis (DC, RC)
• Elaboration, mapping to generic gates• Clock gating• Apply timing constraints, remap / optimize• Multibit FF optimization• MBIST insertion• Scan chain stitching• Further optimization, area recovery
• Physical Synthesis (DCT/DCG, RCP)• LEF list• Tech file, map file• tluplus_{max,min}• floorplan DEF• {min,max}_routing_layer
9A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Physical Synthesis
• In• RTL + SDC + Library models + Floorplan DEF
• Out• Better netlist (usually), at one (worst) corner• Better netlist (usually) + placed DEF (not legalized)• N.B.: very fast TAT required by customers
• Netlist (+ placed DEF) is passed to P&R + signoff• Place, placeOpt, CTS, CTSOpt, route, routeOpt, leakage
recovery, timing closure • Different companies and tools in a long tool chain
10A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Example
FloorplanSpecified by designers
e.g., DCT(Physical
Synthesis)
Floorplan in DEF or physical guidance
P&R flow
Routed Results
Libraries, LEF, tech files
RC tech file (tluplus,captable)Floorplan information
Physical Synthesis
physical information
Netlist + initial placement
11A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Note: “P&R + Signoff” is Complicated!• N. MacDonald, Broadcom Corp., “Timing Closure in Deep
Submicron Designs”, 2010 DAC Knowledge Center articleTOP-LEVEL NETLIST / SPEF
BLOCK-LEVEL NETLIST / SPEF
Timing ClosedStatic Timing Analysis for all Modes / Corners
About 5 iterations
Violation Classes Addressed for Each Iteration (in order of priority)(1) Electrical Rule Violations(2) Noise Violations(3) Setup Violations(4) Hold Violations
Breakdown of Timing Violations on per Block Basis
Manual Repair of Timing Failures
(1) Vt Swap, Resizing, Buffer Insertion, NDR Changes, Useful Skew
Operations Permitted at Each Iteration(in order of preference)
(2) Vt Swap, Resizing, Buffer Insertion, NDR Changes
(3) Vt Swap, Resizing, Buffer Insertion(4) Vt Swap, Resizing(5) Vt Swap
12A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Since That Article Was Written:
20nm90nm 45/40nm 28nm 16/14nm 10nm ≤7nm65nm
BTI
Temp inversion
Noise
MCMM
Maxtrans
EM
AOCV / POCV
PBA Fixed‐margin spec
Multi‐patterning
Cell‐POCV
MOL, BEOL R Dynamic IR
Fill effects
Layout rules
BEOL, MOL variations
Signoff criteria with AVS
SOC complexity
LVF
MIS
Phys‐aware timing ECO
Min implant
[DAC15]
13A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
How Can Physical Synthesis Possibly Work?• “If it sounds too good to be true, it usually is …”
• What do we do with constraints at (physical) synthesis stage?• Overconstrain the clock period in synthesis (was by 20%, now by
~10%)• Utilization: 60% target in synthesis (sometimes 50%, 55%)
85+% post-placement• Which detailed placer, CTS tool, router, optimizer?• Complex tool “sensitivities” (noisy, chaotic behavior)• Information that is ignored (advanced manufacturing)• Information that is never available (CTS, SI)
• What explains “success”? Guardbands, low expectations…? • Designers’ preoccupation with area and schedule helps…
14A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Challenges
• FinFET, BEOL scaling effects• Drive• Resistivity• Gate-wire balance
• Clock effects• Skew across corners• Top-level clock distribution (CGCs, muxes, dividers, …)• Useful skews = area vs. delay tradeoffs
• “Extreme localization” effects• Advanced (multi-)patterning• Pin access, congestion, coupling• Breakdown of placement-optimization separation
15A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Questions
• If Logic Synthesis can’t know outcomes at end of Physical Design, can it be doing the right thing? (Simple information arguments) (What margin is left on the table? Are we seeing placebo effects (association vs. causation etc.)?)
• Can Logic Synthesis be made better aware of future Physical Design outcomes?
• Is Logic Synthesis at risk of being eclipsed by Physical Design? (Venus-Mars Sun-Moon, etc.)
LSLS
16A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
17A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
FinFET: Current Density + Discreteness • Better electrostatic control + continued gate length scaling
• Drive current cell height (e.g., 8.25T), better area density (w/ fin height )• Effective width 1.6x equivalent area with planar devices
• Current density , plus fin discreteness challengesMulti-Fin 3D FinFET
http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtb‐finfet‐jan2013.aspx
http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtb‐finfet‐process‐soc‐2015q1.aspx
NWell
Fin
M1
Poly
MOL1
VIA0 (MOLxM1)
Active
4Ppoly
3Pfin
1Pfin
3Pfin1Pfin
2Pfin
M2
Metal VIA1 (M1 M2)
MOL2
18A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
FinFET: Aggressive Voltage Scaling• FinFET enables voltage scaling for reduced dynamic
power• Better electrostatic control better performance at low supply
voltage• High-performance mode: wire-dominated• Low-performance mode: gate-dominated
C. H. Lin, VLSI‐TSA, 2012, p. 1‐2.
19A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Gate-Wire Balancing• Unbalanced gate-wire delay causes severe delay variation
on data and clock paths across modes• Delay variation in clock paths == skew variation Increased difficulty for timing closure (“ping-pong effect”)
• Minimization of skew variation is important for timing closure(Our work at DAC15 uses global-local optimization achieves 22% skew variation reduction)
datapath
launch path capture path
CornerClock latency
SkewLaunch Capture
SS, 0.7V, ‐25°C 1.0 1.1 ‐0.1
FF, 1.1V, ‐25°C 0.9 0.7 +0.2
Low voltage: gate delay dominatesHigh voltage: wire delay dominates Skew reversal Power/area overheads
1.0 1.1
Skew = -0.1/+0.2
/0.7/0.7
[DAC15]
20A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
FinFET: Less Body Effect, Richer Libraries?• FinFET 4-input NAND ~ planar bulk 3-input NAND• More complex cells / higher fan-in cells could be
made available to synthesis
‘Bulk FinFETs: Fundamentals, Modeling, and Application’, Jong‐Ho Lee, SNU
Number of fan‐in limited by body effect
w/ body effect
21A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Pin Accessibility Below 20nm • Routing challenged by complex rules for multi-patterning
• Limited pin access with small track cells• Wider power rail
for reliable connection fewer pin access points
• Complex design rules+ less pin access Difficulty in routing
< MinOverlap
< MinSpacing metal pitch < via pitch
Inserted via Blocked by the via
9T NAND2
M1
M2
FinPoly
V1
Wider power rail
Access point
Pin accessibility problem conflict between area reduction and routability
[DAC15]
22A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• New Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
[ISQED02] http://vlsicad.ucsd.edu/Publications/Conferences/131/c131.pdf[iSQED10] http://vlsicad.ucsd.edu/Publications/Conferences/267/c267.pdf
23A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Slack vs. Layout Context• Layout knobs: SRAM pitches and buffer keepout distances• Post-P&R slacks of five embedded memories is “chaotic”• Physical synthesis challenge: Logic optimization given “chaos”
Testcase: Logic from OpenCores GPU THEIA + SRAMs
‐1.3
‐1.2
‐1.1
‐1
‐0.9
‐0.8
‐0.7
0 10 20 30WNS of paths th
rough SR
AMs (ns)
SRAM pitch (um)
slack‐1
slack‐2
slack‐3
slack‐4
slack‐5
Delta slack > 300ps
Buffer keepouts
Blockage
Blockage Blockage
sram_pitch
Placement region for standard cells
12345
24A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Slack vs. Clock Period• ∆path slack is 81ps at signoff clock period of 1.0ns• Changing clock period to 0.82ns changes ∆path
slack to 143ps!
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
Max Delta Path Slack (SI –
non‐SI) (ns)
Clock period (ns)
81ps at signoff clock period
143ps at tighter clock period
[SLIP15]
25A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Non-SI vs. SI • Top-1000 critical paths from Viterbi design (clock period = 1.0ns) • Slack diverges by 81ps !!! ~4 stages of logic at 28nm FDSOI• Unfortunately, we don’t know coupling before routing !!!
81ps
Path slack in SI Mode (ns)
Path Slack in
Non
‐SI M
ode (ns) Ideal correlation
[SLIP15]
26A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
20.821
21.221.421.621.822
22.222.422.622.823
0 0.2 0.4 0.6 0.8 1 1.2
3DIC Pow
er (m
W)
WLM Cap (pF)
WLM, RC (Interconnect proxy) Effects
• Example: SOCE-based “Shrunk2D” (S2D) flow [1]• Perform synthesis with different WLM caps, P&R with S2D flow• Shown: total power (#buffers, #instances, instance area, WL, …
similar)
1.35mW(6.43%)
[1] Panth et al., “Design and CAD Methodologies for Low Power Gate‐Level Monolithic 3D ICs”, Proc. ISLPED, 2014, pp. 171‐176.
[DAC15]
27A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
28A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Sensitivity of CTS Outcomes to Layout Contexts
• Delay varies by up to 43% with clock entry point locations• Delay varies by up to 45% with core aspect ratio• NDRs, fill, buffer sizes, max fanout / max trans rules, … 100ps impacts on insertion delays, skew, slacks
0
100
200
300
400
500
600
700
8000.1
0.12
5
0.25
0
0.33 0.4
0.5
1.0
2.0
2.5
3.0
4.0
8.00
10.00
Fall de
lay (ps)
Core aspect ratio
BL BLM B RBM R
BL BLM BRBM
R
[SLIP13]
29A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Useful Skew Improves Timing• Useful skew optimization adjusts clock sink latencies to
improve timing• Our predictive useful skew flow resolves the “chicken-and-egg
loop” further improved timing
-893
-197-60
-1000
-800
-600
-400
-200
0Zero skew Typical
useful skewPredictive
useful skew
Tota
l neg
ativ
e sl
ack Useful skew
improves timing
6 testcases {3 RTLs x 2 clock periods}
Delay/Slack Clock latency
Clock
7/3
10/0
7/3FF1 FF2 FF3
5 5 5
Zero skew
Clock
7/2
10/2
7/2FF1 FF2 FF3
7 6 5
Useful skew
[ISQED14]
30A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Conventional Useful Skew Optimization• Standard useful skew flow has chicken-egg problem
• One solution: Back-annotation flows (large runtime)
Placement / Place Opt.
CTS
Routing / Route Opt.
Skew_opt
RTL netlist
Synthesis
CTS Opt.
Netlist and placement assume zero skew
Useful skew optimization relies on placement
Back annotation
Wang et al. in DAC06 propose to back‐annotate useful skew from post‐placement to before‐synthesis
31A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
NOLO: No-Loop Useful Skew Optimization• Our work: Cure the chicken-egg problem with delay prediction
Synthesis w/ Multi-Vt
Routing/Route Opt.
Placement/Place Opt.
RTL netlist
CTS/CTS Opt.
Predictive Useful Skew
Synthesis w/ LVT
LVT-only netlist
• Use setup slacks from LVT-only synthesis estimation of achievable slacks
• Use hold slacks from multi-VT synthesis reduce pessimism
• Advantage: One-pass approach, not constrained by placement
32A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Experimental Results• Predictive flow achieve similar or better timing and much
smaller runtime
0
50
100
150
200
-6 -5 -4 -3
Run
time
(min
)
TNS (ns)
0
40
80
120
160
-7 -6 -5 -4 -3
Run
time
(min
)
TNS (ns)
0
50
100
150
200
-9 -8 -7 -6
Run
time
(min
)
TNS (ns)0
400
800
1200
1600
-25 -20 -15 -10
Run
time
(min
)
TNS (ns)
aes_cipher des_perf
jpeg_encoder mpeg2
Back annotation (BA) Prediction (w/o LVT-only syn)Prediction (w/ LVT-only syn) Average of various BA flows
33A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
34A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
BEOL Multi-Patterning Impacts
Mandrel
Mwidth
Mspace
Spacer
Swidth
Wire1width = Mwidth
Mx metal
Wire2width = Mspace – 2*Swidth
Floating fill wires
Line-end extensionsLine-end cuts
Mandrel
35A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Placement-Sizing Interference• New “interferences” between post-layout optimization
and P&R• Rules for device layers (FEOL) become considerably
more complex and restrictive• Minimum implant width rules for implant region• Minimum notch and jog width rule for oxide diffusion (OD)
HVT HVTLVT
HVT LVT
LVT
HVT
HVT
OD
Cell boundary
[ICCAD15]
36A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Placement-Sizing Interference (cont.)• Drain-to-drain abutment (DDA)
• Example solution
Intertwine the historically separate tasks of P&R and post‐route optimization
Cell boundary
Active region
Poly
Power/ground
Connection
D D D S
SD
√
DDAviolation
Min implant widthviolation
Min implant widthviolation
Min jog/notch widthviolation
[ICCAD15]
37A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
38A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
I. Flexible Timing Models
setup
c2q
hold
c2q
C2q‐setup‐hold surface
setup holdc2q
• Setup time, hold time and clock-to-q (c2q) delay of FF⇒ values interdependent, but NOT fixed
• Flexible FF timing model can exploit operating (function/test) modes⇒ “Free” pessimism reduction in STA
• Sequential LP:• setup-c2q opt • hold-c2q opt
• Goal: Find best {setup, hold, c2q} for each FF instance
[ISQED14]
hold
c2q1
c2qn
...
setup‐hold‐c2q flexible model
setup‐hold‐c2q fixed model
39A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Flexible Timing Model Recover Margin• Independent datapaths in PBA: using fixed FF timing
model loses performance optimization opportunity
470ps
480ps
460ps
470ps460ps
480ps
FF3
FF1
FF2
setup: 10ps c2q: 20ps
setup: 10ps
c2q: 20ps setup: 20ps
c2q: 10ps
Total: 500ps Total: 500ps
Total: 500ps
20ps
10ps 10ps
20ps
520ps? 500ps!
40A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Improved Timing Signoff Flow
Extract path timing information
LP formulation with flexible flip‐flop timing model
Solve Sequential LP (STA_FTmax , STA_FTmin)
Annotate new timing model for each flip‐flop
Solution
Netlist (and SPEF, if routed)
Timing signoff with annotated timing
Takeaways• Fix timing violations “for free”• 48ps average improvement of
slack over 5 designs in a foundry 65nm technology
Next• Better exploitation of disjoint
cycles/modes • More accurate modeling of
setup-hold-c2q tradeoff• Circuit optimization should
natively exploit FF timing model flexibility
41A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
II. Signoff Definition (e.g., with AVS, Aging)• VBTI : Voltage for BTI‐aging estimation• Vlib : Supply voltage for timing library characterization• Vfinal: Vdd of a circuit with AVS at end‐of‐lifetime
VlibVlib
VBTIVBTI Deratedlibrary
Deratedlibrary
|Vt||Vt| Circuit implementation
and signoff
Circuit implementation
and signoff
circuitcircuitBTI degradation
and AVSBTI degradation
and AVSVfinalVfinal
? Chicken & Egg Loop
VBTI and Vlibdepend on aging during AVS (Vfinal)
Vfinaldepends on circuit
Circuit implementation depends on VBTI and Vlib
[DATE13]
42A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Observations and HeuristicsObservation #1: Vfinal is not sensitive to cells along the timing‐critical path
Observation #2: ΔVt with a constant Vfinalthroughout lifetime ≈ adaptive Vdd
Solve “Chicken & Egg Loop” by having VBTI = Vlib = Vheur≈ Vfinal
Heuristic #1: Use average of critical path replicas to
estimate Vfinal (Vheur)
Heuristic #2: approximate Vdd in AVS by constant Vheur
43A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Low Vlib High Vlib
LowVBTI
Slower circuitLess aging
Faster circuitLess aging
HighVBTI
Slower circuit More aging
Faster circuitMore aging
“Knee” Point for Signoff Definition
Experiment setup:DC/AC BTI @ 125°C32nm PTM technology4 benchmark circuit implementations
Optimistic aging library large power penalty
Our method finds “Knee” point for balanced area and power tradeoff
Overly pessimistic aging library large area penalty
Ignore AVS larger area
44A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
45A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Mixed Cell Height Implementation (!)• Large cell height better timing, but large area and power• Small cell height smaller area/power per gate, but large delay
and more #buffers• Mixing cell height enables tradeoffs between performance and
area/power (recall FinFET introduction!) better design QoR• E.g., use large-height high-fanin cells to improve pin accessibility• Already have flop trays, etc. as problematic multi-height instances
Technology: 28nm LPIn red are 12T cells = larger area, smaller delayIn blue are 8T cells = smaller area, larger delay
[ICCAD15]
46A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Cost of Mixing Cell Heights• “Breaker cells” are required to align regions with different cell heights Optimization must comprehend corresponding area cost
12T Cell
8T Cell 12T Cell
12T Cell
…
……
…
64nm48nm64nm
four sites
P/G rail
Cell boundary
Assume: M2 pitch = 64nm
Y directional shift
X directional shift
No routing blockage Routing blockage on M1/M2
one M2 pitch
47A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Optimization Flow
Synthesis
Initial placement
Partitioning
Legalization
Floorplan Update
Cell mapping
Routing / RoutOpt
Initial placement uses modified LEF enable optimization with a conventional flow Slicing-based partition with DP to
divide die area into regions with different cell heights Internal-timer guided placement
legalization Floorplan update with “breaker cell”
penalty Row-based cell mapping places cells
onto rows with corresponding heights
48A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Example of Optimization Flow
Initial placement(8T/12T cells are “freely” placed)
Partitioning(Yellow blocks = regions)
Legalization
New floorplanMixed-height placement
Technology: 28nm LPDesign: AES8T cells are in blue12T cells are in red
49A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Benefits from Mixing Cell Heights• Technology: 28nm LP (12T/8T) Design: AES • 25% area reduction as compared to 12T-only design • 20% performance improvement compared to 8T-only design
50A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Outline
• Why Physical Synthesis• Physical Synthesis 1.0• Example Challenges
• FinFET• Noise and Chaos• Clock Skew• Complexity and Hyperlocality• Better (and, more complex) Signoff• A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
51A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
Physical Synthesis 2.0• It’s the predictability! (and, prediction is challenged…)
• New devices and patterning technologies• Complex PD tool chain; chaotic behavior of tools and flows• Oblivious to clocks, corners, coupling how can Physical
Synthesis be doing the right thing? (= target for margin recovery!)
• What will Physical Synthesis 2.0 look like? • (1) Higher-level value: what Physical Design cannot do
• Datapath architecture selection• Resource sharing• Mux mapping
• (2) Other types of prediction (machine learning, big data, etc.) ! • (3) Constructive prediction deeper into implementation flow
• (More integration… ) Clock and MCMM awareness• Hyperlocality awareness: coloring, congestion, coupling, interactions …
LSLS
52A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
THANK YOU !