Post on 12-Nov-2018
1 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
RedHawk large design handling using DMPKeuncheol Lee, Sr. Application Engineer
2 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Abstract
Today large SoC IC designs challenge TAT, Cost, power noise margin and machine capacity.
RedHawk Distributed Machine Processing, DMP, is used to provide significant memory reduction
and runtime improvement over flat runs by dividing the design into multiple partitions and processing the partitions in parallel
3 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
IC Design Trends
Cost of Chip Failure is High !
> Increasing silicon costs> First time silicon success is important
Silicon Design Cost Trends
Source:
> Design margins are constantly shrinking> Design verification coverage is critical
Supply and Threshold Scaling
Source: Paolo Gargini, ITRS Past, Present and Future May 2015
4 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Power, Noise & Reliability Requirements
Coverage
> Operating mode coverage> EM/Thermal reliability> ESD/EMI compliance
Demand Current
Supply Current
High power
Low power
Accuracy
> Advanced tech node support> Silicon validated accuracy> Complete Chip/Pkg/System
Mitigate Your IC Design Risk!
Capacity
> Full-chip (B+ instances)> Multi-domain, Multi-Physics> Distributed computing
…
5 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
IC Design Challenges
Higher Variability + Lower Supply Voltages = Decreased Power Noise Margin
Analysis must be ACCURATE!
Multi-Domain + Multi-Physics
High Coverage
> Overdesign!> Missed Schedule!> Increased Costs!
500mV supply How do you margin? 3%
10%
2%
Margins have outlived their usefulness
6 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP : Distributed Machine Processing
7 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP-DB Flow Description Design is divided into vertical partitions and analyzed across multiple machines
Each partition is an individual RedHawk job launched on a smaller portion of the design requiring smaller memory
A “Master” job launched to monitor/communicate with the partitions/slaves
Number of jobs to launch = Number of partitions (slaves) + 1 (Master)
Every partition communicates with other partitions through MPI and creates a reduced view for the rest of the design, thus accuracy is maintained
Job launch across multiple machines and communication between machines is performed using MPI (Message Passing Interface)
Partitions are aware of exit status of other jobs. If one job dies all other partitions/master will exit automatically
Supported over LSF/SSH/SGE/RTDA grid types
Master
DMP flow
Slave2Slave1
Node/resistorWire/viaInstance/net
8 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP-DB Flow Description
Main jobs is to maintain GUI and control synchronization between slaves
Master is launched in GUI mode. All slaves are launched in batch mode
Single unified GUI on master shows results from all partitions
All GUI and TCL command are parsed through master to the slaves
No analysis(power calc, extraction, simulation etc) is performed on the Master.
When ‘FAST_DEF_READ 0’, Master reads in top def and helps in partitioning of design
Most results are concatenated from every slave and present in master adsRpt directory.
Master memory consumption is typically much lower than the slaves
Jobs/Characteristics of Master machine:
9 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP-DB Flow Description
Launched in batch mode, so no GUI for each slave
Every slave has following read behavior for each input file
All other analysis steps(pwrcalc, Extraction, Simulation etc) is done within each partition
Each partition has its own adsRpt directory in run area – adsRpt.1 for 1st partition, adsRpt.2 for 2nd partition…
Results for each partition also available in respective adsRpt.1 directory
LEF All partitions read all LEFs
LIB All partitions read all LIBs
DEF If FAST_DEF_READ 0 : All partitions read all DEFs. Multi-Threaded readingIf FAST_DEF_READ <num> : Each partition reads <num> number of DEFs at a time
Pkg/Ploc All partitions read complete pkg/ploc
STA Default parallelization turned on. Each partition reads a portion of the STA
SPEF Each partition reads its own SPEF files
APL Each partition reads all APL files
VCD File Each partition reads it’s own VCD file
Jobs/Characteristics of Slave machine:
10 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP DB Behavior during Setup Design
Design Partitioning:- User defines number of partitions for the design
DMP_SETUP_BOUNDARY {x_boundary1x_boundary2x_boundary3….}
- RedHawk automatically sizes each partition to ensure even node count distribution
Each partition has a different width
11 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP DB Behavior during PowerCalc
Vless and Gate VCD based power calculation:- By default each partition calculates power for
instances within its boundary- A conservative mode provided for power
calculation with full chip scope- Accuracy matches well with flat while providing
significant speed-up
RTL VCD based power calculation:- Since RTL VCD involves propagating events/states
across partition boundaries, this is an iterative process
RTL VCD: Events and state propagation data exchanged between partitions
12 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP-SIM flow
13 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Capacity : Case StudyDMP Performance Benefit Across Different Design Styles
6.8X10.6X
3.5X 5X
ASIC FPGA Mobile Networking
Flat DMP
No
t P
oss
ible
6.4XAverage runtime improvement
~5XAverage memory improvement
ScalabilityUp to 32 machine cluster scalability benchmarking
AccuracyPreserved flat accuracy with the runtime benefit
> Runtime improvements show above are for 8 to 16 way DMP runs
14 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Capacity : Case StudyDMP Performance Benefit Across Multiple Simulation Modes
4.7XAverage runtime improvement
~4XAverage memory improvement
6X6X 4X
3.8X
1.8X
Static/EM Functional Scan Jitter ESD
Flat DMP-14
1.4B/1.0BNodes & Resistors in the Design
100MFunctional Instances in the Design
20+Power & Ground Networks Solved
15 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Flat(410M Nodes)
DMP/8*
Setup Design 1:53 0:37
Power Calc + Extr. 1:50 0:20
Simulation 1:52 1:03
Post Processing 1:10 0:16
Total 6:45 2:16
Flat(410M Nodes)
DMP/8*
Setup Design 75GB 21GB
Power Calc + Extr. 126GB 32GB
Simulation 132GB 33GB
Post Processing 169GB 40GB
QoR: Flat vs DMP/8
Instance Voltage Drop
EM Results
QoR on a Sample DMP Data
DvD Distribution between 8-way & 16-way runs are very consistent
DvD
inst
ance
s
FLAT run
FLAT run
DM
P r
un
DM
P r
un
16 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP Run-times on Ultra-Large DesignsDesign Type Features Analysis Elapsed Time
Custom Design (mixed signal)
~1.5B nodes~2B resistors
Dynamic (16-way) ~4.5hrs
Networking ASIC~500M instances, ~2.5B nodes~4.5B resistors
Static (16-way) ~9hrs
Dynamic (16-way) ~15hrs
Networking ASIC~4.5B Nodes~6.5B Resistors
Static (16-way) ~15hrs
Dynamic (16-way) ~22hrs
Mobile APU + full MMX views
Static (16-way) ~9hrs
Dynamic (16-way) ~13hrs
Graphics ~7B+ nodes Dynamic (22-way)~46hrs w/ 1ps
time step
• Depending on the size of a design, an optimal number of partitions can be achieved for best performance and memory
• No limitation on the future ultra size of a chip
• Avg. 6.4Xrun time improvement
• Avg. 5Xmemory reduction
• Preserve accuracyw/ run time benefits
17 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP License
No. of Slaves(partitions) License checked out
2 – 4 1 RH + 1 RH_DMP
5 – 8 1 RH + 2 RH_DMP
9 – 16 1RH + 3 RH_DMP
No. of partition = 2N , N = No. of license copies
18 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Real-Time monitoring using dmpstat
19 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP : Accuracy Comparison
Based on comparing DMP Versus Flat runs across multiple 16FF – 40nm designs
Worst instance drop values correlate very well(within 2mV DvD)
99% of instances have difference < 5mV in Static Analysis
99% of instances have difference < 20mV for Dynamic Analysis(minTW, minWC, avgTW, max TW)
20 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP : Supported design and features
Flip-Chip Design/Blocks
Static Analysis
Dynamic Analysis(Vectorless/RTL-VCD/Gate-VCD)
Signal-EM
Power EM
Low-power Analysis
CMM(Raw Model)
Enhancing other features
21 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
Customer List on DMP Deployment for IR/EM Sign-off
Seattle
22 © 2016 ANSYS, Inc. October 14, 2016 ANSYS Confidential
DMP Roadmap
• Extend DMP capability on SoC to multiple dies in RH-3DIC (RH-InFO)
• Continue to optimize DMP scalability
• Further improvement on ease of use, related to log files, process monitoring utilities, and DMP robustness
• Distributed DB support on RH-DMP for ESD, particularly for ultra-large chips