D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
-
Upload
meenakshi-snmurthy -
Category
Documents
-
view
247 -
download
0
Transcript of D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
1/12
Clock Tree Synthesis of SMIC40nm
Low Leakage Cortex A9 With
Cadence CCopt
Brite
Arthur Liang, Titan Wang
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
2/12
Design overview
• ARM dual‐core Cortex A9
• 32K i‐cache and 32K D‐cache, includes Neon
• Use SMIC 40nm low leakage process
• Implementation with Cadence CCopt
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
3/12
CCopt Flow Methodology
Traditio nal EDI Balanced
Clocks Flow
RTLRTLRTL
SynthesisSynthesis
NetlistNetlistNetlist
PlacementPlacement
Routing &Post-route opt
Routing &Post-route opt
GDSIIGDSIIGDSII
CTSCTS
Pre-CTS OptPre-CTS Opt
Post-CTS OptimizationPost-CTS Optimization
New CCopt
Flow
RTLRTLRTL
SynthesisSynthesis
NetlistNetlistNetlist
PlacementPlacement
Routing &
Post-route opt
Routing &Post-route opt
GDSIIGDSIIGDSII
CCOptClock Concurrent Optimization
CCOpt
Clock Concurrent Optimization
Pre-CTS OptPre-CTS Opt
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
4/12
G[i‐1]max G[i]max
T
clock
P[i‐1]
P[i]
P[i+1]
Critical path
• Many iterations
• Excessive run time
• Area explosion
• Higher leakage
Traditional
EDI
CTS
Methodology
UnnecessaryNo fundamental timing
requirement that clocks
need to be balanced
Balanced CTS
Expensive• Clock buffer explosion to
minimize skew
• Other expensive options (e.g,
mesh, spine, ..)
Severe IR DropAll flops/RAMs forced to
trigger at the same time
Traditional Timing Optimization
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
5/12
G[i‐1]max
G[i]max
T
clock
P[i‐1] P[i+1]
CCopt
Time borrowing
• Faster timing closure
• Higher performance
• Lower Area
• Lower leakage
CCopt ‐ Clock Concurrent Optimization Flow
Lower IR Drop• Flops/RAMs triggered at
different times
• Critical and non‐critical sinks
are skewed
Efficient• Significant reduction in clock
buffers (no explicit requirement to
balance Tree)
MM/MC/OCVUseful‐skew takes into all
timing aspects including MM,
MC, OCV, setup, hold
P[i]
Concurrent useful‐skew and datapath optimization
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
6/12
clock
variable
skew
Gmax
Gmax
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
7/12
A9 CPU
Snapshot
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
8/12
Reference CCopt
Script
• setCCOptMode \‐cts_buffer_cells {BUF_X16B_A12TR40 BUF_X13B_A12TR40 BUF_X11B_A12TR40
BUF_X6B_A12TR40} \
‐cts_inverter_cells {INV_X16B_A12TR40 INV_X13B_A12TR40 INV_X11B_A12TR40
INV_X6B_A12TR40} \
‐cts_clock_gating_cells { PREICG_X11B_A12TR40 } \
‐cts_target_slew 0.08 \
‐cts_target_nonleaf_slew 0.08 \
‐cts_target_skew 0.15 \
‐io_opt off \
‐ccopt_auto_limit_insertion_delay_factor 1.2 \
‐ccopt_enable_downsizer true \
‐erc fix \
‐cts_use_inverters true
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
9/12
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
10/12
CCopt Clock
Tree
Summary
STA Timing Summary With CCopt
STA Timing Summary With Traditional CTS Flow
Clock Tree Name : "CLK"
Clock Period : 1.10000
Number of Levels : 21
Number of Sinks : 54562
Number of CT Buffers : 1262
Total Area of CT Buffers : 4689.66
Max Global Skew : 0.2268
Clock Tree Name : "CLK"
Clock Period : 1.10000
Number of Levels : 20
Number of Sinks : 54562
Number of CT Buffers : 1178Total Area of CT Buffers : 4716.22
Max Global Skew : 0.1356
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
11/12
Conclusion• Ccopt is able to determine the proper clock
offsets – instead of manually skewing a clock
in an iterative process
• Have increased A9 cpu frequency
• Can reduce
clock
tree
buffer
• Ccopt is internally making tradeoffs between
timing/power/schedule
-
8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf
12/12
Thanks