Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1...
Transcript of Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1...
1
Power Mitigation For Nanometer FPGAs
(ISLPED 2005 Tutorial)
Mike Hutton Altera San Jose
Power Mitigation For Nanometer FPGAs
(ISLPED 2005 Tutorial)
Mike Hutton Altera San Jose
2© 2005 Altera Corporation, M. Hutton
Section OverviewSection Overview
nFPGA Architecture DesignnPower Breakdown (90nm vs. 130nm)nArchitecture and Design for Low PowernCommercial CAD for Low-Power
2
3© 2005 Altera Corporation, M. Hutton
The k-Input LUT (e.g. k=4)The k-Input LUT (e.g. k=4)DCBA
Y
01
01
01
01
01
01
01
01
01
01
01
01
01
01
01
RR
RR
RR
RR
RR
RR
RR
RR
LUT-mask
a’b’c’d’+ abcd + abc’d’= 1000 0000 0000 1001 = 0x8009
4© 2005 Altera Corporation, M. Hutton
Basic Logic Element (LUT4)Basic Logic Element (LUT4)
Source: Stratix
- Features- Area- Speed- Power
3
5© 2005 Altera Corporation, M. Hutton
Hierarchy: LAB / Cluster / CLBHierarchy: LAB / Cluster / CLB
H channel
V channel
LAB
LAB lines
LE
LAB-size?
LUT-size?
(Effects on area, speed, power,
and layout)
6© 2005 Altera Corporation, M. Hutton
RoutingRouting
...
...
...
...
4
8
4 8
24
16
Wires:- length- width- space- muxing
4
7© 2005 Altera Corporation, M. Hutton
LAB InterfaceLAB InterfaceLABlines
locallines
LE
LE
SecondarySignal
Generation
global signals
LEA
LEB
LEC
LED
LEA
LEB
LEC
LED
Input Output
LAB
H4
H24
V16
V4
H4
H4
H24
V16
V4
H4
H,V
Flexibililty:- #mux- mux size- stubbing
8© 2005 Altera Corporation, M. Hutton
Area/Speed/Power
VPRPlace&Route
FPGA Arch. Spec
BenchmarksVPR Tool-setVPR Tool-set
[Betz, PhD]
5
9© 2005 Altera Corporation, M. Hutton
Commercial Tools (Altera FMT)Commercial Tools (Altera FMT)
10© 2005 Altera Corporation, M. Hutton
Results:Results:
45474951
5355
57596163
0.25 0.75 1.25 1.75 2.25
V/H Channel Width Ratio
Ave
rag
e C
han
nel
Wid
th
0.5,2
1,1
2,0.5
9.0E+03
1.1E+04
1.3E+04
1.5E+04
1.7E+04
1.9E+04
2.1E+04
0 0.2 0.4 0.6 0.8 1
Fraction Length 4 Wires
Rou
ting
Are
a pe
r T
ile Length 4/16Length 4/8
Channel width changes with aspect ratio
Even Ratio of Length 4, 8 Wires Heterogeneous wires reduces delay 25-30%
LUT-size 4 (area) to 6 (speed)
[Betz, TVLSI 2000]
[Lewis, FPGA03] [Hutton, FPGA02]
[Lewis, FPGA05] [Hutton, FPL04]
[Leventis, CICC03]
6
11© 2005 Altera Corporation, M. Hutton
Commercial DieCommercial Die
12© 2005 Altera Corporation, M. Hutton
Power BreakdownPower Breakdown
Rel
ativ
e P
ower
130 nm(Stratix)
90 nm(Stratix II)
I/O
Static
Dynamic
I/O
Static
Dynamic
I/O
7
13© 2005 Altera Corporation, M. Hutton
130nm vs. 90nm operating power130nm vs. 90nm operating power
0
2
4
6
8
10
12
0 20 40 60 80 100Frequency (MHz)
Pow
er (W
) 90nm 2S30
90nm 2S15
130nm 1S25
(DES core~10K LEs)
14© 2005 Altera Corporation, M. Hutton
n Threshold Voltage
n Oxide Thickness
n Channel W,L
n Transistor Sizing
n Duplicate Paths
n Buffer/Isolate
n Width, Spacing
Architecture Design TradeoffsArchitecture Design Tradeoffs
n Clock Network Flexibility
n Power Gating, Multiple Vdd
n Dynamic vs. Static Circuits
n Process Steps
n Manufacturability
n Yield Risk
CostCost
PerformancePerformance
PowerPowerAreaArea
8
15© 2005 Altera Corporation, M. Hutton
Core Dynamic PowerCore Dynamic Powern Average over 112 Industrial (Customer) Designs
* “5% or 0%”ismore accurate
Routing40%
ALMCombinational
23%
ALMRegisters
16%
RAM Blocks14%
Clock Networks7%
DSP Blocks1%*
[Source: 90nm Stratix II]
16© 2005 Altera Corporation, M. Hutton
Dynamic Power MinimizationDynamic Power Minimization
nProcess Technology:− TSMC Black Diamond
Low-k Dielectric (2.9 vs. 3.6 for FSG)− Reduces Metal Capacitancel ~14% Reduction in Dynamic Powerl ~12% Performance Improvement
− Standard On All TSMC 90nm Products
nI/O Region− Re-design to Reduce I/O Pin Capacitance
9
17© 2005 Altera Corporation, M. Hutton
Heterogeneous RAM, DSP, ClockHeterogeneous RAM, DSP, Clock
M4K Blocks
More Data Ports for Greater Memory BandwidthMore Data Ports for Greater Memory Bandwidth
M512 Blocks
More Data Bits for Larger Memory BufferingMore Data Bits for Larger Memory Buffering
MegaRAM
512Kb
4Kb512+ bits
•Memory Packing/Mapping •Programmable Clock Enables
18© 2005 Altera Corporation, M. Hutton
New: Heterogeneous LE / LABNew: Heterogeneous LE / LAB
n LAB Size 10-20, LUT-Size 4 for Area, Power− Low-cost Cyclone II has LAB-size 16, LUT-size 4
n LAB Size 12-16, LUT-Size 6 for Delay− But suffers on power and area
n Stratix II “Adaptable”Logic From 16x5 to 8x7− Allows critical path in 6 and 7 LUTs (10% of logic)− Remaining 90% logic in energy-preferred 4 and 5 LUTs
n Note: LAB-Sizing Very Layout-Dependent
10
19© 2005 Altera Corporation, M. Hutton
ALM ConfigurationsALM Configurations
4-LUT4-LUT
4-LUT4-LUT
ALM
6-LUT6-
LUT
ALM
6-LUT6-
LUT
5-LUT5-LUT
4-LUT4-LUT
ALM
5-LUT5-LUT
ALM
5-LUT5-LUT
5-LUT5-LUT
3-LUT3-LUT
ALM
6-LUT6-LUT
ALM
7-LUT(1)
7-LUT(1)
ALM
20© 2005 Altera Corporation, M. Hutton
Stratix II ALM –High LevelStratix II ALM –High Level
disconnect
11
21© 2005 Altera Corporation, M. Hutton
Source Drain
Gate
Static PowerStatic Power1.Sub-Threshold Leakage (Dominant)
− Increases Rapidly with Temperature− Highly Dependent on Process Variation
2.Gate Leakage (Still Smaller)3.Reverse-Biased Junction Leakage (Very Small)
111
2
3
22© 2005 Altera Corporation, M. Hutton
Raw Static Power NumbersRaw Static Power Numbers
0
1
2
3
4
5
6
0 20 40 60 80 100 120 140 160 180 200Logic Elements (Thousands)
Sta
tic P
ower
(W
)
Typical Device, 25°C
Worst-Case Device, 85°C
12
23© 2005 Altera Corporation, M. Hutton
Channel Length VariationChannel Length Variation
Long Gate
Short Gate
Min
Max
Min
MaxLarge % Variation of
Channel Length
Small % Variation of Channel Length
24© 2005 Altera Corporation, M. Hutton
Process Variation Impact On LeakageProcess Variation Impact On Leakage
“Typical”Vt & L
High VtIncreased L
Low VtReduced L
0
0.5
1
1.5
2
2.5
3
Process Variation
Rel
ativ
e Le
akag
e
Short LLong L
Over 40% Worst-Case
Leakage Reduction
Designed Channel Length (L)
13
25© 2005 Altera Corporation, M. Hutton
90nm Leakage Mitigation90nm Leakage MitigationnMultiple VT Transistors
–High VT Off Critical Path (e.g. config)
gives “easy”10X Leakage Reduction
nLonger Channels for Most Transistors–Significant WC Leakage Reduction–Worst-Case Very Important for FPGAs
due to speed binningnDual TOX
26© 2005 Altera Corporation, M. Hutton
HardCopy II Leakage & Logic PowerHardCopy II Leakage & Logic Power
LE LE
HLE HLE
HardCopy II:Custom Metal Routing(20K Less Routing Cap)
FPGA:Programmable Routing
14
27© 2005 Altera Corporation, M. Hutton
Quartus II CAD OptimizationsQuartus II CAD OptimizationsnConfiguration Options
− Power-Down Unused Branches of Clock Tree − Unused Devices Moved to Low-Leakage States
n Power-Driven Place&Route− Reduce Global IC For Active Nets
n Power-driven synthesis− Re-Arranging LUT-Masks, RAMs and Clustering lAbsorb Active Nets, Reduce Toggling
− Inference / Manipulation of Clock-EnableslEspecially on Hard-Blocks
n Power Calculator and Modeling (Temp, Activity)− Measurement Is Key To Any Optimization Algorithms
28© 2005 Altera Corporation, M. Hutton
E.g. Clock ShutdownE.g. Clock Shutdownn Automatic In Place&Route
− Fine Granularity (Nearly 800 Regions)n Taking Advantage of Programmable Clock Network
Blue: Clock
Required
Only Red Parts
of Clock NetworkToggle
15
29© 2005 Altera Corporation, M. Hutton
E.g. RAM Slicing for Power (16x1024)E.g. RAM Slicing for Power (16x1024)
16 wide x 1kdeep RAM
Less Power Efficient:M4K: 4 wide x 1k deep
(4 times)
More Power Efficient:M4K: 16 wide x 256 deep
(4 times)
2:4Decoder
Addr[0:9]
data[0:15]
16
Addr[0:7]
Addr[8:9]
Data[0:15]
~27% Lower Power
30© 2005 Altera Corporation, M. Hutton
SummarySummaryn Power Breakdown
− Dynamic Dominates at 90nm, Static Growing
n Architecture Enhancements− Logic & LAB Changes Had Significant Benefits− But Most Gains at 90nm From Process/Circuit
nCost Tradeoff Is Key (Area, Yield, Risk)− 90nm Used Multiple VT, L and Lots of Device Tuning− Rejected Multiple Core VDD, TOX (for now)
nCAD− Early Techniques Help, But Lots To Do