Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1...

15
1 Power Mitigation For Nanometer FPGAs (ISLPED 2005 Tutorial) Mike Hutton Altera San Jose Power Mitigation For Nanometer FPGAs (ISLPED 2005 Tutorial) Mike Hutton Altera San Jose 2 © 2005 Altera Corporation, M. Hutton Section Overview Section Overview n FPGA Architecture Design n Power Breakdown (90nm vs. 130nm) n Architecture and Design for Low Power n Commercial CAD for Low-Power

Transcript of Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1...

Page 1: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

1

Power Mitigation For Nanometer FPGAs

(ISLPED 2005 Tutorial)

Mike Hutton Altera San Jose

Power Mitigation For Nanometer FPGAs

(ISLPED 2005 Tutorial)

Mike Hutton Altera San Jose

2© 2005 Altera Corporation, M. Hutton

Section OverviewSection Overview

nFPGA Architecture DesignnPower Breakdown (90nm vs. 130nm)nArchitecture and Design for Low PowernCommercial CAD for Low-Power

Page 2: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

2

3© 2005 Altera Corporation, M. Hutton

The k-Input LUT (e.g. k=4)The k-Input LUT (e.g. k=4)DCBA

Y

01

01

01

01

01

01

01

01

01

01

01

01

01

01

01

RR

RR

RR

RR

RR

RR

RR

RR

LUT-mask

a’b’c’d’+ abcd + abc’d’= 1000 0000 0000 1001 = 0x8009

4© 2005 Altera Corporation, M. Hutton

Basic Logic Element (LUT4)Basic Logic Element (LUT4)

Source: Stratix

- Features- Area- Speed- Power

Page 3: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

3

5© 2005 Altera Corporation, M. Hutton

Hierarchy: LAB / Cluster / CLBHierarchy: LAB / Cluster / CLB

H channel

V channel

LAB

LAB lines

LE

LAB-size?

LUT-size?

(Effects on area, speed, power,

and layout)

6© 2005 Altera Corporation, M. Hutton

RoutingRouting

...

...

...

...

4

8

4 8

24

16

Wires:- length- width- space- muxing

Page 4: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

4

7© 2005 Altera Corporation, M. Hutton

LAB InterfaceLAB InterfaceLABlines

locallines

LE

LE

SecondarySignal

Generation

global signals

LEA

LEB

LEC

LED

LEA

LEB

LEC

LED

Input Output

LAB

H4

H24

V16

V4

H4

H4

H24

V16

V4

H4

H,V

Flexibililty:- #mux- mux size- stubbing

8© 2005 Altera Corporation, M. Hutton

Area/Speed/Power

VPRPlace&Route

FPGA Arch. Spec

BenchmarksVPR Tool-setVPR Tool-set

[Betz, PhD]

Page 5: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

5

9© 2005 Altera Corporation, M. Hutton

Commercial Tools (Altera FMT)Commercial Tools (Altera FMT)

10© 2005 Altera Corporation, M. Hutton

Results:Results:

45474951

5355

57596163

0.25 0.75 1.25 1.75 2.25

V/H Channel Width Ratio

Ave

rag

e C

han

nel

Wid

th

0.5,2

1,1

2,0.5

9.0E+03

1.1E+04

1.3E+04

1.5E+04

1.7E+04

1.9E+04

2.1E+04

0 0.2 0.4 0.6 0.8 1

Fraction Length 4 Wires

Rou

ting

Are

a pe

r T

ile Length 4/16Length 4/8

Channel width changes with aspect ratio

Even Ratio of Length 4, 8 Wires Heterogeneous wires reduces delay 25-30%

LUT-size 4 (area) to 6 (speed)

[Betz, TVLSI 2000]

[Lewis, FPGA03] [Hutton, FPGA02]

[Lewis, FPGA05] [Hutton, FPL04]

[Leventis, CICC03]

Page 6: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

6

11© 2005 Altera Corporation, M. Hutton

Commercial DieCommercial Die

12© 2005 Altera Corporation, M. Hutton

Power BreakdownPower Breakdown

Rel

ativ

e P

ower

130 nm(Stratix)

90 nm(Stratix II)

I/O

Static

Dynamic

I/O

Static

Dynamic

I/O

Page 7: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

7

13© 2005 Altera Corporation, M. Hutton

130nm vs. 90nm operating power130nm vs. 90nm operating power

0

2

4

6

8

10

12

0 20 40 60 80 100Frequency (MHz)

Pow

er (W

) 90nm 2S30

90nm 2S15

130nm 1S25

(DES core~10K LEs)

14© 2005 Altera Corporation, M. Hutton

n Threshold Voltage

n Oxide Thickness

n Channel W,L

n Transistor Sizing

n Duplicate Paths

n Buffer/Isolate

n Width, Spacing

Architecture Design TradeoffsArchitecture Design Tradeoffs

n Clock Network Flexibility

n Power Gating, Multiple Vdd

n Dynamic vs. Static Circuits

n Process Steps

n Manufacturability

n Yield Risk

CostCost

PerformancePerformance

PowerPowerAreaArea

Page 8: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

8

15© 2005 Altera Corporation, M. Hutton

Core Dynamic PowerCore Dynamic Powern Average over 112 Industrial (Customer) Designs

* “5% or 0%”ismore accurate

Routing40%

ALMCombinational

23%

ALMRegisters

16%

RAM Blocks14%

Clock Networks7%

DSP Blocks1%*

[Source: 90nm Stratix II]

16© 2005 Altera Corporation, M. Hutton

Dynamic Power MinimizationDynamic Power Minimization

nProcess Technology:− TSMC Black Diamond

Low-k Dielectric (2.9 vs. 3.6 for FSG)− Reduces Metal Capacitancel ~14% Reduction in Dynamic Powerl ~12% Performance Improvement

− Standard On All TSMC 90nm Products

nI/O Region− Re-design to Reduce I/O Pin Capacitance

Page 9: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

9

17© 2005 Altera Corporation, M. Hutton

Heterogeneous RAM, DSP, ClockHeterogeneous RAM, DSP, Clock

M4K Blocks

More Data Ports for Greater Memory BandwidthMore Data Ports for Greater Memory Bandwidth

M512 Blocks

More Data Bits for Larger Memory BufferingMore Data Bits for Larger Memory Buffering

MegaRAM

512Kb

4Kb512+ bits

•Memory Packing/Mapping •Programmable Clock Enables

18© 2005 Altera Corporation, M. Hutton

New: Heterogeneous LE / LABNew: Heterogeneous LE / LAB

n LAB Size 10-20, LUT-Size 4 for Area, Power− Low-cost Cyclone II has LAB-size 16, LUT-size 4

n LAB Size 12-16, LUT-Size 6 for Delay− But suffers on power and area

n Stratix II “Adaptable”Logic From 16x5 to 8x7− Allows critical path in 6 and 7 LUTs (10% of logic)− Remaining 90% logic in energy-preferred 4 and 5 LUTs

n Note: LAB-Sizing Very Layout-Dependent

Page 10: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

10

19© 2005 Altera Corporation, M. Hutton

ALM ConfigurationsALM Configurations

4-LUT4-LUT

4-LUT4-LUT

ALM

6-LUT6-

LUT

ALM

6-LUT6-

LUT

5-LUT5-LUT

4-LUT4-LUT

ALM

5-LUT5-LUT

ALM

5-LUT5-LUT

5-LUT5-LUT

3-LUT3-LUT

ALM

6-LUT6-LUT

ALM

7-LUT(1)

7-LUT(1)

ALM

20© 2005 Altera Corporation, M. Hutton

Stratix II ALM –High LevelStratix II ALM –High Level

disconnect

Page 11: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

11

21© 2005 Altera Corporation, M. Hutton

Source Drain

Gate

Static PowerStatic Power1.Sub-Threshold Leakage (Dominant)

− Increases Rapidly with Temperature− Highly Dependent on Process Variation

2.Gate Leakage (Still Smaller)3.Reverse-Biased Junction Leakage (Very Small)

111

2

3

22© 2005 Altera Corporation, M. Hutton

Raw Static Power NumbersRaw Static Power Numbers

0

1

2

3

4

5

6

0 20 40 60 80 100 120 140 160 180 200Logic Elements (Thousands)

Sta

tic P

ower

(W

)

Typical Device, 25°C

Worst-Case Device, 85°C

Page 12: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

12

23© 2005 Altera Corporation, M. Hutton

Channel Length VariationChannel Length Variation

Long Gate

Short Gate

Min

Max

Min

MaxLarge % Variation of

Channel Length

Small % Variation of Channel Length

24© 2005 Altera Corporation, M. Hutton

Process Variation Impact On LeakageProcess Variation Impact On Leakage

“Typical”Vt & L

High VtIncreased L

Low VtReduced L

0

0.5

1

1.5

2

2.5

3

Process Variation

Rel

ativ

e Le

akag

e

Short LLong L

Over 40% Worst-Case

Leakage Reduction

Designed Channel Length (L)

Page 13: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

13

25© 2005 Altera Corporation, M. Hutton

90nm Leakage Mitigation90nm Leakage MitigationnMultiple VT Transistors

–High VT Off Critical Path (e.g. config)

gives “easy”10X Leakage Reduction

nLonger Channels for Most Transistors–Significant WC Leakage Reduction–Worst-Case Very Important for FPGAs

due to speed binningnDual TOX

26© 2005 Altera Corporation, M. Hutton

HardCopy II Leakage & Logic PowerHardCopy II Leakage & Logic Power

LE LE

HLE HLE

HardCopy II:Custom Metal Routing(20K Less Routing Cap)

FPGA:Programmable Routing

Page 14: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

14

27© 2005 Altera Corporation, M. Hutton

Quartus II CAD OptimizationsQuartus II CAD OptimizationsnConfiguration Options

− Power-Down Unused Branches of Clock Tree − Unused Devices Moved to Low-Leakage States

n Power-Driven Place&Route− Reduce Global IC For Active Nets

n Power-driven synthesis− Re-Arranging LUT-Masks, RAMs and Clustering lAbsorb Active Nets, Reduce Toggling

− Inference / Manipulation of Clock-EnableslEspecially on Hard-Blocks

n Power Calculator and Modeling (Temp, Activity)− Measurement Is Key To Any Optimization Algorithms

28© 2005 Altera Corporation, M. Hutton

E.g. Clock ShutdownE.g. Clock Shutdownn Automatic In Place&Route

− Fine Granularity (Nearly 800 Regions)n Taking Advantage of Programmable Clock Network

Blue: Clock

Required

Only Red Parts

of Clock NetworkToggle

Page 15: Power Mitigation For Nanometer FPGAseda.ee.ucla.edu/pub/ISLPED05-3.pdf · 0 0.2 0.4 0.6 0.8 1 Fraction Length 4 Wires Routing Area per Tile Length 4/16 Length 4/8 Channel width changes

15

29© 2005 Altera Corporation, M. Hutton

E.g. RAM Slicing for Power (16x1024)E.g. RAM Slicing for Power (16x1024)

16 wide x 1kdeep RAM

Less Power Efficient:M4K: 4 wide x 1k deep

(4 times)

More Power Efficient:M4K: 16 wide x 256 deep

(4 times)

2:4Decoder

Addr[0:9]

data[0:15]

16

Addr[0:7]

Addr[8:9]

Data[0:15]

~27% Lower Power

30© 2005 Altera Corporation, M. Hutton

SummarySummaryn Power Breakdown

− Dynamic Dominates at 90nm, Static Growing

n Architecture Enhancements− Logic & LAB Changes Had Significant Benefits− But Most Gains at 90nm From Process/Circuit

nCost Tradeoff Is Key (Area, Yield, Risk)− 90nm Used Multiple VT, L and Lots of Device Tuning− Rejected Multiple Core VDD, TOX (for now)

nCAD− Early Techniques Help, But Lots To Do