Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16...
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16...
Lecture 16: Power Reduction Techniques November 5, 2013
ECE 636
Reconfigurable Computing
Lecture 16
Power Reductions Techniques for FPGAs
Lecture 16: Power Reduction Techniques November 5, 2013
Overview
• FPGAs generally considered power hungry compared to ASIC and processor counterparts
- Mostly due to unused interconnect
• Recent area of extensive research
• Device techniques
- Voltage scaling
- Sleep mode
• Software techniques
- Reduced switching
- Reduced capacitance
Lecture 16: Power Reduction Techniques November 5, 2013
Dynamic Power
° Dynamic power is required to charge and discharge load capacitances when transistors switch.
° One cycle involves a rising and falling output.
° On rising output, charge Q = CVDD is required
° On falling output, charge is dumped to GND
Cfsw
iDD(t)
VDD
Courtesy: Harris
Short circuit current
Charge/discharge current
Lecture 16: Power Reduction Techniques November 5, 2013
Dynamic Power
Cfsw
iDD(t)
VDD
dynamic
0
0
sw
2sw
1( )
( )
T
DD DD
TDD
DD
DDDD
DD
P i t V dtT
Vi t dt
T
VTf CV
T
CV f
Short circuit power <10% of dynamic power
Lecture 16: Power Reduction Techniques November 5, 2013
° Junction leakage
° Gate oxide leakage
° Subthreshold leakage
FPGA Static Power Consumption
Lecture 16: Power Reduction Techniques November 5, 2013
° Junction leakage• Small fraction of leakage
° Gate oxide leakage• When Vgs < Vt still some
source-drain current
• Increases exponentially as Vt decreases
• Decreases exponentially as Vgs decreases
° Subthreshold leakage• Increases exponentially as Vgs
increases
FPGA Static Power Consumption
Courtesy: NowakTechnology trend
Lecture 16: Power Reduction Techniques November 5, 2013
FPGA Power Reduction Goals
• Dynamic power goals
- Reduce Vdd along non-critical paths
- Low swing signalling
- Use CAD approaches to limit long high-toggle paths
- Pdynamic = 0.5 * C * Vdd2 * f
• Static power goals
- Cut-off Vdd for unused transistors
- Use high Vt transistors for SRAM cells
- Various other voltage biasing techniques
Lecture 16: Power Reduction Techniques November 5, 2013
Traditional Routing Switch
S S S...
SRAM cell CONFIG
…..
i1i2i3i4
in
MU
XMUX
S
S
S
S
i1
i2
i3
i4
MP1
OUT
VINT
MP2
level-restoringbuffer
Courtesy: Anderson
Lecture 16: Power Reduction Techniques November 5, 2013
Proposed Switch Designs: Anderson
° Based on 3 observations:• Routing switch inputs tolerant to
weak-1 signals (level-restoring buffers).
• Considerable slack in FPGA designs many switches can be slowed down.
• Most routing switches feed other routing switches.
- Can produce weak-1 logic signals.
Lecture 16: Power Reduction Techniques November 5, 2013
“Basic” Switch Design
high-speed: MNX & MPX ONlow-power: MNX ON, MPX OFFsleep: MNX OFF, MPX OFF
MODEOPERATION:
OUT
VVD
~SLEEP LOW_POWER v SLEEP
VDD
GND GND
VDD
S S ...
SRAM cell CONFIG
…..
i1
i2
i3
i4
in
SMNX MPX
sLOW_POWER ~LOW_POWER
MUX
VVD
Lecture 16: Power Reduction Techniques November 5, 2013
OUT
VVD
~SLEEP LOW_POWER v SLEEP
VDD
GND GND
VDD
S S ...
SRAM cell CONFIG
…..
i1
i2
i3
i4
in
SMNX MPX
sLOW_POWER ~LOW_POWER
MUX
High-Speed Mode
high-speed: MNX & MPX ONlow-power: MNX ON, MPX OFFsleep: MNX OFF, MPX OFF
MODEOPERATION:
output swing:rail-to-rail.
VVD = VDD
Lecture 16: Power Reduction Techniques November 5, 2013
Low-Power Mode
high-speed: MNX & MPX ONlow-power: MNX ON, MPX OFFsleep: MNX OFF, MPX OFF
MODEOPERATION:
output swing:GND-to-(VDD-VTH).
VVD = VDD - VTH
OUT
VVD
~SLEEP LOW_POWER v SLEEP
VDD
GND GND
VDD
S S ...
SRAM cell CONFIG
…..
i1
i2
i3
i4
in
SMNX MPX
sLOW_POWER ~LOW_POWER
MUX
VVD
output swing:GND-to-(VDD-VTH).
Lecture 16: Power Reduction Techniques November 5, 2013
Sleep Mode
high-speed: MNX & MPX ONlow-power: MNX ON, MPX OFFsleep: MNX OFF, MPX OFF
MODEOPERATION:
OUT
VVD
~SLEEP LOW_POWER v SLEEP
VDD
GND GND
VDD
S S ...
SRAM cell CONFIG
…..
i1
i2
i3
i4
in
SMNX MPX
sLOW_POWER ~LOW_POWER
MUX
VVD
Lecture 16: Power Reduction Techniques November 5, 2013
Leakage Power Results: Anderson
36
60.8
39.7 38.7
0.30
10
20
30
40
50
60
70
LP mode Sleep mode LP mode(+unused
fanout)
LP mode(+usedfanout)
Traditionalswitch
% le
akag
e p
ow
er
red
uct
ion
vs
. h
igh
-sp
eed
mo
de
Basic
Lecture 16: Power Reduction Techniques November 5, 2013
Region Constrained Placement
• Rather than just focusing on routing, consider constraining logic
• Most circuits exhibit locality
• Gayasen: FPGA’2004
Lecture 16: Power Reduction Techniques November 5, 2013
Region Constrained Placement
• Several issues to consider
• Size of sleep transistor
- Too large: increases leakage, area
- Too small: affects logic performance
• Size of region
- Too large: possibly unused resources, complicates placement
- Too small: Sleep transistors take up too much room
Lecture 16: Power Reduction Techniques November 5, 2013
Experimental Flow: RCP
• Different region sizes considered for flow
• Area constraints for portions of design determined by hand
• May encourage designers to create granular designs
Lecture 16: Power Reduction Techniques November 5, 2013
Power Savings: RCP
• Note significant reduction in leakage power savings as region size increases
• Bottom curve primarily due to luck
Lecture 16: Power Reduction Techniques November 5, 2013
Performance Limitation: RCP
• Performance limited by use of regions
• Nearly 10% clock frequency reduction for many designs
Lecture 16: Power Reduction Techniques November 5, 2013
Low-swing Signalling
• Techniques we have examined so far look at tinkering with supply voltage
• Also possible to modify wire signalling to reduce voltage swing
• Most of FPGA is made up of interconnect
• Approach targets dynamic power consumption
George and Rabaey: 1997
Lecture 16: Power Reduction Techniques November 5, 2013
Low-swing Signalling
• Interconnect swing is at 0.8V while rest of circuit operates at 1.5V
• Cascode circuitry used at sink to overcome slow speed issues
• 50% energy savings at cost of 25% delay
Lecture 16: Power Reduction Techniques November 5, 2013
Alternate approach: Modifying FPGA CAD
• FPGA architecture modification impact all designs- even those that don’t care about power
• Can placement and routing be modified to consider dynamic power
- Need to know which signals are high toggle
- Attempt to minimize length of high-toggle wires
- Minimize impact on performance and area
• Techniques fit well into our previous work on placement and routing
Lamoreaux and Wilton
Lecture 16: Power Reduction Techniques November 5, 2013
Modifying FPGA CAD Placement
• Previous cost metrics for annealing considered bounding box wire length and timing costs
• Include additional term which considers signal switching activity
Lecture 16: Power Reduction Techniques November 5, 2013
FPGA Placement for Power
• Previous cost metrics for annealing considered bounding box wire length and timing costs
• Include additional term which considers signal switching activity
• Post-route energy reduced by 3.0%. Power decreased by 7% but delay increases by 4%
Lecture 16: Power Reduction Techniques November 5, 2013
FPGA Routing Modifications for Power
• Original routing cost function takes congestion b(n) and delay(n) into account
• Augment with factor that takes net activity into account
• Minimize length of most active nets, even in the presence of congestion.
Lecture 16: Power Reduction Techniques November 5, 2013
FPGA Routing for Power Results
• Potential benefits somewhat limited by placement
• Note that most nets have low activity
• Power is decreased by 6% but delay increased by 4%. Energy savings of about 3%
Lecture 16: Power Reduction Techniques November 5, 2013
FPGA Embedded Memory Blocks
° Embedded memory blocks (EMBs) are important parts of FPGAs
° Consume roughly 14% of Altera Stratix II dynamic power *
• Increasing in recent designs
* Stratix II Low Power Applications Note, 2005
Lecture 16: Power Reduction Techniques November 5, 2013
Embedded Memory Block Port Internal View
Write Data
MClk
MClk
Write Enable
Column MuxWrite BuffersSense Amps
Row Decode
Read Data
ReadEnable Latch
AddressMClk
MClkClk Enable
Clk
RAM cell
BIT BIT
Bit LinePre-charge
MClk
Reducing clocking saves dynamic power
Lecture 16: Power Reduction Techniques November 5, 2013
Power Optimization #1
° Convert EMB read enable/write enable signals to associated read/write clock enable signals
° Limitations
• Each port has read or write enable control signal
• Embedded memory block has read enable input
Clock
Wren
DataData
WriteAddress
ReadAddress
Q
Write enable
Read enable
Q
Rden
Vcc VccWr clkenable
Rd clkenable
WriteAddress
ReadAddress
Clock
Wren
DataData
WriteAddress
ReadAddress
Q
Write enable
Read enable
Q
Rden
Vcc Vcc
Wr clkenable
Rd clkenable
WriteAddress
ReadAddress
Before After
Lecture 16: Power Reduction Techniques November 5, 2013
Implementation
° Conversion mode • Ties off R/W enable to RAM clock enables
• Doesn’t make transform if CE already present on port
° Combining mode
• AND user RAM clock enables with derived R/W clock
• Could impact performance
Combined Write Clk Enable
Write Enable
User-defined Write Clk Enable
Lecture 16: Power Reduction Techniques November 5, 2013
FPGA RAM Processing
° FIFOs and Shift registers converted into logical RAMs
° Logical RAMs mapped to RAM blocks
FIFO, Shift Register, RAM specification
Create Logical Memory
Logical RAMs/logic
Logical-to-physical
RAM processing
RAM blocks/ logic
Memory/logic
placement
Placed Memory
Lecture 16: Power Reduction Techniques November 5, 2013
Mapping RAM to EMBs
° Implementation choice can impact design area, performance, and power.
° Some mappings may require multiple EMBs
4k deep x 4 wide
16K bits4K bits 4K bits 4K bits 4K bits
M4K M4K M4K M4K
User-defined (logical) memory
Physical (EMB) memory
512K MRAM
Lecture 16: Power Reduction Techniques November 5, 2013
Memory Organization
° Each EMB can be configured to have different depth and width (e.g. Stratix II M4K)
° All hold 4K bits
° Slightly lower power consumption for wider EMB configurations (not including routing)
4K words deep
1 bit wide
32 bits wide
128 words deep
8 bits wide
512 words deep
Lecture 16: Power Reduction Techniques November 5, 2013
Area and Delay Optimal Mapping
° Configure each EMB to be as deep as possible
° Number of address bits on each EMB same as on logical memory
° Area and performance efficient: no external logic needed
° Power inefficient: All EMBs must be active during each logical RAM access
4k words deep and 1 bit wide(4 times)
Addr[0:11]
Data[0:3]
4k words deep and 4 bits wide
Logical memory
4 EMBs active during access
EMB
Vertical Slicing
Lecture 16: Power Reduction Techniques November 5, 2013
Alternative Mapping
° Configure EMB to have width of logical RAM (e.g. 1Kx4)
• Allows shutdown of some RAMs each cycle
• But adds some logic
° Saves RAM power, adds combinational logic and register power
More Power Efficient:
1K deep x 4 wide
(4 times)
1 EMB active during access
AddrDecoder
4
Addr[0:9]
Addr[10:11]
Data[0:3]
4k words deep and 4 bits wide
Logical memory
Addr[10:11]
Horizontal Slicing
Lecture 16: Power Reduction Techniques November 5, 2013
RAM Slicing - Example
° Power reduction available with different slicing
4kx32 Dynamic Power
0
20
40
60
80
100
120
140
Maximum Depth
Dyn
amic
Po
wer
(m
W)
Best range
Multiplexer Power Increasing
128 256 512 1k 2k 4k
EMB Power Increasing
Lecture 16: Power Reduction Techniques November 5, 2013
Power Optimization #2: Power-aware RAM Partitioning
° Algorithm considers possible logical to physical RAM mappings
Completed placement
Insert Decode and Mux Logic
FIFO, Shift Register
Create Logical Memory
Power-aware Physical RAM
processing
Memory/Logic
Placement
Power Library
Lecture 16: Power Reduction Techniques November 5, 2013
Experimental Approach
° 40 designs evaluated
° Quartus 5.1
° Mapped to smallest possible device and target max frequency
° Simulation with test vectors
° Power analysis with PowerPlay
Lecture 16: Power Reduction Techniques November 5, 2013
Memory Power
° 21.0% average reduction for all techniques (9.7% with convert/combine)
-10
0
10
20
30
40
50
60
70
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Designs
% D
yn
Po
we
r R
ed
uc
tio
n
Enable convert/combine
Enable convert/combine + Mempartition
Lecture 16: Power Reduction Techniques November 5, 2013
Overall Core Dynamic Power
° 6.8% average power reduction for all techniques (2.6% with convert/combine)
-5
0
5
10
15
20
25
30
35
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Designs
% D
yn.
Po
wer
Red
uct
ion
Enable convert/combine
Enable convert/combine + mempartition
Lecture 16: Power Reduction Techniques November 5, 2013
Design Performance
° 1.0% average performance loss for all techniques (0.1% for enable convert/combine)
Average Design Clock Frequency
-30
-25
-20
-15
-10
-5
0
5
10
Designs
% F
req
uen
cy Im
pro
vem
ent
EnableConvert/Combine
EnableConvert/Combine +Mem Partition
Lecture 16: Power Reduction Techniques November 5, 2013
Results Summary
° Almost 7% core dynamic power reduction across all designs
• Some designs benefit more than others
° Minimal clock frequency hit for most designs
Enable convert
Enable convert/ combine
Enable convert/
combine + Mem
partition
Core dynamic power -1.8% -2.6% -6.8%
Memory dynamic power -6.3% -9.7% -21.0%
Max clk freq -0.1% -0.2% -1.0%
LUT count 0.0% 0.1% 0.7%
Lecture 16: Power Reduction Techniques November 5, 2013
Impact of Multiple Embedded Memory Blocks° Rerun 40 designs but only allow one type of target EMB for each
mapping
° All designs targeted to Stratix II EP2S180
° Significant power impact for most designs versus EP2S180 target with no restrictions
M512 M4K M-RAM
Designs completed 23 38 4
Core dynamic power 40.4% 6.6% 47.3%
Memory power 279.5% 33.3% 754.0%
Max clk freq. -2.2% 0.6% -1.0%
LUT count 0.4% -0.5% 0.0%
Lecture 16: Power Reduction Techniques November 5, 2013
Summary
° Key to reducing RAM power is keeping clocks disabled.
° Movement of read/write enables to clock enables limits dynamic activity
° Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement
° Overall
• About 21% average memory power reduction
- 10% enable convert/combine
• About 7% average dynamic power reduction
- 3% enable convert/combine
• Diversity of EMBs reduces power by 33%
Lecture 16: Power Reduction Techniques November 5, 2013
Summary
• FPGA power consumption under consideration at numerous level: architecture, circuit, CAD, and physical
• FPGA companies just now embracing power-aware CAD, power-aware architectures on the way
• Many circuit-level techniques still possible
• RTL CAD synthesis techniques provide a promising area for exploration