Low Power Optimization Techniques Pr. Amara AMARA · Low-power circuits Parallelism Power aware OS...
Transcript of Low Power Optimization Techniques Pr. Amara AMARA · Low-power circuits Parallelism Power aware OS...
CiEN Seminar, May 26 2011, Paris 1
Low Power Optimization Techniques
Pr. Amara AMARA [email protected]
Institut Supérieur d’Electronique de Paris
CiEN Seminar, May 26 2011, Paris 2
OUTLINE
Introduction Power Components Dynamic Power Optimization Leakage Power Optimization Dynamic Power Management Conclusions
CiEN Seminar, May 26 2011, Paris 3
Motivations for Low Power Enables developers to address more power-
constrained or thermally-restrictive market; Provides developers more freedom to increase
capabilities within the same thermal and power budget;
Lowers operational and material costs and results in more compact products;
Reduces stringent cooling requirements; Provides social responsibility benefits. Component suppliers must provide developers and manufacturers with the best options to save energy.
CiEN Seminar, May 26 2011, Paris 4
ION and IOFF Trends (ITRS Roadmap)
10 -7
10 -6
10 -5
0.0 001
0.001
0.01
0.1
1
10
020406080100
Ioff Trends (ITRS)Ioff (LOP)Ioff (LSTP)Ioff (HP)
Technology node (nm)100
1000
10 4
10 5
10 6
10 7
10 8
10 9
020406080100
Ion/Ioff Trends for 3 Transistor Types (ITRS)
Ion/Ioff (LOP)Ion/Ioff (LSTP)Ion/Ioff (HP)
Technology node (nm)
CiEN Seminar, May 26 2011, Paris 5
Introduction: Low Power Technology
Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism Power aware OS
SOI Operand isolation Memory architecture Compilers
Material Clock-gating Cache partitioning Memory Access Multi-gate (FinFet,
…) Voltage and frequency
scaling Thermal monitor
Asynchronous design
Power gating
Body bias
Stacked transistors
CiEN Seminar, May 26 2011, Paris 6
OUTLINE Introduction Power Components
Dynamic Power Leakage Power
Dynamic Power Optimization Leakage Power Optimization Dynamic Power Management Conclusions
CiEN Seminar, May 26 2011, Paris 7
Dynamic Power
The switching activity α is the average percentage of the nodes that actually toggles 0->1 in the total chip The switching activity includes
glitches spurious activity The switching activity increases
dramatically with pipelining CL is the total equivalent Capacitance
CL includes both gate and wire capacitance
CL is an average capacitance (Caps vary with biasing, Xtalk, …)
P = α CL VS VDD FCLK
-> CL・VDD・VS energy consumption per cycle
VDD
CL
E=QVDD Q=CLVS
CiEN Seminar, May 26 2011, Paris 8
Dynamic Power Reduction
Lowering switching probability (α) Gated clock, Conditional F/F Low transition coding
Lowering load capacitance (CL) Embedded memory, Gate sizing Low-k
Lowering supply voltage (VS ,VDD) Most effective (∝VDD
2) and popular, but at the cost of speed degradation
VTH should also be lowered for high-speed circuit operation
Lowering operating frequency (fCLK) Better algorithm, parallelism
CiEN Seminar, May 26 2011, Paris 9
Leakage Components
Leakage current Current that flows due to inability to shut-off
transistor Transistors are harder to turn off with smaller
Vth Standby Leakage
Leakage that occurs when overall circuit is sleeping (example: Cell phone in sleep mode)
Active Leakage Leakage that occurs while overall circuit is
operating (example FPU while PC performs integer operation)
CiEN Seminar, May 26 2011, Paris 10
Leakage Current Trends
gate Source
Drain
IPT
Isub
IG Bulk IGIDL IR
Subthreshold and gate leakage are The dominant ones for the current technologies
CiEN Seminar, May 26 2011, Paris 11
OUTLINE Introduction Power Components Dynamic Power Optimization
Clock Gating Encoding Dual Vdd, DVS, DVFS
Leakage Power Optimization Dynamic Power Management Conclusions
CiEN Seminar, May 26 2011, Paris 12
Clock Gating Principle
Goal Disable or suppress transitions from propagating to parts of the clock path (FFs, clock network and logic) under a given IDLE condition.
Principle To each sequential functional unit is associated a block CG which inhibits the clock signal when the IDLE condition is true. The IDLE condition is computed by function Fcg
CiEN Seminar, May 26 2011, Paris 13
Clock Gating Implementation
Simplest way to implement block CG but subject to spikes.
When CLK is low, spikes are filtered by the AND When CLK is high, spikes are filtered by the Latch
Flip-Flop-Based Design
When CLK is high, spikes are filtered by the NOR When CLK is low, spikes are filtered by the Latch
CiEN Seminar, May 26 2011, Paris 14
DSP/ HIF
DEU
MIF
VDE
896Kb SRAM
How effective is Clock-gating?
M. Ohashi, Matsushita, ISSCC 2002
90% of F/F’s were clock-gated. 70% power reduction by clock-gating alone.
MPEG4 decoder
10
8.5mW
0 15 5
30.6mW
20 25
Without clock gating
With clock gating
Power [mW]
CiEN Seminar, May 26 2011, Paris 15
Data Path: Pre-Computation
Principle: Partition the inputs into pre-computed and gated
inputs If output f is independent of gated inputs then
predictor G generates a signal that freezes the outputs of R2.
Function G is not unique best trade-off to find
CombinationalCircuit
R1
R2
G(X)
Pre-computed inputs
Gated inputs
Output
CiEN Seminar, May 26 2011, Paris
Data Path: Pre-Computation
Power savings up to 75% can be achieved with marginal increase in delay and area.
The pre-computation logic is: g1 = An . Bn g2 = An . Bn g1 + g2 = An xor Bn
Example: Binary comparator (A>B)
16
CiEN Seminar, May 26 2011, Paris 17
Data Path: Guarded Evaluation
Applicable to combin. Blocks emb. within logic
If Y is idle, transparent latches are inserted to all inputs
Control circuitry is added to determine the IDLE condition
The IDLE condition is used to disable the latches.
CiEN Seminar, May 26 2011, Paris 18
Bus Coding
Sender Receiver b(t)
Less switching activity
Sender Receiver Encoder Decoder B(t) b(t) b(t)
b(t): Source word B(t): Code word
CiEN Seminar, May 26 2011, Paris 19
Bus Invert Coding
The encoding depends on Hamming distance between the present bus value B(t) and the next bus value B(t+1)
(B(t), INV(t)) = (b(t), 0) if H <= N/2
(b’(t), 1) Otherwise
N: number of bus lines, H: Hamming Distance
CiEN Seminar, May 26 2011, Paris 20
Bus Invert Coding
00101010 00111011 2 11010100 7 11110100 1 00001101 6 01110110 6 00010001 5 10000100 4
00101010 00111011 0 2 00101011 1 2 11110100 1 1 00001101 0 3 10001001 1 3 00010001 0 4 10000100 0 4
Binary (31 Trs) BIC (19 Trs)
CiEN Seminar, May 26 2011, Paris 21
T0 Code: Principle Encoder
(B(t), INC(t)) (B(t-1), 1) If b(t) = b(t-1) + S
(b(t), 0) Otherwise
Decoder
b(t) (B(t-1) + S) If INC = 1
B(t) If INC = 0 S may be known by the encoder and the decoder or send on the bus
CiEN Seminar, May 26 2011, Paris 22
T0 Code: example
4 00000100 00000100 0 5 00000101 1 00000100 1 1 6 00000110 2 00000100 1 0 7 00000111 1 00000100 1 0 8 00001000 4 00000100 1 0 6 00000110 3 00000110 0 2 7 00000111 1 00000110 1 1 8 00001000 4 00000110 1 0
Binary encoding: T0 encoding: 16 Transitions 4 Transitions
CiEN Seminar, May 26 2011, Paris 23
OUTLINE Introduction Power Components Dynamic Power Optimization Leakage Power Optimization
Background Leakage Minimization techniques
Dynamic Power Management Conclusions
CiEN Seminar, May 26 2011, Paris 24
Minimum Leakage Vector
The leakage current of a logic gate is a strong function of its input values because they affect the number of OFF transistors in the NMOS and PMOS networks of a logic gate.
Leakage can be reduced in standby mode by driving the circuit with the MLV
Advantages: 1. No modification in the process technology is required 2. No change in the internal logic gates of the circuit is necessary 3. There is no reduction in voltage swing 4. Technology scaling does not have a negative effect on its
effectiveness or its overhead. In fact the stack effect becomes stronger with technology scaling as DIBL worsens.
CiEN Seminar, May 26 2011, Paris 25
Technology: 0.18 µm Supply Voltage = 1.5V Threshold Voltage = 0.2V
Minimum Leakage Vector
A0 A1 Leakage 0 0 23.60 nA
47.15 nA 51.42 nA
82.94 nA
0 1 1 0 1 1
A0 A1
A0
A1 MLV
The maximum leakage is 3X higher than the minimum
CiEN Seminar, May 26 2011, Paris 26
Low-VTH, Higher speed Higher power Critical paths
Non-critical paths
High-VTH, Lower speed Lower power
Dual VTH Design Principle
Critical Path
FF
FF
FF
FF FF
FF FF
FF
FF FF
0% 5%
10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60%
0% 5% 10% 15% 20% 25% % timing scaling from all high - Vt design
very
low - Vt
tran
sist
or w
idth
(a
s %
of t
otal
tran
sist
or w
idth
)
0% 5%
10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60%
0% 5% 10% 15% 20% 25% % timing scaling from all high - Vt design
very
low - Vt
tran
sist
or w
idth
(a
s %
of t
otal
tran
sist
or w
idth
)
Full low - Vt performance! low - Vt usage: 34%
0% 5%
10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60%
0% 5% 10% 15% 20% 25% % timing scaling from all high - Vt design
very
low - Vt
tran
sist
or w
idth
(a
s %
of t
otal
tran
sist
or w
idth
)
0% 5%
10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60%
0% 5% 10% 15% 20% 25% % timing scaling from all high - Vt design
very
low - Vt
tran
sist
or w
idth
(a
s %
of t
otal
tran
sist
or w
idth
)
Full low - Vt performance! low - Vt usage: 34%
Source: Vivek De, Intel
CiEN Seminar, May 26 2011, Paris 27
High-VDD, Low-VTH, Large-W Higher speed Higher power
Critical paths Non-critical paths
Low-VDD, High-VTH, Small-W Lower speed Lower power
Dual VDD/VTH reduces active leakage power
Critical Path
FF
FF
FF
FF FF FF
FF
FF
FF FF
Rule of thumb VDDL/VDDH=0.6~0.7 minimizes Ptotal. (Kuroda, JSSC 2000)
CiEN Seminar, May 26 2011, Paris 28
VTCMOS
Threshold voltage is controlled through substrate biasing
Vbb can be used to compensate Vth fluctuation
0.27±0.02V @active mode
In standby mode, Vbb is applied so that Vth>0.5V
CiEN Seminar, May 26 2011, Paris 29
Design Example (1)
16Mbit DRAM
Speech Codec
Multiplexer
MPEG-4 Video Codec
Host I/F
DRAM
I/F
PLL Cam I/F
Display I/F
Pre- filter
VT VT
VT VT
T=27 ℃
-0.1 0 0.1 0.2 0.3 0.4 0.5 (¦ V TH .p ¦+ V TH.n )/2 as processed (V)
VTCMOS in active mode
VTCMOS in standby mode
I DD
.leak
(A
)
1E-6
1E-5
1E-4
1E-3
1E-2
1E-1
1E+0
(Coutesy Kuroda)
CiEN Seminar, May 26 2011, Paris 30
Design Example (2)
QCIF 10fps Codec @ 30MHz 0.3µm CMOS 3 million transistors 9mm x 9mm Three designs 1) Conventional design at 3.3V 2) 2.5V design with VTH control by VTCMOS
0.2V ± 0.05V @ active 0.5V ± 0.05V @ standby 3) Dual-VDD (2.5V, 1.75V) design in VTCMOS
MPEG-4 Video Core
(Courtesy Kuroda)
CiEN Seminar, May 26 2011, Paris 31
Design Results
Impossible d'afficher l'image. Votre ordinateur manque peut-être de mémoire pour ouvrir l'image
Impossible d'afficher l'image. Votre ordinateur manque peut-être de mémoire pour ouvrir l'image ou l'image est endommagée. Redémarrez
Pow
er d
issi
patio
n (m
W)
100
90
80
70
60
50
40
30
20
10
0 3.3V
Conv.
Impossible d'afficher l'image. Votre ordinateur manque
Impossible d'afficher l'image. Votre ordinateur manque peut-être de mémoire pour
2.5V VTCMOS
Impossible d'afficher l'image. Votre
Impossible d'afficher l'image. Votre ordinateur manque
2.5V & 1.75V Dual-VDD&VTCMOS*
Logic F/F Clock Memory
-43%
-43%
-43%
-43%
-30%
-37%
-51%
±0%
Measured
Measured
* Additional 3 weeks in design time * <5% area increase for logic circuits
CiEN Seminar, May 26 2011, Paris 32
Multiple Power Domains
PD1_ON
PD2_ON
PD3_ON
CiEN Seminar, May 26 2011, Paris 33
Design Example: OMAP2420
1 Voltage Domain Voltage Scaling Dual lib. Synthesis SRAM Retention 5 Power Domains
#1: DSP Core #2: MCU Core #3: Graphic Accelerator #4: Core + Periph. #5: Always On logic
90-nm CMOS Tech. ~90M transistors
(Courtesy P.Royannez, ISSCC 2005)
CiEN Seminar, May 26 2011, Paris 34
Design Example: OMAP2420
Dual Gate length: 2X Leakage Reduction
Voltage Scaling: 2X Leakage Reduction
CiEN Seminar, May 26 2011, Paris 35
OUTLINE
Introduction Power Components Dynamic Power Optimization Leakage Power Optimization Dynamic Power Management Conclusions
CiEN Seminar, May 26 2011, Paris 36
Power Management Techniques
S. Lee et al, DAC, June 2000
Processor
Controller
Clock, VDD & VBB
Required speed
Software Hardware
Application
Operating System
Control Signals
Load prediction
Speed setting
CiEN Seminar, May 26 2011, Paris 37
Dynamic Voltage Scaling (DVS)
S. Lee et al, DAC, June 2000
Processor
Controller
VDD Required speed
Software Hardware
Application
Operating System
Control Signals
Load prediction
Speed setting
Normalized workload 0.0 0.2 0.4 0.6 0.8 1.0
Nor
mal
ized
pow
er
0.0
0.2
0.4
0.6
0.8
1.0
Variable Vdd Fixed Vdd
CiEN Seminar, May 26 2011, Paris
Dynamic Frequency Scaling (DFS)
The critical path delay of the module is already guaranteed with the highest operating frequency,
Lower operating frequency can be applied when the lower performance operation is executed
power consumed in the logic part falls to about 70% when the operating frequency is lowered from 266 MHz to 66MHz
38
CiEN Seminar, May 26 2011, Paris 39
DVFS (Dynamic V. & Freq. Scaling)
Operating voltage: Controllable / Variable Operating frequency: Controllable / Variable Energy consumption: Reduced
cf. [JSSC, Hazucga, et al. 2007]
DC/DC VDD Target
device
Ctrl.
CLK gen.
fCLK
(variable)
Power source
CiEN Seminar, May 26 2011, Paris 40
OUTLINE
Introduction Power Components Dynamic Power Optimization Leakage Power Optimization Dynamic Power Management Conclusions
CiEN Seminar, May 26 2011, Paris 41
Conclusions
!