VLSI Design Power Frank Sill Torres Department of Electronic Engineering, Federal University of...
-
Upload
prince-sara -
Category
Documents
-
view
218 -
download
2
Transcript of VLSI Design Power Frank Sill Torres Department of Electronic Engineering, Federal University of...
VLSI DesignPower
Frank Sill TorresDepartment of Electronic Engineering, Federal University of Minas Gerais,
Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
http://www.cpdee.ufmg.br/~frank/
Copyright Sill Torres, 2012
TRENDS
2
Copyright Sill Torres, 2012 3
Trend: Performance
0,01
0,1
1
10
100
1000
10000
100000
1000000
1970 1980 1990 2000 2010 2020
MIPS
1 TIPS
8080
8086
386 Pentium® proc
Pentium® 4 proc
Source: Moore, ISSCC 2003
Copyright Sill Torres, 2012
Trends – Power Dissipation
SoC Consumer Portable Power Trend [Source: ITRS, 2010 Update]
Copyright Sill Torres, 2012
Trends - Power Density
←Hot Plate
Nuclear Reactor →
Source: http://cpudb.stanford.edu/
Copyright Sill Torres, 2012 6
Problems of High Power Dissipation
Continuously increasing performance demands
Increasing power dissipation of technical devices
Today: power dissipation is a main problem
High Power dissipation leads to:
High efforts for cooling
Increasing operational costs
Reduced reliability
High efforts for cooling
Increasing operational costs
Reduced reliability
Reduced time of operation
Higher weight (batteries)
Reduced mobility
Reduced time of operation
Higher weight (batteries)
Reduced mobility
Copyright Sill Torres, 2012 7
Chip Power Density Distribution
Power density is not uniformly distributed across the chip Silicon is not a good heat conductor Max junction temperature is determined by hot-spots
Impact on packaging, cooling
Power Map On-Die Temperature
Copyright Sill Torres, 2012 8
„The Internet is an Electricity Hog“
Energy for the internet in 2001 in Germany:6.8 Bill. kWh = 1.4 % of total energy consumption 2.35 Bn. kWh for 17.3 Mill. Internet-PCs 1.91 Bn. for servers 1.67 Bn. for the network 0.87 Bn. for USV
Rate of growth (at the moment): 36 % per year Prognosis: 2010 33 Bn. kWh
> 6 % total energy consumption > 3 medium nuclear power plants
World: 400 Mill. PCs 0.16 PW (P = Peta=1015)
Badische Zeitung, 2003
Copyright Sill Torres, 2012 9
Dissipation in a Notebook
Peripherals
Disk Display
WLAN
Communication
EthernetBattery
Power supply
ASICs
Memory
programmable µPs or DSPs
Processing
DC-DC converter
Copyright Sill Torres, 2012 10
Energy dissipation in a notebook
Energy dissipation a PDA
Examples for Energy Dissipation
Copyright Sill Torres, 2012 11
Battery Capacity
Generalized Moore‘s Law
Capacity of batteries 2% - 6% Increase per year(up to year 2000)
Intel beats Varta
Source: Timmernann, 2007
Copyright Sill Torres, 2012 12
Current Progresses
Batter.20 kg
Factor 4 in the last 10 years still much too less
Copyright Sill Torres, 2012
POWER CONSUMPTION IN CMOS
13
Copyright Sill Torres, 2012 14
Metrics: Energy and Power
Energy Measured in Joules or kWh “Measure of the ability of a system to do work or produce a
change” “No activity is possible without energy.”
Power Measured in Watts or kW “Amount of energy required for a given unit of time.” Average power
Average amount of energy consumed per unit time Simplified to "power" in clear contexts
Instantaneous power Energy consumed if time unit goes to zero
Copyright Sill Torres, 2012 15
Metrics: Energy and Power cont’d
Instantaneous Electrical Power P(t) P(t) = v(t) * i(t) v(t): Potential difference (or voltage drop) across
component i(t): Current through component
Electrical Energy E = P(t) * t = v(t) * i(t) * t
Electrical Energy in CMOS circuits Energy = Power * Delay Why?
Copyright Sill Torres, 2012 16
CL
Consumption in CMOS
Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second
(liter/s) Energy Amount of Water
Energy consumption is proportional to capacitive load!
0
1
Copyright Sill Torres, 2012 17
CL
Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second
(liter/s) Energy Amount of Water
Consumption in CMOS cont’d
Energy for calculation only consumed at 0→1 at output
0
1
Copyright Sill Torres, 2012 18
Energy and Instantaneous Power
CL
CL
INV1: High instantaneous Power (bigger width)
INV2:Low instantaneous power
td1 td2
Same Energy (Cin ingnored)
INV1 is faster
Copyright Sill Torres, 2012 19
Watts
time
Power is height of curve
Watts
time
Energy is area under curve
Approach 1
Approach 2
Approach 2
Approach 1
Metrics: Energy and Power cont’d
Energy = Power * time for calculation = Power * Delay
Copyright Sill Torres, 2012 20
Metrics: Energy and Power cont’d
Energy dissipation Determines battery life in hours Sets packaging limits
Peak power Determines power ground wiring designs Impacts signal noise margin and reliability
analysis
Copyright Sill Torres, 2012 21
Metrics: PDP and EDP
Power-Delay Product Power P, delay tp
Quality criterion PDP = P * tp [J] P and tp have some weight
Two designs can have same PDP, even if tp = 1 year
Energy-Delay Product EDP = PDP * tp = P * tp
2
Delay tp has higher weight
Copyright Sill Torres, 2012 22
Energy and Power
Average Power direct proportional to Energy
In Following: Power means average power
Copyright Sill Torres, 2012 23
Where Does Power Go in CMOS?
Dynamic Power Consumption
Charging and discharging capacitors
Short Circuit Currents
Short circuit path between supply rails during switching
Leakage
Leaking diodes and transistors
Copyright Sill Torres, 2012 24
Dynamic Power Consumption
Pdyn = CL * VDD2 * P01 * f
P01 : probability for 0-to-1 switch of output
f : clock frequency α : activity
Data dependent - a function of switching activity!
Vin Vout
CL
VDD
f01= α * f
Copyright Sill Torres, 2012 25
Dynamic Power Consumption
0
0
0
2
( ) ( )
( )
DD
c DD
L DD
V
L DD
L DD
E I t V t dt
dVC V t dt
dt
C V dV
C V
CL
VDD
Copyright Sill Torres, 2012 26
Example: Static 2 Input NOR Cell
PA=1 = 1/2 PB=1 = 1/2
POut=0 = 3/4
POut=1 = 1/4
P0→1 = POut=0 * POut=1
= 3/4 * 1/4 = 3/16
Then:
Transition Probabilities for CMOS Cells
A B Out
1 1 0
0 1 0
1 0 0
0 0 1
Truth table of NOR2 cell
If A and B with same input signal probability:
Ceff = P0→1 * CL = 3/16 * CL
Copyright Sill Torres, 2012 27
P01 = Pout=0 * Pout=1
NOR (1 - (1 - PA)(1 - PB)) * (1 - PA)(1 - PB)
OR (1 - PA)(1 - PB) * (1 - (1 - PA)(1 - PB))
NAND PAPB * (1 - PAPB)
AND (1 - PAPB) * PAPB
XOR (1 - (PA + PB- 2PAPB)) * (PA + PB- 2PAPB)
Transition Probabilities cont’d
A and B with different input signal probability: PA and PB : Probability that input is 1 P1 : Probability that output is 1
Switching activity in CMOS circuits: P01 = P0 * P1
For 2-Input NOR: P1 = (1-PA)(1-PB)
Thus: P01 = (1-P1)*P1 = [1-(1-PA)(1-PB)]*[(1-PA)][1-PB] (see next slide)
Copyright Sill Torres, 2012 28
Transition Probability of NOR2 Cell as a Function of Input Probabilities
Transition Probabilities cont’d
Probability of input signals → high influence on P01
Source: Timmernann, 2007
Copyright Sill Torres, 2012 29
Short Circuit Power Consumption
Finite slope of input signal During switching: NMOS and PMOS transistors are conducting for
short period of time (tsc)
Direct current path between VDD and GND
Psc = VDD * Isc * (P01 + P10 )
Vin Vout
CL
Isc
VDD
GND
tsc
Copyright Sill Torres, 2012 30
Leakage Power Consumption
Most important Leakage currents:
Subthreshold Leakage Isub
Gate Oxide Leakage Igate
Pleak = Ileak * VDD ≈ (Isub + Igate)* VDD
VDD
GND
CL
Isub
Igate
SiO2
Source Drain
Gate
Igate
Isub
L
Copyright Sill Torres, 2012
P = α f CL VDD2 + VDD Ipeak (P01 + P10 ) + VDD Ileak
31
Dynamic power(≈ 40 - 70% today and decreasing
relatively)
Short-circuit power(≈ 10 % today and
decreasing absolutely)
Leakage power(≈ 20 – 50 %
today and increasing)
Power Equations in CMOS
Copyright Sill Torres, 2012
LEAKAGE
32
Copyright Sill Torres, 2012 33
Si Substrate
Metal Gate
High-kTri-Gate
S
G
D
III-V
S
Carbon Nanotube FET
50 nm
35 nm
30 nm
SiGe S/D
Strained Silicon
SiGe S/D
Strained Silicon
90 nm65 nm
45 nm32 nm
20042006
20082010
2012+
Technology Generation
20 nm 10 nm
5 nm5 nm
5 nm
Nanowire
Manufacturing Development Research
Trends
Copyright Sill Torres, 2012 34
0
200
400
600
800
1000
1200
1400
90 nm 65 nm 45 nm 32 nm 22 nm 16 nm
Pow
er D
issi
pat
ion
[W]
(10
0 m
m²
Ch
ip)
Technologie
0
200
400
600
800
1000
1200
1400
90 nm 65 nm 45 nm 32 nm 22 nm 16 nm
Pow
er D
issi
pat
ion
[W]
(10
0 m
m²
Ch
ip)
Technology
Trends cont‘d
Dynamic Power Dissipation
Power Dissipation by Leakage currents
Source: S. Borkar (Intel), ‘05
Copyright Sill Torres, 2012 35
Recap: Transistor Geometrics
n+ n+
p-type body
polysilicongate
Gate length
L
Source: Rabaey,“Digital Integrated Circuits”,1995
Gate-widthW
SiO2 gate oxide(good insulator, eox = 3.9
tox – thickness of oxide layer
tox
Copyright Sill Torres, 2012 36
Gate
Vgs < Vth
DrainSource
Gate
Vgs > Vth
DrainSource
Subthreshold Leakage Threshold Voltage
Transistor characteristic If: „Gate-Source“-Voltage Vgs higher
than Vth
Channel under Gate Current between Drain and Source If: Vgs lower than Vth
(ideal) No current
Subthreshold leakage Isub Leakage between Drain and
Source when Vgs < Vth
Based on: Short Channels Diffusion Thermionic Emission
Source Drain
Gate
Isub
high Concentration
Low concentration
Diffusion
Copyright Sill Torres, 2012 37
Subthreshold Leakage cont’d
0 Vth’ Vth
Lo
g (
Dra
in c
urr
en
t)
Gate voltage
Short-channel device
Isub
Source: Agarwal, 2007
Transistor is conducting
NMOS-Transistor
Copyright Sill Torres, 2012 38
Gate
DrainSource
Vds
Drain Induced Barrier Lowering (DIBL)
Gate
DrainSource
Vds
Pot
entia
l
Electrons have to overcome potential barrier to enter the channel Ideal: Potential barrier is only controlled by gate voltage
Changed by gate voltage
Vgs < Vth Vgs > Vth
Height of curve = Potential barrier
Copyright Sill Torres, 2012 39
Drain Induced Barrier Lowering cont’d
At short channel transistors potential barrier is also affected by drain voltage
If Vds = VDD Transistors can start to conduct even if Vgs < Vth
Short-channel transistor (L < 180 nm)Long-channel transistor (L > 2 µm)
Vds = Vth
Vds = VDD
Gate
DrainSource
Vds
Vds = Vth
Vds = VDD
G
DS
Vds
Lowering of potential barrier
Copyright Sill Torres, 2012 40
0
4
8
12
16
20
0 20 40 60 80 100 120
Nor
mal
ized
I sub
/µm
Temperature (°C)
Temperature dependence
IOFF at 1100C
Isub at 250C
130nm6x70
nm1
6x
Based on Thermionic Emission: subthreshold leakage Isub increases with temperature
Source: Chatterjee, Intel-labs
Copyright Sill Torres, 2012 41
GateGateoxide
DrainSource
Tox
Gate Oxide Leakage
Igate
Tunneling effect Electromagnetic wave strike at
barrier:
Reflection + Intrusion into barrier
If thickness is small enough:
Wave interfuse barrier partially: (Electrons tunnel through Barrier)
Gate oxide leakage Igate
In Nanometer-Transistors, where Tox< 2 nm
Electrons tunnel through gate oxide
Leakage current
0
x
Potential Energy
0
x
Potential Energy
Tox
Copyright Sill Torres, 2012 42
Gate Oxide Thickness at 45 nm
Copyright Sill Torres, 2012 43
Gate Oxide Leakage cont’d
Components of Gate Oxide Leakage: Tunneling currents through overlap regions (gate-drain Igso, gate-
source Igdo)
Tunneling currents into channel (gate-drain Igis, gate-source Igcd)
Tunneling currents between gate and bulk (Igb)
Drain
Gate
SourceIgb
IgdoIgso IgcdIgcs
Bulk
Copyright Sill Torres, 2012 44
Gate
DrainSource
Further Leakage Components
Reverse bias pn junction conduction Ipn
Gate induced drain leakage IGIDL
Drain source punchthrough IPT
Hot carrier injection IHCI IHCI
Ipt
IGIDL
Ipn
Copyright Sill Torres, 2012 45
Leakage Dependencies
Leakage depends on: Gate Width (Isub, Igate)
Gate Length (Isub, Igate)
Gate Oxide Thickness (Igate)
Threshold Voltage (Isub)
Temperature (Isub)
Input state (Igate)
Copyright Sill Torres, 2012
LOW POWER TECHNIQUES
46
Copyright Sill Torres, 2012 47
Reducing VDD has a quadratic effect!
Has a negative effect on performance especially as VDD
approaches 2VT
Lowering CL
Improves performance as well Keep transistors minimum size
Reducing the switching activity, f01 = P01 * f A function of signal statistics and clock rate Impacted by logic and architecture design decisions
Lowering Dynamic Power
Copyright Sill Torres, 2012 Micro transductors ‘08, Low Leakage 48
Power & Delay Dependence of Vth
K
L DDd
DD TH
k Q k' C Vt
I (W / L ) (V V )
w.o. gate leakage
20
0
10THV
T St L DD DDCLK
WP p f C V I V
W
Source: Sakurai, ‘01
Copyright Sill Torres, 2012 49
Transistor Sizing for Power Minimization
Larger sized devices: only useful only when interconnects dominate Minimum sized devices: usually optimal for low-power
Small W’s
Large W’s
Higher Voltage
Lower Voltage
Lower Capacitance
Higher Capacitance
Source: Timmernann, 2007
To keep performance
Copyright Sill Torres, 2012 50
Logic Style and Power Consumption
Voltage increases: Power-delay product improves
Best logic style minimizes power-delay for a given delay constraint
New Logic style can reduced Power dissipation
(if possible / available !)
Source: Jan M. Rabaey
Copyright Sill Torres, 2012 51
Logic Restructuring
Chain implementation has a lower overall switching activity than tree implementation for random inputs
BUT: Ignores glitching effects
Logic restructuring: changing the topology of a logic network to reduce transitions
A
BC
D F
AB
CD Z
FW
X
Y0.5
0.5
(1-0.25)*0.25 = 3/16
0.50.5
0.5
0.5
0.5
0.5
7/64 = 0.109
15/256
3/16
3/16 = 0.188
15/256
AND: P01 = P0 * P1 = (1 - PAPB) * PAPB
Source: Jan M. Rabaey
Copyright Sill Torres, 2012 52
Input Ordering
Beneficial: postponing introduction of signals with a high transition rate (signals with signal probability close to 0.5)
A
BC
X
F
0.5
0.20.1
B
CA
X
F
0.2
0.10.5
(1-0.5x0.2)*(0.5x0.2)=0.09 (1-0.2x0.1)*(0.2x0.1)=0.0196
AND: P01 = (1 - PAPB) * PAPB
Source: Jan M. Rabaey
Copyright Sill Torres, 2012 53
ABC
X
Z
101 000
Unit Delay
AB
X
ZC
Glitching
Source: Jan M. Rabaey
Copyright Sill Torres, 2012 54
0 1 2 3t (nsec)
0.0
2.0
4.0
6.0
V (
Vo
lt)
out1out3
out5out7
out2out4
out6out8
1out1 out2 out3 out4 out5
...
Example 1: Chain of NAND Cells
VDD / 2
Source: Jan M. Rabaey
Copyright Sill Torres, 2012 55
Example 2: Adder Circuit
S0S1S2S14S15
Cin
0
1
2
3
0 2 4 6 8 10 12
Time (ps)
S O
utp
ut
Vo
ltag
e (
V)
Cin
S0
S1
S2
S3
S4
S5S10
S15 VDD / 2
Source: Jan M. Rabaey
Copyright Sill Torres, 2012 56
How to Cope with Glitching?
F1
F2
F3
0
0
0
0
1
2
F1
F3
F20
0
0
01
1
Equalize Lengths of Timing Paths Through Design
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Power is reduced by two mechanisms–Clock net toggles less frequently, reducing feff
–Registers’ internal clock buffering switches less often
Clock Gating
Local Gating Global Gating
clkqn
qd doutdin
en
clk
clkqn
qd doutdin
en
clk
FSM
ExecutionUnit
MemoryControl
clk
enM
enE
enF
Sou
rce:
Jan
M.
Rab
aey
Copyright Sill Torres, 2012
Clock Gating Insertion
Local clock gating: 3 methods Logic synthesizer finds and implements local gating
opportunities RTL code explicitly specifies clock gating Clock gating cell explicitly instantiated in RTL
Global clock gating: 2 methods RTL code explicitly specifies clock gating Clock gating cell explicitly instantiated in RTL
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Clock Gating VHDL CodeConventional RTL Code
//always clock the register if rising_edge (clk) then // form the flip-flop if (enable = ‘1’)then q <= din; end if; end if;
Low Power Clock Gated RTL Code
//only clock the register when enable is true gclk <= enable and clk; // gate the clock if rising_edge (gclk) then // form the flip-flop q <= din; end if;
Instantiated Clock Gating Cell
//instantiate a clock gating cell from the target library I1: clkgx1 port map(en=>enable, cp=>clk, gclk_out=>gclk);
if rising_edge (gclk) then // form the flip-flop q <= din; end if; Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Clock Gating: Example
DSP/HIF
DEU
MIF
VDE
896Kb SRAM
Source: M. Ohashi, Matsushita, 2002
90% of FlipFlops clock-gated
70% power reduction by clock-gating
MPEG4 decoder
10
8.5mW
0 155
30.6mW
20 25
Without clock gating
With clock gating
Power [mW]
Copyright Sill Torres, 2012
Data Gating
ObjectiveReduce wasted operations => reduce feff
Example Multiplier whose inputs change
every cycle, whose output conditionally feeds an ALU
Low Power Version Inputs are prevented from
rippling through multiplierif multiplier output is not selected
X
X
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Data Gating Insertion
Two insertion methods Logic synthesizer finds and implements data gating
opportunities RTL code explicitly specifies data gating
Some opportunities cannot be found by synthesizers
Issues Extra logic in data path slows timing Additional area due to gating cells
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Data Gating VHDL Code: Operand Isolation
Conventional Code assign muxout = sel ? A : A*B ; // build mux
Low Power Code
assign multinA = sel & A ; // build and cell assign multinB = sel & B ; // build and cell assign muxout = sel ? A : multinA*multinB ;
X
sel
B
Amuxout
X
sel
B
Amuxout
Source: Jan M. Rabaey
Copyright Sill Torres, 2012 64
30
35
40
45
50
55
0
40
80
120
160
0.25 0.27 0.29 0.31 0.33 0.35 0.37
Lea
kag
e -
I sub
[nA
]
Threshold Voltage VthNMOS [V]
Inverter (BPTM 65 nm)
30
35
40
45
50
55
0
40
80
120
160
0.25 0.27 0.29 0.31 0.33 0.35 0.37
Leak
age
-I s
ub[n
A]
Threshold Voltage VthNMOS [V]
Inverter (BPTM 65 nm)
Dea
ly [
ps]
Influence of Threshold Voltage Vth
Threshold Voltage Vth: Influence on sub-threshold leakage Isub
Influence on delay of logic cells
IsubDelay
Copyright Sill Torres, 2012 65
25
30
35
40
45
50
0
40
80
120
160
1.4 1.6 1.7 1.8 2.0 2.2
Lea
kag
e -
I gat
e[n
A]
Gate oxide Thicknes Tox [nm]
Inverter (BPTM 65 nm)
25
30
35
40
45
50
0
40
80
120
160
1.4 1.6 1.7 1.8 2.0 2.2
Lea
kag
e -
I gat
e[n
A]
Gate oxide Thicknes Tox [nm]
Inverter (BPTM 65 nm)
Del
ay [p
s]
Influence of Gate Oxide Thickness Tox
Gate oxide Thickness Tox: Influence on gate oxide leakage Igate
Influence on delay
IgateDelay
Copyright Sill Torres, 2012 66
FF
FF
FF
FF
FF
FF
FF
FF
FF
CLK CLK CLK
Recap: Data Paths
Data propagate through different data paths between registers (flipflops - FF)
Paths mostly differ in propagation delay times Frequency of clock signal (CLK) depends on path with longest delay
critical path
Paths
Path
Copyright Sill Torres, 2012 67
Recap: Slack
B
A
Y
C
time
all Inputs of G1arrived
G1 ready withevaluation
delay of G1
all inputs of G2arrived
Slack for G1
BA Y
C
G1G2
Copyright Sill Torres, 2012 68
Dual-Vth / Dual-Tox
Two different cell types:
Cells consist of „low-Vth“- or „low-Tox“-transistors
Low threshold voltage or thin gate oxide layer For critical paths High leakage / short delay
Cells consist of „low-Vth“- or „low-Tox“-transistors
Low threshold voltage or thin gate oxide layer For critical paths High leakage / short delay
“LVT / LTO”- Cells
Cells consist of „high-Vth“- „high-Tox“-transistors
High threshold voltage or thick gate oxide layer For uncritical paths Low leakage / long delay
Cells consist of „high-Vth“- „high-Tox“-transistors
High threshold voltage or thick gate oxide layer For uncritical paths Low leakage / long delay
“HVT / HTO”- Cells
Leakage reduction at constant performance
(no level converter necessary)
Copyright Sill Torres, 2012 69
Performance at different Dual-Vth
High VthLow Vth0.0
0.2
0.4
0.6
0.8
1.0
1.0V 0.9V 0.8V 0.7V 0.6V
Nor
mal
ized
Perf
orm
ance
Supply Voltage VDD
Measured at NAND2 BPTM 65nm Technology
Copyright Sill Torres, 2012 70
High VthLow Vth0
20
40
60
80
1.0V 0.9V 0.8V 0.7V 0.6V
Sub-
Thre
shol
d Le
kage
[nA]
Supply Voltage VDD
Leakage Isub at different Dual-Vth
Measured at NAND2 BPTM 65nm Technology
Copyright Sill Torres, 2012 71
Dual-Vth / Dual-Tox Example
Critical Path
HVT- and/orHTO-Cells
LVT- and/orLTO-Cells
Copyright Sill Torres, 2012 72
Stack Effect
Transistor stack: at least two transistor from same type (NMOS or PMOS) in a row
Based on behavior of internal nodes:
The more transistors are non-conducting (off) the lower the leakage
Source: K. Roy
0
2
4
6
8
10
1 2 3 4
Le
aka
ge
I su
b[n
A]
Transistors off in stack
Copyright Sill Torres, 2012 73
Sleep Transistors
Idea: Insertion of additional transistors between logic block and supply lines
This transistors: connect with SLEEP-signal
If circuit has nothing to do: SLEEP signal is active: Stack effect
(additional off transistor in row to other)
If sleep transistors are High-Vth:
approach also called Multi-Threshold CMOS (MTCMOS)
Mostly insertion only of 1 Transistor
Low-Vth logic cells
Vss
Vdd
sleep
Virtual Vss
Virtual Vddsleep
Source: Kaijian Shi, Synopsys
Copyright Sill Torres, 2012 74
Sleep Transistors: Realization
VDDGlobal VDD
VVDD1 domain
Ring style sleep transistor implementation
Sleep transistors are placed around each VVDD island
VVDD2 domain
Source: Kaijian Shi, Synopsys
Copyright Sill Torres, 2012 75
Sleep Transistors: Realization cont’d
Grid style sleep transistor implementation
Source: Kaijian Shi, Synopsys
Global VDD
VVDD2
VDDVVDD1
VVDD1
VVDD1
VVDD2
VVDD2
VDD network cross chip; VVDD networks in each gating domain
Sleep transistors are placed in grid connecting VDD and VVDDs
Copyright Sill Torres, 2012 76
Sleep Transistors: Problems
Sleep transistor can be modeled as resistor R In active mode (cell is working)
Current I through sleep transistor Voltage Vx drop over resistor Output voltage reduced to VDD-Vx
high-Vth
sleep transistorSLEEP
CMOSGatter / Block
VDD VDD
R I
CMOSGatter / Block
Vx = RI
VDD - Vx
Reduced Delay (of following blocks)
Current I is not a leakage current!
I is a discharging current of load capacitances
Copyright Sill Torres, 2012 77
Stackforcing Simple method of using stack effect
Increasing stack by splitting transistors
Cin stays constant
Only one technology is needed
Area is (almost) the same
Drive strength (drain-source current) is reduced delay goes down
VDD
WP
VDD
WP/2
WP/2
WN/2
WN/2
Copyright Sill Torres, 2012 78
Stackforcing cont’d
Source: Narendra, et al., ISLPED01
Normalized Isub
Nor
mal
ized
del
ay
No Stackforcing
Copyright Sill Torres, 2012 79
Input Vector Control (IVC)
Input vector Leakage Trans. off In3 In2 In1 [nA] in NMOS-Stack 0 0 0 0,1 TN3, TN2, TN1
0 0 1 0,2 TN3, TN2
0 1 0 0,2 TN3, TN1
0 1 1 1,9 TN3
1 0 0 0,2 TN2, TN1
1 0 1 1,3 TN2
1 1 0 1,2 TN1
1 1 1 9,4 -
In1
In3
In2
VDD
TN3
TN2
TN1
Leakage of cell depends on input vector
Copyright Sill Torres, 2012 80
Every circuits is input vector with minimum leakage Idea: If design is in passive mode
SLEEP signal gets active Sleep vector is applied
Input Vector Control cont’d
Logic CircuitMUX
Data
Sleep Vector
SLEEP
Copyright Sill Torres, 2012 81
Pin Reordering
Gate leakage in stack depends on input vector Same logic input vector (amounts of ‘0’ and ‘1’ is equal) → can
result in different leakage If input probability is known reorder pins so that highest probable
state has minimum gate leakage
116.0 nA↑Igcs, Igso, Igcd, IgdoIgci, Igso, Igdo,
IgcdIgdo011
7.6 nA↓Igdo--110
58.7 nA→Igcs, Igso, Igcd, IgdoIgdo-101
10.3 nA↓-Igdo-100
42.8 nA↑-Igci, Igcs, Igdo,
IgcdIgdo010
65.9 nA→Igcs, Igso, Igcd, Igdo-Igdo001
Example|Igate,stack|T1T2T3Input vector[In3,In2,In1]
116.0 nA↑Igcs, Igso, Igcd, IgdoIgci, Igso, Igdo,
IgcdIgdo011
7.6 nA↓Igdo--110
58.7 nA→Igcs, Igso, Igcd, IgdoIgdo-101
10.3 nA↓-Igdo-100
42.8 nA↑-Igci, Igcs, Igdo,
IgcdIgdo010
65.9 nA→Igcs, Igso, Igcd, Igdo-Igdo001
Example|Igate,stack|T1T2T3Input vector[In3,In2,In1]
BPTM, 65 nm technology
T3
T2
T1
VDD
In3
In2
In1
Drain
Igcd
Igdo
IgcsIgso
Copyright Sill Torres, 2012
Delay and Power versus VDD
Dynamic Power (and leakage) can be traded by delay
0
1
2
3
4
5
6
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
Supply voltage (VDD)
Rel
ativ
e D
elay
t d
0
2
4
6
8
10
Rel
ativ
e P
dyn
td
Pdyn
Copyright Sill Torres, 2012
Adaptive Dynamic Voltage/Frequency Scaling (DVS/DFS)
Slow down processor to fill idle time More Delay lower operational voltage
Runtime Scheduler determines processor speed and selects appropriate voltage
Transitions delay for frequencies <150s Potential to realize 10x energy savings E.g.: Intel SpeedStep, AMD PowerNow, Transmeta Longrun
Active Idle Active Idle 3.3 V
Active 2.4 V
Copyright Sill Torres, 2012
0
10
20
30
40
50
60
70
80
90
100
300 400 500 600 700 800 900 1000
Frequency (MHz)
% o
f m
ax p
ow
erl c
on
sum
pti
on
300 Mhz0.80 V
433 Mhz0.87 V
533 Mhz0.95 V
667 Mhz1.05 V
800 Mhz1.15 V
900 Mhz1.25 V
1000 Mhz1.30 V
Typical operating region Peak performance region
DVS/DFS with Transmeta LongRun
Source: Transmeta
Copyright Sill Torres, 2012
Multi-VDD
Objective Reduce dynamic power by reducing the VDD
2 term Higher supply voltage used for speed-critical logic Lower supply voltage used for non speed-critical logic
Example Memory VDD = 1.2 V
Logic VDD = 1.0 V
Logic dynamic powersavings = 30%
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Multi-VDD Issues Partitioning
Which blocks and modules should use with voltages? Physical and logical hierarchies should match as much as possible
Voltages Voltages should be as low as possible to minimize CVDD
2f Voltages must be high enough to meet timing specs
Level shifters Needed (generally) to buffer signals crossing islands Added delays must be considered
Physical design Multiple VDD rails must be considered during floorplanning
Timing verification Timing verification must be performed for all corner cases across
voltage islands.Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Multi-VDD Flow
RouteRoute
Determine which blocks run at which Vdd
Determine which blocks run at which Vdd
Multi-voltage placementMulti-voltage placement
Multi-voltagesynthesis
Multi-voltagesynthesis
Determine floor planDetermine floor plan
Verify timing Verify timing
Clock tree synthesisClock tree synthesis
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Power-orientated Programming
Algorithms can differ in power dissipationSource: Irwin, 2000
0
2000
4000
6000
8000
10000
12000
14000
bubble.c heap.c quick.c
Sw
itche
d C
apac
itanc
e (n
F)
OthersFunctional UnitPipeline RegistersRegister File