Various Low-Power SoC Design Techniques Chong-Min Kyung KAIST.
-
Upload
arthur-simpson -
Category
Documents
-
view
219 -
download
1
Transcript of Various Low-Power SoC Design Techniques Chong-Min Kyung KAIST.
Various Low-Power SoC Design Techniques
Chong-Min KyungKAIST
Contents• Introduction• Power Management using Voltage Island Technique• Energy (Power) Management Approach
by ARM• Low Power Design Example with
Samsung AP based on ARM 920T• IBM Low Power Design using PowerPC• Conclusions
Why Low Power?
• Limited Battery Capacity (Mobile Devices)
• For Minimal Heat Dissipation (Heat Sink, Cooler, System Size/Weight/Cost)
• For Chip/System Reliability• Save Energy; it’s limited after all!
Power vs. Energy
• Power-Critical Applications ; – Heat Dissipation Requirement– Power/Ground Metal Line Width– Power/Ground Bounce due to IR drop
• Energy-Critical Applications ;– Battery Lifetime– Heat Dissipation Requirement
Applications for Low Power Technology
• Medical ; Implantable hearing-aid, cardiac pacemaker
• Mobile Devices ; cellular phone• Military Devices ;• Hard-to-access points ; Space• Too-many-to-access points ;
Sensors/Actuators in Ubiquitous World
• Power Management using Voltage Island Technique
Typical Power Optimization ProcedureApplications
H/W Description and Synthesis
Standard Cell/WireInitialLayout
Functional Partitioning
Cell/Interconnect Delay and Power Modeling
Vdd, Vt, Wg, Wint Optimization
Technology Files
Parasitic(Resistance, Capacitance)
Interconnects from layout
Constraints(Delay, Power, Area, Noise)
Switching Activity
Gate-Level PowerOptimization
Power optimized Net List
Parameterized Cell/Wire Design
Place/Route and Layout
CustomizedLayout
Verification for Min-Power,Delay, Area, Noise
Optimized Vdd, Vt, Wg, Wint
N
Y
Place/Route and Layout
Power Challenge
Active power density increasing with device scaling and increased frequency Leakage power density increasing due to lower Vt and gate leakage Stressing packaging, cooling, battery life, etc. Complicates IDDq testing as well
Thinning gate oxides increase
gate tunneling leakageSource from Bergamaschi
Low Power Levers
Structural TechniquesVoltage IslandsMulti-threshold devicesMulti-oxide devicesMinimize capacitance
by custom designPower efficient circuitsParallelism in
micro-architecture
Dynamic Techniques Clock gating Data gating Power gating Variable frequency Variable voltage supply Variable device threshold
Standby Mode Leakage Suppression
Disconnect inactive logic from supply in standby mode
Multi-threshold use higher Vt header/footer
suppresses logic leakage gate & sub-threshold
Multi-oxide Use thick oxide header/footer
suppresses gate leakage
Header/footer gate voltage Overdrive: increase freq. under-drive: reduces leakage
Header/footer well bias Forward bias : increase freq. Reverse bias : reduce leakage
Voltage Islands
Standby Power Reduction Mechanism On-chip supervisor manages standby power
Clock gating Functional clock gating (fine clock control) Voltage scaling, shutdown SOC latch save/restore Timeout and interrupt driven
SuspendCtrlLogic
RTC
System ClksFreeze
SoC LogicLSSD Latches
Scan Ctrl Logic
Reset Logic
IICCtrl
PGWakeReset
3
Irq
Clk I/O Freeze
Scalable VDD Domain 3.3VI/O
SerialNVRAM
Clk
Data
DC/DC Supplies
SelectShutdown
1.0-1.8V
ScanChains
BatteryBacked Domain
Voltage Island Concept
Trade off power for delay by running functional blocks at different voltages Can use mix of Low and High Vt to balance performance and leakage Switch off inactive blocks to reduce leakage power Requires IP standards for power management, clock gating, etc.
Delay vs. Voltage30
25
20
15
10
5
0
Dde
lay
(ps)
0.7 0.8 0 .9 1.0 1.1 1.2 1.3Voltage (Vdd)
Std. Vt Low Vt
E.g.: Telecom ASIC with 1.0/1.2 V islands saved :16 % active power50 % standby power
Power Management Unit
SWITCH SWITCH
LogicLow VT
Logic
Vddo
Vdd1 Vdd2
IP1 IP2
Source from Bergamaschi
Power Management Unit
Bus Interfaces
ReconfigurableRegister Units
Power Management
State Machine
Timer / Counter
ControlPerformance
Unit
Clock ControlUnit
MonitorUnit
Power ControlUnit
DC/DC Converter
Well-bias generator
Clock generator
Clock & Power-Gating
Device Performance Monitor
Thermal Monitor
IP Core Interfaces
One clock & One signaling voltage Some approaches :
Temporarily scaling V & F to for comm. Separate different voltages with bridges
Busses with Different Voltages
Hot Bus Cold Bus
Cool Bus
bridge bridge
Power Management
I/O’s, VReg, Gnd
Memory ArraysVdd 4
High Vt device arraysOptimized for low active
power
Memory ArraysVdd 3
Low Vt device arraysOptimized for low active
power
MicrocontrollerVdd 2
DSPVdd 2
ROMVdd 1
Monitor Logic Vdd 4
ROMVdd1
RLM 1
RLM 2
Memory ArraysVdd 3
Low Vt device arraysOptimized for low active
power
I/O’s, VReg, Gnd
Analog Vdd 5
RLM 3
Vdd 1
I/O
’s,
VR
eg,
Gnd
I/O
’s,
VR
eg,
Gnd
Independently controlled domain power switches Multiple On-Chip Voltage Islands On-Chip Voltage Regulators
Functional Partitioning
Identifying functional components with similar inactive periods
Assigning functional components to possible chip-level power
sources capable of providing required voltage level
Identifying the optimal grouping of components, based upon
power sequencing (affects static power) and operating voltage
(affects active power) that minimizes chip power within the limits
(such as peak power) of the SoC
Identifying or creating, and connecting, logic signals that will be
used to control power-sequencing circuitry or control clock gates
Connecting alternate voltage sources to latches or arrays used to
save state across power sequencing
Controlling VDD and VTH for low power
Active Stand-by
Multiple VTH Dual-VTH MTCMOS
Variable VTH VTH hopping VTCMOS
Multiple VDD Dual-VDD Boosted gate MOS
Variable VDD VDD hopping
Software-hardware cooperationTechnology-circuit cooperation
MTCMOS : Multi-Threshold CMOS VTCMOS : Variable Threshold CMOS
Multiple : spatial assignment Variable : temporal assignment
Dynamic power reduction
0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Super-linearNor
mal
ized
pow
er P
∞fV
2
Required speed ∞f
Controller
Software Hardware
Requiredspeed
Clock & VDD
Processor
If you don’t need to hustle, relax and save power
Through Software-hardware cooperation OS and application programming
SuspendCtrlLogic
Voltage Scaling Mechanism Four power domains On-chip supervisor for SOC voltage supplies Level shifting and latching circuits at domain interfaces
RTCLogic
LinearRegulator
Regulated1.0V PLLSupplyDomain
CPU CoreCachesI/O Intf LogicMemory IntfAccelerators
DriversRecvrsPersistent
1.8VBattery-BackedDomain
SelectShutdown
3.3V1.0V-1.8V
Constant
3.3V I/O
Domain
Voltage Scalable1.0V-1.8V Logic SupplyDomainBattery
DC/DC Supplies
Dynamic Voltage/Frequency ScalingFreq. changed and Vdd dropped from 1.8V to 1.0V PLL locked at 533MHz with CPU clock switched from 266MHz to 66MHz to 266MHz Continues to execute Dhrystone benchmark
Low Leakage Cells – Standby Power Reduction Dual-Vt Storage Cells
Low Vt for high performanceHigh Vt for low leakage
Gated Vdd and DRGPower Switch
Sub threshold leakage current dominates
• Energy (Power) Management Approach by ARM
Need for Energy Management
• Today’s mobile consumers want:– longer battery life and – smaller, lighter products
• Manufacturers are adding new features and applications to add product appeal:– media players (audio, video)– gaming – video capture
Increasing processing power requirements and longer battery life are conflicting requirements
• Battery technology alone offers only incremental improvement over the next several years
Higher performance, higher power
ARM7
ARM9
ARM10, 11
1
10
100
1000
0 50 100 150 200 250 300 350 400 450 500
Dhrystone MIPS
Pow
er c
onsu
mpt
ion
(mW
)
0.18um process
0.13um process
Layers of power optimizations
Software (OS, applications) System – Architecture Micro-architecture Circuits Ambient environment Si conditions Power delivery
• Important to optimize design at each level• ARM’s partners have widely varying design-time,
technology, legacy, cost constraints.• IEM: current focus on top two layers
– Widely applicable dynamic power-optimizations– Optimize for the requirements of the specific workload
Conventional Power Management
PowerManager
ON
IDLE STANDBY
RESTART
– STANDBY is off but with state retained with clocks stopped– IDLE is a lower power mode with a slow clock running– ON state is fully powered up at maximum clock frequency
• Conventional power management schemes manage the transitions between defined power states
• Despite the changing software workload, system runs at maximum performance while there is any work to be done
Optimizing for utilization characteristics
• Conventional power management optimizes power consumption when there is nothing to do (sleep modes).
• IEM optimizes power when work is being done.– Only run fast enough to meet deadlines!– Running fast and idling wastes power.
• The active- and sleep-mode techniques are orthogonal.
100%
0%
100%
0%
Utilization
Dynamic Voltage Scaling
Energy used
Energy used
Meeting the Performance Requirement
• Effective Energy Management requires:1. Automatic Performance Prediction technology
• Determining the lowest performance level that will get the software workload done just in time
2. Performance Scaling technology • Delivering just enough performance to meet the
current requirement• Responding rapidly to changing performance levels
PerformancePrediction
andMonitoring
ScalingTechnology
Voltage Scaling
Threshold Scaling
Energy Management Control Components
• Software component– To automatically predict future software workloads by
interacting with instrumented Operating Systems and application software
– To determine the software deadlines– To balance workload and deadlines with performance
• Hardware component– To accurately measure the actual system performance – To independently manage the transitions of hardware
scaling blocks. e.g., clock generators and power controllers
• Together these components determine and manage the lowest performance level that gets the work done
Adaptive Voltage Scaling (AVS)
• AVS is a closed loop control mechanism.– Feedback from the PMU indicates the earliest
opportunity to change processor frequency based on the voltage levels being output to the SoC.
– APC monitors the difference between the requested performance level and the actual level achieved.
• Taking into account variations due to differences in process technology and ambient temperature the system dynamically changes the voltage applied.
• The lowest energy consumption is achieved OR a specified performance level can be met.
• Low Power Design Example with Samsung AP based on ARM 920T
Limited Battery Improvement
• Power Increase vs. Battery Improvement
Year 2001 2004 2007 2010 2013 2016 Feature Size(nm) 130 90 65 45 32 22 Dynamic Power Reduction(X) 0 1.5 2.5 4.0 7.0 20 Stand-by Power Reduction(X) 2 6 15 30 150 800[ITRS 2001]
200
400
600
Vo
lum
etr
ic E
ne
rgy
D
en
sit
y(W
hr/
L)
Gravimetric Energy Density(Whr/Kg)100 200 300
Li-Ion / PolymerNI-MH
800
400 500 600 700 800 900
Fuel Cell
• Cellular Phone Talk Time : 2Hrs ~ 4HrsStandby : about 1 week
• Cellular Phone Talk Time : 2Hrs ~ 4HrsStandby : about 1 week
• Cellular Phone Talk Time : about 12HrsStandby : about 1 month
• Cellular Phone Talk Time : about 12HrsStandby : about 1 month
Smaller
Lighter
Only 4~5 X improvement In Battery lifetime!
Problem Statement
• Power Analysis on CMOS Inverter
Input switching to '1' or '0'
charge
discharge
Input
Cload
V thn< Input < VDD- | Vthp|
Input
Input : '1' or '0' steady state
Input
(a) Capacitive Current (b) Short Circuit Current (c) Static Leakage Current
Problem Statement• Dynamic Power
• Average Short Circuit Current
• Sub-threshold Leakage Current
P C VDD fswitching switching 2
IVDD
VDD V fSCin
th
122 3( )
gain factor
Threshold Voltage V V V
n p
thn thp th
_ : ,
_ :
I e eDSV V q nkT V q kTGS th DS ( ) / /( )1
K V V
V q k T
n kT
GS DS
th
: : , :
: : , : :
: ~ ( . )
function of technology, gate to source voltage drain to source voltage,
theshold voltage, electronic charge Boltzmann constance, temperature,
nonlinearity constance ,
1 2 0 0259
Problem Statement• Domination of Leakage Current
Feature SizeFeature Size
Core VoltageCore Voltage
VTH(Threshold)VTH(Threshold)
Performance(AP)Performance(AP)
TR LeakageTR Leakage
Stand-by ModeStand-by Mode
Low PowerLow Power
> 0.25um> 0.25um
5.0/3.3/2.5V5.0/3.3/2.5V
> +/- 0.6V> +/- 0.6V
< 200MHz< 200MHz
NegligibleNegligible
PLL-off(Clock-off)PLL-off(Clock-off)
Focus on Operating PowerFocus on Operating Power
0.18/0.13/0.09um…0.18/0.13/0.09um…
1.8/1.2/1.0V …1.8/1.2/1.0V …
+/- 0.5, 0.4, 0.3V …+/- 0.5, 0.4, 0.3V …
300/400/533MHz, 1GHz300/400/533MHz, 1GHz
Exponential growing(SD/Gate)Exponential growing(SD/Gate)
V/MTMOS, High VTH/High VDDV/MTMOS, High VTH/High VDD
Focus on Operating/Stand-byFocus on Operating/Stand-by
Active and Leakage Power with CMOS Scaling
• As CMOS scales down the following stand-by leakage current rises rapidly.– Source to drain leakage
(diffusion+tunneling) as Lg scales down
– Gate leakage current (tunneling) as Tox scales down
– Body to drain leakage current (tunneling) as channel doping scales up
Two cases of Leakage Mechanism
Vg=0VTurn off
Vd=Vdd
Turn on
Vg=Vdd
Vd=0V
Sub-threshold Leakage
Source to drain tunneling
Drain to Body tunneling (BTB)
Gate oxide tunneling
Gate Leakage Current Reduction with High-K Gate Dielectric
10-6
10-5
10-4
10-3
10-2
10-1
100
101
20 25 30 35 40
Cur
rent
Den
sity
(A
/cm2
)
Tox (A)
Gate leakage
Drain leakage
Ck A
Toxphysical
0
High-K gate dielectric
Gate Leakage Current Reduction with High-K Gate Dielectric
• As Tox scales gate leakage current increases exponentially due to exponential increase of tunneling probability with reduction of physical tunneling distance.
• Physically thicker gate dielectric allows lower leakage current but lower oxide capacitance reducing on-current
• Using high k (dielectric constant) material, both thicker physical thickness and higher oxide capacitance can be achieved.
• Applying high-k gate dielectric, several orders of magnitude lower gate leakage current can be achieved with similar oxide capacitance
Power Saving vs. Abstraction Layers
• Power Saving v.s. Abstraction Layers
System/Algorithm/Architecture have a large potential!
Desig
n T
ime
System Level Consideration for Low Power Design
• Mobile Device’s Behavior according to Time (Operation Time is less than 10%)
Time
Periodic Wakeup
Idle/Stand-by
Wakeup &
Operation
“Need Various Power Modes In System”
Power Management : Example
General Clock GatingGeneral Clock GatingControlling the individual clock source forControlling the individual clock source foreach IP block by the on/off controlling of each IP block by the on/off controlling of
each corresponding clock source enable biteach corresponding clock source enable bit
IDLEIDLE Turn off the clock source to the CPUTurn off the clock source to the CPU
STOPSTOP Turn offTurn off all of the clock sources including all of the clock sources includingthe external X-tal and internal PLLsthe external X-tal and internal PLLs
SLEEPSLEEPTurn offTurn off all of the clock sources and also all of the clock sources and also
the power-supply for the internal-logicthe power-supply for the internal-logicexcept for the wake-up logic circuitryexcept for the wake-up logic circuitry
Dynamic Voltage Scaling (DVS)
• Reduction of Stand-by Power in Leaky Process– By Monitoring Data Bus Congestion– By Monitoring/Guessing Performance Needed, for
Specific Application
V
time
V
time
ΔV
Power gain ∝ ΔV2
DVS
Need to predict task execution time!
Task Task
Dynamic Voltage Scaling (DVS)
• Stretch the execution by lowering the supply voltage– Quadratic Power saving– No later than the deadline
• Processors supporting DVS– Intel Xscale– Transmeta Crusoe
• DVS Algorithms– Can be implemented as HW or SW– Optimal solution in continuous voltage domain, but not in discrete voltage domain
Voltage Scaling for Low Power
Low Power
Low VDDLow VDD
Low Speed
Speed Up
Low VthLow Vth
P VDD2
I ds (VDD - Vth)1~2
I ds (VDD - Vth)1~2
High Leakage
I leakage e-C x Vth
Leakage Suppression
Low-Leakage Solution – TechnologyD
ynam
ic p
ow
er[W
]
Leakage power[W]
VTH: 0.5V VTH: 0.25V
High speed
Low speed Low speed
VDD control
VTH control
High speedMTCMOS
VDD: 1.5V
VDD: 1.0V
VDD control
VTH control
100n
1
10
100
100p1p 10p 100n1n 10n
VTCMOS & MTCMOSMulti-Threshold CMOSMulti-Threshold CMOS Variable-Threshold CMOSVariable-Threshold CMOS
Schem
atic Diagram
Schem
atic Diagram
principleprinciple
•On-off control of internal VDD or VSS•Special F/Fs, Two Vth’s
•On-off control of internal VDD or VSS•Special F/Fs, Two Vth’s
•Threshold control with bulk-bias•Triple well is desirable
•Threshold control with bulk-bias•Triple well is desirable
•Low leakage in stand-by mode.•Conventional design Env.
•Low leakage in stand-by mode.•Conventional design Env.
Merit
Merit
•Low leakage in stand-by mode.•Conventional design Env.
•Low leakage in stand-by mode.•Conventional design Env.
Dem
eritD
emerit
•Large serial MOSFET •ground bounce noise
•Ultra-low voltage region?(1V)
•Large serial MOSFET •ground bounce noise
•Ultra-low voltage region?(1V)
•Scalability? (junction leakage)•TR reliability under 0.1m
•Latch-up immunity, Vth controllability, Substrate noise, Gate oxide reliability
•Gate leakage current
•Scalability? (junction leakage)•TR reliability under 0.1m
•Latch-up immunity, Vth controllability, Substrate noise, Gate oxide reliability
•Gate leakage current
Low-Vth
VDD
GND
Hi-VthSleep
Low Vt
VDD
GND
VtControlcircuit
Vnb = 0 or V-
Vpb = VDD or V+
N-well
P-well
MTCMOS : Reduce Stand-by Power with High Speed
• With High VTH switch, much lower leakage current flows between Vdd and Vss• High VTH MOSFET should have much lower ( >10X) leakage current compared to normal VTH MOSFET
Vdd
Vss
0 0
Vdd
Vss
1 1
0
Without High VTH switch With High VTH switch (MTCMOS)
High VTH switch
Normal or Low VTH MOSFET
Virtual Ground
Multi-Threshold CMOS (MTCMOS)• Mobile Applications
– Mostly in the idle state– Sub-threshold leakage Current
• Power Gating – Low VTH Transistors for High Performance Logic Gates– High VTH Transistors for Low Leakage Current Gates
Active Sleep Active
Sleep Control
(SC)Time
OperatingMode
CurrentCutoff-Switch(High Vth)
SC
VDD
VSSVGND
Low Vth
MOS
High Vth
MOS
Logic Component (Low Vth)
CCS Sizing• The effect of CCS size
– As the size decreases, logic performance also decreases.– As the size increases, leakage current and chip area also
increase.– Proper sizing is very important.– CCS size should be decided within 2% performance
degradation.
Vop = VDD - V
V must be sizedwithin 2% performance degradation.
VDD
GND
Low Vt
High Vt SwitchControl
Energy Management System – Open loop
• IEM and IEC components work together to predict lowest acceptable processor performance level
• Power Controller, PMU and Clock Generator work together to deliver that lowest performance level
System-on-Chip (SoC)
ARM Core
IEM
IntelligentEnergy
Manager
IEC
IntelligentEnergy
Controller
PC
PowerController
PMU
PowerManagement
Unit
Performance Comms
Vdd
Apps
OS
DCG
DynamicClock
Generator
CPU Clk
Energy Management System – Closed loop System-on-Chip (SoC)
ARM Core
IEM
IntelligentEnergy
Manager
IEC
IntelligentEnergy
Controller
APC
AdaptivePower
Controller
EMU
"PowerWise"Energy
ManagementUnit
PerformancePowerWise
Interface
Vdd
AppsOS
HardwarePerformance
Monitor
Dynamic ClockGenerator
• APC operates in closed loop control mode using HPM to adapt to actual process and temperature
• PowerWise™ Interface provides fast control of EMU and feedback of status for optimum control
MPEG video playback comparison
• Classical interval-based algorithms (e.g. LongRun) are too conservative – choose higher performance than necessary.
Legendary MPEG
17.20%
79.15%
7.78%
88.06%
4.07%
0%
20%
40%
60%
80%
100%
LongRun Vertigo
Fra
cti
on
of
tim
e a
t e
ac
h p
erf
orm
an
ce
lev
el
400 Mhz
500 Mhz
600 Mhz
Danse De Cable MPEG
5.74%
17.04%
29.50%
47.72%
51.17%
48.34%
0%
20%
40%
60%
80%
100%
LongRun Vertigo
Fra
ctio
n o
f ti
me
at e
ach
per
form
ance
lev
el
600 Mhz
500 Mhz
400 Mhz
300 Mhz
Interactive app: Konqueror
• Exactly repeating the run of interactive apps is difficult.• Our methodology: LongRun in control, estimate what IEM would
have done on that same run.
Konqueror
10.09%
10.44%
5.55%
73.92%
38.49%
25.56%
14.75%
26.65%
0%
20%
40%
60%
80%
100%
LongRun Vertigo
Fra
ctio
n o
f ti
me
at e
ach
per
form
ance
lev
el
Energy Management in Action
2 seconds
100%
83%
66%
50%
Performance
MPEG video
4 performance(frequency andvoltage) levelsavailable inbenchmarkedsystem
Performancelevel requestedby algorithm
Closest availableperformancelevel of system
DVS Control Sub-system
Current
Target
PWRREQ
DVC
DVCDynamic Voltage
Controller
Voltage vs.Frequency
Lookup table
IEC
DCGDynamic Clock Generator
(SoC specific)
DPCDynamic Performance
Controller
DPMDynamic Performance
Monitor
CPUCLKGEN
DPCCLKGEN
MAXPERF
cpuclk
CLOCK
DATA
...
APB
Configuration Interface
Ta
rge
tCu
rre
nt
Perf.Index
Config.
Perf.Index
DEMDVS EmulationInterrupts
(SoC specific)
PMU
DVS operation (with MAXPERF Signalling)
100%
50%
0%
IECMAXPERF
VDD
75%
25%
New Performance Target(50%)
Requested by IEM S/W
Maximum performancerequested
Back to software programmedperformance as
IECMAXPERF is cleared
IECCRNTDVCIDX[7]
IECCRNTDVCIDX[6]
IECCRNTDVCIDX[5]
IECCRNTDVCIDX[4]
Index changes as VoltageRamps down and
respective stable pointreached
Index changes as VDDRamps down as
IECMAXPERF is cleared
Index changes asVDD ramps up due to
IECMAXPERF
Prototype IEM test chip• ARM926EJ-S core• Multiple power domains• Voltage and frequency scaling of
CPU, caches and TCMs• First full DVS silicon with National
Semiconductor PowerWise™ technology
• NSC Adaptive Power Controller (APC) implemented in FPGA
• Includes DVS emulation mode for comparative tests
• TSMC 0.13μm - CL013G - April Cyber Shuttle– Packaged parts – 11 August 2003
• Developed by ARM, Synopsys and National Semiconductor using Synopsys EDA tools
Conclusions
• Along with Process Technology Scaling, Signal Integrity, SoC Integration and System Verification, Low-Power Design is a critical issue.
• Low Power Design needs to be approached from System-Level including Software, algorithm to Device/Process Standpoints.
Thank you for your kind attention!
Thank you for your kind attention!
• IBM Low Power Design using PowerPC
Platforms for Information Appliances
IBM PowerPC platforms enable highly integrated, power efficient Information Appliance (IA) chips
PowerPC
Platform
SOC
SOC
Custom IA Chips
Application-Specific IA Chips
uP Cores
405/440
IP Cores
CoreConnectTM
Architecture
ASIC Tools
Low Power
Optimizations
Scalable PowerPC 405 CPU Core
Instruction Unit
Timers
Debug/Trace
I-cache D-cache
64-bit Processor Local Bus
I-cache Control
D-cache ControlMMU
Power Mgmt.
Execution Unit
Load / Store PipeMAC
Branch Unit
Interrupts
PowerPC 405 Core
GPRs
CPU Goals Expanded operating voltage range (0.9V
to 1.95V)
Maintain full software and tools with existing compatibility PowerPC 405
Provide a high performance core capable of high efficiency low power operation
CPU Optimizations Redesigned custom circuits within CPU
that were sensitive to low voltage operation
Re-optimize design and timing for extended voltage range
Verification of equivalence
Embedded PowerPC Cores
• PowerPC 405– 32-bit data, 32-bit address, MMU– Single-issue, 5-stage pipeline: 1.52 DMIPS / MHz– 266 – 400 MHz– L1 Cache to 16KB/16KB– Voltage-scalable versions (405LP-1, 405LP-2)
• PowerPC 440– 32-bit data, 36-bit address, MMU– Dual-issue, 7-stage pipeline: 2.0 DMIPS / MHz– 400 – 800 MHz– L1 Cache 32KB/32KB; L2 256 KB; L3
Low Power Optimizations
Active Power Reductions
Voltage Scaling
Frequency Scaling
Flexible Clock Distribution
Clock Gating
Hardware Accelerators
IBM low-power SOC designs include a wide range of optimizations to reduce both active and standby power
Standby Power Reductions
Clock Freezing
Hibernation
“Cryo” Standby
Voltage Scaling Benefits
Complementary CMOS scales well over a wide voltage range
Can be used widely over entire chip
Can optimize power/performance (MIPS / W) over a 4X range
Voltage Scaling Challenges
Custom Circuits, PLLs, Analog, and I/O drivers don’t voltage scale easily
Avoiding increases in standby power in low active power circuits
( the VTH dilemma )
Reducing operating voltage greatly reduces active power in CMOS
Operating at 1/2 normal Vdd increases delay 2.4-3.2X but reduces power by > 10X
CMOS Ring Oscillator Delay and Power VS VDD
IBM Low-Power SOC Designs“Palmtops to Teraflops” in a single ISAOptimized for high-performance handheld applications, e.g., high-end PDA
• PowerPC 405LP-1– Joint project of IBM Research and IBM Microelectronics– First silicon Oct. 2001– 0.18m process– Frequency-scalable, < 66 – 266 MHz– Voltage-scalable, 1.0 – 1.8 V (0.9 – 1.65 V)– Technology evaluation platformAll power and performance data from 405LP-1 systems
• PowerPC 405LP-2– 0.13 m process– Scalable to 333 MHz @ 1.5 V (est.)– Optimized for multimedia processing– Well into design
405LP-1 System on a Chip
DMAController
PLB-OPBBridge
On-
chip
Per
iphe
ral B
us (
OP
B)
3
2-bi
t
Processor Local Bus (PLB) 64-bit
16KI-Cache
16KD-Cache
PPC405CPU Core
ScalableLow Power PLL
LCDController
SpeechAccel
CODEC
INTRFC
RTC
InterruptController GPIO
SDRAMController
RAM/ROM/PeripheralController
Code Decompression
PCMCIA/CFII
UART
UART
IIC
Standby Power Management
Passive
INTRFCClockPowerManagement
CryptoAccel
3.3V I/O Supply
1.0V – 1.8V Logic
1.8V Battery-Backed
1.0V Internal Reg.
New Core
Pre-existing Core
Sensor
Reducing Standby Power
• Cryo mode uses– Customers/designs comfortable with clock-stop
standby
– Low-latency periodic sleep/wake with minimal standby power
– IP cores with hidden state can cause problems for SW-based save/restore
• Other methods under review– Voltage islands and power gating– State-saving latches
Standby Power Modes
Cryo mode sequence– Shutdown: Save CPU Core State Flush caches and TLBs Clocks stopped State
scanned to internal/external non-volatile storage Power removed from logic
– Suspend: Monitor system for wake up condition or RTC timer
– Restore: On Wake indicator Restore power to logic State scanned in from non-volatile storage Restore clocks Restore CPU state
Standby power modes enable longer battery life and “instant on”
System Clock
VDD Logic
State Saved Restore Time Power Logic
Freeze Mode
0 Hz 1V All
Observe Wake-up Condition
(< 1ms)
CMOS Leakage
at 1V
Hibernation Mode
0 0 Software State OS Restore (100s of mS)
~0
Cryo Mode 0 0 Registers and Software State
“Instant On” – Scan Restore
of State (20 - 200 mS)
~0
Dynamic Power Management
• System-Wide power management (PM) during application execution
• Examples:– Peripheral PM, including core clock gating– PM at idle (including low-latency sleep modes)– Memory PM– Dynamic voltage and frequency scaling– Energy policy management
• DPM is proposed as an architecture for policy-guided dynamic power management.
DPM Motivation
• Embedded application requirements– Long battery life– System-specific policy requirements
• Highly variable system designs• Watch, cell phone, personal server, PDA, tablet• Soft real-time (multimedia) requirements• Task-specific policy requirements
– General-purpose systems and applications• No/minimal application software changes for PM
– Minimal/variable firmware• PM must be in the OS/applications
DPM Motivation
• Technology– SOC
• CPU + peripheral PM– Complex clocking architectures
• Decoupled CPU/bus frequencies– Heterogeneous processor architectures
• Example: 405LP-2 - Asynchronous heterogeneous processing in a common voltage/memory domain
– New performance and leakage control mechanisms at the circuit level
DPM Motivation
• Linux– Platform independence desired
– Community acceptance required• Simplicity – ease of maintenance• Integration with pre-existing facilities
– Linux Device Model• Minimal core kernel changes
– 5 lines of new code in the “core” kernel
– Scalability to server/SMP systems
DPM: An Architecture for Policy-Guided PM
Is:A generic software architecture for policy-guided dynamic power managementproposed by IBM and MontaVista software
• Flexible enough to implement a number of system-specific DVFS and static PM approaches
• Available in an embedded Linux distribution for several embedded processors
Is Not:
• PowerPC or Linux specific
• A DVFS algorithm
• Fully implemented yet
DPM Overview
DPM
Sets operating points changing
power-performance levels
CPU
Memory Controller
Power Supplies
Signal operating/task state
changesProvide,
manage policies
Policy/Power Managers
Power-aware Applications
System Clock
Generation
Operating System
Device Drivers Requirements,
power-mgmt. informationSoftware
Hardware
Dynamic Voltage and Frequency Scaling
Dynamic Voltage Scaling 1.8V --> 1.0V at upto 1V/100us
Dynamic Frequency Scaling 266Mhz CPU to 66MHz CPU
400mW
200mW
600mW
2.0V
1.0V
LogicVDD
I/O Power
--- 266 /133---| -------------------------- 66 /66 --------------------- |-------- 266/133-------- CPU/MEMORY FREQUENCY( MHz)
Total Chip Power
Logic Power
Uninterrupted Operation Linux 2.3.17 Running
Dhrystone 2.1 code 400 loops per cycle .
0mW
0V
Power consumption for the CPU and logic was reduced by 13X dynamically under the control of the Linux kernel ( NO PLL Relock and NO stopping of the application )
Idle Scaling Trace (MPEG4)
Core Voltage
Battery Power
Application
Default Idle Scaling
Sys. Savings
Core Savings
MPEG4 A/V 2.76 W 2.63 W 4.7 % 11.4 %
MP3 1.42 W 1.1 W 22.5 % 47.8 %
Load Scaling Trace (MPEG4/spmt)
A
FB
DE
Core Voltage
Application
Default Load Scaling
System Savings
MPEG4 A/V 2.76 W 2.54 W 8.0 %
MP3 1.42 W 1.03 W 27.7 %
Core Voltage
Battery Power
Application Scaling Trace
Task Task-1Task-1Task Task+1 Task+1
More Performance Required Working Ahead
E
F
D
VideoThread
Task State
AS Results• AS achieved close to an “ideal” LS result with a
simple policy manager and a straightforward modification of the application
Application
No DPM DPM: Application Scaling
DPM Savings
“Ideal”Savings
MPEG4 A/V
2.76 W 2.46 W 10.8 % 10.8 %
0
10
20
30
40
50
60
70
%
Idle/33 100 133 166 200 266
Operating Point Usage for MPEG4 by Strategy
Idle Scaling
Load Scaling
App. Scaling
"Ideal"
References
• Nowka et al., “A 32-bit PowerPC System-on-a-chip With Support for Dynamic Voltage Scaling and Dynamic Frequency Scaling”, IEEE Journal of Solid-State Circuits, vol. 37(11), Nov. 2002, pp. 1441-1447.
• IBM Austin Research Laboratory (www.research.ibm.com/arl)
– Dynamic Power Management for Embedded Systems (Whitepaper)http://www.research.ibm.com/arl/projects/papers/DPM_V1.1.pdf
• Linux 2.4 kernel including DPM implementation (Bitkeeper) bk://source.mvista.com/linuxppc_2_4_devel-pm