On-Line Adjustable Buffering for Runtime Power Reduction ( vlsicad.ucsd )
-
Upload
dustin-hardy -
Category
Documents
-
view
32 -
download
1
description
Transcript of On-Line Adjustable Buffering for Runtime Power Reduction ( vlsicad.ucsd )
UC San Diego Computer Engineering • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory
U
C S
an D
ieg
o C
om
pute
r Eng
ineeri
ng
•
UC
San D
ieg
o C
om
pute
r Eng
ineeri
ng
• V
LSI C
AD
Lab
ora
tory
VLS
I C
AD
Lab
ora
tory
• U
C S
an D
ieg
o C
om
pute
r Eng
ineeri
ng
•
• U
C S
an D
ieg
o C
om
pute
r Eng
ineeri
ng
• V
LSI C
AD
Lab
ora
tory
VLS
I C
AD
Lab
ora
tory
• U
C S
an D
ieg
o C
om
pute
r Eng
ineeri
ng
•
• U
C S
an D
ieg
o C
om
pute
r Eng
ineeri
ng
• V
LSI C
AD
Lab
ora
tory
VLS
I C
AD
Lab
ora
tory
U
C S
an D
ieg
o C
om
pute
r Eng
ineerin
g •
U
C S
an D
ieg
o C
om
pute
r Eng
ineerin
g •
VLS
I CA
D La
bora
tory
V
LSI C
AD
Lab
ora
tory
• U
C S
an D
ieg
o C
om
pute
r Eng
ineerin
g •
• U
C S
an D
ieg
o C
om
pute
r Eng
ineerin
g •
VLS
I CA
D La
bora
tory
V
LSI C
AD
Lab
ora
tory
• U
C S
an D
ieg
o C
om
pute
r Eng
ineerin
g •
• U
C S
an D
ieg
o C
om
pute
r Eng
ineerin
g •
VLS
I CA
D La
bora
tory
V
LSI C
AD
Lab
ora
tory
••
••••
••
• Presented new technique to dynamically trade-off power-Presented new technique to dynamically trade-off power-performance that turns performance that turns offoff devices not needed at less than peak devices not needed at less than peak performanceperformance
• Both leakage and dynamic power reduce; total power reduction Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcasesis 6-12% on our testcases
• By sharing of LPM devices, area overhead reduced to <5.57%By sharing of LPM devices, area overhead reduced to <5.57%• No adverse affect on performance of the circuit when LPM signal No adverse affect on performance of the circuit when LPM signal is is offoff..
Ongoing work:Ongoing work:• Actual layout of custom repeater with routing of V’Actual layout of custom repeater with routing of V’DDDD, V’, V’SSSS, LPM , LPM
nets to accurately estimate power, performance, area overheadnets to accurately estimate power, performance, area overhead• Customizing more cells especially clock repeaters to further Customizing more cells especially clock repeaters to further improve power-performance trade-off.improve power-performance trade-off.
• Presented new technique to dynamically trade-off power-Presented new technique to dynamically trade-off power-performance that turns performance that turns offoff devices not needed at less than peak devices not needed at less than peak performanceperformance
• Both leakage and dynamic power reduce; total power reduction Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcasesis 6-12% on our testcases
• By sharing of LPM devices, area overhead reduced to <5.57%By sharing of LPM devices, area overhead reduced to <5.57%• No adverse affect on performance of the circuit when LPM signal No adverse affect on performance of the circuit when LPM signal is is offoff..
Ongoing work:Ongoing work:• Actual layout of custom repeater with routing of V’Actual layout of custom repeater with routing of V’DDDD, V’, V’SSSS, LPM , LPM
nets to accurately estimate power, performance, area overheadnets to accurately estimate power, performance, area overhead• Customizing more cells especially clock repeaters to further Customizing more cells especially clock repeaters to further improve power-performance trade-off.improve power-performance trade-off.
Problem:Problem: High performance when LPM signal on High performance when LPM signal on use large use large LPM devices LPM devices large area overhead large area overheadSolution:Solution: Share LPM devices among multiple repeatersShare LPM devices among multiple repeaters
Fewer LPM devices butFewer LPM devices butvirtual Vvirtual VDDDD (V’ (V’DDDD) and ) and
VVSSSS (V’ (V’SSSS) need routing) need routing
Note: All LPM devices driveNote: All LPM devices driveV’V’DDDD and V’ and V’SSSS
How many LPM devicesHow many LPM devicesneeded?needed?• Compute simultaneousCompute simultaneous
switching rate (SSR) by switching rate (SSR) by finding the max. #repeaters that have overlapping timing finding the max. #repeaters that have overlapping timing windows. Time = windows. Time = OO(RlogR) (R = #repeaters)(RlogR) (R = #repeaters)
• Find total width of all repeater devices (=Find total width of all repeater devices (=WWRR))• For good performance, width of LPM devices = 2xSSRxWFor good performance, width of LPM devices = 2xSSRxWRR
Typical SSR=~10% Typical SSR=~10% small area overhead small area overhead
Problem:Problem: High performance when LPM signal on High performance when LPM signal on use large use large LPM devices LPM devices large area overhead large area overheadSolution:Solution: Share LPM devices among multiple repeatersShare LPM devices among multiple repeaters
Fewer LPM devices butFewer LPM devices butvirtual Vvirtual VDDDD (V’ (V’DDDD) and ) and
VVSSSS (V’ (V’SSSS) need routing) need routing
Note: All LPM devices driveNote: All LPM devices driveV’V’DDDD and V’ and V’SSSS
How many LPM devicesHow many LPM devicesneeded?needed?• Compute simultaneousCompute simultaneous
switching rate (SSR) by switching rate (SSR) by finding the max. #repeaters that have overlapping timing finding the max. #repeaters that have overlapping timing windows. Time = windows. Time = OO(RlogR) (R = #repeaters)(RlogR) (R = #repeaters)
• Find total width of all repeater devices (=Find total width of all repeater devices (=WWRR))• For good performance, width of LPM devices = 2xSSRxWFor good performance, width of LPM devices = 2xSSRxWRR
Typical SSR=~10% Typical SSR=~10% small area overhead small area overhead
We add PMOS-NMOS pair to turn half devices off dynamicallyWe add PMOS-NMOS pair to turn half devices off dynamically
What power components likely to reduce?What power components likely to reduce?• Short-circuit power: During switching, PMOS & NMOS Short-circuit power: During switching, PMOS & NMOS ONON
momentarily momentarily short circuit between V short circuit between VDDDD and V and VSSSS
High when transition time (High when transition time (slewslew) is large) is large• Subthreshold leakage: when one of PMOS-NMOS pair Subthreshold leakage: when one of PMOS-NMOS pair
between Vbetween VDDDD and V and VSSSS ONON
Requirements:Requirements:• Low area overheadLow area overhead
Added PMOS-NMOS pair (Added PMOS-NMOS pair (LPM devices)LPM devices) take area take areaLPMLPM (low-power mode) signal to be routed or locally generated (low-power mode) signal to be routed or locally generatedLayout of the new cell must be simple and low area overheadLayout of the new cell must be simple and low area overhead
• High performance when LPM signal High performance when LPM signal OFFOFFOn-resistance of LPM devices may reduce performanceOn-resistance of LPM devices may reduce performance
• Good power-performance trade-offGood power-performance trade-off
We add PMOS-NMOS pair to turn half devices off dynamicallyWe add PMOS-NMOS pair to turn half devices off dynamically
What power components likely to reduce?What power components likely to reduce?• Short-circuit power: During switching, PMOS & NMOS Short-circuit power: During switching, PMOS & NMOS ONON
momentarily momentarily short circuit between V short circuit between VDDDD and V and VSSSS
High when transition time (High when transition time (slewslew) is large) is large• Subthreshold leakage: when one of PMOS-NMOS pair Subthreshold leakage: when one of PMOS-NMOS pair
between Vbetween VDDDD and V and VSSSS ONON
Requirements:Requirements:• Low area overheadLow area overhead
Added PMOS-NMOS pair (Added PMOS-NMOS pair (LPM devices)LPM devices) take area take areaLPMLPM (low-power mode) signal to be routed or locally generated (low-power mode) signal to be routed or locally generatedLayout of the new cell must be simple and low area overheadLayout of the new cell must be simple and low area overhead
• High performance when LPM signal High performance when LPM signal OFFOFFOn-resistance of LPM devices may reduce performanceOn-resistance of LPM devices may reduce performance
• Good power-performance trade-offGood power-performance trade-off
On-Line Adjustable Buffering for Runtime Power ReductionOn-Line Adjustable Buffering for Runtime Power Reduction( http://vlsicad.ucsd.edu )( http://vlsicad.ucsd.edu )
Puneet SharmaPuneet Sharma†† ([email protected] ([email protected]))Advisor: Prof. Andrew B. KahngAdvisor: Prof. Andrew B. Kahng‡†‡†
Jointly with Mr. Sherief RedaJointly with Mr. Sherief Reda‡‡
††Electrical & Computer EngineeringElectrical & Computer Engineering‡‡Computer Science & EngineeringComputer Science & Engineering
CMOS Power:CMOS Power:•Operational – dynamic and leakageOperational – dynamic and leakage•Standby – leakageStandby – leakage
Approaches to reduce operational power:Approaches to reduce operational power:•Supply voltage (VSupply voltage (VDDDD) scaling) scaling•Dynamic VDynamic VDDDD and frequency scaling (DVFS) and frequency scaling (DVFS)
DVFS used to provide dynamic power-performance tradeoffDVFS used to provide dynamic power-performance tradeoff Switch to low-power mode if high performance not neededSwitch to low-power mode if high performance not needed
VDD already small to reduce dynamic powerVDD already small to reduce dynamic power Dynamic voltage scaling reduces noise marginsDynamic voltage scaling reduces noise margins DVFS difficult to use due to reduced VDVFS difficult to use due to reduced VDDDD
Our approach, like DVFS, provides dynamic low-power, low-Our approach, like DVFS, provides dynamic low-power, low-performance modes performance modes supplement or replace DVFS supplement or replace DVFSKey idea: Key idea: Many devices added for performance not functionality Many devices added for performance not functionality Turn those devices off when high-performance not needed Turn those devices off when high-performance not neededPoor interconnect scaling Poor interconnect scaling large number of repeaters large number of repeatersWe modify repeaters to dynamically adjust their driving capacityWe modify repeaters to dynamically adjust their driving capacity
CMOS Power:CMOS Power:•Operational – dynamic and leakageOperational – dynamic and leakage•Standby – leakageStandby – leakage
Approaches to reduce operational power:Approaches to reduce operational power:•Supply voltage (VSupply voltage (VDDDD) scaling) scaling•Dynamic VDynamic VDDDD and frequency scaling (DVFS) and frequency scaling (DVFS)
DVFS used to provide dynamic power-performance tradeoffDVFS used to provide dynamic power-performance tradeoff Switch to low-power mode if high performance not neededSwitch to low-power mode if high performance not needed
VDD already small to reduce dynamic powerVDD already small to reduce dynamic power Dynamic voltage scaling reduces noise marginsDynamic voltage scaling reduces noise margins DVFS difficult to use due to reduced VDVFS difficult to use due to reduced VDDDD
Our approach, like DVFS, provides dynamic low-power, low-Our approach, like DVFS, provides dynamic low-power, low-performance modes performance modes supplement or replace DVFS supplement or replace DVFSKey idea: Key idea: Many devices added for performance not functionality Many devices added for performance not functionality Turn those devices off when high-performance not needed Turn those devices off when high-performance not neededPoor interconnect scaling Poor interconnect scaling large number of repeaters large number of repeatersWe modify repeaters to dynamically adjust their driving capacityWe modify repeaters to dynamically adjust their driving capacity
Experimental SetupExperimental SetupCircuitsCircuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732): s38417 (8,890 cells), AES (15,272), OpenRisc (46,732)ToolsTools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, : Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis), Cadence SoC Encounter (P&R), timing and power analysis), Cadence SoC Encounter (P&R), SignalStorm (library characterization), TSMC 90nm library modelsSignalStorm (library characterization), TSMC 90nm library modelsOther settingsOther settings: power and timing analysis at slow corner, V: power and timing analysis at slow corner, VDDDD of of
1.1V and 0.9V, activity factor of 0.01.1.1V and 0.9V, activity factor of 0.01.
Power Reduction ResultsPower Reduction Results• Cell-level results: when LPM signal is turned Cell-level results: when LPM signal is turned ONON
• 20-20% reduction in leakage20-20% reduction in leakage• 15-30% reduction in short-circuit power (for same slew)15-30% reduction in short-circuit power (for same slew)• 45-65% increase in delay45-65% increase in delay
• Circuit-level results:Circuit-level results:
• Both dynamic and leakage power reduceBoth dynamic and leakage power reduce• 6-12% reduction in total power at low performance modes6-12% reduction in total power at low performance modes
Area Overhead EstimationArea Overhead Estimation• Area overhead due to LPM devices is 0.91% to 5.57%. May be Area overhead due to LPM devices is 0.91% to 5.57%. May be
smaller as LPM devices placeable in whitespace.smaller as LPM devices placeable in whitespace.• Routing overhead: V’Routing overhead: V’DDDD and V’ and V’SSSS nets routed as min. Steiner nets routed as min. Steiner
trees and found shorter than scanchain; LPM signal has short trees and found shorter than scanchain; LPM signal has short wirelength as #LPM devices is small.wirelength as #LPM devices is small.
Experimental SetupExperimental SetupCircuitsCircuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732): s38417 (8,890 cells), AES (15,272), OpenRisc (46,732)ToolsTools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, : Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis), Cadence SoC Encounter (P&R), timing and power analysis), Cadence SoC Encounter (P&R), SignalStorm (library characterization), TSMC 90nm library modelsSignalStorm (library characterization), TSMC 90nm library modelsOther settingsOther settings: power and timing analysis at slow corner, V: power and timing analysis at slow corner, VDDDD of of
1.1V and 0.9V, activity factor of 0.01.1.1V and 0.9V, activity factor of 0.01.
Power Reduction ResultsPower Reduction Results• Cell-level results: when LPM signal is turned Cell-level results: when LPM signal is turned ONON
• 20-20% reduction in leakage20-20% reduction in leakage• 15-30% reduction in short-circuit power (for same slew)15-30% reduction in short-circuit power (for same slew)• 45-65% increase in delay45-65% increase in delay
• Circuit-level results:Circuit-level results:
• Both dynamic and leakage power reduceBoth dynamic and leakage power reduce• 6-12% reduction in total power at low performance modes6-12% reduction in total power at low performance modes
Area Overhead EstimationArea Overhead Estimation• Area overhead due to LPM devices is 0.91% to 5.57%. May be Area overhead due to LPM devices is 0.91% to 5.57%. May be
smaller as LPM devices placeable in whitespace.smaller as LPM devices placeable in whitespace.• Routing overhead: V’Routing overhead: V’DDDD and V’ and V’SSSS nets routed as min. Steiner nets routed as min. Steiner
trees and found shorter than scanchain; LPM signal has short trees and found shorter than scanchain; LPM signal has short wirelength as #LPM devices is small.wirelength as #LPM devices is small.
Problem:Problem: Custom repeaters ~5% slower when LPM signal Custom repeaters ~5% slower when LPM signal OFFOFF Up to ~5% reduction in circuit performanceUp to ~5% reduction in circuit performance
Solution:Solution: use custom repeaters only on non- timing-critical pathsuse custom repeaters only on non- timing-critical pathsAdditional constraint: slew constraints not violated when LPM Additional constraint: slew constraints not violated when LPM signal is signal is OFF OFF or or ON.ON.We characterize custom repeaters (i.e., find delay, slew, power, We characterize custom repeaters (i.e., find delay, slew, power, input capacitance) and then perform remapping with synthesis input capacitance) and then perform remapping with synthesis tool subject to delay and slew constraints.tool subject to delay and slew constraints. No loss in circuit performance & no slew violationsNo loss in circuit performance & no slew violations
Problem:Problem: Custom repeaters ~5% slower when LPM signal Custom repeaters ~5% slower when LPM signal OFFOFF Up to ~5% reduction in circuit performanceUp to ~5% reduction in circuit performance
Solution:Solution: use custom repeaters only on non- timing-critical pathsuse custom repeaters only on non- timing-critical pathsAdditional constraint: slew constraints not violated when LPM Additional constraint: slew constraints not violated when LPM signal is signal is OFF OFF or or ON.ON.We characterize custom repeaters (i.e., find delay, slew, power, We characterize custom repeaters (i.e., find delay, slew, power, input capacitance) and then perform remapping with synthesis input capacitance) and then perform remapping with synthesis tool subject to delay and slew constraints.tool subject to delay and slew constraints. No loss in circuit performance & no slew violationsNo loss in circuit performance & no slew violations
• Power-performance for circuitPower-performance for circuitAES shownAES shown
• Utilize slack to reduce powerUtilize slack to reduce powerwhen high performance notwhen high performance notneededneeded
• Power lowered or unchangedPower lowered or unchangedwith LPMwith LPM
• Alternatively, unchanged orAlternatively, unchanged orhigher performance givenhigher performance givenpower budgetpower budget
• Higher performance per wattHigher performance per watt
• Power-performance for circuitPower-performance for circuitAES shownAES shown
• Utilize slack to reduce powerUtilize slack to reduce powerwhen high performance notwhen high performance notneededneeded
• Power lowered or unchangedPower lowered or unchangedwith LPMwith LPM
• Alternatively, unchanged orAlternatively, unchanged orhigher performance givenhigher performance givenpower budgetpower budget
• Higher performance per wattHigher performance per watt
Restricting Area OverheadRestricting Area OverheadIntroductionIntroduction
Custom Repeater DesignCustom Repeater DesignEnsuring High PerformanceEnsuring High Performance
Power-Performance TradeoffPower-Performance Tradeoff
Experimental ValidationExperimental Validation
Conclusions & Ongoing WorkConclusions & Ongoing WorkTraditional InverterTraditional Inverter Custom InverterCustom Inverter
LPM devices shared by two invertersLPM devices shared by two inverters
Power-performance w/ DVFS &Power-performance w/ DVFS &DVFS combined w/ LPMDVFS combined w/ LPM
AES
2
2.5
3
3.5
4
4.5
445 438 432 389 354 349 343 337Frequency (MHz)
Tota
l Pow
er (m
W)
DVFS
DVFS+LPM
OpenRisc
77.5
88.5
99.510
10.511
11.5
192 187 181 173 164 159 154 149
Frequency (MHz)
To
tal
Po
wer
(m
W)
DVFS
DVFS+LPM