On-Line Adjustable Buffering for Runtime Power Reduction ( vlsicad.ucsd )

1
UC San Diego Computer Engineering UC San Diego Computer Engineering VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering UC San Diego Computer Engineering VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering UC San Diego Computer Engineering VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering UC San Diego Computer Engineering VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering UC San Diego Computer Engineering VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering UC San Diego Computer Engineering VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering UC San Diego Computer Engineering VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering UC San Diego Computer Engineering VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer • UC San Diego Computer Engineering • Engineering • VLSI CAD Laboratory VLSI CAD Laboratory U C S a n D i e g o C o m p u t e r E n g i n e e r i n g U C S a n D i e g o C o m p u t e r E n g i n e e r i n g V L S I C A D L a b o r a t o r y V L S I C A D L a b o r a t o r y U C S a n D i e g o C o m p u t e r E n g i n e e r i n g U C S a n D i e g o C o m p u t e r E n g i n e e r i n g V L S I C A D L a b o r a t o r y V L S I C A D L a b o r a t o r y U C S a n D i e g o U C S a n D i e g o C o m p u t e r E n g i n e e r i n g C o m p u t e r E n g i n e e r i n g V L S I C A D L a b o r a t o r y V L S I C A D L a b o r a t o r y Presented new technique to dynamically trade- Presented new technique to dynamically trade- off power-performance that turns off power-performance that turns off off devices not devices not needed at less than peak performance needed at less than peak performance Both leakage and dynamic power reduce; total Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcases power reduction is 6-12% on our testcases By sharing of LPM devices, area overhead By sharing of LPM devices, area overhead reduced to <5.57% reduced to <5.57% No adverse affect on performance of the circuit No adverse affect on performance of the circuit when LPM signal is when LPM signal is off off . . Ongoing work: Ongoing work: Actual layout of custom repeater with routing Actual layout of custom repeater with routing of V’ of V’ DD DD , V’ , V’ SS SS , LPM nets to accurately estimate , LPM nets to accurately estimate power, performance, area overhead power, performance, area overhead Customizing more cells especially clock Customizing more cells especially clock repeaters to further improve power-performance repeaters to further improve power-performance trade-off. trade-off. Problem: Problem: High performance when LPM signal on High performance when LPM signal on use large LPM devices use large LPM devices large area overhead large area overhead Solution: Solution: Share LPM devices among multiple Share LPM devices among multiple repeaters repeaters Fewer LPM devices but Fewer LPM devices but virtual V virtual V DD DD (V’ (V’ DD DD ) and ) and V V SS SS (V’ (V’ SS SS ) need routing ) need routing Note: All LPM devices drive Note: All LPM devices drive V’ V’ DD DD and V’ and V’ SS SS How many LPM devices How many LPM devices needed? needed? Compute simultaneous Compute simultaneous switching rate (SSR) by switching rate (SSR) by finding the max. #repeaters that have finding the max. #repeaters that have overlapping timing windows. Time = overlapping timing windows. Time = O O (RlogR) (R (RlogR) (R = #repeaters) = #repeaters) Find total width of all repeater devices (= Find total width of all repeater devices (= W W R ) ) For good performance, width of LPM devices = For good performance, width of LPM devices = 2xSSRxW 2xSSRxW R Typical SSR=~10% Typical SSR=~10% small area overhead small area overhead We add PMOS-NMOS pair to turn half devices off We add PMOS-NMOS pair to turn half devices off dynamically dynamically What power components likely to reduce? What power components likely to reduce? Short-circuit power: During switching, PMOS & Short-circuit power: During switching, PMOS & NMOS NMOS ON ON momentarily momentarily short circuit between short circuit between V V DD DD and V and V SS SS High when transition time ( High when transition time ( slew slew ) is large ) is large Subthreshold leakage: when one of PMOS-NMOS Subthreshold leakage: when one of PMOS-NMOS pair between V pair between V DD DD and V and V SS SS ON ON Requirements: Requirements: Low area overhead Low area overhead Added PMOS-NMOS pair ( Added PMOS-NMOS pair ( LPM devices) LPM devices) take area take area LPM LPM (low-power mode) signal to be routed or (low-power mode) signal to be routed or locally generated locally generated Layout of the new cell must be simple and low Layout of the new cell must be simple and low area overhead area overhead High performance when LPM signal High performance when LPM signal OFF OFF On-resistance of LPM devices may reduce On-resistance of LPM devices may reduce On-Line Adjustable Buffering for Runtime Power Reduction On-Line Adjustable Buffering for Runtime Power Reduction ( http://vlsicad.ucsd.edu ) ( http://vlsicad.ucsd.edu ) Puneet Sharma Puneet Sharma ([email protected] ([email protected] ) ) Advisor: Prof. Andrew B. Kahng Advisor: Prof. Andrew B. Kahng ‡† ‡† Jointly with Mr. Sherief Reda Jointly with Mr. Sherief Reda Electrical & Computer Engineering Electrical & Computer Engineering Computer Science & Engineering Computer Science & Engineering CMOS Power: CMOS Power: Operational – dynamic and leakage Operational – dynamic and leakage Standby – leakage Standby – leakage Approaches to reduce operational power: Approaches to reduce operational power: Supply voltage (V Supply voltage (V DD DD ) scaling ) scaling Dynamic V Dynamic V DD DD and frequency scaling (DVFS) and frequency scaling (DVFS) DVFS used to provide dynamic power-performance DVFS used to provide dynamic power-performance tradeoff tradeoff Switch to low-power mode if high performance Switch to low-power mode if high performance not needed not needed VDD already small to reduce dynamic power VDD already small to reduce dynamic power Dynamic voltage scaling reduces noise margins Dynamic voltage scaling reduces noise margins DVFS difficult to use due to reduced V DVFS difficult to use due to reduced V DD DD Our approach, like DVFS, provides dynamic low- Our approach, like DVFS, provides dynamic low- power, low-performance modes power, low-performance modes supplement or supplement or replace DVFS replace DVFS Key idea: Key idea: Many devices added for performance not Many devices added for performance not functionality functionality Turn those devices off when Turn those devices off when high-performance not needed high-performance not needed Poor interconnect scaling Poor interconnect scaling large number of large number of repeaters repeaters We modify repeaters to dynamically adjust their driving capacity We modify repeaters to dynamically adjust their driving capacity Experimental Setup Experimental Setup Circuits Circuits : s38417 (8,890 cells), AES (15,272), : s38417 (8,890 cells), AES (15,272), OpenRisc (46,732) OpenRisc (46,732) Tools Tools : Synopsys HSPICE (SPICE), Design Compiler : Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis), Cadence (synthesis, timing and power analysis), Cadence SoC Encounter (P&R), SignalStorm (library SoC Encounter (P&R), SignalStorm (library characterization), TSMC 90nm library models characterization), TSMC 90nm library models Other settings Other settings : power and timing analysis at : power and timing analysis at slow corner, V slow corner, V DD DD of 1.1V and 0.9V, activity of 1.1V and 0.9V, activity factor of 0.01. factor of 0.01. Power Reduction Results Power Reduction Results Cell-level results: when LPM signal is turned Cell-level results: when LPM signal is turned ON ON 20-20% reduction in leakage 20-20% reduction in leakage 15-30% reduction in short-circuit power 15-30% reduction in short-circuit power 45-65% increase in delay 45-65% increase in delay Circuit-level results: Circuit-level results: Both dynamic and leakage power reduce Both dynamic and leakage power reduce 6-12% reduction in total power at low 6-12% reduction in total power at low performance modes performance modes Area Overhead Estimation Area Overhead Estimation Area overhead due to LPM devices is 0.91% to Area overhead due to LPM devices is 0.91% to 5.57%. May be smaller as LPM devices placeable 5.57%. May be smaller as LPM devices placeable in whitespace. in whitespace. Routing overhead: V’ Routing overhead: V’ DD DD and V’ and V’ SS SS nets routed as nets routed as min. Steiner trees and found shorter than min. Steiner trees and found shorter than scanchain; LPM signal has short wirelength as scanchain; LPM signal has short wirelength as #LPM devices is small. #LPM devices is small. Problem: Problem: Custom repeaters ~5% slower when LPM Custom repeaters ~5% slower when LPM signal signal OFF OFF Up to ~5% reduction in circuit performance Up to ~5% reduction in circuit performance Solution: Solution: use custom repeaters only on non- use custom repeaters only on non- timing-critical paths timing-critical paths Additional constraint: slew constraints not Additional constraint: slew constraints not violated when LPM signal is violated when LPM signal is OFF OFF or or ON. ON. We characterize custom repeaters (i.e., find We characterize custom repeaters (i.e., find delay, slew, power, input capacitance) and then delay, slew, power, input capacitance) and then perform remapping with synthesis tool subject to perform remapping with synthesis tool subject to delay and slew constraints. delay and slew constraints. No loss in circuit performance & no slew violations No loss in circuit performance & no slew violations Power-performance for circuit Power-performance for circuit AES shown AES shown Utilize slack to reduce power Utilize slack to reduce power when high performance not when high performance not needed needed Power lowered or unchanged Power lowered or unchanged with LPM with LPM Alternatively, unchanged or Alternatively, unchanged or higher performance given higher performance given power budget power budget Higher performance per watt Higher performance per watt Restricting Area Overhead Restricting Area Overhead Introduction Introduction Custom Repeater Design Custom Repeater Design Ensuring High Performance Ensuring High Performance Power-Performance Tradeoff Power-Performance Tradeoff Experimental Validation Experimental Validation Conclusions & Ongoing Work Conclusions & Ongoing Work Traditional Inverter Traditional Inverter Custom Inverter Custom Inverter LPM devices shared by two inverters LPM devices shared by two inverters Power-performance w/ DVFS & Power-performance w/ DVFS & DVFS combined w/ LPM DVFS combined w/ LPM AES 2 2.5 3 3.5 4 4.5 445 438 432 389 354 349 343 337 Frequency (M Hz) Total Power(m W) DVFS DVFS+LPM O penR isc 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 192 187 181 173 164 159 154 149 Frequency (M H z) TotalPow er(m W ) DVFS DVFS+LPM

description

Puneet Sharma † ([email protected] ) Advisor: Prof. Andrew B. Kahng ‡† Jointly with Mr. Sherief Reda ‡ † Electrical & Computer Engineering ‡ Computer Science & Engineering. On-Line Adjustable Buffering for Runtime Power Reduction ( http://vlsicad.ucsd.edu ). Introduction. - PowerPoint PPT Presentation

Transcript of On-Line Adjustable Buffering for Runtime Power Reduction ( vlsicad.ucsd )

Page 1: On-Line Adjustable Buffering for Runtime Power Reduction ( vlsicad.ucsd )

UC San Diego Computer Engineering • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory UC San Diego Computer Engineering • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory • UC San Diego Computer Engineering • • UC San Diego Computer Engineering • VLSI CAD Laboratory VLSI CAD Laboratory

U

C S

an D

ieg

o C

om

pute

r Eng

ineeri

ng

UC

San D

ieg

o C

om

pute

r Eng

ineeri

ng

• V

LSI C

AD

Lab

ora

tory

VLS

I C

AD

Lab

ora

tory

• U

C S

an D

ieg

o C

om

pute

r Eng

ineeri

ng

• U

C S

an D

ieg

o C

om

pute

r Eng

ineeri

ng

• V

LSI C

AD

Lab

ora

tory

VLS

I C

AD

Lab

ora

tory

• U

C S

an D

ieg

o C

om

pute

r Eng

ineeri

ng

• U

C S

an D

ieg

o C

om

pute

r Eng

ineeri

ng

• V

LSI C

AD

Lab

ora

tory

VLS

I C

AD

Lab

ora

tory

U

C S

an D

ieg

o C

om

pute

r Eng

ineerin

g •

U

C S

an D

ieg

o C

om

pute

r Eng

ineerin

g •

VLS

I CA

D La

bora

tory

V

LSI C

AD

Lab

ora

tory

• U

C S

an D

ieg

o C

om

pute

r Eng

ineerin

g •

• U

C S

an D

ieg

o C

om

pute

r Eng

ineerin

g •

VLS

I CA

D La

bora

tory

V

LSI C

AD

Lab

ora

tory

• U

C S

an D

ieg

o C

om

pute

r Eng

ineerin

g •

• U

C S

an D

ieg

o C

om

pute

r Eng

ineerin

g •

VLS

I CA

D La

bora

tory

V

LSI C

AD

Lab

ora

tory

••

••••

••

• Presented new technique to dynamically trade-off power-Presented new technique to dynamically trade-off power-performance that turns performance that turns offoff devices not needed at less than peak devices not needed at less than peak performanceperformance

• Both leakage and dynamic power reduce; total power reduction Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcasesis 6-12% on our testcases

• By sharing of LPM devices, area overhead reduced to <5.57%By sharing of LPM devices, area overhead reduced to <5.57%• No adverse affect on performance of the circuit when LPM signal No adverse affect on performance of the circuit when LPM signal is is offoff..

Ongoing work:Ongoing work:• Actual layout of custom repeater with routing of V’Actual layout of custom repeater with routing of V’DDDD, V’, V’SSSS, LPM , LPM

nets to accurately estimate power, performance, area overheadnets to accurately estimate power, performance, area overhead• Customizing more cells especially clock repeaters to further Customizing more cells especially clock repeaters to further improve power-performance trade-off.improve power-performance trade-off.

• Presented new technique to dynamically trade-off power-Presented new technique to dynamically trade-off power-performance that turns performance that turns offoff devices not needed at less than peak devices not needed at less than peak performanceperformance

• Both leakage and dynamic power reduce; total power reduction Both leakage and dynamic power reduce; total power reduction is 6-12% on our testcasesis 6-12% on our testcases

• By sharing of LPM devices, area overhead reduced to <5.57%By sharing of LPM devices, area overhead reduced to <5.57%• No adverse affect on performance of the circuit when LPM signal No adverse affect on performance of the circuit when LPM signal is is offoff..

Ongoing work:Ongoing work:• Actual layout of custom repeater with routing of V’Actual layout of custom repeater with routing of V’DDDD, V’, V’SSSS, LPM , LPM

nets to accurately estimate power, performance, area overheadnets to accurately estimate power, performance, area overhead• Customizing more cells especially clock repeaters to further Customizing more cells especially clock repeaters to further improve power-performance trade-off.improve power-performance trade-off.

Problem:Problem: High performance when LPM signal on High performance when LPM signal on use large use large LPM devices LPM devices large area overhead large area overheadSolution:Solution: Share LPM devices among multiple repeatersShare LPM devices among multiple repeaters

Fewer LPM devices butFewer LPM devices butvirtual Vvirtual VDDDD (V’ (V’DDDD) and ) and

VVSSSS (V’ (V’SSSS) need routing) need routing

Note: All LPM devices driveNote: All LPM devices driveV’V’DDDD and V’ and V’SSSS

How many LPM devicesHow many LPM devicesneeded?needed?• Compute simultaneousCompute simultaneous

switching rate (SSR) by switching rate (SSR) by finding the max. #repeaters that have overlapping timing finding the max. #repeaters that have overlapping timing windows. Time = windows. Time = OO(RlogR) (R = #repeaters)(RlogR) (R = #repeaters)

• Find total width of all repeater devices (=Find total width of all repeater devices (=WWRR))• For good performance, width of LPM devices = 2xSSRxWFor good performance, width of LPM devices = 2xSSRxWRR

Typical SSR=~10% Typical SSR=~10% small area overhead small area overhead

Problem:Problem: High performance when LPM signal on High performance when LPM signal on use large use large LPM devices LPM devices large area overhead large area overheadSolution:Solution: Share LPM devices among multiple repeatersShare LPM devices among multiple repeaters

Fewer LPM devices butFewer LPM devices butvirtual Vvirtual VDDDD (V’ (V’DDDD) and ) and

VVSSSS (V’ (V’SSSS) need routing) need routing

Note: All LPM devices driveNote: All LPM devices driveV’V’DDDD and V’ and V’SSSS

How many LPM devicesHow many LPM devicesneeded?needed?• Compute simultaneousCompute simultaneous

switching rate (SSR) by switching rate (SSR) by finding the max. #repeaters that have overlapping timing finding the max. #repeaters that have overlapping timing windows. Time = windows. Time = OO(RlogR) (R = #repeaters)(RlogR) (R = #repeaters)

• Find total width of all repeater devices (=Find total width of all repeater devices (=WWRR))• For good performance, width of LPM devices = 2xSSRxWFor good performance, width of LPM devices = 2xSSRxWRR

Typical SSR=~10% Typical SSR=~10% small area overhead small area overhead

We add PMOS-NMOS pair to turn half devices off dynamicallyWe add PMOS-NMOS pair to turn half devices off dynamically

What power components likely to reduce?What power components likely to reduce?• Short-circuit power: During switching, PMOS & NMOS Short-circuit power: During switching, PMOS & NMOS ONON

momentarily momentarily short circuit between V short circuit between VDDDD and V and VSSSS

High when transition time (High when transition time (slewslew) is large) is large• Subthreshold leakage: when one of PMOS-NMOS pair Subthreshold leakage: when one of PMOS-NMOS pair

between Vbetween VDDDD and V and VSSSS ONON

Requirements:Requirements:• Low area overheadLow area overhead

Added PMOS-NMOS pair (Added PMOS-NMOS pair (LPM devices)LPM devices) take area take areaLPMLPM (low-power mode) signal to be routed or locally generated (low-power mode) signal to be routed or locally generatedLayout of the new cell must be simple and low area overheadLayout of the new cell must be simple and low area overhead

• High performance when LPM signal High performance when LPM signal OFFOFFOn-resistance of LPM devices may reduce performanceOn-resistance of LPM devices may reduce performance

• Good power-performance trade-offGood power-performance trade-off

We add PMOS-NMOS pair to turn half devices off dynamicallyWe add PMOS-NMOS pair to turn half devices off dynamically

What power components likely to reduce?What power components likely to reduce?• Short-circuit power: During switching, PMOS & NMOS Short-circuit power: During switching, PMOS & NMOS ONON

momentarily momentarily short circuit between V short circuit between VDDDD and V and VSSSS

High when transition time (High when transition time (slewslew) is large) is large• Subthreshold leakage: when one of PMOS-NMOS pair Subthreshold leakage: when one of PMOS-NMOS pair

between Vbetween VDDDD and V and VSSSS ONON

Requirements:Requirements:• Low area overheadLow area overhead

Added PMOS-NMOS pair (Added PMOS-NMOS pair (LPM devices)LPM devices) take area take areaLPMLPM (low-power mode) signal to be routed or locally generated (low-power mode) signal to be routed or locally generatedLayout of the new cell must be simple and low area overheadLayout of the new cell must be simple and low area overhead

• High performance when LPM signal High performance when LPM signal OFFOFFOn-resistance of LPM devices may reduce performanceOn-resistance of LPM devices may reduce performance

• Good power-performance trade-offGood power-performance trade-off

On-Line Adjustable Buffering for Runtime Power ReductionOn-Line Adjustable Buffering for Runtime Power Reduction( http://vlsicad.ucsd.edu )( http://vlsicad.ucsd.edu )

Puneet SharmaPuneet Sharma†† ([email protected] ([email protected]))Advisor: Prof. Andrew B. KahngAdvisor: Prof. Andrew B. Kahng‡†‡†

Jointly with Mr. Sherief RedaJointly with Mr. Sherief Reda‡‡

††Electrical & Computer EngineeringElectrical & Computer Engineering‡‡Computer Science & EngineeringComputer Science & Engineering

CMOS Power:CMOS Power:•Operational – dynamic and leakageOperational – dynamic and leakage•Standby – leakageStandby – leakage

Approaches to reduce operational power:Approaches to reduce operational power:•Supply voltage (VSupply voltage (VDDDD) scaling) scaling•Dynamic VDynamic VDDDD and frequency scaling (DVFS) and frequency scaling (DVFS)

DVFS used to provide dynamic power-performance tradeoffDVFS used to provide dynamic power-performance tradeoff Switch to low-power mode if high performance not neededSwitch to low-power mode if high performance not needed

VDD already small to reduce dynamic powerVDD already small to reduce dynamic power Dynamic voltage scaling reduces noise marginsDynamic voltage scaling reduces noise margins DVFS difficult to use due to reduced VDVFS difficult to use due to reduced VDDDD

Our approach, like DVFS, provides dynamic low-power, low-Our approach, like DVFS, provides dynamic low-power, low-performance modes performance modes supplement or replace DVFS supplement or replace DVFSKey idea: Key idea: Many devices added for performance not functionality Many devices added for performance not functionality Turn those devices off when high-performance not needed Turn those devices off when high-performance not neededPoor interconnect scaling Poor interconnect scaling large number of repeaters large number of repeatersWe modify repeaters to dynamically adjust their driving capacityWe modify repeaters to dynamically adjust their driving capacity

CMOS Power:CMOS Power:•Operational – dynamic and leakageOperational – dynamic and leakage•Standby – leakageStandby – leakage

Approaches to reduce operational power:Approaches to reduce operational power:•Supply voltage (VSupply voltage (VDDDD) scaling) scaling•Dynamic VDynamic VDDDD and frequency scaling (DVFS) and frequency scaling (DVFS)

DVFS used to provide dynamic power-performance tradeoffDVFS used to provide dynamic power-performance tradeoff Switch to low-power mode if high performance not neededSwitch to low-power mode if high performance not needed

VDD already small to reduce dynamic powerVDD already small to reduce dynamic power Dynamic voltage scaling reduces noise marginsDynamic voltage scaling reduces noise margins DVFS difficult to use due to reduced VDVFS difficult to use due to reduced VDDDD

Our approach, like DVFS, provides dynamic low-power, low-Our approach, like DVFS, provides dynamic low-power, low-performance modes performance modes supplement or replace DVFS supplement or replace DVFSKey idea: Key idea: Many devices added for performance not functionality Many devices added for performance not functionality Turn those devices off when high-performance not needed Turn those devices off when high-performance not neededPoor interconnect scaling Poor interconnect scaling large number of repeaters large number of repeatersWe modify repeaters to dynamically adjust their driving capacityWe modify repeaters to dynamically adjust their driving capacity

Experimental SetupExperimental SetupCircuitsCircuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732): s38417 (8,890 cells), AES (15,272), OpenRisc (46,732)ToolsTools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, : Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis), Cadence SoC Encounter (P&R), timing and power analysis), Cadence SoC Encounter (P&R), SignalStorm (library characterization), TSMC 90nm library modelsSignalStorm (library characterization), TSMC 90nm library modelsOther settingsOther settings: power and timing analysis at slow corner, V: power and timing analysis at slow corner, VDDDD of of

1.1V and 0.9V, activity factor of 0.01.1.1V and 0.9V, activity factor of 0.01.

Power Reduction ResultsPower Reduction Results• Cell-level results: when LPM signal is turned Cell-level results: when LPM signal is turned ONON

• 20-20% reduction in leakage20-20% reduction in leakage• 15-30% reduction in short-circuit power (for same slew)15-30% reduction in short-circuit power (for same slew)• 45-65% increase in delay45-65% increase in delay

• Circuit-level results:Circuit-level results:

• Both dynamic and leakage power reduceBoth dynamic and leakage power reduce• 6-12% reduction in total power at low performance modes6-12% reduction in total power at low performance modes

Area Overhead EstimationArea Overhead Estimation• Area overhead due to LPM devices is 0.91% to 5.57%. May be Area overhead due to LPM devices is 0.91% to 5.57%. May be

smaller as LPM devices placeable in whitespace.smaller as LPM devices placeable in whitespace.• Routing overhead: V’Routing overhead: V’DDDD and V’ and V’SSSS nets routed as min. Steiner nets routed as min. Steiner

trees and found shorter than scanchain; LPM signal has short trees and found shorter than scanchain; LPM signal has short wirelength as #LPM devices is small.wirelength as #LPM devices is small.

Experimental SetupExperimental SetupCircuitsCircuits: s38417 (8,890 cells), AES (15,272), OpenRisc (46,732): s38417 (8,890 cells), AES (15,272), OpenRisc (46,732)ToolsTools: Synopsys HSPICE (SPICE), Design Compiler (synthesis, : Synopsys HSPICE (SPICE), Design Compiler (synthesis, timing and power analysis), Cadence SoC Encounter (P&R), timing and power analysis), Cadence SoC Encounter (P&R), SignalStorm (library characterization), TSMC 90nm library modelsSignalStorm (library characterization), TSMC 90nm library modelsOther settingsOther settings: power and timing analysis at slow corner, V: power and timing analysis at slow corner, VDDDD of of

1.1V and 0.9V, activity factor of 0.01.1.1V and 0.9V, activity factor of 0.01.

Power Reduction ResultsPower Reduction Results• Cell-level results: when LPM signal is turned Cell-level results: when LPM signal is turned ONON

• 20-20% reduction in leakage20-20% reduction in leakage• 15-30% reduction in short-circuit power (for same slew)15-30% reduction in short-circuit power (for same slew)• 45-65% increase in delay45-65% increase in delay

• Circuit-level results:Circuit-level results:

• Both dynamic and leakage power reduceBoth dynamic and leakage power reduce• 6-12% reduction in total power at low performance modes6-12% reduction in total power at low performance modes

Area Overhead EstimationArea Overhead Estimation• Area overhead due to LPM devices is 0.91% to 5.57%. May be Area overhead due to LPM devices is 0.91% to 5.57%. May be

smaller as LPM devices placeable in whitespace.smaller as LPM devices placeable in whitespace.• Routing overhead: V’Routing overhead: V’DDDD and V’ and V’SSSS nets routed as min. Steiner nets routed as min. Steiner

trees and found shorter than scanchain; LPM signal has short trees and found shorter than scanchain; LPM signal has short wirelength as #LPM devices is small.wirelength as #LPM devices is small.

Problem:Problem: Custom repeaters ~5% slower when LPM signal Custom repeaters ~5% slower when LPM signal OFFOFF Up to ~5% reduction in circuit performanceUp to ~5% reduction in circuit performance

Solution:Solution: use custom repeaters only on non- timing-critical pathsuse custom repeaters only on non- timing-critical pathsAdditional constraint: slew constraints not violated when LPM Additional constraint: slew constraints not violated when LPM signal is signal is OFF OFF or or ON.ON.We characterize custom repeaters (i.e., find delay, slew, power, We characterize custom repeaters (i.e., find delay, slew, power, input capacitance) and then perform remapping with synthesis input capacitance) and then perform remapping with synthesis tool subject to delay and slew constraints.tool subject to delay and slew constraints. No loss in circuit performance & no slew violationsNo loss in circuit performance & no slew violations

Problem:Problem: Custom repeaters ~5% slower when LPM signal Custom repeaters ~5% slower when LPM signal OFFOFF Up to ~5% reduction in circuit performanceUp to ~5% reduction in circuit performance

Solution:Solution: use custom repeaters only on non- timing-critical pathsuse custom repeaters only on non- timing-critical pathsAdditional constraint: slew constraints not violated when LPM Additional constraint: slew constraints not violated when LPM signal is signal is OFF OFF or or ON.ON.We characterize custom repeaters (i.e., find delay, slew, power, We characterize custom repeaters (i.e., find delay, slew, power, input capacitance) and then perform remapping with synthesis input capacitance) and then perform remapping with synthesis tool subject to delay and slew constraints.tool subject to delay and slew constraints. No loss in circuit performance & no slew violationsNo loss in circuit performance & no slew violations

• Power-performance for circuitPower-performance for circuitAES shownAES shown

• Utilize slack to reduce powerUtilize slack to reduce powerwhen high performance notwhen high performance notneededneeded

• Power lowered or unchangedPower lowered or unchangedwith LPMwith LPM

• Alternatively, unchanged orAlternatively, unchanged orhigher performance givenhigher performance givenpower budgetpower budget

• Higher performance per wattHigher performance per watt

• Power-performance for circuitPower-performance for circuitAES shownAES shown

• Utilize slack to reduce powerUtilize slack to reduce powerwhen high performance notwhen high performance notneededneeded

• Power lowered or unchangedPower lowered or unchangedwith LPMwith LPM

• Alternatively, unchanged orAlternatively, unchanged orhigher performance givenhigher performance givenpower budgetpower budget

• Higher performance per wattHigher performance per watt

Restricting Area OverheadRestricting Area OverheadIntroductionIntroduction

Custom Repeater DesignCustom Repeater DesignEnsuring High PerformanceEnsuring High Performance

Power-Performance TradeoffPower-Performance Tradeoff

Experimental ValidationExperimental Validation

Conclusions & Ongoing WorkConclusions & Ongoing WorkTraditional InverterTraditional Inverter Custom InverterCustom Inverter

LPM devices shared by two invertersLPM devices shared by two inverters

Power-performance w/ DVFS &Power-performance w/ DVFS &DVFS combined w/ LPMDVFS combined w/ LPM

AES

2

2.5

3

3.5

4

4.5

445 438 432 389 354 349 343 337Frequency (MHz)

Tota

l Pow

er (m

W)

DVFS

DVFS+LPM

OpenRisc

77.5

88.5

99.510

10.511

11.5

192 187 181 173 164 159 154 149

Frequency (MHz)

To

tal

Po

wer

(m

W)

DVFS

DVFS+LPM