Slack Redistribution for Graceful Degradation Under Voltage Overscaling

39
UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010 Slack Redistribution for Graceful Degradation Under Voltage Overscaling Andrew B. Kahng , Seokhyeong Kang , Rakesh Kumar and John Sartori VLSI CAD LABORATORY, UCSD PASSAT GROUP, UIUC

description

Slack Redistribution for Graceful Degradation Under Voltage Overscaling. Andrew B. Kahng † , Seokhyeong Kang † , Rakesh Kumar ‡ and John Sartori ‡ † VLSI CAD LABORATORY, UCSD ‡ PASSAT GROUP, UIUC. Outline. Background and motivation Voltage scaling and BTWC designs - PowerPoint PPT Presentation

Transcript of Slack Redistribution for Graceful Degradation Under Voltage Overscaling

Page 1: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

Andrew B. Kahng†, Seokhyeong Kang†, Rakesh Kumar‡ and John Sartori‡

†VLSI CAD LABORATORY, UCSD‡PASSAT GROUP, UIUC

Page 2: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(2/25)

Outline• Background and motivation

• Voltage scaling and BTWC designs• Limitation of Traditional CAD Flow

• Power-Aware Slack Redistribution• Our design optimization goal• Related work: BlueShift• Our Heuristic

• Experimental Framework and Results• Design methodology• Testbed• Results and analysis

• Conclusions and Ongoing Work

Page 3: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(3/25)

Reducing Power with Voltage Scaling• Power is a first-order design constraint• Voltage scaling can significantly reduce power• Voltage scaling may result in timing violations

• Voltage scaling is limited because of timing errors

Pow

er

(lower voltage)Voltage

Timing errors begin to occur

Page 4: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(4/25)

CPU, Heal Thyself...

Better-Than-Worst-Case Design• Better-Than-worst-case (BTWC) design approach

• Optimize for normal operating conditions• Trade off reliability and power/performance• Have error detection/correction mechanism (e.g., Razor*)

* Ernst et al. “Razor: A low power pipeline based on circuit-level timing speculation”, Proc. MICRO 2003.

Traditional IC design BTWC design

• Does not allow timing errors in STA

• Error correction architecture allows timing errors

• Fixed target frequency and operating voltage

• Overclocking or voltage overscaling

• BTWC design allows tradeoffs between reliability and power

Page 5: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(5/25)

v

PE(v

)

v

pwr(

v)

(lower voltage)

Voltage Scaling with Error Correction

• Error correction incurs power overhead

PE(v) : Error rate at voltage vpwr(v) : Power consumption at v

Minimum powerat point b

Voltage v

A

B

Voltage v

A

B

• Overscaling is possible for Better-Than-Worst-Case designs

Page 6: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(6/25)

Limitations of Traditional CAD Flow• Conventional designs exhibit

critical operating points• Many paths have near-critical

slack → wall of (critical) slack• Scaling beyond COP causes

massive errors that cannot be corrected

• Conventional designs fail critically when voltage is scaled down

• Error rate should be increased gracefully :

gradual slope slack

‘wall’ of slack

Num

ber o

f pat

hs

Timing slackZero slack

Erro

r rat

eLower voltage

Higher frequency

COP

Page 7: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(7/25)

Outline• Background and motivation

• Voltage scaling and BTWC designs• Limitation of Traditional CAD Flow

• Power-Aware Slack Redistribution• Our design optimization goal• Related work: BlueShift• Our Heuristic

• Experimental Framework and Results• Design methodology• Testbed• Results and analysis

• Conclusions and Ongoing Work

Page 8: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(8/25)

Our Design Optimization Goal• Problem: Minimize power for a given error rate • Goal: Achieve a ‘gradual slope’ slack distribution• Approach: Frequently-exercised paths: upsize cells

Rarely-exercised paths: downsize cells

• We make a gradual slope slack distribution

‘gradual slope’ slack

‘wall’ of slack

Num

ber o

f pat

hs

Timing slackZero slack after voltage scaling

Rarely exercised

paths

Frequently exercised

paths with gradual failure characteristic

Page 9: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(9/25)

Related Work: BlueShift

• BlueShift speed up• Paths with the highest frequency of timing errors• FBB (forward body-biasing) & Timing override

* Grescamp et al. “Blueshift: Designing processors for timing speculation from the ground up”, HPCA 2009

• BlueShift* : maximize frequency for a given error rate

Computeerror rate

ER < TargetGate-level simulation

YES

NO Speed up paths

Finish

• Limitation• Repetitive gate level simulation – impractical• Design overhead of FBB

• BlueShift is impractical with modern SOC designs

Page 10: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(10/25)

Our Heuristic

Set initial voltage

Error rate estimation

ER < ERtargetOptimize Paths

Voltage scalingYES

NO

PowerReduction

Finish

• Optimize slack distribution by cell swaps, exploiting switching activity information

• Iteratively scale target voltage the until error rate exceeds a target, and optimize negative slack paths

• Our heuristic: Voltage scaling → Optimize paths → Power reduction

Page 11: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(11/25)

Heuristic Implementation – Voltage Scaling

Path A

Path B

Path C

Negative Slack of Path A at the target voltage

Nominal voltage

Target voltage (fixed)

Target voltage (fixed)

Actual voltage at the target error rate

Unnecessary cell sizing

Path A

Path B

Path C

Negative Slack of Path A at the target voltage

Nominal voltage

Target voltage with the estimated error rate

Find target voltage and optimize iteratively

New target voltage

Set initial voltage

Error rate estimation

ER < ERtargetOptimize Paths

Voltage scalingYES

NO

PowerReduction

Finish

• Optimize with fixed target voltage• Lower voltage incrementally• Load a pre-characterized library at each voltage point• With iterative voltage scaling, we can find minimum

operating voltage

Page 12: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(12/25)

Heuristic Implementation – Optimize Paths

• Main idea: increase slack of frequently-exercised paths in order of increasing switching activity

• Procedure1. Pick a critical path p with maximum switching activity2. Resize cell instance ci in p3. If slack of path p is not improved, cell change is restored4. Repeat 2. ~ 3. for all cell instances in path p 5. Repeat 2.~ 4. for all critical paths

Set initial voltage

Error rate estimation

ER < ERtargetOptimize Paths

Voltage scalingYES

NO

PowerReduction

Finish

• OptimizePaths procedure reduces error rates and enables further voltage scaling

Page 13: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(13/25)

Heuristic Implementation – Power Reduction

• Main idea: Downsize cells on rarely-exercised paths in order of decreasing toggle rate

• Procedure1. Pick a cell c with minimum toggle rate2. Downsize cell c with logically equivalent cell3. Incremental timing analysis and check error rate4. If error rate is increased, cell change is restored5. Repeat 1. ~ 4.

Set initial voltage

Error rate estimation

ER < ERtargetOptimize Paths

Voltage scalingYES

NO

PowerReduction

Finish

• PowerReduction procedure reduces power without affecting error rate

Page 14: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(14/25)

Heuristic Implementation – Error Rate Estimation

CLK

XA X

f/f f/f

P1

P2

P3 P1 P1 P1P2 P2 P3

TG(P1) = 0.3

TG(P2) = 0.2

TG(P3) = 0.1

Slack(P1) = postive

Slack(P2) = negative

Slack(P3) = positive ER(X) = TG(X) ∙

TG(X) = 0.6

Timing Error

TG(P2)

TG(P1) + TG(P2) + TG(P3) = 0.2

Set initial voltage

Error rate estimation

ER < ERtargetOptimize Paths

Voltage scalingYES

NO

PowerReduction

Finish

• Error rate contribution of one flip-flop

• Error rate of an entire design

α : compensation parameter

ALL

NEG

P

Pffff TG

TGTGER

Dff

ffD ERER

• We estimate error rates without functional simulation

• Error rate estimation: use toggle rate from SAIF(Switching Activity Interchange Format)

Page 15: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(15/25)

Erro

r Rat

e

(lower voltage)

Maximum error rate

Power consumption

Pow

er

Error rate

Operating point

Vmin

Pmin

2. ReducePower

Erro

r Rat

e

Vmin

Pmin

(lower voltage)

Maximum error rate

Power consumption

Pow

er

Error rate

Operating point Er

ror R

ate

(lower voltage)

Maximum error rate

Power consumption

Pow

er

Error rate

Operating point

Vmin

Pmin

1. OptimizePaths

Power Reduction Through Slack Redistribution• Power consumption @BTWC

• Minimum power Pmin is obtained at minimum operating voltage Vmin

1. OptimizePaths• Minimize error rate• Enable to scale

voltage further

2. ReducePower• Downsize cells • Obtain additional power

reduction

2

1

Page 16: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(16/25)

Outline• Background and motivation

• Voltage scaling and BTWC designs• Limitation of Traditional CAD Flow

• Power-Aware Slack Redistribution• Our design optimization goal• Related work: BlueShift• Our Heuristic

• Experimental Framework and Results• Design methodology• Testbed• Results and analysis

• Conclusions and Ongoing Work

Page 17: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(17/25)

• Benchmark generation• Virtutech Simics – Full system simulation and capture test vectors

• Functional simulation• Cadence NC Verilog – Gate level simulation

• Library characterization• Cadence SignalStorm – Synopsys Liberty generation for each voltage

• Heuristic (Slack Optimization)• Implement in C++ and use Tcl socket interface with Synopsys

PrimeTime

• ECO P&R• Cadence SOCEncounter – ECO implementation

Design Methodology

Page 18: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(18/25)

• Target design : sub-modules of OpenSPARC T1

• Benchmark• Ammp, bzip2, equake, sort and twolf• Make test vectors with 1 billion cycles for each sub-module

• Implementation• TSMC 65GP technology with standard SP&R flow

Testbed

Page 19: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(19/25)

• Design techniques1. SP&R with 0.8 GHz (loose constraints)2. SP&R with 1.2 GHz (tight constraints)3. Blueshift: timing override4. Slack Optimizer

• Experiments compare all design techniques with respect to:

1. Power consumption at each voltage point2. Actual error rates from gate level simulation3. Power consumption at each target error rate4. Estimated processor-wide power consumption

List of Experiments

Page 20: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(20/25)

• Error rate at each operating voltage (test case : lsu_dctl)

• Power consumption at each operating voltage

Error Rate and Power Results

Page 21: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(21/25)

• Power consumption at each target error rate

• Slack distribution

Comparison of Power and Slack Results

Page 22: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(22/25)

• Power reduction after optimization (@ 2% error rate)

• Area overhead of design approaches

Max. 32.8 %, Avg. 12.5% power reduction

Power Reduction and Area Overhead

Page 23: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(23/25)

*Kahng et al. “Designing a Processor From the Ground Up to Allow Voltage/Reliability Tradeoffs”, HPCA 2010.

*

Processor-wide Results

• Slack optimization extends range of voltage scaling and reduces Razor recovery cost

Page 24: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(24/25)

Conclusions and Ongoing Work• Showed limitations of a BTWC design• Presented design technique – slack redistribution

• Optimize frequently exercised critical paths• De-optimize rarely-exercised paths

• Demonstrated significant power benefits of gradual slack design• Reduced power 33% on maximum , 12.5% on average

• Ongoing work• Reliability-power tradeoffs for embedded memory• Applying to heterogeneous multi-core architecture

Page 25: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010

THANK YOU

Page 26: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010

BACKUP

Page 27: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(27/25)

CPU, Heal Thyself

* Razor: A low power pipeline based on circuit-level timing speculation. In International Symposium on Micro architecture, December 2003.

• Razor* system• Timing errors can be corrected• Manage the trade-off between

system voltage and error rate• New design methodology is

needed

Page 28: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(28/25)

Razor – How it works• Razor Implementation

• Razor: A low power pipeline based on circuit-level timing speculation. In International Symposium on Microarchitecture, December 2003.

• Main flip-flop latches at T, but Shadow latch latches at T+skew• If a timing violation occurs, main flip-flop will latch incorrect value, but

shadow latch should latch correct value• Comparator signals error and the late arriving value is fed back into the main

flip-flop

Page 29: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(29/25)

BTWC: Voltage Scaling

• Error correction needs additional clock cycles and incurs power overhead

fa b

c

fa fb fc

PE(f

)

f

a

b c

fa fb fc

perf

(f)

PE(f) : Error rate at frequency f perf(f) : Performance at f

Maximum performance at point c

• Overclocking case

va b

c

va vb vc

PE(v

)

v

ab c

va vb vc

pwr(

v)

(lower voltage)

PE(v) : Error rate at voltage v pwr(v) : Power consumption at v

Minimum powerat point c

• Voltage scaling case

Page 30: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(30/25)

Limitation of Voltage Scaling• At some voltage, circuit breaks down

Voltage scaling must halt after only 10% scaling.

0.0

0.1

0.2

0.3

0.4

0.5

0.50.60.70.80.91.0Voltage

Err

ors

/ C

ycle

.

Page 31: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(31/25)

Reason for Steep Error Degradation• Critical paths are bunched up in traditional designs.

Page 32: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(32/25)

Slack Re-distribution Example

Positive SlackNegative Slack

0.0

-0.1

Negative SlackPositive Slack

Error Rate = 1%Error Rate = 25%

Page 33: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(33/25)

Heuristic Implementation – Error Rate Estimation• Error rate contribution of one flip-flop

• Error rate of an entire design

• Actual vs. estimated error rates

(1)

(2) α : compensation parameter

ALL

NEG

P

Pffff TG

TGTGER

Dff

ffD ERER

Page 34: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(34/25)

Gradual Slack Distribution

Slack optimization achieves gradual slack distribution.

Page 35: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(35/25)

Processor Error Rate and Power

Designs with comparable error rates have much

higher power/area overheads.

Page 36: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(36/25)

Reliability/Power Tradeoff

Slack-optimized design enjoys continued power reduction as error rate increases.

Page 37: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(37/25)

Enhancing Razor-based Design

Slack optimization extends range of voltage scaling and reduces Razor recovery cost.

Page 38: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(38/25)

Moore’s Law

• Power consumption of processor node doubles every 18 months.

Page 39: Slack Redistribution for Graceful Degradation Under Voltage  Overscaling

(39/25)

Power Scaling• With current design techniques, processor power soon on

par with nuclear power plant