Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width...

27
Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf Postdoctoral Research Fellow Computer Architecture Laboratory Lawrence Berkeley National Laboratory

Transcript of Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width...

Page 1: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Variable-Width Datapath for On-Chip Network Static Power Reduction

George Michelogiannakis, John Shalf

Postdoctoral Research Fellow

Computer Architecture Laboratory

Lawrence Berkeley National Laboratory

Page 2: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

In a Nutshell

Leakage power is an increasing problem in future or near threshold

voltage (NTV) technologies

Leakage power can be important even at high network loads

This work proposes variable-width datapaths

Parts of channels, buffers, and crossbars can be activated on

demand

We demonstrate an average of 33% total power reduction with

PARSEC benchmarks

Page 3: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Today’s Menu

Leakage power / motivation

Related work

Variable-width datapaths

Results

Conclusions

Page 4: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Leakage Power Contribution

32nm (above). 45nm (below). Orion 2.0.

[Top left] “FlexiBuffer: Reducing Leakage Power in On-

Chip Network Routers”. DAC 2011

[Rest] “NoRD: Node-Router Decoupling for Effective

Power-gating of On-Chip Routers”. MICRO 2012

Single router. 45nm. 1.0V Orion 2.0.

PARSEC benchmarks

Page 5: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Subthreshold Leakage at NTV

0%

10%

20%

30%

40%

50%

60%

45nm 32nm 22nm 14nm 10nm 7nm 5nm

SD

Leakag

e P

ow

er

100% Vdd

75% Vdd

50% Vdd

40% Vdd

Increasing

Variations

NTV operation reduces total power, improves energy efficiency

Subthreshold leakage power is substantial portion of the total

Near-threshold voltage (NTV) design — Opportunities and challenges. DAC 2012

Assumes 20% leakage power at 100% Vdd

Page 6: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Applications Load Network Unevenly

“Fine-grained bandwidth adaptivity in networks-on-chip using bidirectional channels”. NOCS 2012

Page 7: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Today’s Menu

Leakage power / motivation

Related work

Variable-width datapaths

Results

Conclusions

Page 8: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Power Gating

High threshold voltage “sleep switch” transistor

Savings when sleep time enough to overcome energy overheads

“MP3: Minimizing Performance Penalty for Power-gating of Clos Network-on-Chip”. HPCA 2014

Page 9: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Drowsy SRAMs

Put SRAM lines into low-power (low voltage) “drowsy” mode

Preserves data

Faster activation than power-gated SRAMs (1-2 cycles)

Higher leakage current while drowsy. Higher activation penalty

“Drowsy Caches: Simple Techniques for Reducing Leakage Power”. ISCA 2002

Page 10: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

CatNap: Multiple Networks

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

Router

“Catnap: Energy Proportional Multiple Network-on-Chip”. ISCA 2013

High traffic area

Source

Low traffic

injection

Multinets cannot share resources (e.g.,

channels) in low traffic regions

Optimal decisions at injection

are a challenge

Page 11: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Today’s Menu

Leakage power / motivation

Related work

Variable-width datapaths

Results

Conclusions

Page 12: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Variable-Width Channels

Router Router

Router RouterPacket 1

Packet 1

Packet 2

Same bisection bandwidth

Page 13: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Input Buffer Gating

VC1

VC2

VC3

VC4

“Adding slow-silent virtual channels for low-power on-chip networks”. NOCS 2008

Router

Packet 1

Packet 2

VC1

VC2

If flits from any channel lane can choose any VC, that

necessasitates multiplexers

We map channel lanes to use only a subset of VCs (1-1 with equal

VCs and lanes)

VC 0

VC 1

VC 2

VC 3

Page 14: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Crossbar Gating

“Segment gating for static energy reduction in networks-on-chip”. NoCArc 2009

VC1

VC2

VC3

VC4

VC1

VC2

VC1 VC2 VC3 VC4VC1 VC4

Output VCs

Input VCs

Page 15: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Activation Mechanism

Flits winning switch allocation (SA) activate in the next router:

Output channels and switch lanes (3 cycles)

Input buffers (1 cycle with drowsy SRAMs)

No false activations

With the below 4-stage router pipeline, no activation stalls

InBuf VA SA ST

InBuf VA SA STSTInBuf

Page 16: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Router Impact

ABN switch allcocators: ( Inputs x VCs ) x ( Outputs x ChanLanes )

As long as ChanLanes no greater than VCs, switch allocator no more

complex than VC allocator

If VC and switch allocators in different pipeline stages, router cycle

time does not extend

VC allocators consume 2-10mW and occupy 5000um

Both very small percentages of the router

Therefore increase of switch allocator’s cost insigificant

“Allocator implementations for network- on-chip routers”. SC 2009

Page 17: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Today’s Menu

Leakage power / motivation

Related work

Variable-width datapaths

Results

Conclusions

Page 18: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Methodology

Booksim network simulator

8x8 Mesh. DOR

We compare:

Single-lane: Single-lane power-gated network

ABN: Flits choose a lane based on their output VC

Multinets: Multiple power-gated subnetworks

Router pipeline previously presented

2 VCs as baseline. Normalize for buffer size by adjusting VCs

Activation and deactivation delays (65nm at 1GHz):

Channel and crossbar activation delay: 3 cycles

Channel and crossbar activation wait: 15 cycles

Channel and crossbar deactivation wait: 6 cycles

Buffer (VC) deactivation wait: 3 cycles

Page 19: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Two Subnetworks/Lanes. Static Power

0 20 40 600

1

2

3

4

5

Injection rate (request packets/cycle * 1000)

Sta

tic p

ow

er

(W)

8x8 mesh. DOR. UR traffic

Single lane

ABN

MultiNets

UR worst case for ABNs

ABNs better powers down resources at high loads

Page 20: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Two Subnetworks/Lanes. Dynamic Power

0 10 20 30 40 500

2

4

6

8

Injection rate (request packets/cycle * 1000)

Dyn

am

ic p

ow

er

(W)

8x8 mesh. DOR. UR traffic

Single lane

ABN

MultiNets

UR worst case for ABNs

Multinets takes advantage of lower-radix switches

Page 21: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Two Subnetworks. Latency

0 20 40 600

50

100

150

200

Injection rate (request packets/cycle * 1000)

Ave

rag

e la

ten

cy (

clo

ck c

ycle

s)

8x8 mesh. DOR. UR traffic

Single lane

ABN

MultiNets

Multinets cannot make perfect injection decisions or use resources in

another subnetwork after injection to combat transient imbalance

Page 22: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

PARSEC Results

0

5

10

15

20

25

30

35

40

Perc

en

tag

e t

ota

l p

ow

er

red

ucti

on

(%

)

Low Medium High

Two ABN lanes and two multinet subnetworks

Page 23: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Scaling Up

0

5

10

15

20

25

30

Perc

en

tag

e t

ota

l p

ow

er

red

ucti

on

(%

)

Low Medium High

Four ABN lanes and four multinet subnetworks

Page 24: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Scaling Up

0 10 20 30 40 500

50

100

150

200

250

Injection rate (request packets/cycle * 1000)

Ave

rag

e la

ten

cy (

clo

ck c

ycle

s)

8x8 mesh. DOR. UR traffic

ABN two lanes

ABN four lanes

Two multinets

Four multinets

Effects of transient imbalance in multinets are intensified

Page 25: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Today’s Menu

Leakage power / motivation

Related work

Variable-width datapaths

Results

Conclusions

Page 26: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Conclusions

Leakage power is a growing concern in future technologies

Dividing datapaths in lanes provides more flexibility than multi-

network approaches

But there are tradeoffs

Using drowsy SRAMs allows hiding the activation delay without

false activations

Can change with shallow router pipelines

We demonstrate an average of 33% total power reduction with

PARSEC benchmarks

Page 27: Variable-Width Datapath for On-Chip Network Static Power … · 2014-09-17 · Variable-Width Datapath for On-Chip Network Static Power Reduction George Michelogiannakis, John Shalf

Questions?

Acknowledgment: D.O.E. office of science