Déjà Vu Switching for Multiplane NoCs

33
Déjà Vu Switching for Multiplane NoCs NOCS’12 Déjà Vu Switching for Multiplane NoCs University of Pittsburgh Ahmed Abousamra Rami Melhem Alex Jones

description

Déjà Vu Switching for Multiplane NoCs. Ahmed Abousamra. Rami Melhem. Alex Jones. University of Pittsburgh. Motivation. Power efficiency has become a primary concern in the design of CMPs. The NoC of Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power. - PowerPoint PPT Presentation

Transcript of Déjà Vu Switching for Multiplane NoCs

Page 1: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Déjà Vu Switching for Multiplane NoCs

University of Pittsburgh

Ahmed Abousamra Rami Melhem Alex Jones

Page 2: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Motivation

Power efficiency has become a primary concern in the design of CMPs.The NoC of Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power.Network messages can be classified into critical and non-criticalIt may be possible to send non-critical messages on a slower plane without hurting performance.

Page 3: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

SettingBaseline Single

Plane NoC

Control Plane Data Plane

• Carries control and coherence messages: data requests, invalidates, acknowledgments, …

• Operates at baseline’s voltage & frequency

• Carries data messages

• Operates at a lower voltage & frequency to save power

Page 4: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Outline

Motivation & Related workImportance of data messages to performanceThe Optimization Problem Proposed Solution

Déjà Vu SwitchingAnalysis of the acceptable data plane speed

EvaluationSummary

Page 5: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Importance of data messages to performance

1 2 3 4 5 6 7 8Cache line having 8 data words

Data Message

5 1 2 3 4 6 7 8Header

Critical Word

Subsequent miss for another word in the line Delayed cache hit

Critical WordNon-Critical Words

Page 6: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Importance of data messages to performance

If delayed cache hits are overly delayed, performance can suffer.

Percentage of L1 misses that are delayed cache hits

barn

es

blac

ksch

oles

body

trac

k

fluid

anim

ate

lu c

ontig

lu n

onco

ntig

ocea

n co

ntig

radi

osity

radi

x

rayt

race

spec

jbb

wat

er n

squa

red

wat

er sp

atial

Geom

etric

mea

n

0%5%

10%15%20%25%30%35%40%45%50%

% o

f L1

miss

es t

hat a

re d

elay

ed h

its

Page 7: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Outline

Motivation & Related workImportance of data messages to performanceThe Optimization ProblemProposed Solution

Déjà Vu SwitchingAnalysis of the acceptable data plane speed

EvaluationSummary

Page 8: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

The Optimization Problem

How slow can we operate the data plane to maximize energy savings while not impacting performance?

Page 9: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Proposed Solution

Split NoC physically into 2 planes: control + dataOn data plane:

Use circuit-switching to speed-up communication.Reduce voltage & frequency.

Send a control message to establish the circuit once a cache hit is detected.Do not block circuit establishment message: Déjà Vu SwitchingAnalyze acceptable slow down of the data plane to minimize energy while maintaining performance.

Page 10: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Déjà Vu Switching – Circuit SwitchingRequest Packet Reservation Packet Data Packet

Control Plane

Data Plane

1) Upon a cache miss, a data request is sent

Page 11: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Déjà Vu Switching – Circuit SwitchingRequest Packet Reservation Packet Data Packet

Control Plane

Data Plane

2) Upon a data hit in the next cache level, the circuit reservation packet is sent

Page 12: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Déjà Vu Switching – Circuit SwitchingRequest Packet Reservation Packet Data Packet

Control Plane

Data Plane

3) Finally, the data packet is sent to the requester on the reserved circuit

Page 13: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Déjà Vu Switching – Conflicting Reservations

23 1

23 1

Control Plane

Data Plane Reserved Circuits

Reservation Packets

Data Packets

East-NorthWest-NorthWest-East

Router

Page 14: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Reservation Queue of West Input Port

Reservation Queue of East Output Port

Reserving the West-East Connection: E W

Déjà Vu Switching - Reservation Scheme

Head of theQueues

Port Names: North West South East Local

Page 15: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

S S

Reservation Queues of the Input Ports

Reservation Queues of the Output Ports

Déjà Vu Switching – Realizing Connections

EN

NN N

S W

North

West

South

East

Local

E

W N

Head of theQueues

North

West

South

East

Local

Port Names: North West South East Local

Page 16: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Reservation Queues of the Input Ports

Reservation Queues of the Output Ports

Déjà Vu Switching – Realizing Connections

N

S

North

West

South

East

Local

E

W N

Head of theQueues

North

West

South

East

Local

N

Port Names: North West South East Local

Page 17: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Reservation Queues of the Input Ports

Reservation Queues of the Output Ports

Déjà Vu Switching – Realizing Connections

N

S

North

West

South

East

Local

E

W N

Port Names: North West South East Local

Head of theQueues

North

West

South

East

Local

N

Page 18: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Déjà Vu Switching – Ensuring Correct Routing

Each input and output port must independently track the reserved circuits it is part of.Any two reservation packets that share part of their paths, must traverse all the shared links in the same orderData packets must be injected onto the data plane in the same order their reservation packets are injected onto the control plane.

Page 19: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Outline

Motivation & Related workImportance of data messages to performanceThe Optimization ProblemProposed Solution

Déjà Vu SwitchingAnalysis of the acceptable data plane speed

EvaluationSummary

Page 20: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Reservations should always be ahead of data packets

82345679121011113Time

Reservation Packet injected early at cycle 1Data Packet injected at cycle 3

Page 21: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Reservations should always be ahead of data packets

82345679121011113Time

Reservation Packet injected early at cycle 1Data Packet injected at cycle 3

Page 22: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Acceptable data plane slowdown - Message Latency

Page 23: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Outline

Motivation & Related workImportance of data messages to performanceThe Optimization ProblemProposed Solution

Déjà Vu SwitchingAnalysis of the acceptable data plane speed

EvaluationSummary

Page 24: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Evaluation Environment

We use the functional simulator, Simics, to simulate cache coherent CMPs of 16 and 64 cores.We use Orion2 to get power numbers for the interconnect routersWe evaluate with

Synthetic traces: allows varying the network loadExecution driven simulation of parallel benchmarks from the SPLASH-2 and PARSEC suits

Interconnect Topology: Mesh

Page 25: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Evaluation Environment

Baseline NoC: Single plane, 16 byte links, packet switched with 3 cycles router pipeline, clocked at 4 GHz

Evaluated NoC:Control plane: 6 byte links, packet switched, 4GHzData plane: 10 byte links, circuit switched.

Control and Coherence Packets: 1-flitData Packets:

Baseline NoC: 5 flitsData Plane: 7 flits

Page 26: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Evaluation – Synthetic Traces

Normalized NoC energy and completion timeof synthetic traces on a 64-core CMP

0.01 0.03 0.050%

20%

40%

60%

80%

100%

120%4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz

Norm

alize

d E

nerg

y Co

nsum

ption

Traffic Injection Rate (request / cycle / node) 0.01 0.03 0.05

0%

20%

40%

60%

80%

100%

120%4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz

Traffic Injection Rate (request / cycle / node)

Norm

alize

d C

ompl

etion

Tim

e

*Each request receives a reply data packet

Page 27: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

barn

es

blac

ksch

oles

body

trac

k

fluid

anim

ate

lu c

ontig

lu n

onco

ntig

ocea

n co

ntig

radi

osity

radi

x

rayt

race

spec

jbb

wat

er n

squ.

..

wat

er s

patia

l

Geo

met

ric...

0%

20%

40%

60%

80%

100%

120%4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz

Nor

mal

ized

Exe

cutio

n

Tim

eEvaluation – Execution Driven Simulation

Normalized execution time on a 16-core CMP

Page 28: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

barn

es

blac

ksch

oles

body

trac

k

fluid

anim

ate

lu c

ontig

lu n

onco

ntig

ocea

n co

ntig

radi

osity

radi

x

rayt

race

spec

jbb

wat

er n

squ.

..

wat

er s

patia

l

Geo

met

ric...

0%10%20%30%40%50%60%70%80%90%

100%

4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz

Nor

mal

ized

Ene

rgy

Con-

sum

ption

Evaluation – Execution Driven Simulation

Normalized NoC energy on a 16-core CMP

Page 29: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

barn

es

blac

ksch

oles

body

trac

k

fluid

anim

ate

lu co

ntig

lu n

onco

ntig

ocea

n co

ntig

radi

osity

radi

x

spec

jbb

wat

er n

squa

red

wat

er sp

atial

Geom

etric

mea

n

0%20%40%60%80%

100%120%140%

4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz

Nor

mal

ized

Exe

cutio

n T

ime

barn

es

blac

ksch

oles

body

trac

k

fluid

anim

ate

lu co

ntig

lu n

onco

ntig

ocea

n co

ntig

radi

osity

radi

x

spec

jbb

wat

er n

squa

red

wat

er sp

atial

Geom

etric

mea

n

0%10%20%30%40%50%60%70%80%90%

100%4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz

Nor

mal

ized

Ene

rgy

Cons

umpti

on

Evaluation – Execution Driven Simulation

Normalized NoC energy and execution timeon a 64-core CMP

Page 30: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

barn

es

blac

ksch

oles

body

trac

k

fluid

anim

ate

lu c

ontig

lu n

onco

ntig

ocea

n co

ntig

radi

osity

radi

x

rayt

race

spec

jbb

wat

er n

squa

red

wat

er s

patia

l

Geo

met

ric m

ean

90%

95%

100%

105%

110%

PS 4/4 DV 4/4 PS 4/2.66 PS+CW 4/2.66 DV 4/2.66

Perf

orm

ance

Rel

ative

to S

ingl

e Pl

ane

Base

line

Evaluation – Isolating Effect of Déjà Vu Switching

Performance relative to CMP with a single plane NoCon a 16-core CMP

Page 31: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Related Work

L. Cheng et al. (ISCA'06) and A. Flores et al. (IEEE Trans. Computers'10): Heterogeneous NoC using wires of different latency and power characteristics to improve performance and reduce NoC energy. Proposal requires wide links (75 bytes), but performance degrades with narrow links.Our work differs in:

Tying latency of messages to performanceUsing Déjà Vu Switching to Compensate for slower data plane

Page 32: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

SummaryProblem: Saving power in the NoC by reducing the data plane’s power consumption without impacting performance.Delayed cache hits are important to performance.Operating data plane in circuit-switched mode allows it to operate at reduced frequency.Déjà Vu Switching allows reservation to proceed when resources are not currently available.The constraints governing the speed of the data plane can be estimated analytically.

Page 33: Déjà Vu Switching for  Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs NOCS’12

Thank you!