George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer...

19
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic-Buffer On-Chip Networks

Transcript of George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer...

Page 1: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

George MichelogiannakisWilliam J. Dally

Stanford University

Router Designs for Elastic-Buffer On-Chip Networks

Page 2: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Introduction

EB flow-control was recently proposed.• Uses the channels as distributed FIFOs.

EB routers are bufferless packet-switched routers.• They have the benefits of circuit-switched routers,

without the overhead of setting up and tearing down circuits.

This work explores the EB router design space.• By evaluating three representative designs.

2SC09: Routers for EB NoCs

Page 3: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

The EB Flow-control Idea

Master-slave FF

Elastic buffer

Pipelined channel

Channel as FIFO

3SC09: Routers for EB NoCs

Page 4: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

How Elastic Buffer Channels Work

Ready/valid handshake between elastic buffers• Ready: At least one free storage slot

• Valid: Non-empty (driving valid data)

Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 64SC09: Routers for EB NoCs

Page 5: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Use EB Flow-Control Through the Router

VC input-buffered router

EB router

Input bufferreplaced byinput EB

VC & SWallocators removed.Per-output arbitersinstead.

Three-slot outputEB to cover forarbitration doneone cycle inadvance.

LA routing alsoapplicable to EBnetworks.

5

Page 6: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Baseline Router - Issues

Issues constraining the clock cycle time:• Three-slot EB FSM too complicated: output EB

implemented as FIFO.

• Routing is performed serially with switch arbitration.

FIFO

Serially

6

Page 7: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Enhanced Two-Stage Router

Look-ahead routing to shorten the critical path.

Use two-slot EBs at output and for pipelining.• Flits are stored in the interm. EB and wait for a grant.

• Decision to traverse switch made in the same cycle.

7

Page 8: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Enhanced Two-Stage Router – Sync Module

Synchronization module maintains alignment between flits and grants.

Contains an output port EB.• Stores the chosen output port of the current and any

other packets in the router stage 1 and interm. EB.Maintains alignment between flits and grants.

8

Page 9: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Enhanced Two-Stage Router – Sync Module

When the current packet’s tail flit is departing:• Sync. module propagates the next output to the arbiters.

• From the appropriate location.

Sync. module propagates an update to all outputs.• An output receiving an update from the input it is

granting clocks the arbiter output regs at the next edge.

9

Page 10: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Single-Stage Router

Merges the two router stages to:• Reduce router latency.

• Avoid pipelining overhead.

10SC09: Routers for EB NoCs

Page 11: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Evaluation Methodology

45nm worst-case low-power commercial library.

Synopsys DC and Cadence Encounter.• 64-bit router datapath. 70% initial area utilization ratio.

Used a cycle-accurate network simulator.

We assume each router at its maximum post-P&R frequency, or all at the same frequency.

8x8 2D mesh. 2mm-long wires. 1 cycle latency.• Constant packet size of 512 bits.

Averaged over a set of six traffic patterns.

Swept datapath width from 28 to 171 bits.11SC09: Routers for EB NoCs

Page 12: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Placement and Routing Cycle Time

Enhanced two-stage has a 26% reduced cycle time compared to the single-stage, and 42% compared to the baseline two-stage.

12SC09: Routers for EB NoCs

Page 13: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Placement and Routing Energy per Bit

Baseline two-stage requires 9% less energy per bit compared to the single-stage, and 35% compared to the enhanced two-stage.

13

Page 14: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Placement and Routing Area

Single-stage occupies 30% less area than the enhanced two-stage and 44% less than the baseline two-stage.

14

Page 15: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Latency-Throughput, Max Frequencies.

Latency increase:

Enhanced: +1%Baseline: +46%

15

Page 16: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Latency-Throughput, Equal Frequencies.

Latency increase:

Enhanced: +34%Baseline: +32%

16

Page 17: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Which Router is the Optimal Choice?

Priority Router Choice

Operate at maximum frequencies

Area Enhanced two-stage

Energy Baseline two-stage(closely followed by single-stage)

Latency Single-stage(depends on effect on channels)

Operate at the same frequency

Area Single-stage

Energy Baseline two-stage(closely followed by single-stage)

Latency Single-stage

17SC09: Routers for EB NoCs

Page 18: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Conclusion

Improved EB router designs can widen the gap compared to VC networks.• Makes EB look even more attractive.

EB routers are simple designs. Simple designs have numerous advantages.• A lot of the complexity of VC networks is ignored by some

area and power models.

Overall compared to VC, 43% reduction in power per unit throughput, 67% reduction in cycle time and 22% throughput per unit area.

18SC09: Routers for EB NoCs

Page 19: George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Questions?

SC09: Routers for EB NoCs