Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

36
The Power of Communication: Energy-Efficient NoCs for FPGAs Mohamed ABDELFATTAH Vaughn BETZ

Transcript of Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

Page 1: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

The Power of Communication: Energy-Efficient NoCs for FPGAs

Mohamed ABDELFATTAHVaughn BETZ

Page 2: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

2

Outline

Why NoCs on FPGAs?

Embedded NoCs

Power Analysis

1

2

3

Page 3: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

3

Interconnect

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Page 4: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

4

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Hard Blocks:• Memory• Multiplier• Processor

Page 5: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

5

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Hard InterfacesDDR/PCIe ..

Interconnect still the same

Hard Blocks:• Memory• Multiplier• Processor

1600 MHz

200 MHz

800 MHz

Page 6: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

6

MotivationDDR3 PHY and Controller

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

1600 MHz

200 MHz

800 MHz

Page 7: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

7

MotivationDDR3 PHY and Controller

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Page 8: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

Barcelona Los Angeles

Keep the “roads”, but add “freeways”.

Hard Blocks

Logic Cluster

Source: Google Earth

Page 9: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

9

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

FPGA with NoCNoC

Routers

Links Router forwards data packet

Router moves data to local interconnect

Page 10: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

10

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Abstraction favours modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

FPGA with NoC

Pre-design NoC to requirements NoC links are “re-usable” NoC is heavily “pipelined” NoC abstraction favors modularity

High bandwidth endpoints known

Page 11: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

11

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

FPGA with NoC

Latency-tolerant communication NoC abstraction favors modularity

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Abstraction favours modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

Previous work: Compelling area efficiency and performance

NoCs can simplify FPGA design

Does the NoC abstraction come at a high power cost?

Page 12: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

12

Outline

Why NoCs on FPGAs?

Embedded NoCs

Power Analysis

1

2

3

Mixed NoCs Hard NoCs

Page 13: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

Embedded NoCsFPGA

DD

Rx In

terf

ace

PCIe

Inte

rfac

e

Router

Compute Module

Links(Hard or Soft)

Fabric

Port

(Hard or Soft)

2. Embedded NoCs

“Mixed” NoC

“Hard” NoC

Soft LinksHard Routers

Hard LinksHard Routers =++

=“Soft” NoCSoft LinksSoft Routers + =

Page 14: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

14

Soft Hard

FPGA CAD Tools ASIC CAD Tools

Design Compiler

Area

Speed

Power?Power

Methodology

Toggle rates

Gate-level simulation Gate-level simulation

Mixed

HSPICE

Page 15: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

15

Router Logic

Programmable Interconnect

FPGA

Router

Mixed NoCs2. Embedded NoCs

Logic blocks

Baseline Router

Programmable“soft” interconnect

Width VCs Ports Buffer

32 2 5 10/VC

“Mixed” NoCSoft LinksHard Routers + =

Page 16: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

16

Router Logic

Programmable Interconnect

FPGA

Router

Mixed NoCs2. Embedded NoCs

Router Logic

16“Mixed” NoCSoft LinksHard Routers + =

Page 17: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

17

Router Logic

Programmable Interconnect

Router

Assumed a mesh Can form any topology

FPGA

Mixed NoCs2. Embedded NoCs

Special FeatureConfigurable topology

Page 18: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

18

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Logic blocks

Dedicated “hard” interconnect

Programmable“soft” interconnect

18“Hard” NoCHard LinksHard Routers + =

Page 19: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

19

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Router Logic

19“Hard” NoCHard LinksHard Routers + =

Page 20: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

20

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Low-V mode

1.1 V0.9 V

Save 33% Dynamic Power

Special Feature

~15% slower

20“Hard” NoCHard LinksHard Routers + =

Page 21: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

21

Outline

Why NoCs on FPGAs?

Embedded NoCs

1

2

Power Analysis

ComponentsAnalysis

3

System Analysis

Page 22: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

Soft, Mixed and Hard

22

Area Gap

Speed Gap

Power Gap

Mixed Hard (Low-V)Soft

20X – 23X smaller

5X – 6X faster

9X 11X (15X)

Speed

Area

Speed

Bisection BW

1. Power-aware design 2. NoC power budget 3. Comparison

~ 1.5% of FPGA33% of FPGA

730 – 940 MHz166 MHz

~ 50 GB/s~ 10 GB/s

Aver

age

64 –

NoC

1X

Investigate BW and power together

Page 23: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

23

Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?

3. Power Analysis

Links Power

Routers Power

Wider Links, Fewer Routers

Page 24: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

24

Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?

3. Power Analysis

Page 25: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

25

Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?

3. Power Analysis

Page 26: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

26

NoC Power BudgetSoft NoC Mixed NoC Hard NoC Hard NoC (Low-V)

17.4 W

250 GB/s total bandwidth

Typical FPGA Dynamic Power

3. Power Analysis

123%How much is used for system-level communication?

Page 27: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

27

NoC Power BudgetSoft NoC Mixed NoC Hard NoC Hard NoC (Low-V)

17.4 W

NoC

250 GB/s total bandwidth 15%

Typical FPGA Dynamic Power

3. Power Analysis

123%

Page 28: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

28

NoC Power Budget3. Power Analysis

NoC

17.4 WTypical FPGA

Dynamic Power

Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V)250 GB/s total bandwidth 15%123% 11%

Page 29: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

29

NoC Power Budget3. Power Analysis

NoC

17.4 WTypical FPGA

Dynamic Power

Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V)250 GB/s total bandwidth 15%123% 11% 7%

Page 30: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

30

Bandwidth in Perspective

14.6 GB/s

14.6 GB/s

14.6 GB/s

14.6 GB/s

17 G

B/s

17 G

B/s

17 G

B/s

17 G

B/s

DDR3 Module 1

PCIe Module 2

Full theoretical BW

126 GB/sAggregate Bandwidth

3.5%NoC Power Budget

Cross whole chip!

3. Power Analysis

Page 31: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

31

FPGA Interconnect

1 1

Point-to-point Links

Broadcast

1 1

n

Multiple Masters

1

1Mux + Arbiter

n

Multiple Masters, Multiple Slaves

1 1Mux + Arbiter

n nMux + Arbiter

Interconnect = Just wires Interconnect = Wires + Logic Interconnect = NoC

1 .. .. ..

.. .. .. ..

.. .. ..

.. .. .. n

..Compare “wires” interconnect to NoCs

3. Power Analysis

Page 32: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

32

NoC Power vs. FPGA Interconnect

Hard and Mixed NoCs very compelling

Length of 1 NoC Link1 % area overhead on Stratix 5

Runs at 730-943 MHz

Power on-par with simplest FPGA interconnect

3. Power Analysis

200 MHz

High Performance / Packet Switched

Page 33: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

1

2

3

Big city needs freeways to handle traffic

Area: 20-23X

Why NoCs on FPGAs?

Embedded NoCs: Mixed & Hard

Power Analysis

Speed: 5-6X Power: 9-15X

• Power-aware design of embedded NoCs• Power Budget for 100 GB/s: 3-7%• Point-to-point soft Links: 4.7 mJ/GB• Embedded NoCs: 4.5 – 10.4 mJ/GB

Page 34: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

34

eecg.utoronto.ca/~mohamed/noc_designer.html

Page 35: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.

35

Thank You!

eecg.utoronto.ca/~mohamed/noc_designer.html

Page 36: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3.