hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently...

47
Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre [email protected]

Transcript of hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently...

Page 1: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Hoplite-DSP Harnessing the Xilinx DSP48

Multiplexers to efficiently support NoCs on FPGAs

Chethan Kumar H B and Nachiket Kapre [email protected]

Page 2: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Hoplite — FPL 2015 paper

• Jan Gray co-author

• Specs— 60 LUTs+100 FFs— 2.9ns clock

• Smallest FPGA router available + RTL code

2

Page 3: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockPenn 1.7K 541 4.5nsCMU 1.5K 635 9.6ns

Hoplite — FPL 2015 60 100 2.9ns

32b payload + Virtex-6 240T

3

Page 4: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockPenn 1.7K 541 4.5nsCMU 1.5K 635 9.6ns

Hoplite — FPL 2015 60 100 2.9ns

32b payload + Virtex-6 240T

25x

3

Page 5: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockPenn 1.7K 541 4.5nsCMU 1.5K 635 9.6ns

Hoplite — FPL 2015 60 100 2.9ns

32b payload + Virtex-6 240T

25x 5x

3

Page 6: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockPenn 1.7K 541 4.5nsCMU 1.5K 635 9.6ns

Hoplite — FPL 2015 60 100 2.9ns

32b payload + Virtex-6 240T

25x 5x 1.5x

3

Page 7: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockHoplite

FPL 2015 70 140 2.7ns

Hoplite-DSP FPL 2016 13 17 2.8ns

47b payload + Virtex-7 485T

4

Page 8: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockHoplite

FPL 2015 70 140 2.7ns

Hoplite-DSP FPL 2016 13 17 2.8ns

47b payload + Virtex-7 485T

5

Page 9: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockHoplite

FPL 2015 70 140 2.7ns

Hoplite-DSP FPL 2016 13 17 2.8ns

47b payload + Virtex-7 485T

5x

5

Page 10: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockHoplite

FPL 2015 70 140 2.7ns

Hoplite-DSP FPL 2016 13 17 2.8ns

47b payload + Virtex-7 485T

5x 8x

5

Page 11: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockHoplite

FPL 2015 70 140 2.7ns

Hoplite-DSP FPL 2016 13 17 2.8ns

47b payload + Virtex-7 485T

5x 8x ~

5

Page 12: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Router LUTs FFs ClockHoplite

FPL 2015 70 140 2.7ns

Hoplite-DSP FPL 2016 13 17 2.8ns

47b payload + Virtex-7 485T

5x 8x ~+ 1 DSP48

6

Page 13: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

7

Page 14: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Motivation

• Close the gap vs. embedded NoCs — do we really want clean-slate hard NoCs?

• Return resources to FPGA application — reduce NoC overheads

• Find clever ways to reuse existing FPGA elements

8

Page 15: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Outline

• Adapting the Hoplite arch. to the DSP48

• Scaling to 2D layouts — using DSP carry chains

• Performance and Resource evaluation

9

Page 16: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Outline

• Adapting the Hoplite arch. to the DSP48

• Scaling to 2D layouts — using DSP carry chains

• Performance and Resource evaluation

10

Page 17: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Overview of Hoplite switch organization

• NoC organised as a unidirectional torus

• Each switch has 2 inputs, 2 outputs into the network + PE connection

• Uses deflection routing — no buffering, no allocation, etc

from: Jan Gray11

Page 18: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Hoplite Internals

5LUT5

LUT

5LUT5

LUT5

LUT

6LUT

WPE E

S/PE

DOR Logicsel0 sel1,2

N

12

Page 19: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Hoplite summary• Bulk of the footprint from 5-LUT, 6-LUT blocks

— implement packet multiplexers

• DOR logic handful of LUTs — only reads address fields, valid signals

• Inter-Hoplite router links pipelined — registers

• Idea: move (1) multiplexers + (2) registers into Xilinx DSP48 block

5LUT5

LUT

5LUT5

LUT5

LUT

6LUT

WPE E

S/PE

DOR Logicsel0 sel1,2

N

13

Page 20: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Xilinx DSP48 block

A

D

B

C

30/

27/

18/

48/

P

48/

27/

48/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

14

Page 21: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Xilinx DSP48 block

A

D

B

C

30/

27/

18/

48/

P

48/

27/

48/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

15

Page 22: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Programmable elements

• Xilinx DSP block very versatile!

• Typical use case: signal processing, streaming computations => mainly arithmetic

• INMODE — 27b multiplexer between A and D OPMODE — 48b multiplexers between A:B, C

• Exploit cascade links PCIN/PCOUT!

A

D

B

C

30/

27/

18/

48/

P

48/

27/

48/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

16

Page 23: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Input + Multiplexer Mapping

5LUT5

LUT

5LUT5

LUT5

LUT

6LUT

WPE E

S/PE

DOR Logicsel0 sel1,2

N

A

D

B

C

30/

27/

18/

48/

P

48/

27/

48/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

17

Page 24: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Input + Multiplexer Mapping

5LUT5

LUT

5LUT5

LUT5

LUT

6LUT

WPE E

S/PE

DOR Logicsel0 sel1,2

N

A

D

B

C

30/

27/

18/

48/

P

48/

27/

48/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

WEST

PE

N

S/PE

EAST

18

Page 25: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Input + Multiplexer Mapping

5LUT5

LUT

5LUT5

LUT5

LUT

6LUT

WPE E

S/PE

DOR Logicsel0 sel1,2

N

A

D

B

C

30/

27/

18/

48/

P

48/

27/

48/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

WEST

PE

N

S/PE

EAST

19

Page 26: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Input + Multiplexer Mapping

5LUT5

LUT

5LUT5

LUT5

LUT

6LUT

WPE E

S/PE

DOR Logicsel0 sel1,2

N

A

D

B

C

30/

27/

18/

48/

P

48/

27/

48/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

WEST

PE

N

S/PE

EAST

20

Page 27: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Multi-cycling

• Problem: Hoplite has two outputs (three in fact, with S/PE output port shared)

• Solution: must multi-pump the DSP block — runs at 2x the frequency of the PEs

• First sub-cycle — resolve EAST output

• Second sub-cycle — resolve SOUTH/PE output

21

Page 28: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

First cycle

A

D

B

C

30/

27/

18/

48/

27/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

PE Input

West Input

East Output

48/

P48/

CE

22

Page 29: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Second cycle

A

D

B

C

30/

27/

18/

48/

27/

PCIN48/

ALU

X

Z

Y

PCOUTOPMODE ALUMODEINMODE

PE Input

West Input

South/PE Output

48/

P48/

North Input

CE

23

Page 30: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Outline

• Adapting the Hoplite arch. to the DSP48

• Scaling to 2D layouts — using DSP carry chains

• Performance and Resource evaluation

24

Page 31: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

DSP48 columnar layout

DSP48E

DSP48E

PCOUT

PCIN

A:B

C

P DSP48E

UserLogic

A:B

DSP48E

PCOUT

PCIN

DSP48E P

dedicatedcascade routes

programmable FPGA interconnectDSP

Column

DORLogic

25

Page 32: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Layout considerations• FPGA DSPs organised into vertical columns

~100s of DSPs in a column~10s of columns

• Restrictions:1. Cascade links only extend within column 2. Horizontal links must use general interconnect

• Key question: Adjusting NoC size vs. DSP count— use passthrough DSPs

26

Page 33: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Embedded layout

Hoplite

Hoplite

DSP48E

cascade

fabric

Hoplite

DSP48E

DSP48E

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplitefabric

Top-Turn DSPs PCIN to P

Bottom-Turn DSPs A:B to PCOUT

DSP48EDSP48E DSP48E DSP48E

Pass-thru DSPsPCOUT to PCIN

Pass-thru DSPsPCOUT to PCIN

Router DSPs

Router DSPs

Router DSPs

27

Page 34: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Comparing Xilinx Virtex6 and Virtex7 Layouts

8x8 NoC (ML605 board)

16x16 NoC (VC707 board)

28

Page 35: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Outline

• Adapting the Hoplite arch. to the DSP48

• Scaling to 2D layouts — using DSP carry chains

• Performance and Resource evaluation

29

Page 36: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

LUTs vs DSPs

30

• Simple tradeoff— substantially fewer LUTs vs. DSP48s— Importantly, FFs absorbed into DSP48

• Power and effective B/W for random traffic mostly identical

Page 37: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

LUTs vs DSPs

31

• Simple tradeoff— substantially fewer LUTs vs. DSP48s— Importantly, FFs absorbed into DSP48

• Power and effective B/W for random traffic mostly identical

Page 38: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Commentary on hard NoCs• Area:

— Hard router = 12.45 LABs— 1 Altera DSP block = 11.9 LABs Stratix-III— Hoplite-DSP marginally smaller

• Speed:— Hard router ~996 MHz— Hoplite-DSP ~650 MHz (multi-pumped)— Hoplite-DSP limits freq advantage to 3x.

• Power— Hard router ~1.58 W— Hoplite-DSP model ~1.1W 15% activity— Hoplite-DSP uses ~50% less power

32

Abdelfattah + Betz [TRETS2014](extrapolated results for 48b-wide 1VC)

Page 39: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Wish-list for DSP48s Gen2• Configurable Cascades

— 48b switched bidirectional routing instead of just cascades (approach hard NoC wiring)— option to skip DSP blocks (segment lengths)

• DOR routing— pattern detection logic with multiple masks (similar to Altera DSP units)

• SIMD Multiplexing — fracturing 48b-wide lanes into multiple lanes

33

Page 40: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Conclusions

• Hoplite muxes mapped to DSP48 blocks — use the dynamic OPMODE feature

• Reduce cost by 5x LUTs, 8x FFs per router

• Exploit cascade links to absorb NoC wiring

• Significantly close the gap with hard NoCs

34

Page 41: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Embedded layout• Three kinds of DSPs

• “Route DSPs” — Small fraction of DSPs for switching

• “Pass-through DSPs” — glorified “pipelined wires” — multi-pumping 50% back to user

• “Corner-turn DSPs”— connect cascades to fabric

Hoplite

Hoplite

DSP48E

cascade

fabric

Hoplite

DSP48E

DSP48E

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

fabric

Top-Turn DSPs PCIN to P

Bottom-Turn DSPs A:B to PCOUT

DSP48EDSP48E DSP48E DSP48E

Pass-thru DSPsPCOUT to PCIN

Pass-thru DSPsPCOUT to PCIN

Router DSPs

Router DSPs

Router DSPs

Hoplite

Hoplite

DSP48E

cascade

fabric

Hoplite

DSP48E

DSP48E

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

fabric

Top-Turn DSPs PCIN to P

Bottom-Turn DSPs A:B to PCOUT

DSP48EDSP48E DSP48E DSP48E

Pass-thru DSPsPCOUT to PCIN

Pass-thru DSPsPCOUT to PCIN

Router DSPs

Router DSPs

Router DSPs

Hoplite

Hoplite

DSP48E

cascade

fabric

Hoplite

DSP48E

DSP48E

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

fabric

Top-Turn DSPs PCIN to P

Bottom-Turn DSPs A:B to PCOUT

DSP48EDSP48E DSP48E DSP48E

Pass-thru DSPsPCOUT to PCIN

Pass-thru DSPsPCOUT to PCIN

Router DSPs

Router DSPs

Router DSPs

Hoplite

Hoplite

DSP48E

cascade

fabric

Hoplite

DSP48E

DSP48E

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

fabric

Top-Turn DSPs PCIN to P

Bottom-Turn DSPs A:B to PCOUT

DSP48EDSP48E DSP48E DSP48E

Pass-thru DSPsPCOUT to PCIN

Pass-thru DSPsPCOUT to PCIN

Router DSPs

Router DSPs

Router DSPs

Hoplite

Hoplite

DSP48E

cascade

fabric

Hoplite

DSP48E

DSP48E

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

fabric

Top-Turn DSPs PCIN to P

Bottom-Turn DSPs A:B to PCOUT

DSP48EDSP48E DSP48E DSP48E

Pass-thru DSPsPCOUT to PCIN

Pass-thru DSPsPCOUT to PCIN

Router DSPs

Router DSPs

Router DSPs

35

Page 42: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Physical FPGA layout

Hoplite

Hoplite

DSP48E

cascade

fabric

Hoplite

DSP48E

DSP48E

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

Hoplite

DSP48E

Hoplite

DSP48E

DSP48E

Hoplite

fabric

Top-Turn DSPs PCIN to P

Bottom-Turn DSPs A:B to PCOUT

DSP48EDSP48E DSP48E DSP48E

Pass-thru DSPsPCOUT to PCIN

Pass-thru DSPsPCOUT to PCIN

Router DSPs

Router DSPs

Router DSPs

2x2 NoC (ML605 board)

Corner-Turn

Pass-Thru

Hoplite

36

Page 43: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite
Page 44: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Efficiency

38

Page 45: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Efficiency

39

Page 46: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Efficiency

40

Page 47: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite

Efficiency

41

DSP48s less-efficient than LUT-based Hoplite!