Extension Using NoCsparallel.princeton.edu/openpiton/tutorial_slides/asplos18/openpiton... · Cache...

Post on 19-Jul-2020

11 views 0 download

Transcript of Extension Using NoCsparallel.princeton.edu/openpiton/tutorial_slides/asplos18/openpiton... · Cache...

OpenPiton in Action

Princeton University

OpenPit

http://openpiton.org

Extension Using NoCs

2

P-Mesh NoC Connected I/O and Accelerators

3

DRAM DRAM

Deserializer

Serializer

DMA

Buffer Accel

Tile

10

Gig

abit

Eth

ern

et

P-Mesh NoC: packet format

CHIPID: Highest bits indicate whether the destination is on-chip or off-chip, the rest of the bits indicates the chip ID

XPOS: The position of the destination tile in the X dimensionYPOS: The position of the destination tile in the Y dimensionFBITS: The router output port to the destinationPAYLOAD LENGTH: The number of payload packetsRESERVED: Reserved Bits used by higher-level protocols.

4

RESERVED

P-Mesh NoC: .h files

piton/design/include/network_define.hDefines the header flits b63-22(all except messageid, tag, and options 1)

piton/design/include/define.vhdefines the rest

5

Cache Coherence Protocol

Directory-based MESI coherence Protocol

- Four-hop message communication (no direct communication between private L1.5 caches)

- Uses 3 physical NoCs with point-to-point ordering to avoid deadlock

- The directory and L2 are co-located but state information are maintained separately

- Silent eviction in E and S states

- No need for acknowledgement upon write-back of dirty lines from L1.5 to L2, but writeback guard needed in some cases.

6

Memory Hierarchy Datapath

7

Private L1.5

Distributed shared

L2

Off-chipChipset

NoC1

NoC2

NoC3

NoC1

NoC2

NoC3

NoC Messages

8

L1.5 L2L1.5/

Memory

NoC1 NoC2

NoC2 NoC3

In order to avoid deadlock, NoC3 messages will never be blocked

LoadStore

Ifill…

DowngradeInv

Mem Req…

DG ackInv ack

Mem Reply…

Load AckStore Ack

Backup Slides

9

Coherence Transaction Example

10

Core 1

I → E

Core 2

I

I → E

Memory

L1.5 L1.5

L2❶Load

❷Mem Req ❸MemReply

❹ Data Ack

Core 1 Core2

Ld

Coherence Transaction Example (2)

11

Core 1

E → I

Core 2

I → M

E → M

Memory

L1.5 L1.5

L2 ❶ Store

❷Downgrade

❸ DG Ack

❹ Data Ack

Core 1 Core2

LdSt

Coherence Transaction Example (3)

12

Core 1

I

Core 2

M → I

M → I

Memory

L1.5 L1.5

L2 ❶WbGuard

❷Writeback

Core 1 Core2

LdSt

Wb

Adding to OpenPiton

• AXI-Lite

• Wishbone

• Interfacing with the Network on Chip

13

Hooking up an AXI-Lite device

14

Interfacing with the Networks-on-Chip

1.Packet format– Highlighting key packet fields

2.Definition files– .h files

3.Instantiations in Verilog design

15

NoC: packet format

64-bit flits1 packet header (64b) + X packet payload flits

(64b * X)Ex: Cache request from L1.5 to L2

Header flit + req. address flit + metadata flitEx: Cache response from L2 to L1.5

Header flit + 2x data flits (16B cache line)Ex: Instruction cache response

Header flit + 4x data flits (32B cache line)

16

NoC: instantiations

piton/design/chip/rtl/chip.v.pyvChip-wide connections between tilesAuto generated using PYHP

17

NoC: instantiationspiton/design/chip/tile/rtl/tile.v.pyv

Instantiation of NoC1/2/3piton/design/chip/tile/rtl/tile.v.pyv

Selectable between router and crossbar design

18

Cache Coherence Protocol

Directory-based MESI coherence Protocol

- Four-hop message communication (no direct communication between private L1.5 caches)

- Uses 3 physical NoCs with point-to-point ordering to avoid deadlock

19

ReqI->S

DirM->S

ReqRd

AckDt

OwnerM->S

FwdRdAck

FwdRd

L1.5 L2 L1.5

Cache Coherence Protocol (2)

Directory-based MESI coherence Protocol

- The directory and L2 are co-located but state information are maintained separately

20

L2 State Dir State Tag Data Sharer List

Cache Coherence Protocol (3)

Directory-based MESI coherence Protocol

- Silent eviction in E and S states

- No need for acknowledgement upon write-back of dirty lines from L1.5 to L2

21

ReqS->I

ReqE->I

ReqM->I

DirM->IE->I

WbGuard

Wb

Example: Add an on-chip accelerator

1. Implement the NoC interface for the accelerator

2. Design and implement the control flow for the accelerator

– Use interrupt packets to init and stop the accelerator

– Use special load and stores to config the accelerator

– Follow the coherence protocol if a coherence cache is maintained

3. Connect the accelerator to NoCs and assign it a new tile ID

4. Modify the OS code to init the accelerator if needed

5. Write tests to test the accelerator

22