Vm Vhdl Design

Post on 05-Apr-2015

154 views 2 download

Transcript of Vm Vhdl Design

VHDL design in Liberouter accelerating cards

Jan Ko enek řkorenek@liberouter.org

Accelerating cards

� Combo6 card

� Connected to PCI bus.

� Routing and filtering functionality.

� Interface card

� Connected to Combo6 card.

� PHYTERs driving

� Buffers

� Probably some routing functionality blocks.

HW resources

� Combo6

� FPGA Virtex II – XCV2-3000

� DDRAM

� 3 x SSRAM

� CAM

� Interface card

� 2 x FPGA Virtex II – XCV2-1000

� 2 x SSRAM

� 4 x PHYTER

PHYTER

PHYTER

PHYTER

PHYTER

Combo6 card

CAM

SDRAM

XCV2−3000Virtex−II

SSRAM

PLX

SSRAM

SSRAM

PCI bus

SSRAM

SSRAM

XCV2−1000Virtex−II

XCV2−1000Virtex−II

Interface card

IO in

terfa

ces

VHDL design entities

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

VHDL design (ifc. card)

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

DRAMscheduler

Addressdecoder

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

PHYTER control

� Necessary PHYTER initialization

� Serial access to internal registers

� Speed can be slow.

� Registers are driven by SW.

� Serialize or deserialize data from or to PCI bus.

MDIO

MDC_OUTMDC_IN

PHY_ADR

REG_ADR

CLK_IN

RDY

PHYTER conrol

DATA

R/W

Input and output buffer

� Header field extractor suppose Virtex II Pro chip and its Rocket IO transceivers.

� Necessary functionality.

� Elastic buffer store one or more packets.

� Some signals – Full, Half Full, ...

� Compute CRC

� Not implemented yet

VHDL design – HFE

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Header Field Extractor

� Small universal processor (nano-processor).

� Analyze packet header and retrieve important information for routing and filtering.

� Store this information into the unified header structure.

� Send unified header to Look-up processor.

� Send packet to the DRAM scheduler.

Header Field Extractor implementation

UH_ADDRESSHFE_CORE

CPU

Instruction Memory

BlockRAM

ADDR

INSTR

ADDR

DATA

UH_DATA

DRAM DATA

CONTROL

PACKET DATA

Finite State Machine

REQs

ACKs

WENs

STATE sigs

External registers

HFE – main components

Processor core and instruction memory

External registers set – accesses environment (packets data input, DRAM,…) to processor core

Finite State Machine – processes DRAM and UH communication, may work core independently (intelligent peripherals)

Others – counters,… mapped into register set

Processor core overview

Simple RISC core, 16 bit data processing

But allows 8, 4 and 1 bit operations too

Fast loop and jump support (no wait cycles)

Arithmetic operations are reduced to addition and subtraction

Everything is mapped into one memory space (inputs, outputs, control, RAM), every instruction can access any register, I/O port or RAM memory

But program memory and stack is invisible (Harvard architecture) – faster, stable

Two-levels pipeline – decode and execute phases

Processor core structure

CONSTANTS GENERATOR

RAM

DATA Address Generation Unit

CONTROL & DECODE UNIT

ALU

INSTR

„Z“

SRC_ADDR

DST_ADDR

INTERNAL REGISTRES

MX

DIN

DOUT

MX

PIPELINE REGISTERSCLK

CONTROL SIGNALS

HFE_COREINSTRUCTION Address Generation Unit

IADDR

DSTADDR

STACK

Current state

� Processor core and most of peripherals fully implemented

� Working frequency about 60 MHz, we need some optimizations

� About 600 CLBs occupied (4,2 %)

� VHDL simulations, real packet data on input

Future performance improvements

� Discard the 16 bit adder – we don’t need addition and subtraction

� Optimize buses

� No indirect addressing

� More clock cycles for some instructions

� …

VHDL design – LUP

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Look-up processor

� Why we need this?

Recognize unified header and select output interfaces, packet priority and packet editation.

� Apply firewall rules

� Function description

� Load unified header from input FIFO

� Do the match and retrieve informations

� Send informations to Replicator

Look-up abstraction

Block structure

CAM block

� Do the match in CAM memory

� Select part of unified header (16 registers)

� Load registers into the buffer – necessary for full CAM performance

� Do the match in CAM

� Retrieve matched address and put it to processing unit

� Mutual exclusion to unified header FIFO.

Processing unit

� Simple processor.

� Program address get from CAM block.

� Supported instructions.

� TAB – new program counter value created from unified header.

� Jxxx – test lower and upper bounds.

� EXE – Last program instruction. Contain information for next blocks.

Current state

� HW design

Implementation in VHDL

! Behavioral simulations

" Post place and route processing unit simulations

VHDL design – Replicator

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Replicator

# Block that replicates matched data structure to Edit engine input queues

$ Function description

% Get input data structure from Match Engine or SW

& Load appropriate number replication data structure from BlockRAM memory and send them with address of packet into Priority queues

' Increment reference to DRAM allocation block

( Update statistic informations

Block diagram

Data structure

VHDL design – PQ

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Priority queues

) Records on every interface are sorting by priority and putting to the Edit Engine.

* Records are temporary stored inside SSRAM memory.

+ Replication on interface

, Why isn't this done by replicator?

- Spend less memory

. Replicator simplification

Architecture

/ Two memory components

0 16x64b DESC queues description and status

1 Generates SSRAM addresses

2 Helps for actual queue search

3 32x4b ASGN assigns every queue to interface (EE or SW)

4 Provides information for SEARCH block

Q15 Q1 Q0 Q0

Q0

Q1

Q2

Q15

Start WritePtrReadPtrLength

ASGN

DESC

Q2

Edit engine 1

Block structure

State diagram

Design critical points

5 Concurrent searching of actual queue and putting records to Edit Engine. It is necessary synchronization and dual-port memory DESC

6 Different behavior of SW and Edit Engine interface (SW hasn't WB state)

7 Design suitable pipeline

8 Current state – implementation in VHDL

VHDL design – Edit engine

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

9 Block that creates the output stream for outgoing packets

: Inserts new L2 header

; Decrements Hop Limit

< Routing Header Options

= Encapsulate/Decapsulate IPv6 packet

> etc ....

Edit engine

Input data

Block Diagram

Instructions

Instructions for data sending and modification

? send data by given size

@ SPDP, SPEP, SAPB, SAPC

A send data to reference position

B SPDU, SPDE

Control instructions

C MARK, NXTO, LDEN, LDOP

Current state

VHDL design – DRAM scheduler

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

Addressdecoder

DRAMscheduler

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

DRAM scheduler

D DDR SDRAM memory control.

E Load and store packets into DDR SDRAM memory.

F Memory divided into fixed length blocks.

G Free blocks control.

H Bootle neck of the design – we need maximal memory speed.

I Three different types of interfaces – HFE, Replicator and Edit engine.

Block structure

32B 32B

32B 32B

32B 32B

32B 32B

EEinterface

REPinterface

interfaceHFE

Low level DDRSDRAM control

Timeslotscontrol

Addresscontrol

Scheduler_core

BlocRAM

BlocRAM

BlocRAM

BlocRAM

DDR SDRAM memory

HFE1

HFE2

HFE3

HFE4

EE1

EE2

EE3

EE4

Data Control

Core structure

J Low level SDRAM control

K Communication with SDRAM

L Time slots control

M Time sharing strategy

N Every component has slot

O Address control

P Number of references for every block address.

Scheduler_core

AddressTimeslotscontrol control

Low level DDRSDRAM control

Low level DDR SDRAM control

Q DRAM control

R Load and store data

S DCM – clock generation and phase shifting

T Command generator

U Memory initialization

V Auto refresh and read/write cycles

W Data path

X Time transforms (data are delayed)

SLow level DDR SDRAM control

data_from_ddr

data_to_ddr

clk_in

read

write

DD

R S

DR

AMdata

command

Data

Command

generator

2x D

cm

path

Current state

Y DDR SDRAM test implemented

Z HFE interface

[ Implemented, but no simulation

\ Low level scheduler

] Implemented

^ Behavioral simulation

_ Other blocks

` Specified but not implemented

VHDL design – Address decoder

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Address decoder and local bus

a Address decoder

b Hierarchical address space

c chip select for next level

d Local bus

e Connect all blocks to PLX (PCI bus)

f Address and data multiplex – 16 bits

g Long wires –> necessary wait cycles

Current state of VHDL design

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

Addressdecoder

DRAMscheduler

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 cardBlock phase

Next steps

h Finish all blocks implementation

i Design completion (concurrently)

j Testing functionality and bugs fixing

k Move some blocks to interface card

l Adding new features and improve performance.

The End

Thank you for your attention