Presented By: Tal Goihman, Irit Kaufman Instructor: Mony Orbach Winter 2012.
HS/DSL Project Yael GrossmanArik Krantz Implementation and Synthesis of a 3-Port PCI- Express Switch...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of HS/DSL Project Yael GrossmanArik Krantz Implementation and Synthesis of a 3-Port PCI- Express Switch...
HS/DSL ProjectHS/DSL Project
Yael Grossman Arik Krantz
Implementation and Synthesis of a 3-Port PCI-
Express Switch
Supervisor: Mony Orbach
AgendaAgenda
Project Goals Project Specifications PCI vs. PCI-Express Switched Environment PCI-Express Layers Top Level Design Blocks Local/Global Mechanisms “A Day in the Life of a Packet” Logic Blocks Walkthrough Problems And Potential Pitfalls Completed Tasks Second Semester Plan
Project GoalsProject Goals
Familiarize ourselves with all aspects of hardware design: Definition of soft and hard cores Interaction with existing cores Software and hardware integration Simulation Synthesis
Working with state of the art design tools such as EDK v8.1
Design a 3-Port PCI-Express switch Learn the PCI-E protocol Create block-diagram flow and design of the switch
Implement and test the switch we designed
Part A
Part B
Project SpecificationsProject Specifications
Design a 3 port PCI-Express switch. Design must support high transfer rates –
2.5Gbit/sec, full-duplex, per port. Support simultaneous connections on all
ports. Support complex real-time algorithms for:
Packet routing between ports Error checking (CRC) and retransmission in
case of error Packet arbitration and prioritization
according to type (data / control)
Implement the switch on a Memec FF1152 design board using a Virtex-II Pro FPGA.
PCI BusPCI Bus
PCI - Peripheral Component Interconnect.
The standard bus, used to connect peripheral devices (hard-drive, CDROM/
DVD player) to a computer motherboard. Typical clock frequency of 33Mhz with a
throughput of 1.056Gbit/sec.
Problems: Only one device can use the bus at any given
time! Limited throughput with scalability issues.
PCI Bus vs. PCI-ExpressPCI Bus vs. PCI-Express
The new state of the art computer I/O interconnect technology .
A serial point-to-point (P2P) technology. Layered design, similar to the successful
OSI model used in computer networks. Answers all of the specified shortcomings
of the standard PCI bus:
Many devices can interconnect simultaneously.
Highly scalable design - adding devices does not affect the throughput of existing ones.
Initial throughput of 2.5Gbit/sec, easily extendable by using additional lanes.
Switched EnvironmentSwitched Environment
The PCI-E fabric is comprised of endpoints and switches, connected in a tree-like structure, where switches are internal nodes and endpoints are leaves.
The role of switches is to route and forward data packets between endpoints, while ensuring data integrity and quality of service.
Each link is defined exclusively between two devices, replacing the shared-bus technology.
Layered StructureLayered Structure
Transaction Layer:Creates data packets obtained from the device during a data send flow and restores the data
from the packets during a data receive flow. Data Link Layer:Ensures data integrity during packet transmission
and reception on each Link. Physical Layer:Performs the actual, physical process of sending
and receiving packets on the electrical lines.
Each layer interacts solely with the
parallel layer on the peer device
Packet StructurePacket Structure
Frame1 Byte
Frame1 Byte
LCRC Field4 Bytes
Sequence Number2 Bytes
Data Payload0-4 KBytes
Header12-16 Bytes
Transaction Layer
Data Link Layer
Physical Layer
Transaction Layer PacketTransaction Layer Packet
Data Link Layer PacketData Link Layer Packet
Frame1 Byte
Sequence Number2 Bytes
Frame1 Byte
LCRC Field4 Bytes
CRC Check
MGT
SQ1 SQ2
RB ACKQ
Receive BRAM
ACK/NACK GeneratorRouting
Arbiter
Data In
Data Out
Routing
Multi-Gigabit Transceiver (MGT)
Handles high speed transmission and reception of
packets..
Receive Block RAM (BRAM)
Stores received packets on arrival to the switch.
CRC Check
Checks if the packet arrived without errors.
ACK/NACK Generator
Creates a DLL packet which indicates to the sending
device whether the packet arrived intact.
ACK/NACK Queue (ACKQ)
Stores the generated DLLPs while they await sending.
Top Level DesignTop Level Design
Routing Block
Decides on which outgoing port to forward the packet,
according to lookup tables it contains.
Replay Buffer
Contains all the packets that have been sent on the
outgoing queue and have not yet received an ACK.
Send Queue 1 and 2 (SQ)
Contain packets routed to current port from the other
two switch ports.
Arbiter
Decides which packet to transmit on the outgoing lane.
Top Level DesignTop Level Design
CRC Check
MGT
SQ1 SQ2
RB ACKQ
Receive BRAM
ACK/NACK GeneratorRouting
Arbiter
Data In
Data Out
Routing
Send Queue 1 and 2 (SQ1, SQ2)
Contain packets routed to current port from the other twoswitch ports. We have one buffer for each queue in orderto prevent simultaneous memory write.
ACK/NACK Queue (ACKQ)Used to store DLLPs for each received TLP while they await sending.
Replay Buffer (RB)Contains all the packets that have been sent on the outgoing queue and have not yet received an ACK.
Send/Receive PipelinesIncludes logic for incoming and outgoing packets.
Routing MechanismIncludes an independent copy of the routing table.
Multi Gigabit Tranceiver (MGT)
Allows sending and receiving packets via Rocket-I/O Arbitration Mechanism
Chooses next packet to send via MGT
Local/Global MechanismsLocal/Global Mechanisms
Packet Storage in Common Memory SpaceEnables fast packet forwarding between ports
Local
Global
CRC Check
MGT 1
SQ1 SQ2
RB ACKQ
Receive BRAM
ACK/NACK Generation
Routing
CRC Check
MGT 2
SQ1 SQ2
RB ACKQ
Receive BRAM
ACK/NACK Generation
If ACKed
RoutingCRC
Check
MGT 3
SQ1SQ2
RBACKQ
Receive BRAM
ACK/NACK Generation Routing
A Day in the Life of a PacketA Day in the Life of a Packet
TLP
TLP
ACK
TLP
TLP
TLP
TLP
CRC Check
MGT 2
SQ1 SQ2
RB ACKQ
Receive BRAM
ACK/NACK GenerationRouting
? ?
CRC Logic Block
CLK
Address Addr
CLK
IsTLP
Valid
IsTLP
Valid
AddrTLP
Handler
DLLP Handler
Receive Pipeline
Addr3
NTS3
Addr4
NTS4
CRCLogicBlock
TLPHandler
DLLPHandler
Continue
TLP_CRC
DLLP_CRC
IsTLPCheck if
TLP/DLLP
Valid
Valid
Valid IsTLP Description 1 1 Packet is valid TLP 1 0 Packet is valid DLLP 0 1 Packet is invalid TLP 0 0 Packet is invalid DLLP / unrecognized
Valid
IsTLP
Addr
* Valid and IsTlp default to 0.
EN
EN
CRC Logic Block
Return
Return
EN CTRL[1] CTRL[2] Meaning 0 1 1 Not a valid TLP packet 1 0 0 TLP_SEQ (received) == NRS (expected) 1 0 1 TLP_SEQ (received) > NRS (expected) 1 1 0 TLP_SEQ (received) < NRS (expected)
NRS – NEXT_RCV_SEQ – keeps track of the next expected TLPs sequence
number
TLP_CTRL[2]
IsTLP
Route Packet
and enqueue
ACK/NACK Send Logic
Addr3
NTS3
Addr
IsTLP
ValidCompare NRS and TLP_SEQ (deduced from
Addr)
EN
/2
Addr
Addr
TLP_CTRL
Send_Nack
Update expected TLP seq (NRS)
TLP Handler Logic Block
Check if NACK needs to be sent. If so, set
NACK_SCHEDULED
Valid
(5)
(4)
(1)
(2)
(3)
TLP_SEQ>NRS
TLP_SEQ=NRS
TLP_SEQ<NRS
Check if NACKneeds to be sent.
If so, setNACK_SCHEDULED
ACK/NACKSendLogic
RoutePacket
andEnqueue
Return
NACK_SCHEDULED SRFF
TLP_CTRL2
TLP_CTRL1
Invalid_TLP
TLP_SEQ (received) > NRS (expected)
Q
Send_Nack
S
TLP_CTRL2
TLP_CTRL1TLP_CTRL
R
NACK_SCHEDULED handler
Want_To_Send_Nack
ValidIsTLP
If Want_To_Send_Nack=1 and NACK_SCHED=0 S=1. We’re setting the NACK_SCHED.
Else S=0. We’re not changing the NACK_SCHED.(either NACK_SCHED is on, or WANT_TO_Send is off)
If TLP_CTRL = ‘00’ R=1. We’re resetting the NACK_SCHED.Else R=0. No change.
Construct DLLP Packet
Enable
Addr
Update type-bit
in packet
Write to ACKQ
ENEN
ACK/NACK Block
Addr3
NTS3TLP_CTRL /2
Send_NACKDetermines whether to
send DLLP, and if so
what type.
Ack/Nack
(1)
(2) (3) (4)
Logic for Block No. 1:
Send_Nack TLP_CTRL[2] Meaning Comments 0 ‘01’ don’t send anything. (we wanted to send a NACK, but NACK_SCHEDULED was on) 1 ‘01’ send a NACK (we want to send a NACK. NACK_SCHEDULED is off, so we can send, no problem) X ’00', ‘10’ send an ACK (we want to send an ACK. NACK_SCHEDULED not relevant) 0 ‘11’ don’t send anything (TLP_CTRL=’11: this is either a valid DLLP or an invalid TLP. If it’s an
invalid TLP, we’d need to send out a NACK, but since Send_Nack is off, we don’t send anything).
1 ‘11’ Send a Nack (this is definitely an invalid TLP, else Send_Nack wouldn’t be on).
Return
Decide Outgoing Port
Addr
Valid
IsTLP
Write to SQ_1
Write to SQ_2
EN_1
EN_2
Routing Logic BlockLogic block executed by global routing mechanism
Return
1 2
1
21
2
Port 1
Port 2Port 3
Return
1 2
1
21
2
Port 1
Port 2Port 3
Routing Flows
RB Logic
Chosen_Buffer
Addr-TLP
Addr4
NTS4
Addr
IsTLP
Valid
ACK_SEQ is updated to value of DLLP_Seq_Num
(timeout mechanism
implemented)
EN
Enable=1 if it’s a valid
DLLP. If not, we can’t do
anything with it.
Why? Because DLLP_Seq_Num can’t be
smaller than ACK_SEQ. It’s either equal or larger. If it’s equal, no harm done. If not,
we’ve made forward progress and need to update anyway.
We still don’t know if it’s an ACK or a NACK. All we know
is that it’s a valid DLLP.
DLLP Handler Logic Block
En
Addr
(1)(2)
Return
RBLogic
Addr
Replay Buffer Logic Block
\2
Addr-TLP
Enable
Control
Chosen_Buffer
Resend Buffer
Purge RB up to
DLLP_Seq_Num
Add Address
to RB
ACK/NACK
Addr
ACK/NACK
Addr
Addr
Addr4
NTS4
Decide on action
according to DLLP type, validity and
chosen buffer
/2
(1)(2)
(3)
(4)
(5)
Addr
En A/N Ch_Buf Action 0 X 10,11 Do Nothing 0 X 00,01 Add address to RB from last seq. number 1 1 10,11 Purge RB up to DLLP_Seq_Num 1 1 00,01 Purge RB up to DLLP_Seq_Num + add to RB from last seq. number 1 0 10,11 Purge RB up to DLLP_Seq_Num + resend RB from DLLP_Seq_Num 1 0 00,01 Purge RB up to DLLP_Seq_Num + resend RB from DLLP_Seq_Num
Control: 00 – Do Nothing01 – Add to RB10 – Request to Resend RB11 – Perform RB Resend
Enable 0 – Do Nothing1 – Purge RB
Return
SQ_1
SQ_2
ACKQ
RB Logic
Addr1
NTS1
Addr2
NTS2
Addr3
NTS3
Addr4
NTS4
Arbitration Logic
NTS – Need To Send
\2
Chosen_Buffer
\2
Chosen_Buffer
\2
Chosen_Buffer
Address_toStore
Addr-TLP
Valid+TLP \2
Addr
Valid+TLP \2Addr
Valid+TLP \2Addr_2
Valid+TLP \2Addr_1
Send Pipeline
Potential Traps and PitfallsPotential Traps and Pitfalls
Memory/Logic constraints Need to make sure required logic can fit on development board Make sure we have enough BRAM space for all required buffers
Speed and concurrency Need to make sure speed can match PCI-E speed specs. Potential
issue with extensive bus access. Potential synchronization issues in a multiple endpoint environment,
under heavy load.
Completed TasksCompleted Tasks
Detailed design Data flow diagram (transmitter/ receiver) for ports Queue and buffer structure and size Adaptation of design to board requirements
System block design Completed high and low-level block design User core design in progress
Work with design tools Learned VHDL Learned to work with EDK 8.1 Created custom projects and user cores
Second Semester PlanSecond Semester Plan
Complete coding of project 4 weeks Block implementation and design in VHDL Creating test benches and simulation Adaptation of VHDL code to user core format
Debugging and integration 4 weeks Synthesis Simulation Testing Fix bugs.. (if we have any, of course.. )