RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P....

60
[Page 1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 R. Recio draft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard Company J. Hilland Hewlett-Packard Company October 2002 An RDMA Protocol Specification (Version 1.0) 1 Status of this Memo This document is a Release Specification of the RDMA Consortium. Copies of this document and associated errata may be found at http://www.rdmaconsortium.org. 2 Abstract This document defines a Remote Direct Memory Access Protocol (RDMAP) that operates over the Direct Data Placement Protocol (DDP protocol). RDMAP provides read and write services directly to applications and enables data to be transferred directly into ULP Buffers without intermediate data copies. It also enables a kernel bypass implementation.

Transcript of RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P....

Page 1: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

[Page 1]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

R. Reciodraft-recio-iwarp-rdmap-v1.0 IBM Corporation

P. CulleyHewlett-Packard Company

D. GarciaHewlett-Packard Company

J. HillandHewlett-Packard Company

October 2002

An RDMA Protocol Specification (Version 1.0)

1 Status of this Memo

This document is a Release Specification of the RDMA Consortium.Copies of this document and associated errata may be found athttp://www.rdmaconsortium.org.

2 Abstract

This document defines a Remote Direct Memory Access Protocol (RDMAP)that operates over the Direct Data Placement Protocol (DDPprotocol). RDMAP provides read and write services directly toapplications and enables data to be transferred directly into ULPBuffers without intermediate data copies. It also enables a kernelbypass implementation.

Page 2: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 2]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

Table of Contents

1 Status of this Memo ........................................ 1 2 Abstract ................................................... 1 3 Introduction ............................................... 4 3.1 Architectural Goals ........................................ 4 3.2 Protocol Overview .......................................... 5 3.3 RDMAP Layering ............................................. 6 4 Glossary ................................................... 9 4.1 General .................................................... 9 4.2 LLP ....................................................... 10 4.3 Direct Data Placement (DDP) ............................... 11 4.4 Remote Direct Memory Access (RDMA) ........................ 12 5 ULP and Transport Attributes .............................. 16 5.1 Transport Requirements & Assumptions ...................... 16 5.2 RDMAP Interactions with the ULP ........................... 17 6 Header Format ............................................. 20 6.1 RDMAP Control and Invalidate STag Field ................... 20 6.2 RDMA Message Definitions .................................. 22 6.3 RDMA Write Header ......................................... 23 6.4 RDMA Read Request Header .................................. 24 6.5 RDMA Read Response Header ................................. 25 6.6 Send Header and Send with Solicited Event Header .......... 25 6.7 Send with Invalidate Header and Send with SE and InvalidateHeader.......................................................... 26 6.8 Terminate Header .......................................... 26 7 Data Transfer ............................................. 31 7.1 RDMA Write Message ........................................ 31 7.2 RDMA Read Operation ....................................... 32 7.2.1 RDMA Read Request Message................................ 32 7.2.2 RDMA Read Response Message............................... 33 7.3 Send Message Type ......................................... 34 7.4 Terminate Message ......................................... 35 7.5 Ordering and Completions .................................. 36 8 RDMAP Stream Management ................................... 40 8.1 Stream Initialization ..................................... 40 8.2 Stream Teardown ........................................... 41 8.2.1 RDMAP Abortive Termination............................... 41 9 RDMAP Error Management .................................... 43 9.1 RDMAP Error Surfacing ..................................... 43 9.2 Errors Detected at the Remote Peer on Incoming RDMA Messages44 10 Security Considerations ................................... 46 10.1 Protocol-specific Security Considerations................ 46 10.2 Using IPSec with RDMAP................................... 46 10.3 Other Security Considerations............................ 46 11 References ................................................ 49 11.1 Normative References..................................... 49 11.2 Informative References................................... 49 12 Appendix .................................................. 50 12.1 DDP Segment Formats for RDMA Messages.................... 50

Page 3: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 3]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

12.1.1 DDP Segment for RDMA Write ............................. 50 12.1.2 DDP Segment for RDMA Read Request ...................... 50 12.1.3 DDP Segment for RDMA Read Response ..................... 51 12.1.4 DDP Segment for Send and Send with Solicited Event ..... 52 12.1.5 DDP Segment for Send with Invalidate and Send with SE andInvalidate...................................................... 52 12.1.6 DDP Segment for Terminate .............................. 53 12.2 Ordering and Completion Table............................ 53 13 Authors Addresses ......................................... 56 14 Acknowledgments ........................................... 57 15 Full Copyright Statement .................................. 60

Table of Figures

Figure 1 RDMAP Layering.......................................... 7 Figure 2 Example of MPA, DDP, and RDMAP Header Alignment over TCP 8 Figure 3 DDP Control, RDMAP Control, and Invalidate STag Fields. 21 Figure 4 RDMA Usage of DDP Fields............................... 22 Figure 5 RDMA Message Definitions............................... 23 Figure 6 RDMA Read Request Header Format........................ 24 Figure 7 Terminate Header Format................................ 26 Figure 8 Terminate Control Field................................ 27 Figure 9 Terminate Control Field Values......................... 28 Figure 10 Error Type to RDMA Message Mapping.................... 30 Figure 11 RDMA Write, DDP Segment format........................ 50 Figure 12 RDMA Read Request, DDP Segment format................. 51 Figure 13 RDMA Read Response, DDP Segment format................ 51 Figure 14 Send and Send with Solicited Event, DDP Segment format 52 Figure 15 Send with Invalidate and Send with SE and Invalidate, DDPSegment......................................................... 52 Figure 16 Terminate, DDP Segment format......................... 53 Figure 17 Operation Ordering.................................... 55

Page 4: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 4]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

3 Introduction

Today, communications over TCP/IP typically require copy operations,which add latency and consume significant CPU and memory resources.The Remote Direct Memory Access Protocol (RDMAP) enables removal ofdata copy operations and enables reduction in latencies by allowinga local application to read or write data on a remote computer'smemory with minimal demands on memory bus bandwidth and CPUprocessing overhead, while preserving memory protection semantics.

RDMAP is layered on top of Direct Data Placement (DDP) and uses thetwo Buffer Models available from DDP [DDP].

3.1 Architectural Goals

RDMAP has been designed with the following high-level architecturalgoals:

* Provide a data transfer operation that allows a Local Peer totransfer up to 2^32 - 1 octets directly into a previouslyadvertised buffer (i.e. Tagged buffer) located at a Remote Peerwithout requiring a copy operation. This is referred to as theRDMA Write data transfer operation.

* Provide a data transfer operation that allows a Local Peer toretrieve up to 2^32 - 1 octets directly from a previouslyadvertised buffer (i.e. Tagged buffer) located at a Remote Peerwithout requiring a copy operation. This is referred to as theRDMA Read data transfer operation.

* Provide a data transfer operation that allows a Local Peer tosend up to 2^32 - 1 octets directly into a buffer located at aRemote Peer that has not been explicitly advertised. This isreferred to as the Send (Send with Invalidate, Send withSolicited Event, and Send with Solicited Event and Invalidate)data transfer operation.

* Enable the local ULP to use the Send Operation Type (includesSend, Send with Invalidate, Send with Solicited Event, and Sendwith Solicited Event and Invalidate) to signal to the remote ULPthe Completion of all previous Messages initiated by the localULP.

* Provide for all Operations on a single RDMAP Stream to bereliably transmitted in the order that they were submitted.

* Provide RDMAP capabilities independently for each Stream when theLLP supports multiple data Streams within an LLP connection.

Page 5: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 5]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

3.2 Protocol Overview

RDMAP provides seven data transfer operations. Except for the RDMARead operation, each operation generates exactly one RDMA Message.Following is a brief overview of the RDMA Operations and RDMAMessages:

1. Send - A Send operation uses a Send Message to transfer datafrom the Data Source into a buffer that has not been explicitlyAdvertised by the Data Sink. The Send Message uses the DDPUntagged Buffer Model to transfer the ULP Message into the DataSink's Untagged Buffer.

2. Send with Invalidate - A Send with Invalidate operation uses aSend with Invalidate Message to transfer data from the DataSource into a buffer that has not been explicitly Advertised bythe Data Sink. The Send with Invalidate Message includes allfunctionality of the Send Message, with one addition: an STagfield is included in the Send With Invalidate Message and afterthe message has been Placed and Delivered at the Data Sink theremote peer's buffer identified by the STag can no longer beaccessed remotely until the remote peer's ULP re-enables accessand Advertises the buffer.

3. Send with Solicited Event (Send with SE) - A Send with SolicitedEvent operation uses a Send with Solicited Event Message totransfer data from the Data Source into an Untagged Buffer atthe Data Sink. The Send with Solicited Event Message is similarto the Send Message, with one addition: when the Send withSolicited Event Message has been Placed and Delivered, an Eventmay be generated at the recipient, if the recipient isconfigured to generate such an Event.

4. Send with Solicited Event and Invalidate (Send with SE andInvalidate) - A Send with Solicited Event and Invalidateoperation uses a Send with Solicited Event and InvalidateMessage to transfer data from the Data Source into a buffer thathas not been explicitly Advertised by the Data Sink. The Sendwith Solicited Event and Invalidate Message is similar to theSend with Invalidate Message, with one addition: when the Sendwith Solicited Event and Invalidate Message has been Placed andDelivered, an Event may be generated at the recipient, if therecipient is configured to generate such an Event.

5. Remote Direct Memory Access Write - An RDMA Write operation usesan RDMA Write Message to transfer data from the Data Source to apreviously advertised buffer at the Data Sink.

The ULP at the Remote Peer, which in this case is the Data Sink,enables the Data Sink Tagged Buffer for access and Advertises

Page 6: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 6]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

the buffer's size (length), location (Tagged Offset), andSteering Tag (STag) to the Data Source through a ULP specificmechanism. The ULP at the Local Peer, which in this case is theData Source, initiates the RDMA Write operation. The RDMA WriteMessage uses the DDP Tagged Buffer Model to transfer the ULPMessage into the Data Sink's Tagged Buffer. Note: the STagassociated with the Tagged Buffer remains valid until the ULP atthe Remote Peer invalidates it or the ULP at the Local Peerinvalidates it through a Send with Invalidate or Send withSolicited Event and Invalidate.

6. Remote Direct Memory Access Read - The RDMA Read operationtransfers data to a Tagged Buffer at the Local Peer, which inthis case is the Data Sink, from a Tagged Buffer at the RemotePeer, which in this case is the Data Source. The ULP at the DataSource enables the Data Source Tagged Buffer for access andAdvertises the buffer's size (length), location (Tagged Offset),and Steering Tag (STag) to the Data Sink through a ULP specificmechanism. The ULP at the Data Sink enables the Data Sink TaggedBuffer for access and initiates the RDMA Read operation. TheRDMA Read operation consists of a single RDMA Read RequestMessage and a single RDMA Read Response Message, and the lattermay be segmented into multiple DDP Segments.

The RDMA Read Request Message uses the DDP Untagged Buffer Modelto Deliver the STag, starting Tagged Offset and length for boththe Data Source and Data Sink Tagged Buffers to the remotepeer's RDMA Read Request Queue.

The RDMA Read Response Message uses the DDP Tagged Buffer Modelto Deliver the Data Source's Tagged Buffer to the Data Sink,without any involvement from the ULP at the Data Source.

Note: the Data Source STag associated with the Tagged Bufferremains valid until the ULP at the Data Source invalidates it orthe ULP at the Data Sink invalidates it through a Send withInvalidate or Send with Solicited Event and Invalidate. The DataSink STag associated with the Tagged Buffer remains valid untilthe ULP at the Data Sink invalidates it.

7. Terminate - A Terminate operation uses a Terminate Message totransfer to the Remote Peer information associated with an errorthat occurred at the Local Peer. The Terminate Message uses theDDP Untagged Buffer Model to transfer the Message into the DataSink's Untagged Buffer.

3.3 RDMAP Layering

RDMAP is dependent on DDP, subject to the requirements defined insection 5 ULP and Transport Attributes

Page 7: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 7]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

Transport Requirements & Assumptions. Figure 1 RDMAP Layeringdepicts the relationship between Upper Layer Protocols (ULPs),RDMAP, DDP protocol, the framing layer, and the transport For LLPprotocol definitions of each LLP, see [MPA], [TCP], and [SCTP].

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| || Upper Layer Protocol (ULP) || |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| || RDMAP || |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| || DDP protocol || |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| | || MPA | || | |+-+-+-+-+-+-+-+-+-+ SCTP || | || TCP | || | |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 1 RDMAP Layering

If RDMAP is layered over DDP/MPA/TCP, then the respective headersand ULP Payload are arranged as follows (Note: For clarity, MPAheader and CRC fields are included but MPA markers are not shown):

Page 8: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 8]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |// TCP Header //| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| MPA Header | |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| |// DDP Header //| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |// RDMA Header //| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |// ULP Payload //| (shown with no pad bytes) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| MPA CRC |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Figure 2 Example of MPA, DDP, and RDMAP Header Alignment over TCP

Page 9: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 9]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

4 Glossary

4.1 General

Advertisement (Advertised, Advertise, Advertisements, Advertises) -the act of informing a Remote Peer that a local RDMA Buffer isavailable to it. A Node makes available an RDMA Buffer forincoming RDMA Read or RDMA Write access by informing itsRDMA/DDP peer of the Tagged Buffer identifiers (STag, baseaddress, and buffer length). This advertisement of Tagged Bufferinformation is not defined by RDMA/DDP and is left to the ULP. Atypical method would be for the Local Peer to embed the TaggedBuffer's Steering Tag, base address, and length in a SendMessage destined for the Remote Peer.

Data Sink - The peer receiving a data payload. Note that the DataSink can be required to both send and receive RDMA/DDP Messagesto transfer a data payload.

Data Source - The peer sending a data payload. Note that the DataSource can be required to both send and receive RDMA/DDPMessages to transfer a data payload.

Data Delivery (Delivery, Delivered, Delivers) - Delivery is definedas the process of informing the ULP or consumer that aparticular Message is available for use. This is specificallydifferent from "Placement", which may generally occur in anyorder, while the order of "Delivery" is strictly defined. See"Data Placement".

Fabric - The collection of links, switches, and routers that connecta set of Nodes with RDMA/DDP protocol implementations.

Fence (Fenced, Fences) - To block the current RDMA Operation fromexecuting until prior RDMA Operations have Completed.

iWARP - A suite of wire protocols comprised of RDMAP, DDP, and MPA.The iWARP protocol suite may be layered above TCP, SCTP, orother transport protocols.

Local Peer - The RDMA/DDP protocol implementation on the local endof the connection. Used to refer to the local entity whendescribing a protocol exchange or other interaction between twoNodes.

Node - A computing device attached to one or more links of a Fabric(network). A Node in this context does not refer to a specificapplication or protocol instantiation running on the computer. ANode may consist of one or more RNICs installed in a hostcomputer.

Page 10: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 10]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

Remote Peer - The RDMA/DDP protocol implementation on the oppositeend of the connection. Used to refer to the remote entity whendescribing protocol exchanges or other interactions between twoNodes.

RNIC - RDMA Network Interface Controller. In this context, thiswould be a network I/O adapter or embedded controller with iWARPand verbs functionality.

RNIC Interface (RI) - The presentation of the RNIC to the verbsConsumer as implemented through the combination of the RNIC andthe RNIC driver.

ULP - Upper Layer Protocol. The protocol layer above the protocollayer currently being referenced. The ULP for RDMA/DDP isexpected to be an OS, Application, adaptation layer, orproprietary device. The RDMA/DDP documents do not specify a ULP- they provide a set of semantics that allow a ULP to bedesigned to utilize RDMA/DDP.

ULP Payload - The ULP data that is contained within a singleprotocol segment or packet (e.g. a DDP Segment).

Verbs - An abstract description of the functionality of a RNICInterface. The OS may expose some or all of this functionalityvia one or more APIs to applications. The OS will also use someof the functionality to manage the RNIC Interface.

4.2 LLP

LLP - Lower Layer Protocol. The protocol layer beneath the protocollayer currently being referenced. For example, for DDP the LLPis SCTP, MPA, or other transport protocols. For RDMA, the LLP isDDP.

LLP Connection - Corresponds to an LLP transport-level connectionbetween the peer LLP layers on two nodes.

LLP Stream - Corresponds to a single LLP transport-level Streambetween the peer LLP layers on two Nodes. One or more LLPStreams may map to a single transport-level LLP connection. Fortransport protocols that support multiple Streams per connection(e.g. SCTP), a LLP Stream corresponds to one transport-levelStream.

MULPDU - Maximum ULPDU. The current maximum size of the record thatis acceptable for DDP to pass to the LLP for transmission.

ULPDU - Upper Layer Protocol Data Unit. The data record defined bythe layer above MPA.

Page 11: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 11]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

4.3 Direct Data Placement (DDP)

Data Placement (Placement, Placed, Places) - For DDP, this term isspecifically used to indicate the process of writing to a databuffer by a DDP implementation. DDP Segments carry Placementinformation, which may be used by the receiving DDPimplementation to perform Data Placement of the DDP Segment ULPPayload. See "Data Delivery".

DDP Abortive Teardown - The act of closing a DDP Stream withoutattempting to Complete in-progress and pending DDP Messages.

DDP Graceful Teardown - The act of closing a DDP Stream such thatall in-progress and pending DDP Messages are allowed to Completesuccessfully.

DDP Control Field - a fixed 16-bit field in the DDP Header. The DDPControl Field contains an 8-bit field whose contents arereserved for use by the ULP.

DDP Header - The header present in all DDP segments. The DDP Headercontains control and Placement fields that are used to definethe final Placement location for the ULP payload carried in aDDP Segment.

DDP Message - A ULP defined unit of data interchange, which issubdivided into one or more DDP segments. This segmentation mayoccur for a variety of reasons, including segmentation torespect the maximum segment size of the underlying transportprotocol.

DDP Segment - The smallest unit of data transfer for the DDPprotocol. It includes a DDP Header and ULP Payload (if present).A DDP Segment should be sized to fit within the underlyingtransport protocol MULPDU.

DDP Stream - a sequence of DDP Messages whose ordering is defined bythe LLP. For SCTP, a DDP Stream maps directly to an SCTP Stream.For MPA, a DDP Stream maps directly to a TCP connection and asingle DDP Stream is supported. Note that DDP has no orderingguarantees between DDP Streams.

Direct Data Placement - A mechanism whereby ULP data containedwithin DDP Segments may be Placed directly into its finaldestination in memory without processing of the ULP. This mayoccur even when the DDP Segments arrive out of order. Out oforder Placement support may require the Data Sink to implementthe LLP and DDP as one functional block.

Page 12: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 12]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

Direct Data Placement Protocol (DDP) - Also, a wire protocol thatsupports Direct Data Placement by associating explicit memorybuffer placement information with the LLP payload units.

Message Offset (MO) - For the DDP Untagged Buffer Model, specifiesthe offset, in bytes, from the start of a DDP Message.

Message Sequence Number (MSN) - For the DDP Untagged Buffer Model,specifies a sequence number that is increasing with each DDPMessage.

Queue Number (QN) - For the DDP Untagged Buffer Model, identifies adestination Data Sink queue for a DDP Segment.

Steering Tag - An identifier of a Tagged Buffer on a Node, valid asdefined within a protocol specification.

STag - Steering Tag

Tagged Buffer - A buffer that is explicitly Advertised to the RemotePeer through exchange of an STag, Tagged Offset, and length.

Tagged Buffer Model - A DDP data transfer model used to transferTagged Buffers from the Local Peer to the Remote Peer.

Tagged DDP Message - A DDP Message that targets a Tagged Buffer.

Tagged Offset (TO) - The offset within a Tagged Buffer on a Node.

Untagged Buffer - A buffer that is not explicitly Advertised to theRemote Peer.

Untagged Buffer Model - A DDP data transfer model used to transferUntagged Buffers from the Local Peer to the Remote Peer.

Untagged DDP Message - A DDP Message that targets an UntaggedBuffer.

4.4 Remote Direct Memory Access (RDMA)

Event - An indication provided by the RDMAP Layer to the ULP toindicate a Completion or other condition requiring immediateattention.

Invalidate STag - A mechanism used to prevent the Remote Peer fromreusing a previous explicitly Advertised STag, until the LocalPeer makes it available through a subsequent explicitAdvertisement. The STag cannot be accessed remotely until it isexplicit Advertised again.

Page 13: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 13]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

RDMA Completion (Completion, Completed, Complete, Completes) - ForRDMA, Completion is defined as the process of informing the ULPthat a particular RDMA Operation has performed all functionsspecified for the RDMA Operations, including Placement andDelivery. The Completion semantic of each RDMA Operation isdistinctly defined.

RDMA Message - A data transfer mechanism used to fulfill an RDMAOperation.

RDMA Operation - A sequence of RDMA Messages, including controlMessages, to transfer data from a Data Source to a Data Sink.The following RDMA Operations are defined - RDMA Writes, RDMARead, Send, Send with Invalidate, Send with Solicited Event,Send with Solicited Event and Invalidate, and Terminate.

RDMA Protocol (RDMAP) - A wire protocol that supports RDMAOperations to transfer ULP data between a Local Peer and theRemote Peer.

RDMAP Abortive Termination (Termination, Terminated, Terminate,Terminates) - The act of closing an RDMAP Stream withoutattempting to Complete in-progress and pending RDMA Operations.

RDMAP Graceful Termination - The act of closing an RDMAP Stream suchthat all in-progress and pending RDMA Operations are allowed toComplete successfully.

RDMA Read - An RDMA Operation used by the Data Sink to transfer thecontents of a source RDMA buffer from the Remote Peer to theLocal Peer. An RDMA Read operation consists of a single RDMARead Request Message and a single RDMA Read Response Message.

RDMA Read Request - An RDMA Message used by the Data Sink to requestthe Data Source to transfer the contents of an RDMA buffer. TheRDMA Read Request Message describes both the Data Source andData Sink RDMA buffers.

RDMA Read Request Queue - The queue used for processing RDMA ReadRequests. The RDMA Read Request Queue has a DDP Queue Number of1.

RDMA Read Response - An RDMA Message used by the Data Source totransfer the contents of an RDMA buffer to the Data Sink, inresponse to an RDMA Read Request. The RDMA Read Response Messageonly describes the data sink RDMA buffer.

RDMAP Stream - An association between a pair of RDMAPimplementations, possibly on different Nodes, which transfer ULPdata using RDMA Operations. There may be multiple RDMAP Streams

Page 14: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 14]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

on a single Node. An RDMAP Stream maps directly to a single DDPStream.

RDMA Write - An RDMA Operation that transfers the contents of asource RDMA Buffer from the Local Peer to a destination RDMABuffer at the Remote Peer using RDMA. The RDMA Write Messageonly describes the Data Sink RDMA buffer.

Remote Direct Memory Access (RDMA) - A method of accessing memory ona remote system in which the local system specifies the remotelocation of the data to be transferred. Employing a RNIC in theremote system allows the access to take place withoutinterrupting the processing of the CPU(s) on the system.

Send - An RDMA Operation that transfers the contents of a ULP Bufferfrom the Local Peer to an Untagged Buffer at the Remote Peer.

Send Message Type - A Send Message, Send with Invalidate Message,Send with Solicited Event Message, or Send with Solicited Eventand Invalidate Message.

Send Operation Type - A Send Operation, Send with InvalidateOperation, Send with Solicited Event Operation, or Send withSolicited Event and Invalidate Operation.

Solicited Event (SE) - A facility by which an RDMA Operation sendermay cause an Event to be generated at the recipient, if therecipient is configured to generate such an Event, when a Sendwith Solicited Event or Send with Solicited Event and InvalidateMessage is received. Note: The Local Peer's ULP can use theSolicited Event mechanism to ensure that Messages designated asimportant to the ULP are handled in an expeditious manner by theRemote Peer's ULP. The ULP at the Local Peer can indicate a givenSend Message Type is important by using the Send with SolicitedEvent Message or Send with Solicited Event and InvalidateMessage. The ULP at the Remote Peer can choose to only benotified when valid Send with Solicited Event Messages and/orSend with Solicited Event and Invalidate Messages arrive andhandle other valid incoming Send Messages or Send with InvalidateMessages at its leisure.

Terminate - An RDMA Message used by a Node to pass an errorindication to the peer Node on an RDMAP Stream. This operationis for RDMAP use only.

ULP Buffer - A buffer owned above the RDMAP Layer and advertised tothe RDMAP Layer either as a Tagged Buffer or an Untagged ULPBuffer.

Page 15: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 15]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

ULP Message - The ULP data that is handed to a specific protocollayer for transmission. Data boundaries are preserved as theyare transmitted through iWARP.

Page 16: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 16]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

5 ULP and Transport Attributes

5.1 Transport Requirements & Assumptions

RDMAP MUST be layered on top of the Direct Data Placement Protocol[DDP].

RDMAP requires the following DDP support:

* RDMAP uses three queues for Untagged Buffers:

* Queue Number 0 (used by RDMAP for Send, Send withInvalidate, Send with Solicited Event, and Send withSolicited Event and Invalidate operations).

* Queue Number 1 (used by RDMAP for RDMA Read operations).

* Queue Number 2 (used by RDMAP for Terminate operations).

* DDP maps a single RDMA Message to a single DDP Message.

* DDP uses the STag and Tagged Offset provided by the RDMAP forTagged Buffer Messages (i.e. RDMA Write and RDMA Read Response).

* When the DDP layer Delivers an Untagged DDP Message to the RDMAPlayer, DDP provides the length of the DDP Message. This ensuresthat RDMAP does not have to carry a length field in its header.

* When the RDMAP layer provides an RDMA Message to the DDP Layer,DDP must insert the RsvdULP field value provided by the RDMAPLayer into the associated DDP Message.

* When the DDP layer Delivers a DDP Message to the RDMAP layer, DDPprovides the RsvdULP field.

* The RsvdULP field must be 1 octet for DDP Tagged Messages and 5octets for DDP Untagged Messages.

* DDP propagates to RDMAP all operation or protection errors (usedby RDMAP Terminate) and, when appropriate, the DDP Header fieldsof the DDP Segment that encountered the error.

* If an RDMA Operation is aborted by DDP or a lower layer, thecontents of the Data Sink buffers associated with the operationare considered indeterminate.

* DDP in conjunction with the lower layers provide reliable, in-order Delivery.

Page 17: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 17]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

5.2 RDMAP Interactions with the ULP

RDMAP provides the ULP with access to the following RDMA Operationsas defined in this specification:

* Send

* Send with Solicited Event

* Send with Invalidate

* Send with Solicited Event and Invalidate

* RDMA Write

* RDMA Read

For Send Operation Types, the following are the interactions betweenthe RDMAP Layer and the ULP:

* At the Data Source:

* The ULP passes to the RDMAP Layer the following:

* ULP Message Length

* ULP Message

* An indication of the Send Operation Type, where thevalid types are: Send, Send with Solicited Event, Sendwith Invalidate, or Send with Solicited Event andInvalidate.

* An Invalidate STag, if the Send Operation Type was Sendwith Invalidate or Send with Solicited Event andInvalidate.

* When the Send Operation Type Completes, an indication of theCompletion results.

* At the Data Sink:

* If the Send Operation Type Completed successfully, the RDMAPLayer passes the following information to the ULP Layer:

* ULP Message Length

* ULP Message

Page 18: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 18]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

* An Event, if the Data Sink is configured to generate anEvent.

* An Invalidated STag, if the Send Operation Type was Sendwith Invalidate or Send with Solicited Event andInvalidate.

* If the Send Operation Type Completed in error, the Data SinkRDMAP Layer will pass up the corresponding error informationto the Data Sink ULP and send a Terminate Message to theData Source RDMAP Layer. The Data Source RDMAP Layer willthen pass up the Terminate Message to the ULP.

For RDMA Write Operations, the following are the interactionsbetween the RDMAP Layer and the ULP:

* At the Data Source:

* The ULP passes to the RDMAP Layer the following:

* ULP Message Length

* ULP Message

* Data Sink STag

* Data Sink Tagged Offset

* When the RDMA Write Operation Completes, an indication ofthe Completion results.

* At the Data Sink:

* If the RDMA Write completed successfully, the RDMAP Layerdoes not Deliver the RDMA Write to the ULP. It does Placethe ULP Message transferred through the RDMA Write Messageinto the ULP Buffer.

* If the RDMA Write completed in error, the Data Sink RDMAPLayer will pass up the corresponding error information tothe Data Sink ULP and send a Terminate Message to the DataSource RDMAP Layer. The Data Source RDMAP Layer will thenpass up the Terminate Message to the ULP.

For RDMA Read Operations, the following are the interactions betweenthe RDMAP Layer and the ULP:

* At the Data Sink:

* The ULP passes to the RDMAP Layer the following:

Page 19: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 19]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

* ULP Message Length

* Data Source STag

* Data Sink STag

* Data Source Tagged Offset

* Data Sink Tagged Offset

* When the RDMA Read Operation Completes, an indication of theCompletion results.

* At the Data Source:

* If no error occurred while processing the RDMA Read Request,the Data Source will not pass up any information to the ULP.

* If an error occurred while processing the RDMA Read Request,the Data Source RDMAP Layer will pass up the correspondingerror information to the Data Source ULP and send aTerminate Message to the Data Sink RDMAP Layer. The DataSink RDMAP Layer will then pass up the Terminate Message tothe ULP.

For STags made available to the RDMAP Layer, following are theinteractions between the RDMAP Layer and the ULP:

* If the ULP enables an STag, the ULP passes to the RDMAP Layerthe:

* STag;

* range of Tagged Offsets that are associated with a givenSTag;

* remote access rights (read, write, or read and write)associated with a given, valid STag; and

* association between a given STag and a given RDMAP Stream.

* If the ULP disables an STag, the ULP passes to the RDMAP Layerthe STag.

If an error occurs at the RDMAP Layer, the RDMAP Layer may pass backerror information (e.g. the content of a Terminate Message) to theULP.

Page 20: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 20]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

6 Header Format

The control information of RDMA Messages is included in DDP protocoldefined header fields, with the following exceptions:

* The first octet reserved for ULP usage on all DDP Messages in theDDP Protocol (i.e. the RsvdULP Field) is used by RDMAP to carrythe RDMA Message Opcode and the RDMAP version. This octet isknown as the RDMAP Control Field in this specification. For Sendwith Invalidate and Send with Solicited Event and Invalidate,RDMAP uses the second through fifth octets provided by DDP onUntagged DDP Messages to carry the STag that will be Invalidated.

* The RDMA Message length is passed by the RDMAP layer to the DDPlayer on all outbound transfers.

* For RDMA Read Request Messages, the RDMA Read Message Size isincluded in the RDMA Read Request Header.

* The RDMA Message length is passed to the RDMAP Layer by the DDPlayer on inbound Untagged Buffer transfers.

* Two RDMA Messages carry additional RDMAP headers. The RDMA ReadRequest carries the Data Sink and Data Source bufferdescriptions, including buffer length. The Terminate carriesadditional information associated with the error that caused theTerminate.

6.1 RDMAP Control and Invalidate STag Field

The version of RDMAP defined by this specification uses all 8 bitsof the RDMAP Control Field. The first octet reserved for ULP use inthe DDP Protocol MUST be used by the RDMAP to carry the RDMAPControl Field. The ordering of the bits in the first octet MUST beas defined in Figure 3 DDP Control, RDMAP Control, and InvalidateSTag Field. For Send with Invalidate and Send with Solicited Eventand Invalidate, the second through fifth octets of the DDP RsvdULPfield MUST be used by RDMAP to carry the Invalidate STag. Figure 3DDP Control, RDMAP Control, and Invalidate STag Field depicts theformat of the DDP Control and RDMAP Control fields. (Note: In Figure3 DDP Control, RDMAP Control, and Invalidate STag Field, the DDPHeader is offset by 16 bits to accommodate the MPA header defined in[MPA]. The MPA header is only present if DDP is layered on top ofMPA.)

Page 21: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 21]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|T|L| Resrv | DV| RV|Rsv| Opcode|

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Invalidate STag |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 3 DDP Control, RDMAP Control, and Invalidate STag Fields

All RDMA Messages handed by the RDMAP Layer to the DDP layer MUSTdefine the value of the Tagged flag in the DDP Header. Figure 4 RDMAUsage of DDP Fields MUST be used to define the value of the Taggedflag that is handed to the DDP Layer for each RDMA Message.

Figure 4 RDMA Usage of DDP Fields defines the value of the RDMAOpcode field that MUST be used for each RDMA Message.

Figure 4 RDMA Usage of DDP Fields defines when the STag, QueueNumber, and Tagged Offset fields MUST be provided for each RDMAMessage.

For this version of the RDMAP, all RDMA Messages MUST have:

* Bits 24-25; RDMA Version field: 00b .

* Bits 26-27; Reserved. MUST be set to zero by sender, ignored bythe receiver.

* Bits 28-31; OpCode field: see Figure 4 RDMA Usage of DDP Fields.

* Bits 32-63; Invalidate STag. However, this field is only validfor Send with Invalidate and Send with Solicited Event andInvalidate Messages (see Figure 4 RDMA Usage of DDP Fields). ForSend, Send with Solicited Event, RDMA Read Request, andTerminate, the Invalidate STag field MUST be set to zero ontransmit and ignored by the receiver.

Page 22: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 22]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

-------+-----------+-------+------+-------+-----------+--------------RDMA | Message | Tagged| STag | Queue | Invalidate| MessageMessage| Type | Flag | and | Number| STag | LengthOpCode | | | TO | | | Communicated

| | | | | | between DDP| | | | | | and RDMAP

-------+-----------+-------+------+-------+-----------+--------------0000b | RDMA Write| 1 | Valid| N/A | N/A | Yes

| | | | | |-------+-----------+-------+------+-------+-----------+--------------0001b | RDMA Read | 0 | N/A | 1 | N/A | Yes

| Request | | | | |-------+-----------+-------+------+-------+-----------+--------------0010b | RDMA Read | 1 | Valid| N/A | N/A | Yes

| Response | | | | |-------+-----------+-------+------+-------+-----------+--------------0011b | Send | 0 | N/A | 0 | N/A | Yes

| | | | | |-------+-----------+-------+------+-------+-----------+--------------0100b | Send with | 0 | N/A | 0 | Valid | Yes

| Invalidate| | | | |-------+-----------+-------+------+-------+-----------+--------------0101b | Send with | 0 | N/A | 0 | N/A | Yes

| SE | | | | |-------+-----------+-------+------+-------+-----------+--------------0110b | Send with | 0 | N/A | 0 | Valid | Yes

| SE and | | | | || Invalidate| | | | |

-------+-----------+-------+------+-------+-----------+--------------0111b | Terminate | 0 | N/A | 2 | N/A | Yes

| | | | | |-------+-----------+-------+------+-------+-----------+--------------1000b | |to | Reserved | Not Specified1111b | |-------+-----------+-------------------------------------------------

Figure 4 RDMA Usage of DDP Fields

Note: N/A means Not Applicable.

6.2 RDMA Message Definitions

The following figure defines which RDMA Headers MUST be used on eachRDMA Message and which RDMA Messages are allowed to carry ULPpayload:

Page 23: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 23]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

-------+-----------+-------------------+-------------------------RDMA | Message | RDMA Header Used | ULP Message allowed inMessage| Type | | the RDMA MessageOpCode | | |

| | |-------+-----------+-------------------+-------------------------0000b | RDMA Write| None | Yes

| | |-------+-----------+-------------------+-------------------------0001b | RDMA Read | RDMA Read Request | No

| Request | Header |-------+-----------+-------------------+-------------------------0010b | RDMA Read | None | Yes

| Response | |-------+-----------+-------------------+-------------------------0011b | Send | None | Yes

| | |-------+-----------+-------------------+-------------------------0100b | Send with | None | Yes

| Invalidate| |-------+-----------+-------------------+-------------------------0101b | Send with | None | Yes

| SE | |-------+-----------+-------------------+-------------------------0110b | Send with | None | Yes

| SE and | || Invalidate| |

-------+-----------+-------------------+-------------------------0111b | Terminate | Terminate Header | No

| | |-------+-----------+-------------------+-------------------------1000b | |to | Reserved | Not Specified1111b | |-------+-----------+-------------------+-------------------------

Figure 5 RDMA Message Definitions

6.3 RDMA Write Header

The RDMA Write Message does not include an RDMAP header. The RDMAPlayer passes to the DDP layer an RDMAP Control Field. The RDMA WriteMessage is fully described by the DDP Headers of the DDP Segmentsassociated with the Message.

See section 12 Appendix for a description of the DDP Segment formatassociated with RDMA Write Messages.

Page 24: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 24]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

6.4 RDMA Read Request Header

The RDMA Read Request Message carries an RDMA Read Request Headerthat describes the Data Sink and Data Source Buffers used by theRDMA Read operation. The RDMA Read Request Header immediatelyfollows the DDP header. The RDMAP layer passes to the DDP layer anRDMAP Control Field. The following figure depicts the RDMA ReadRequest Header that MUST be used for all RDMA Read Request Messages:

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Sink STag (SinkSTag) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |+ Data Sink Tagged Offset (SinkTO) +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| RDMA Read Message Size (RDMARDSZ) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Source STag (SrcSTag) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |+ Data Source Tagged Offset (SrcTO) +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 6 RDMA Read Request Header Format

Data Sink Steering Tag: 32 bits.

The Data Sink Steering Tag identifies the Data Sink's TaggedBuffer. This field MUST be copied, without interpretation, fromthe RDMA Read Request into the corresponding RDMA Read Responseand allows the Data Sink to place the returning data. The STagis associated with the RDMAP Stream through a mechanism that isoutside the scope of the RDMAP specification (see Section 10.3Other Security Considerations).

Data Sink Tagged Offset: 64 bits.

The Data Sink Tagged Offset specifies the starting offset, inoctets, from the base of the Data Sink's Tagged Buffer, wherethe data is to be written by the Data Source. This field iscopied from the RDMA Read Request into the corresponding RDMARead Response and allows the Data Sink to place the returningdata. The Data Sink Tagged Offset MAY start at an arbitraryoffset.

The Data Sink STag and Data Sink Tagged Offset fields describethe buffer to which the RDMA Read data is written.

Page 25: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 25]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

Note: the DDP Layer protects against a wrap of the Data SinkTagged Offset.

RDMA Read Message Size: 32 bits.

The RDMA Read Message Size is the amount of data, in octets,read from the Data Source. A single RDMA Read Request Messagecan retrieve from 0 to 2^32-1 data octets from the Data Source.

Data Source Steering Tag: 32 bits.

The Data Source Steering Tag identifies the Data Source'sTagged Buffer. The STag is associated with the RDMAP Streamthrough a mechanism that is outside the scope of the RDMAPspecification (see Section 10.3 Other Security Considerations).

Data Source Tagged Offset: 64 bits.

The Tagged Offset specifies the starting offset, in octets,that is to be read from the Data Source's Tagged Buffer. TheData Source Tagged Offset MAY start at an arbitrary offset.

The Data Source STag and Data Source Tagged Offset fieldsdescribe the buffer from which the RDMA Read data is read.

See Section 9.2 Errors Detected at the Remote Peer on Incoming RDMAMessages for a description of error checking required uponprocessing of an RDMA Read Request at the Data Source.

6.5 RDMA Read Response Header

The RDMA Read Response Message does not include an RDMAP header. TheRDMAP layer passes to the DDP layer an RDMAP Control Field. The RDMARead Response Message is fully described by the DDP Headers of theDDP Segments associated with the Message.

See Section 12 Appendix for a description of the DDP Segment formatassociated with RDMA Read Response Messages.

6.6 Send Header and Send with Solicited Event Header

The Send and Send with Solicited Event Message do not include anRDMAP header. The RDMAP layer passes to the DDP layer an RDMAPControl Field. The Send and Send with Solicited Event Message arefully described by the DDP Headers of the DDP Segments associatedwith the Message.

See Section 12 Appendix for a description of the DDP Segment formatassociated with Send and Send with Solicited Event Messages.

Page 26: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 26]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

6.7 Send with Invalidate Header and Send with SE and Invalidate Header

The Send with Invalidate and Send with Solicited Event andInvalidate Message do not include an RDMAP header. The RDMAP layerpasses to the DDP layer an RDMAP Control Field and the InvalidateSTag field (see section 6.1 RDMAP Control and Invalidate STagField). The Send with Invalidate and Send with Solicited Event andInvalidate Message are fully described by the DDP Headers of the DDPSegments associated with the Message.

See Section 12 Appendix for a description of the DDP Segment formatassociated with Send and Send with Solicited Event Messages.

6.8 Terminate Header

The Terminate Message carries a Terminate Header that containsadditional information associated with the cause of the Terminate.The Terminate Header immediately follows the DDP header. The RDMAPlayer passes to the DDP layer an RDMAP Control Field. The followingfigure depicts a Terminate Header that MUST be used for theTerminate Message:

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Terminate Control | Reserved |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP Segment Length (if any) | |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| |// //| Terminated DDP Header (if any) |+ +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |// //| Terminated RDMA Header (if any) |+ +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 7 Terminate Header Format

Terminate Control: 19 bits.

The Terminate Control field MUST have the format defined inFigure 8 Terminate Control Field.

Page 27: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 27]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Layer | EType | Error Code |HdrCt|+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 8 Terminate Control Field

* Figure 9 Terminate Control Field Values defines the validvalues that MUST be used for this field.

* Layer: 4 bits.

Identifies the layer that encountered the error.

* EType (RDMA Error Type): 4 bits.

Identifies the type of error that caused the Terminate.When the error is detected at the RDMAP Layer, the RDMAPLayer inserts the Error Type into this field. When theerror is detected at a LLP layer, a LLP layer createsthe Error Type and the DDP layer passes it up to theRDMAP Layer, and the RDMAP Layer inserts it into thisfield.

* Error Code: 8 bits.

This field identifies the specific error that caused theTerminate. When the error is detected at the RDMAPLayer, the RDMAP Layer creates the Error Code. When theerror is detected at a LLP layer, a LLP layer createsthe Error Code and the DDP layer passes it up to theRDMAP Layer, and the RDMAP Layer inserts it into thisfield.

* HdrCt: 3 bits.

Header control bits:

* M: 1 bit. DDP Segment Length valid. See Figure 10for when this bit SHOULD be set.

* D: 1 bit. DDP Header Included. See Figure 10 forwhen this bit SHOULD be set.

* R: 1 bit. RDMAP Header Included. See Figure 10 forwhen this bit SHOULD be set.

Page 28: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 28]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

-------+----------+-------+-------------+------+--------------------Layer | Layer | Error | Error Type | Error| Error Code Name

| Name | Type | Name | Code |-------+----------+-------+-------------+------+--------------------

| | 0000b | Local | None | None| | | Catastrophic| || | | Error | || +-------+-------------+------+--------------------| | | | 00X | Invalid STag| | | +------+--------------------| | | | 01X | Base or bounds| | | | | violation| | | Remote +------+--------------------| | 0001b | Protection | 02X | Access rights| | | Error | | violation| | | +------+--------------------

0000b | RDMA | | | 03X | STag not associated| | | | | with RDMAP Stream| | | +------+--------------------| | | | 04X | TO wrap| +-------+-------------+------+--------------------| | | | 05X | Invalid RDMAP| | | | | version| | | +------+--------------------| | | | 06X | Unexpected OpCode| | | Remote +------+--------------------| | 0010b | Operation | 07X | Catastrophic error,| | | Error | | localized to RDMAP| | | | | Stream| | | +------+--------------------| | | | 08X | Catastrophic error,| | | | | global

-------+----------+-------+-------------+------+--------------------0001b | DDP | See DDP Specification [DDP] for a description of

| | the values and names.-------+----------+-------+-----------------------------------------0010b | MPA | See MPA Specification [MPA] for a description of

| | the values and names.-------+----------+-------+-----------------------------------------

Figure 9 Terminate Control Field Values

Reserved: 8 bits. This field MUST be set to zero on transmit,ignored on receive.

DDP Segment Length: 16 bits

The length handed up by the DDP Layer when the error wasdetected. It MUST be valid if the M bit is set. It MUST bepresent when the D bit is set.

Page 29: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 29]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

Terminated DDP Header: 112 bits for Tagged Messages and 144 bits forUntagged Messages.

The DDP Header of the incoming Message that is associated withthe Terminate. The DDP Header is not present if the TerminateError Type is a Local Catastrophic Error. It MUST be present ifthe D bit is set.

Terminated RDMA Header: 224 bits.

The Terminated RDMA Header is only sent back if the terminateis associated with an RDMA Read Request Message. It MUST bepresent if the R bit is set.

If the terminate occurs before the first RDMA Read Request byteis processed, the original RDMA Read Request Header is sentback.

If the terminate occurs after the first RDMA Read Request byteis processed, the RDMA Read Request Header is updated toreflect the current location of the RDMA Read operation that isin process:

* Data Sink STag = Data Sink STag originally sent in theRDMA Read Request.

* Data Sink Tagged Offset = Current offset into the DataSink Tagged Buffer. For example if the RDMA Read Requestwas terminated after 2048 octets were sent, then theData Sink Tagged Offset = the original Data Sink TaggedOffset + 2048.

* Data Message size = Number of bytes left to transfer.

* Data Source STag = Data Source STag in the RDMA ReadRequest.

* Data Source Tagged Offset = Current offset into the DataSource Tagged Buffer. For example if the RDMA ReadRequest was terminated after 2048 octets were sent, thenthe Data Source Tagged Offset = the original Data SourceTagged Offset + 2048.

Figure 10 Error Type to RDMA Message Mapping maps layer name anderror types to each RDMA Message type:

Page 30: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 30]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

---------+-------------+------------+------------+-----------------Layer | Error Type | Terminate | Terminate | What type ofName | Name | Includes | Includes | RDMA Message can

| | DDP Header | RDMA Header| cause the error| | and DDP | || | Segment | || | Length | |

---------+-------------+------------+------------+-----------------| Local | No | No | Any| Catastrophic| | || Error | | |+-------------+------------+------------+-----------------| Remote | Yes, if | Yes | Only RDMA Read

RDMA | Protection | possible | | Request, Send| Error | | | with Invalidate,| | | | and Send with SE| | | | and Invalidate+-------------+------------+------------+-----------------| Remote | Yes, if | No | Any| Operation | possible | || Error | | |

---------+-------------+------------+------------+-----------------DDP | See DDP Spec| Yes | No | Any

| [DDP] | | |---------+-------------+------------+------------+-----------------MPA | See MPA Spec| No | No | Any

| [MPA] | | |Figure 10 Error Type to RDMA Message Mapping

Page 31: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 31]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

7 Data Transfer

7.1 RDMA Write Message

An RDMA Write is used by the Data Source to transfer data to apreviously Advertised Tagged Buffer at the Data Sink. The RDMA WriteMessage has the following semantics:

* AN RDMA Write Message MUST reference a Tagged Buffer. That is,the Data Source RDMAP Layer MUST request that the DDP layer markthe Message as Tagged.

* A valid RDMA Write Message MUST NOT be delivered to the DataSink's ULP (i.e. it is placed by the DDP layer).

* At the Remote Peer, when an invalid RDMA Write Message isdelivered to the Remote Peer's RDMAP Layer, an error is surfaced(see section 9.1 RDMAP Error Surfacing).

* The Tagged Offset of a Tagged Buffer MAY start at a non-zerovalue.

* AN RDMA Write Message MAY target all or part of a previouslyAdvertised buffer.

* The RDMAP does not define how the buffer(s) used by an outboundRDMA Write is defined and how it is addressed. For example, animplementation of RDMA may choose to allow a gather-list of non-contiguous data blocks to be the source of an RDMA Write. In thiscase, the data blocks would be combined by the Data Source andsent as a single RDMA Write Message to the Data Sink.

* The Data Source RDMAP Layer MUST issue RDMA Write Messages to theDDP layer in the order they were submitted by the ULP.

* At the Data Source, a subsequent Send (Send with Invalidate, Sendwith Solicited Event, or Send with Solicited Event andInvalidate) Message MAY be used to signal Delivery of previousRDMA Write Messages to the Data Sink, if desired by the ULP.

* If the Local Peer wishes to write to multiple Tagged Buffers onthe Remote Peer, the Local Peer MUST use multiple RDMA WriteMessages. That is, a single RDMA Write Message can only write toone remote Tagged Buffer.

* The Data Source MAY issue a zero length RDMA Write Message.

Page 32: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 32]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

7.2 RDMA Read Operation

The RDMA Read operation MUST consist of a single RDMA Read RequestMessage and a single RDMA Read Response Message.

7.2.1 RDMA Read Request Message

An RDMA Read Request is used by the Data Sink to transfer data froma previously Advertised Tagged Buffer at the Data Source to a TaggedBuffer at the Data Sink. The RDMA Read Request Message has thefollowing semantics:

* AN RDMA Read Request Message MUST reference an Untagged Buffer.That is, the Local Peer's RDMAP Layer MUST request that the DDPmark the Message as Untagged.

* One RDMA Read Request Message MUST consume one Untagged Buffer.

* The Remote Peer's RDMAP Layer MUST process an RDMA Read RequestMessage. A valid RDMA Read Request Message MUST NOT be deliveredto the Data Sink's ULP (i.e. it is processed by the RDMAP layer).

* At the Remote Peer, when an invalid RDMA Read Request Message isdelivered to the Remote Peer's RDMAP Layer, an error is surfaced(see section 9.1 RDMAP Error Surfacing).

* AN RDMA Read Request Message MUST reference the RDMA Read RequestQueue. That is, the Local Peer's RDMAP Layer MUST request thatthe DDP layer set the Queue Number field to one.

* The Local Peer MUST pass to the DDP Layer RDMA Read RequestMessages in the order they were submitted by the ULP.

* The Remote Peer MUST process the RDMA Read Request Messages inthe order they were sent.

* If the Local Peer wishes to read from multiple Tagged Buffers onthe Remote Peer, the Local Peer MUST use multiple RDMA ReadRequest Messages. That is, a single RDMA Read Request MessageMUST only read from one remote Tagged Buffer.

* AN RDMA Read Request Message MAY target all or part of apreviously Advertised buffer.

* If the Data Source receives a valid RDMA Read Request Message itMUST respond with a valid RDMA Read Response Message.

* The Data Sink MAY issue a zero length RDMA Read Request Message,by setting the RDMA Read Message Size field to zero in the RDMARead Request Header.

Page 33: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 33]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

* If the Data Source receives a non-zero length RDMA Read MessageSize, the Data Source RDMAP MUST validate the Data Source STagand Data Source Tagged Offset contained in the RDMA Read RequestHeader.

* If the Data Source receives an RDMA Read Request Header with theRDMA Read Message Size set to zero, the Data Source RDMAP:

* MUST NOT validate the Data Source STag and Data SourceTagged Offset contained in the RDMA Read Request Header, and

* MUST respond with a zero length RDMA Read Response Message.

7.2.2 RDMA Read Response Message

The RDMA Read Response Message uses the DDP Tagged Buffer Model toDeliver the contents of a previously requested Data Source TaggedBuffer to the Data Sink, without any involvement from the ULP at theRemote Peer. The RDMA Read Response Message has the followingsemantics:

* The RDMA Read Response Message for the associated RDMA ReadRequest Message travels in the opposite direction.

* An RDMA Read Response Message MUST reference a Tagged Buffer.That is, the Data Source RDMAP Layer MUST request that the DDPmark the Message as Tagged.

* The Data Source MUST ensure that a sufficient number of UntaggedBuffers are available on the RDMA Read Request Queue (Queue withDDP Queue Number 1) to support the maximum number of RDMA ReadRequests negotiated by the ULP.

* The RDMAP Layer MUST Deliver the RDMA Read Response Message tothe ULP.

* At the Remote Peer, when an invalid RDMA Read Response Message isdelivered to the Remote Peer's RDMAP Layer, an error is surfaced(see section 9.1 RDMAP Error Surfacing).

* The Tagged Offset of a Tagged Buffer MAY start at a non-zerovalue.

* The Data Source RDMAP Layer MUST pass RDMA Read Response Messagesto the DDP layer in the order that the RDMA Read Request Messageswere received by the RDMAP Layer at the Data Source.

* The Data Sink MAY validate that the STag, Tagged Offset, andlength of the RDMA Read Response Message are the same as the

Page 34: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 34]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

STag, Tagged Offset, and length included in the correspondingRDMA Read Request Message.

* A single RDMA Read Response Message MUST write to one remoteTagged Buffer. If the Data Sink wishes to Read multiple TaggedBuffers, the Data Sink can use multiple RDMA Read RequestMessages.

7.3 Send Message Type

The Send Message Type uses the DDP Untagged Buffer Model to transferdata from the Data Source into an Untagged Buffer at the Data Sink.

* A Send Message Type MUST reference an Untagged Buffer. That is,the Local Peer's RDMAP Layer MUST request that the DDP layer markthe Message as Untagged.

* One Send Message Type MUST consume one Untagged Buffer.

* The ULP Message sent using a Send Message Type MAY be lessthan or equal to the size of the consumed Untagged Buffer.The RDMAP Layer communicates to the ULP the size of the datawritten into the Untagged Buffer.

* If the ULP Message sent via Send Message Type is larger thanthe Data Sink's Untagged Buffer, it is an error (see section9.1 RDMAP Error Surfacing).

* At the Remote Peer, the Send Message Type MUST be Delivered tothe Remote Peer's ULP in the order they were sent.

* After the Send with Solicited Event or Send with Solicited Eventand Invalidate Message is Delivered to the ULP, the RDMAP MAYgenerate an Event, if the Data Sink is configured to generatesuch an Event.

* At the Remote Peer, when an invalid Send Message Type isDelivered to the Remote Peer's RDMAP Layer, an error is surfaced(see section 9.1 RDMAP Error Surfacing).

* The RDMAP does not define how the buffer(s) used by an outboundSend Message Type is defined and how it is addressed. Forexample, an implementation of RDMA may choose to allow a gather-list of non-contiguous data blocks to be the source of a SendMessage Type. In this case, the data blocks would be combined bythe Data Source and sent as a single Send Message Type to theData Sink.

* For a Send Message Type, the Local Peer's RDMAP Layer MUSTrequest that the DDP layer set the Queue Number field to zero.

Page 35: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 35]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

* The Local Peer MUST issue Send Message Type Messages in the orderthey were submitted by the ULP.

* The Data Source MAY pass a zero length Send Message Type. A zerolength Send Message Type MUST consume an Untagged Buffer at theData Sink.A Send with Invalidate or Send with Solicited Event andInvalidate Message MUST reference an STag. That is, the LocalPeer's RDMAP Layer MUST pass the RDMA control field and the STagthat will be Invalidated to the DDP layer.

* When the Send with Invalidate and Send with Solicited Event andInvalidate Message are Delivered to the Remote Peer's RDMAPLayer, the RDMAP Layer MUST:

* Verify the STag that is associated with the RDMAP Stream;and

* Invalidate the STag if it is associated with the RDMAPStream; or Issue a Terminate Message if the STag is notassociated with the RDMAP Stream.

7.4 Terminate Message

The Terminate Message uses the DDP Untagged Buffer Model to transfererror related information from the Data Source into an UntaggedBuffer at the Data Sink and then ceases all further communicationson the underlying DDP Stream. The Terminate Message has thefollowing semantics:

* A Terminate Message MUST reference an Untagged Buffer. That is,the Local Peer's RDMAP Layer MUST request that the DDP layer markthe Message as Untagged.

* A Terminate Message references the Terminate Queue. That is, theLocal Peer's RDMAP Layer MUST request that the DDP layer set theQueue Number field to two.

* One Terminate Message MUST consume one Untagged Buffer.

* On a single RDMAP Stream, the RDMAP layer MUST guaranteeplacement of a single Terminate Message.

* A Terminate Message MUST be Delivered to the Remote Peer's RDMAPLayer. The RDMAP Layer MUST Deliver the Terminate Message to theULP.

* At the Remote Peer, when an invalid Terminate Message isdelivered to the Remote Peer's RDMAP Layer, an error is surfaced(see section 9.1 RDMAP Error Surfacing).

Page 36: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 36]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

* The RDMAP Layer Completes in error all ULP Operations that havenot been provided to the DDP layer.

* After sending a Terminate Message on an RDMAP Stream, the LocalPeer MUST NOT send any more Messages on that specific RDMAPStream.

* After receiving a Terminate Message on an RDMAP Stream, theRemote Peer MAY stop sending Messages on that specific RDMAPStream.

7.5 Ordering and Completions

It is important to understand the difference between Placement andDelivery ordering since RDMAP provides quite different semantics forthe two.

Note that many current protocols, both as used in the Internet andelsewhere, assume that data is both Placed and Delivered in order.This allowed applications to take a variety of shortcuts by takingadvantage of this fact. For RDMAP, many of these shortcuts are nolonger safe to use, and could cause application failure.

The following rules apply to implementations of the RDMAP protocol.Note, in these rules Send includes Send, Send with Invalidate, Sendwith Solicited Event, and Send with Solicited Event and Invalidate:

1. RDMAP does not provide ordering among Messages on differentRDMAP Streams.

2. RDMAP does not provide ordering between operations that aregenerated from the two ends of an RDMAP Stream.

3. RDMA Messages that use Tagged and Untagged Buffers MAY be Placedin any order. If an application uses overlapping buffers(points different Messages or portions of a single Message atthe same buffer), then it is possible that the last incomingwrite to the Data Sink buffer will not be the last outgoing datasent from the Data Source.

4. For a Send operation, the contents of an Untagged Buffer at theData Sink MAY be indeterminate until the Send is Delivered tothe ULP at the Data Sink.

5. For an RDMA Write operation, the contents of the Tagged Bufferat the Data Sink MAY be indeterminate until a subsequent Send isDelivered to the ULP at the Data Sink.

Page 37: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 37]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

6. For an RDMA Read operation, the contents of the Tagged Buffer atthe Data Sink MAY be indeterminate until the RDMA Read ResponseMessage has been Delivered at the Local Peer.

Statements 4, 5, and 6 imply "no peeking" at the data to see ifit is done. It is possible for some data to arrive beforelogically earlier data does, and peeking may causeunpredictable application failure

7. If the ULP or Application modifies the contents of Tagged orUntagged Buffers being modified by an RDMA Operation while theRDMAP is processing the RDMA Operation, the state of the Buffersis indeterminate.

8. If the ULP or Application modifies the contents of Tagged orUntagged Buffers read by an RDMA Operation while the RDMAP isprocessing the RDMA Operation, the results of the read areindeterminate.

9. The Completion of an RDMA Write or Send Operation at the LocalPeer does not guarantee that the ULP Message has yet reached theRemote Peer ULP Buffer or been examined by the Remote ULP.

10. Send Messages MUST be Delivered to the ULP at the Remote Peerafter they are Delivered to RDMAP by DDP and in the order thatthe they were Delivered to RDMAP.

Note that DDP ordering rules ensure that this will be the sameorder that they were submitted at the Local Peer and that anyprior RDMA Writes have been submitted for ordered Placement atthe Remote Peer. This means that when the ULP sees the Deliveryof the Send, the memory buffers targeted by any preceding RDMAWrites and Sends are available to be accessed locally orremotely as authorized. If the ULP overlaps its buffers fordifferent operations, the data from the RDMA Write or Send maybe overwritten by subsequent RDMA Operations before the ULPreceives and processes the Delivery.

11. RDMA Read Response Messages MUST be Delivered to the ULP at theRemote Peer after they are Delivered to RDMAP by DDP and in theorder that the they were Delivered to RDMAP.

DDP ordering rules ensure that this will be the same order thatthey were submitted at the Local Peer. This means that when theULP sees the Delivery of the RDMA Read Response, the memorybuffers targeted by the RDMA Read Response are available to beaccessed locally or remotely as authorized. If the ULP overlapsits buffers for different operations, the data from the RDMARead Response may be overwritten by subsequent RDMA Operationsbefore the ULP receives and processes the Delivery.

Page 38: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 38]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

12. RDMA Read Request Messages, including zero-length RDMA ReadRequests, MUST NOT start processing at the Remote Peer untilthey have been Delivered to RDMAP by DDP.

Note the ULP is assured that data written can be read back. Ifan RDMA Read Request is issued by the local peer, targeting thesame ULP Buffer as a preceding Send or RDMA Write (in the samedirection as the RDMA Read Request), the remote peer will sendback the data written by the Send or RDMA Write. Thus assuringthat subsequent local or remote accesses to the ULP Buffercontain the data written by the Send or RDMA Write. Note: thisstatement requires that the buffers are not arranged such thatthe data is over-written either by the incoming data, or otherapplications.

RDMA Read Response Messages MAY be generated at the Remote Peerafter subsequent RDMA Write Messages or Send Messages have beenPlaced or Delivered. Therefore, when an application does an RDMARead Request followed by an RDMA Write (or Send) to the samebuffer, it may get the data from the later RDMA Write (or Send)in the RDMA Read Response Message, even though the operationscompleted in order at the Local Peer. If this behavior is notdesired, the Local Peer ULP must Fence the later RDMA write (orSend) by withholding the RDMA Write Message until alloutstanding RDMA Read Responses have been Delivered.

13. The RDMAP Layer MUST submit RDMA Messages to the DDP layer inthe order the RDMA Operations are submitted to the RDMAP Layerby the ULP.

14. A Send or RDMA Write Message MUST NOT be considered Complete atthe Local Peer (Data Source) until it has been successfullycompleted at the DDP layer.

15. RDMA Operations MUST be Completed at the Local Peer in the orderthat they were submitted by the ULP.

16. At the Data Sink, an incoming Send Message MUST be Delivered tothe ULP only after the DDP Message has been Delivered to theRDMAP Layer by the DDP layer.

17. RDMA Read Response Message processing at the Remote Peer(reading the specified Tagged Buffer) MUST be started only afterthe RDMA Read Request Message has been Delivered by the DDPlayer (thus all previous RDMA Messages have been properlysubmitted for ordered Placement).

Page 39: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 39]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

18. Send Messages MAY be Completed at the Remote Peer (Data Sink)before prior incoming RDMA Read Request Messages have completedtheir response processing.

19. An RDMA Read operation MUST NOT be Completed at the Local Peeruntil the DDP layer Delivers the associated incoming RDMA ReadResponse Message.

20. If more than one outstanding RDMA Read Request Message issupported by both peers, the RDMA Read Response Messages MUST besubmitted to the DDP layer on the Remote Peer in the order theRDMA Read Request Messages were Delivered by DDP, but the actualread of the buffer contents MAY take place in any order at theRemote Peer.

This simplifies Local Peer Completion processing for RDMA Readsin that a Delivered RDMA Read Response MUST be sufficient toComplete the RDMA Read Operation.

Page 40: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 40]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

8 RDMAP Stream Management

RDMAP Stream management consists of RDMAP Stream Initialization andRDMAP Stream Termination.

8.1 Stream Initialization

RDMAP Stream initialization occurs after the LLP Stream has beencreated (e.g. for DDP/MPA over TCP the first TCP Segment after theSYN, SYN/ACK exchange). The ULP is responsible for transitioning theLLP Stream into RDMA enabled mode. The switch to RDMA mode canhappen immediately at LLP Stream initialization or at any timethereafter. Once in RDMA enabled mode, an implementation MUST sendonly RDMA Messages across the transport Stream until the RDMAPStream is torn down.

For each direction of an RDMAP Stream:

* For a given RDMAP Stream, the number of outstanding RDMA ReadRequests is limited per RDMAP Stream direction.

* It is the ULP's responsibility to set the maximum number ofoutstanding, inbound RDMA Read Requests per RDMAP Streamdirection.

* The RDMAP Layer MUST provide the maximum number of outstanding,inbound RDMA Read Requests per RDMAP Stream direction that werenegotiated between the ULP and the Local Peer's RDMAP Layer. Thenegotiation mechanism is outside the scope of this specification.

* It is the ULP's responsibility to set the maximum number ofoutstanding, outbound RDMA Read Requests per RDMAP Streamdirection.

* The RDMAP Layer MUST provide the maximum number of outstanding,outbound RDMA Read Requests for the RDMAP Stream direction thatwere negotiated between the ULP and the Local Peer's RDMAP Layer.The negotiation mechanism is outside the scope of thisspecification.

* The Local Peer's ULP is responsible for negotiating with theRemote Peer's ULP the maximum number of outstanding RDMA ReadRequests for the RDMAP Stream direction. It is recommended thatthe ULP set the maximum number of outstanding, inbound RDMA ReadRequests equal to the maximum number of outstanding, outboundRDMA Read Requests for a given RDMAP Stream direction.

* For outbound RDMA Read Requests, the RDMAP Layer MUST NOT exceedthe maximum number of outstanding, outbound RDMA Read Requests

Page 41: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 41]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

that were negotiated between the ULP and the Local Peer's RDMAPLayer.

* For inbound RDMA Read Requests, the RDMAP Layer MUST NOT exceedthe maximum number of outstanding, inbound RDMA Read Requeststhat were negotiated between the ULP and the Local Peer's RDMAPLayer.

8.2 Stream Teardown

There are three methods for terminating an RDMAP Stream: ULPGraceful Termination, RDMAP Abortive Termination, and LLP AbortiveTermination.

The ULP is responsible for performing ULP Graceful Termination.After a ULP Graceful Termination, either side of the Stream caninitiate LLP Graceful Termination, using the graceful terminationmechanism provided by the LLP.

RDMAP Abortive Termination allows the RDMAP to issue a TerminateMessage describing the reason the RDMAP Stream was terminated. Thenext section (8.2.1 RDMAP Abortive Termination) describes the RDMAPAbortive Termination in detail.

LLP Abortive Termination results due to a LLP error and causes theRDMAP Stream to be torn down midstream, without an RDMAP TerminateMessage. While this last method is highly undesirable, it ispossible and the ULP should take this into consideration.

8.2.1 RDMAP Abortive Termination

RDMAP defines a Terminate operation that SHOULD be invoked wheneither an RDMAP error is encountered or a LLP error is surfaced tothe RDMAP layer by the LLP.

It is not always possible to send the Terminate Message. Forexample, certain LLP errors may occur that cause the LLP Stream tobe torn down before a) RDMAP is aware of the error, b) before RDMAPis able to send the Terminate Message, or c) after RDMAP has postedthe Terminate Message to the LLP, but it has not yet beentransmitted by the LLP.

Note that an RDMAP Abortive Termination may entail loss of data. Ingeneral, when a Terminate Message is received it is impossible totell for sure what unacknowledged RDMA Messages were Completedsuccessfully at the Remote Peer. Thus the state of all outstandingRDMA Messages is indeterminate and the Messages SHOULD be consideredCompleted in error.

Page 42: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 42]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

When a peer sends or receives a Terminate Message, it MAYimmediately teardown the LLP Stream. The peer SHOULD perform agraceful LLP teardown to ensure the Terminate Message issuccessfully Delivered.

See section 6.8 Terminate Header for a description of the TerminateMessage and its contents. See section 7.4 Terminate Message for adescription of the Terminate Message semantics.

Page 43: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 43]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

9 RDMAP Error Management

The RDMAP protocol does not have RDMAP or DDP layer error recoveryoperations built in. If everything is working, the LLP guaranteeswill ensure that the Messages are arriving at the destination.

If errors are detected at the RDMAP or DDP layer, then the RDMAP,DDP and LLP Streams are Abortively Terminated (see section 6.8Terminate Header on page 26).

In general poor implementations or improper ULP programming causesthe errors detected at the RDMAP and DDP layers. In these cases,returning a diagnostic termination error Message and closing theRDMAP Stream is far simpler than attempting to maintain the RDMAPStream, particularly when the cause of the error is not known.

If an LLP does not support teardown of a Stream independent of otherStreams and an RDMAP error results in the Termination of a specificStream, then the LLP MUST label the Stream as an erroneous Streamand MUST NOT allow any further data transfer on that Stream afterRDMAP requests the Stream to be torn down.

For a specific LLP connection, when all Streams are eithergracefully torn down or are labeled as erroneous Streams, the LLPconnection MUST be torn down.

Since errors are detected at the Remote Peer (possibly long) afterRDMA Messages are passed to DDP and the LLP at the Local Peer andCompleted, the sender cannot easily determine which of its Messageshave been received. (RDMA Reads are an exception to this rule).

For a list of errors returned to the Remote Peer as a result of anAbortive Termination, see section 6.8 Terminate Header on page 26.

9.1 RDMAP Error Surfacing

If an error occurs at the Local Peer, the RDMAP layer MUST attemptto inform the local ULP that the error has occurred.

The Local Peer MUST send a Terminate Message for each of thefollowing cases:

1. For Errors detected while creating RDMA Write, Send, Send withInvalidate, Send with Solicited Event, Send with Solicited Eventand Invalidate, or RDMA Read Requests, or other reasons notdirectly associated with an incoming Message, the TerminateMessage and Error code are sent instead of the request. In thiscase, the Error Type and Error Code fields are included in theTerminate Message, but the Terminated DDP Header and TerminatedRDMA Header fields are set to zero.

Page 44: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 44]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

2. For errors detected on an incoming RDMA Write, Send, Send withInvalidate, Send with Solicited Event, Send with Solicited Eventand Invalidate, or Read Response Message (after the Message hasbeen Delivered by DDP), the Terminate Message is sent at theearliest possible opportunity, preferably in the next outgoingRDMA Message. In this case, the Error Type, Error Code, ULP PDULength, and Terminated DDP Header fields are included in theTerminate Message, but the Terminated RDMA Header field is setto zero.

3. For errors detected on an incoming RDMA Read Request Message(after the Message has been Delivered by DDP), the TerminateMessage is sent at the earliest possible opportunity, preferablyin the next outgoing RDMA Message. In this case, the Error Type,Error Code, ULP PDU Length, Terminated DDP Header, andTerminated RDMA Header fields are included in the TerminateMessage.

4. If more than one error is detected on incoming RDMA Messages,before the Terminate Message can be sent, then the first RDMAMessage (and its associated DDP Segment) that experienced anerror MUST be captured by the Terminate Message in accordancewith rules 2 and 3 above.

9.2 Errors Detected at the Remote Peer on Incoming RDMA Messages

On incoming RDMA Writes, RDMA Read Response, Sends, Send withInvalidate, Send with Solicited Event, Send with Solicited Event andInvalidate, and Terminate Messages, the following must be validated:

1. The DDP Layer MUST validate all DDP Segment fields.

2. The RDMA OpCode MUST be valid.

3. The RDMA Version MUST be valid.

Additionally, on incoming Send with Invalidate and Send withSolicited Event and Invalidate Messages, the following must alsobe validated:

4. The Invalidate STag MUST be valid.

5. The STag MUST be associated to this RDMAP Stream.

On incoming RDMA Request Messages, the following must be validated:

1. The DDP Layer MUST validate all Untagged DDP Segment fields.

2. The RDMA OpCode MUST be valid.

Page 45: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 45]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

3. The RDMA Version MUST be valid.

4. For non-zero length RDMA Read Request Messages:

a. The Data Source STag MUST be valid.

b. The Data Source STag MUST be associated to this RDMAPStream.

c. The Data Source Tagged Offset MUST fall in the range oflegal offsets associated with the Data Source STag.

d. The sum of the Data Source Tagged Offset and the RDMA ReadMessage Size MUST fall in the range of legal offsetsassociated with the Data Source STag.

e. The sum of the Data Source Tagged Offset and the RDMA ReadMessage Size MUST NOT cause the Data Source Tagged Offset towrap.

Page 46: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 46]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

10 Security Considerations

This section discusses both protocol-specific considerations and theimplications of using RDMAP with existing security mechanisms.

10.1 Protocol-specific Security Considerations

The vulnerabilities of RDMAP to active third-party interference areno greater than any other protocol running over TCP. A third party,by injecting spoofed packets into the network that are Delivered toan RDMAP Data Sink, could launch a variety of attacks that exploitRDMAP-specific behavior. Since RDMAP directly or indirectly exposesmemory addresses on the wire, the Placement information carried ineach RDMA Message must be validated, including access rights andoctet level granularity base and bounds check, before any data isPlaced. For example, a third-party adversary could inject randompackets that appear to be valid RDMA Messages and corrupt the memoryon an RDMAP Data Sink. Since RDMAP is IP transport protocolindependent, communication security mechanisms such as IPsec [IPSEC]or TLS [TLS] may be used to prevent such attacks.

10.2 Using IPSec with RDMAP

IPsec can be used to protect against the packet injection attacksoutlined above. Because IPsec is designed to secure arbitrary IPpacket streams, including streams where packets are lost, RDMAP canrun on top of IPsec without any change. IPsec packets are processed(e.g., integrity checked and possibly decrypted) in the order theyare received, and an RDMAP Data Sink will process the decrypted RDMAMessages contained in these packets in the same manner as RDMAMessages contained in unsecured IP packets.

10.3 Other Security Considerations

RDMAP has several mechanisms that deal with a number of attacks.These include, but are not limited to:

1. Connection to/from an unauthorized or unauthenticated endpoint.

2. Highjacking of an RDMAP Stream.

3. Attempts to read or write from unauthorized memory regions.

4. Injection of RDMA Messages within a Stream on a multi-useroperating system by another application.

RDMAP relies on the LLP to establish the LLP Stream over which RDMAMessages will be carried. RDMAP itself does nothing to authenticatethe validity of the LLP Stream of either of the endpoints. It is

Page 47: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 47]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

the responsibility of the ULP to validate the LLP Stream. This ishighly desirable due to the nature of RDMA and DDP.

Hijacking of an RDMAP channel would require that the underlying LLPconnection be hijacked. This would require knowledge of Advertisedbuffers in order to directly Place data into a user buffer and istherefore constrained by the same techniques mentioned to guardagainst attempts to read or write from unauthorized memory regions.

RDMAP does not require a host to open its buffers to arbitraryattacks over the RDMAP Stream. It may access ULP memory only to theextent that the ULP has enabled and authorized it to do so. The STagaccess control model is defined by the RNIC Verbs Specification[VERBS]. Specific security operations include:

1. STags are only valid over the exact byte range established bythe ULP. RDMAP MUST provide a mechanism for the ULP to establishand revoke the TO range associated with the ULP Bufferreferenced by an STag.

2. STags are only valid for the duration established by the ULP.The ULP may revoke them at any time, in accordance with its ownupper layer protocol requirements. RDMAP MUST provide amechanism for the ULP to establish and revoke STag validity.

3. STags are only enabled for read and/or write access by explicitULP action. RDMAP MUST provide a mechanism for the ULP toestablish and revoke read, write, or read and write access tothe ULP Buffer referenced by an STag.

4. The implementation is free to choose the value of STags and isencouraged to sparsely populate them over the full rangeavailable. This is admittedly weak security protection against adeliberate attack, but does minimize the risk of accidentalmatches when an incorrect STag is used due to a ULP softwareerror.

5. RDMAP allows and encourages local interactions to restrict theusage of STags to specific Streams and/or user processes. RDMAPMUST provide a mechanism for associating a RDMAP Stream with aSTag.

6. A ULP may only expose memory to remote access to the extent thatit already had access to that memory itself.

7. RDMAP provides operations to allow the holder of an STag toindicate when it has made its last usage of that STag. Thisenables automatic deregistration and/or scope reduction of STagsas the implementation and ULP may see fit.

Page 48: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 48]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

8. If an STag is not valid on a connection, RDMAP provides amechanism for terminating the RDMAP Stream (see section 6.8Terminate Header).

9. An STag that is associated with an RDMAP Stream becomes invalidupon reception of a valid Send with Invalidate or Send withSolicited Event Message. RDMAP MUST invalidate the STag sent ina valid Send with Invalidate or Send with Solicited Event andInvalidate Message, before Completing the Send with Invalidateor Send with Solicited Event and Invalidate Message.

Further, RDMAP encourages direct Placement of incoming payloads inuser-mode ULP Buffers. This avoids the risks of prior solutions thatrelied upon exposing system buffers for incoming payloads.

There is also a clean data Delivery hand-off between RDMAP and theULP. This allows the ULP to implement additional security operationswithout restrictions or interference from RDMAP.

In summary, RDMAP enables both ULP and LLP security. It requiresthat all of its data access be enabled and authorized by the ULP. Itprovides no operations for the ULP to gain permissions not alreadygranted by the host operating system. It allows and encourages localinteractions to specify even more precise security checks on STagbinding and data transfer operations.

By remaining independent of ULP and LLP security protocols, RDMAPwill benefit from continuing improvements at those layers. Users areprovided flexibility to adapt to their specific securityrequirements and the ability to adapt to future security challenges.

Page 49: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 49]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

11 References

11.1 Normative References

[VERBS] Specification under development in the RDMA Consortium (seehttp://www.rdmaconsortium.org/).

[DDP] H. Shah et al., "Direct Data Placement over ReliableTransports", RDMA Consortium Draft Specification draft-shah-iwarp-ddp-v1.0, October 2002 (seehttp://www.rdmaconsortium.org/)

[MPA] P. Culley et al., "Markers with PDU Alignment", RDMAConsortium Draft Specification draft-culley-iwarp-mpa-v1.0,October 2002 (see http://www.rdmaconsortium.org/)

[SCTP] R. Stewart et al., "Stream Control Transmission Protocol",RFC 2960, October 2000.

[TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,September 1981.

11.2 Informative References

[RFC2401] Atkinson, R., Kent, S., "Security Architecture for theInternet Protocol", RFC 2401, November 1998.

[TLS] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", RFC2246, November 1998.

Page 50: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 50]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

12 Appendix

12.1 DDP Segment Formats for RDMA Messages

This appendix is for information only and is NOT part of thestandard. It simply depicts the DDP Segment format for the variousRDMA Messages.

12.1.1 DDP Segment for RDMA Write

The following figure depicts an RDMA Write, DDP Segment:

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP Control | RDMA Control |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Sink STag |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Sink Tagged Offset |+ +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| RDMA Write ULP Payload |// //| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 11 RDMA Write, DDP Segment format

12.1.2 DDP Segment for RDMA Read Request

The following figure depicts an RDMA Read Request, DDP Segment:

Page 51: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 51]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP Control | RDMA Control |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Reserved (Not Used) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP (RDMA Read Request) Queue Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP (RDMA Read Request) Message Sequence Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP (RDMA Read Request) Message Offset |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Sink STag (SinkSTag) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |+ Data Sink Tagged Offset (SinkTO) +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| RDMA Read Message Size (RDMARDSZ) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Source STag (SrcSTag) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |+ Data Source Tagged Offset (SrcTO) +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 12 RDMA Read Request, DDP Segment format

12.1.3 DDP Segment for RDMA Read Response

The following figure depicts an RDMA Read Response, DDP Segment:

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP Control | RDMA Control |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Sink STag |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data Sink Tagged Offset |+ +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| RDMA Read Response ULP Payload |// //| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 13 RDMA Read Response, DDP Segment format

Page 52: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 52]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

12.1.4 DDP Segment for Send and Send with Solicited Event

The following figure depicts a Send and Send with Solicited Request,DDP Segment:

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP Control | RDMA Control |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Reserved (Not Used) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| (Send) Queue Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| (Send) Message Sequence Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| (Send) Message Offset |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Send ULP Payload |// //| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Figure 14 Send and Send with Solicited Event, DDP Segment format

12.1.5 DDP Segment for Send with Invalidate and Send with SE andInvalidate

The following figure depicts a Send with invalidate and Send withSolicited and Invalidate Request, DDP Segment:

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP Control | RDMA Control |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Invalidate STag |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| (Send) Queue Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| (Send) Message Sequence Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| (Send) Message Offset |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Send ULP Payload |// //| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Figure 15 Send with Invalidate and Send with SE and Invalidate, DDP

Segment

Page 53: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 53]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

12.1.6 DDP Segment for Terminate

The following figure depicts a Terminate, DDP Segment:

0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP Control | RDMA Control |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Reserved (Not Used) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP (Terminate) Queue Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP (Terminate) Message Sequence Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP (Terminate) Message Offset |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Terminate Control | Reserved |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| DDP Segment Length (if any) | |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| |+ +| Terminated DDP Header (if any) |+ +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |// //| Terminated RDMA Header (if any) |+ +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 16 Terminate, DDP Segment format

12.2 Ordering and Completion Table

The following table summarizes the ordering relationships that aredefined in section 7.5 Ordering and Completions from the standpointof the local peer issuing the two Operations. Note, in the tablethat follows Send includes Send, Send with Invalidate, Send withSolicited Event, and Send with Solicited Event and Invalidate

------+-------+----------------+----------------+----------------First | Later | Placement | Placement | OrderingOp | Op | guarantee at | guarantee | guarantee at

| | Remote Peer | Local Peer | Remote Peer| | | |

------+-------+----------------+----------------+----------------Send | Send | No placement | Not applicable | Completed in

Page 54: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 54]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

| | guarantee. If | | order.| | guarantee is | || | necessary, see | || | footnote 1. | |

------+-------+----------------+----------------+----------------Send | RDMA | No placement | Not applicable | Not applicable

| Write | guarantee. If | || | guarantee is | || | necessary, see | || | footnote 1. | |

------+-------+----------------+----------------+----------------Send | RDMA | No placement | RDMA Read | RDMA Read

| Read | guarantee | Response | Response| | between Send | Payload will | Message will| | Payload and | not be placed | not be| | RDMA Read | at the local | generated until| | Request Header | peer until the | Send has been| | | Send Payload is| Completed| | | placed at the || | | remote peer |

------+-------+----------------+----------------+----------------RDMA | Send | No placement | Not applicable | Not applicableWrite | | guarantee. If | |

| | guarantee is | || | necessary, see | || | footnote 1. | |

------+-------+----------------+----------------+----------------RDMA | RDMA | No placement | Not applicable | Not applicableWrite | Write | guarantee. If | |

| | guarantee is | || | necessary, see | || | footnote 1. | |

------+-------+----------------+----------------+----------------RDMA | RDMA | No placement | RDMA Read | Not applicableWrite | Read | guarantee | Response |

| | between RDMA | Payload will || | Write Payload | not be placed || | and RDMA Read | at the local || | Request Header | peer until the || | | RDMA Write || | | Payload is || | | placed at the || | | remote peer |

------+-------+----------------+----------------+----------------RDMA | Send | No placement | Send Payload | Not applicableRead | | guarantee | may be placed |

| | between RDMA | at the remote || | Read Request | peer before the|| | Header and Send| RDMA Read || | payload | Response is |

Page 55: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 55]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

| | | generated. || | | If guarantee is|| | | necessary, see || | | footnote 2. |

------+-------+----------------+----------------+----------------RDMA | RDMA | No placement | RDMA Write | Not applicableRead | Write | guarantee | Payload may be |

| | between RDMA | placed at the || | Read Request | remote peer || | Header and RDMA| before the RDMA|| | Write payload | Read Response || | | is generated. || | | If guarantee is|| | | necessary, see || | | footnote 2. |

------+-------+----------------+----------------+----------------RDMA | RDMA | No placement | No placement | Second RDMARead | Read | guarantee of | guarantee of | Read Response

| | the two RDMA | the two RDMA | will not be| | Read Request | Read Response | generated until| | Headers | Payloads. | first RDMA Read| | Additionally, | | Response is| | there is no | | generated.| | guarantee that | || | the Tagged | || | Buffers | || | referenced in | || | the RDMA Read | || | will be read in| || | order | |

Figure 17 Operation Ordering

Footnote 1: If the guarantee is necessary, a ULP may insert an RDMARead Operation and wait for it to complete to act as a Fence.

Footnote 2: If the guarantee is necessary, a ULP may wait for theRDMA Read Operation to complete before performing the Send.

Page 56: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

[Page 56]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

13 Authors Addresses

Paul R. CulleyHewlett-Packard Company20555 SH 249Houston, Tx. USA 77070-2698Phone: 281-514-5543Email: [email protected]

Dave GarciaHewlett-Packard Company19333 Vallco ParkwayCupertino, Ca. USA 95014Phone: 408.285.6116Email: [email protected]

Jeff HillandHewlett-Packard Company20555 SH 249Houston, Tx. USA 77070-2698Phone: 281-514-9489Email: [email protected]

Renato J. RecioIBM Corp.11501 Burnett RoadAustin, Tx. USA 78758Phone: 512-838-3685Email: [email protected]

Page 57: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 57]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

14 Acknowledgments

Dwight BarronHewlett-Packard Company20555 SH 249Houston, Tx. USA 77070-2698Phone: 281-514-2769Email: [email protected]

John CarrierAdaptec, Inc.691 S. Milpitas Blvd.Milpitas, CA 95035 USAPhone: +1 (360) 378-8526Email: [email protected]

Ted ComptonEMC CorporationResearch Triangle Park, NC 27709, USAPhone: 919-248-6075Email: [email protected]

Uri ElzurBroadcom Corporation16215 Alton ParkwayIrvine, California 92619-7013 USAPhone: +1 (949) 585-6432Email: [email protected]

Hari GhadiaAdaptec, Inc.691 S. Milpitas Blvd.,Milpitas, CA 95035 USAPhone: +1 (408) 957-5608Email: [email protected]

Howard C. HerbertIntel CorporationMS CH7-4045000 West Chandler Blvd.Chandler, Arizona 85226Phone: 480-554-3116Email: [email protected]

Mike KoIBM650 Harry Rd.San Jose, CA 95120Phone: (408) 927-2085Email: [email protected]

Page 58: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 58]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

Mike KrauseHewlett-Packard Company43LN19410 Homestead RoadCupertino, CA 95014 USAPhone: 408-447-3191Email: [email protected]

Dave MinturnIntel CorporationMS JF1-2105200 North East Elam Young ParkwayHillsboro, Oregon 97124Phone: 503-712-4106Email: [email protected]

Mike PennaBroadcom Corporation16215 Alton ParkwayIrvine, California 92619-7013 USAPhone: +1 (949) 926-7149Email: [email protected]

Jim PinkertonMicrosoft, Inc.One Microsoft WayRedmond, WA, USA 98052Email: [email protected]

Hemal ShahIntel CorporationMS PTL11501 South Mopac Expressway, #400Austin, Texas 78746Phone: 512-732-3963Email: [email protected]

Allyn RomanowCisco Systems170 W Tasman DriveSan Jose, CA 95134 USAPhone: +1 408 525 8836Email: [email protected]

Tom TalpeyNetwork Appliance375 Totten Pond RoadWaltham, MA 02451 USAPhone: +1 (781) 768-5329EMail: [email protected]

Page 59: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 59]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

Patricia ThalerAgilent Technologies, Inc.1101 Creekside Ridge Drive, #100M/S-RG10Roseville, CA 95678Phone: +1-916-788-5662email: [email protected]

Jim WendtHewlett-Packard Company8000 Foothills Boulevard MS 5668Roseville, CA 95747-5668 USAPhone: +1 916 785 5198Email: [email protected]

Page 60: RDMA Protocol Specification 21 Oct · PDF filedraft-recio-iwarp-rdmap-v1.0 IBM Corporation P. Culley Hewlett-Packard Company D. Garcia Hewlett-Packard ... October 2002 An RDMA Protocol

RDMA Protocol Specification 21 Oct 2002

[Page 60]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

15 Full Copyright Statement

This document and the information contained herein is provided on an“AS IS” basis and ADAPTEC INC., AGILENT TECHNOLOGIES INC., BROADCOMCORPORATION, CISCO SYSTEMS INC., EMC CORPORATION, HEWLETT-PACKARDCOMPANY, INTERNATIONAL BUSINESS MACHINES CORPORATION, INTELCORPORATION, MICROSOFT CORPORATION, AND NETWORK APPLIANCE INC.DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOTLIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILLNOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITYOR FITNESS FOR A PARTICULAR PURPOSE.

Copyright (c) 2002 ADAPTEC INC., BROADCOM CORPORATION, CISCO SYSTEMSINC., EMC CORPORATION, HEWLETT-PACKARD COMPANY, INTERNATIONALBUSINESS MACHINES CORPORATION, INTEL CORPORATION, MICROSOFTCORPORATION, NETWORK APPLIANCE INC., All Rights Reserved.