iSCSI Notes

download iSCSI Notes

of 40

Transcript of iSCSI Notes

  • 7/31/2019 iSCSI Notes

    1/40

    iSCSIInternet Small Computer Systems Interface

    A brief document that helps developer to work qu

    Rajesh

    11th

    July

  • 7/31/2019 iSCSI Notes

    2/40

    iSCSI

    2

    TABLE OF CONTENTS

    TABLE OF CONTENTS ................................................................................................. 21. INTRODUCTION......................................................................................................... 4

    iSCSI..............................................................................................................................................4 TCP................................................................................................................................................4IP...................................................................................................................................................4 Keyissues.....................................................................................................................................4

    2. INTERNET PROTOCOL ............................................................................................ 6Operation.......................................................................................................................................6 ModelofOperation........................................................................................................................7 FunctionDescription......................................................................................................................7 InternetHeaderFormat...............................................................................................................14 Interfaces.....................................................................................................................................23

    3. TRANSPORT CONTROL PROTOCOL ................................................................. 24Operation.....................................................................................................................................24 ModelofOperation......................................................................................................................25 Interfaces.....................................................................................................................................25 FunctionalSpecification..............................................................................................................26 HeaderFormat..........................................................................................................................26

    4. INTERNET SMALL COMPUTER SYSTEMS INTERFACE .............................. 30SCSIConcepts............................................................................................................................30 iSCSIConceptsandFunctionalOverview..................................................................................30 LayersandSessions.................................................................................................................30OrderingandiSCSINumbering................................................................................................31 CommandNumberingandAcknowledging..............................................................................31 Response/StatusNumberingandAcknowledging....................................................................33 DataSequencing......................................................................................................................33 iSCSILogin...............................................................................................................................33 iSCSIFullFeaturePhase.........................................................................................................34 iSCSIConnectionTermination.................................................................................................36 iSCSINames............................................................................................................................36 PersistentState.........................................................................................................................38MessageSynchronizationandSteering...................................................................................39

    iSCSISessionTypes...................................................................................................................40

  • 7/31/2019 iSCSI Notes

    3/40

    iSCSI

    3

    SCSItoiSCSIConceptsMappingModel....................................................................................40

  • 7/31/2019 iSCSI Notes

    4/40

    iSCSI

    4

    1. Introduction

    Discussion on the topics in short notes helps the reader to understand basics as well as the

    important things about the topics. These short notes also provide bottlenecks and the criticalcases that are needed to be attended by the developer. This to some extent addresses the

    developer point of view during implementation.

    The following topics are discussed here.

    iSCSI: iSCSI means internet small computer systems interface. iSCSI sits just above TCP in thelayered architecture. iSCSI is an IP-based standard of the SCSI. iSCSI carries SCSI packets of

    data, commands in the form of PDUs (Protocol Data Units) over a TCP/IP. The advantages of

    iSCSI over other protocols that carry SCSI are the existing Internet infrastructure, Internetmanagement facilities and address distance limitations.

    TCP: TCP means Transmission Control Protocol. TCP sits just above the Internet Protocol in

    layered architecture. TCP is used for building communication between systems with aconnection orientation and thereby providing an end-to-end reliability. TCP Segments are usedto carry data between the source and the destination.

    IP: IP means Internet Protocol. IP sits above the Local Network Protocol in the TCP/IP protocollayered architecture (Data Link + Physical in OSI layerOpen Systems Interconnection Model).

    IP does not guarantee the end-to-end data reliability, flow control, sequencing or other services.

    IP only implement the addressing and fragmentation. Data transferred between a source anddestination is done through Internet datagrams.

    Key issues: There are several details required for iSCSI to be addressed when working on a

    TCP/IP based data storage systems over a network.

    1. Datagram delivery in IP is not in a fixed order as with SCSI as a channel interface. In channel

    interfaces of SCSI the packets are delivered in a sequence and a break in sequence can lead to

    data loss.

    2. The delay/latency that occurs in data transfer over the Ethernet.

    3. TCP/IP reordering and iSCSI reordering is another issue.

    4. How to extract the boundaries of the PDUs after the reordering is done.

    The above addressed for iSCSI implementation also needs the following information to have inhandy to ease the understanding

    1. iSCSI Address and Naming ConventionsSCSI devices available in a network are identified through iSCSI nodes that are nameduniquely across the world. This unique name enables quick identification of an iSCSI

    device regardless of its physical location.

  • 7/31/2019 iSCSI Notes

    5/40

    iSCSI

    5

    2. iSCSI Session ManagementSession management consists of two phases. They are login phase and a full-feature

    phase.

    3.

    iSCSI Error HandlingError recovery in iSCSI prompted the design for two considerations

    i. An iSCSI PDU may fail the digest check and be dropped after the TCP layer.ii. If a TCP connection fails during data transfer, all the active tasks should be

    continued on a different TCP connection within the same session.

    A hierarchy model error recovery level was described to reduce the complexity of eachimplementation. Based on the ErrorRecoveryLevel the associated Error capabilities are

    supported. The following table gives us a clear idea of implementation

    ErrorRecoveryLevel Associated Error recovery capabilities

    0 Session recovery class

    1 Digest failure recovery plus the capabilities of ER level 0

    2 Connection recovery class plus the capabilities if ER level 1

    4. iSCSI SecurityiSCSI security is a very important for the administrator. For authentication of iSCSI

    CHAP is used and for encryption IPSec is used.

    The later section of this document gives us a more in depth knowledge focus of the topics to

    guide us for the proper understanding. All the topics are from their respective ietf rfc. The list of

    rfc used here arerfc791,rfc793andrfc3720and also read if possiblerfc3721,rfc3722,rfc3723,rfc3454,rfc3347,rfc3783,rfc5044,rfc5046,rfc5048.

    http://tools.ietf.org/html/rfc791http://tools.ietf.org/html/rfc791http://tools.ietf.org/html/rfc791http://tools.ietf.org/html/rfc793http://tools.ietf.org/html/rfc793http://tools.ietf.org/html/rfc793http://tools.ietf.org/html/rfc3720http://tools.ietf.org/html/rfc3720http://tools.ietf.org/html/rfc3720http://tools.ietf.org/html/rfc3721http://tools.ietf.org/html/rfc3721http://tools.ietf.org/html/rfc3721http://tools.ietf.org/html/rfc3722http://tools.ietf.org/html/rfc3722http://tools.ietf.org/html/rfc3722http://tools.ietf.org/html/rfc3723http://tools.ietf.org/html/rfc3723http://tools.ietf.org/html/rfc3723http://tools.ietf.org/html/rfc3454http://tools.ietf.org/html/rfc3454http://tools.ietf.org/html/rfc5048http://tools.ietf.org/html/rfc5048http://tools.ietf.org/html/rfc5048http://tools.ietf.org/html/rfc3783http://tools.ietf.org/html/rfc3783http://tools.ietf.org/html/rfc3783http://tools.ietf.org/html/rfc5044http://tools.ietf.org/html/rfc5044http://tools.ietf.org/html/rfc5044http://tools.ietf.org/html/rfc5046http://tools.ietf.org/html/rfc5046http://tools.ietf.org/html/rfc5046http://e/tools.ietf.org/html/rfc5048http://e/tools.ietf.org/html/rfc5048http://e/tools.ietf.org/html/rfc5048http://e/tools.ietf.org/html/rfc5048http://tools.ietf.org/html/rfc5046http://tools.ietf.org/html/rfc5044http://tools.ietf.org/html/rfc3783http://tools.ietf.org/html/rfc5048http://tools.ietf.org/html/rfc3454http://tools.ietf.org/html/rfc3723http://tools.ietf.org/html/rfc3722http://tools.ietf.org/html/rfc3721http://tools.ietf.org/html/rfc3720http://tools.ietf.org/html/rfc793http://tools.ietf.org/html/rfc791
  • 7/31/2019 iSCSI Notes

    6/40

    iSCSI

    6

    2. Internet ProtocolInternet Protocol is designed for use in interconnected systems. The IP provide transferring

    blocks of data called IP datagrams between sources and destinations identified by fixed length

    address. The IP provides for fragmentation and reassembly of long datagrams, if necessary, for

    transmission through small packet networks.

    Internet can capitalize on the services of its supporting networks to provide various types of QOS(quality of service). The internet protocol calls on local network protocols to carry the internet

    datagram to the next gateway or destination host.

    Operation: The IP implements two basic functions. They are

    1. Addressing: The Internet modules use the addresses carried in the internet header to transmit

    internet datagrams toward their destinations. The selection of a path for transmission is called

    routing.

    2. Fragmentation: The Internet modules use fields in the internet header to fragment andreassemble internet datagrams when necessary for transmission through small packetnetworks.

    The model of operation is that an internet module resides in each host engaged in internet

    communication and in each gateway that interconnects network. These modules share common

    rules for interpreting address fields and for fragmenting and assembling internet datagrams .Gateways have additional procedures for making routing decisions and other functions.

    IN IP each internet datagram is an independent entity unrelated to any other internet datagram.

    The IP uses four key mechanisms in providing its service: Type of Service, Time to Live,

    Options and Header Checksum.

    The Type of Service provided by IP is used by gateways to select the actual transmissionparameters for a particular network, the network to be used for the next hop, or the next gatewaywhen routing an internet datagram. The Time to Live is an indication of an upper bound on the

    lifetime of an internet datagram. It is set by the sender and reduced at the points along the route

    where it is processed. The datagram is destroyed if the time to live reaches zero before reachingits destination. The Options provide for control functions needed or useful in some situations but

    unnecessary for the most common communications. The Options include provisions for

    timestamps, security and special routing. The Header Checksum provides verification that the

    processing internet datagram has been transmitted correctly. If the header checksum fails, the

    internet datagram is discarded at once by the entity which detects the error.

    The internet protocol does not provide a reliable communication facility as there are noacknowledgements either end-to-end or hop-by-hop. There is no error control for data, only a

    header checksum. There are no retransmissions and no flow control. Errors detected may be

    reported via the ICMP (Internet Control Message Protocol) which is implemented in the IPmodule.

  • 7/31/2019 iSCSI Notes

    7/40

    iSCSI

    7

    Model of Operation: The model of operation for transmitting an internet datagram from oneapplication program to another is illustrated by the following scenario:

    Here in this transmission one intermediate gateway is involved. The sending application preparesits data and calls on its local internet module to send that data as a datagram and passes the

    destination address and other parameters as arguments of the call. The internet module preparesthe datagram header and attaches the data to it. The internet module determines a local networkaddress for this internet address, in this case it is the address of a gateway. It sends this datagram

    and the local network address to the local network interface.

    The local network interface creates a local network header, and attaches the datagram to it, thensends the result via the local network. The internet datagram arrives at a gateway host wrapped

    in the local network header, the local network interface strips off this header and turns the

    datagram over to the internet module. The internet module determines from the internet address

    that the internet datagram is to be forwarded to another host in a second network. This internetmodule calls on the local network interface for that destination network to send the internet

    datagram. This local network interface creates a local network header and attaches the datagramsending the result to the destination host. At this destination host the internet datagram is strippedof the local net header by the local network interface and handed to the internet module. The

    internet module determines that the datagram is for an application program in this host. It passes

    the data to the application program in response to a system call, passing the source address andother parameters as results of the call.

    Application Application

    Program Program\ /

    Internet Module Internet Module Internet Module

    \ / \ /LNI-1 LNI-1 LNI-2 Local N/W Interface-2

    \ / \ /

    Local Network 1 Local Network 2

    Transmission Path

    Function Description: The function or purpose of IP is to move datagrams through aninterconnected set of networks. This is done by passing the internet datagrams from internet

    module to another until the destination is reached. The internet datagrams are routed from one

    internet module to another through individual networks based on the interpretation of an internet

    address. During routing of messages from one internet module to another datagrams may needto transverse a network whose maximum packet size is smaller than the size of the datagram. To

    overcome this difficulty, a fragmentation mechanism is provided in the internet protocol.

    Addressing: A distinction is made between names, addresses and routes.

    A name indicates what humans can understand.An address indicates where it is.

    A route indicates how to get there.

  • 7/31/2019 iSCSI Notes

    8/40

    iSCSI

    8

    The IP deals primarily with addresses and it is the task of higher level protocols to make the

    mapping from names to addresses. The internet module maps internet addresses to local netaddresses and it is the task of lower level procedures to make the mapping from local net

    addresses to routes.

    Addresses are fixed length of four octets (32 bits). An address begins with a network number,followed by local address (called the rest field). There are three formats or classes of internet

    addresses shown in the below table.

    Format /Classes of internet addresses

    (32 bits)

    Explanation of addresses

    High Order bits Network bits Local Address bits

    Class A 0 7 24

    Class B 10 14 16

    Class C 110 21 8

    111 Escape to extended addressing mode

    Class A: The high order bit is zero, the next 7 bits are the network and the last 24 bits are thelocal address. Class B: The high order two bits are one-zero, the next 14 bits are the network andthe last 16 bits are the local address. Class C: The high order three bits are one-one-zero, the next

    21 bits are the network and the last 8 bits are the local address.

    Care must be taken in mapping internet addresses to local net addresses.

    Fragmentation: Fragmentation of an internet datagram is necessary when it originates in a localnet that allows large packet size and must traverse a local net that limits packets to a smaller size

    to reach its destination. In the internet datagram there is a field that can be marked dontfragment (DF bit). Any internet datagram so marked is not to be fragmented under any

    circumstances. If such an internet datagram not delivered to its destination is to be discarded.

    The fields which may be affected by fragmentation include:

    1) options field2) more fragments flag

    3) fragment offset

    4) internet header length field5) total length field

    6) Header checksum

    The internet fragmentation and reassembly procedure needs to be able to break a datagram intoan almost arbitrary number of pieces that can be later reassembled. The receiver of the fragments

    uses the identification field to ensure that fragments of different datagrams are not mixed. Thefragment offset field tells the receiver the position of a fragment in the original datagram. Thefragment offset and length determine the portion of the original datagram covered by this fragment.

    The more-fragments flag indicate the last fragment.

    The originating protocol module of an internet datagram sets identification field to a value thatmust be unique to that source-destination pair for the time the datagram will be active in the

  • 7/31/2019 iSCSI Notes

    9/40

    iSCSI

    9

    internet system. The originating protocol module of a complete datagram also sets the more-

    fragments flag to zero and the fragment offset to zero.

    If a long datagram is to be fragmented by any internet module (example in a gateway), first

    creates two new internet datagrams and copies the contents of the internet header field from the

    long datagram into both new internet headers. After the fragmentation is done the total lengthfield is set to the length of the first datagram. The more-fragments flag is set to one. The second

    portion of the data is placed in the second new internet datagram, and the total length field is set

    to the length of the second datagram. The more-fragments flag carries the same value as the longdatagram. The fragment offset field of the second new internet datagram is set to the value of

    that field in the long datagram plus NFB (Number of Fragment Blocks). This procedure can be

    generalized for an n-way split, rather than the two-way split described.

    To assemble the fragments of an internet datagram, an IP module combines the datagram that all

    have the same value for the four fields: identification, source, destination and protocol. Based on

    the fragment offset present in that fragments header the combinations are done to place the data

    portion of each fragment in the relative position indicated. The first fragment will have thefragment offset zero and the last fragment will have the more-fragments flag reset to zero.

    An Example Fragmentation Procedure: The maximum sized datagram that can be transmitted

    through the next network is called the maximum transmission unit (MTU). If the total length is

    less than or equal the maximum transmission unit then submit this datagram to the next step in

    datagram processing; otherwise cut the datagram into two fragments, the first fragment being themaximum size, and the second fragment being the rest of the datagram. The first fragment is

    submitted to the next step in datagram processing, while the second fragment is submitted to this

    procedure in case it is still too large.

    Notation:

    FO - Fragment Offset

    IHL - Internet Header LengthDF - Don't Fragment flag

    MF - More Fragments flag

    TL - Total LengthOFO - Old Fragment Offset

    OIHL - Old Internet Header Length

    OMF - Old More Fragments flagOTL - Old Total Length

    NFB - Number of Fragment Blocks

    MTU - Maximum Transmission Unit

    Procedure:

    IF TL =< MTU THEN Submit this datagram to the next step in datagram processing ELSE IF

    DF = 1 THEN discard the datagram ELSE

    To produce the first fragment:

    (1) Copy the original internet header;

  • 7/31/2019 iSCSI Notes

    10/40

    iSCSI

    10

    (2) OIHL

  • 7/31/2019 iSCSI Notes

    11/40

    iSCSI

    11

    available and the data rate of the transmission medium; that is, data rate times timer value equals

    buffer size (e.g., 10Kb/s X 15s = 150Kb).

    Notation:

    FO - Fragment Offset

    IHL - Internet Header LengthMF - More Fragments flag

    TTL - Time To Live

    NFB - Number of Fragment BlocksTL - Total Length

    TDL - Total Data Length

    BUFID - Buffer IdentifierRCVBT - Fragment Received Bit Table

    TLB - Timer Lower Bound

    Procedure:

    (1) BUFID

  • 7/31/2019 iSCSI Notes

    12/40

    iSCSI

    12

    this source, destination pair and protocol for the time the datagram (or any fragment of it) could

    be alive in the internet.

    It seems then that a sending protocol module needs to keep a table of Identifiers, one entry for

    each destination it has communicated with in the last maximum packet lifetime for the internet.

    However, since the Identifier field allows 65,536 different values, some host may be able tosimply use unique identifiers independent of destination.

    It is appropriate for some higher level protocols to choose the identifier. For example, TCPprotocol modules may retransmit an identical TCP segment, and the probability for correct

    reception would be enhanced if the retransmission carried the same identifier as the original

    transmission since fragments of either datagram could be used to construct a correct TCPsegment.

    Type of Service: The type of service (TOS) is for internet service quality selection. The type of

    service is specified along the abstract parameters precedence, delay, throughput, and reliability.

    These abstract parameters are to be mapped into the actual service parameters of the particularnetworks the datagram traverses.

    Precedence: An independent measure of the importance of this datagram.

    Delay: Prompt delivery is important for datagrams with this indication.

    Throughput: High data rate is important for datagrams with this indication.

    Reliability: A higher level of effort to ensure delivery is important for datagrams with thisindication.

    For example, the ARPANET has a priority bit, and a choice between "standard" messages (type0) and "uncontrolled" messages (type 3), (the choice between single packet and multipacket

    messages can also be considered a service parameter). The uncontrolled messages tend to be less

    reliably delivered and suffer less delay. Suppose an internet datagram is to be sent through the

    ARPANET. Let the internet type of service be given as:Precedence : 5

    Delay : 0

    Throughput : 1Reliability : 1

    In this example, the mapping of these parameters to those available for the ARPANET would beto set the ARPANET priority bit on since the Internet precedence is in the upper half of its range,

    to select standard messages since the throughput and reliability requirements are indicated and

    delay is not. More details are given on service mappings in "Service Mappings".

    Time to Live: The time to live is set by the sender to the maximum time the datagram is allowed

    to be in the internet system. If the datagram is in the internet system longer than the time to live,

    then the datagram must be destroyed.

    This field must be decreased at each point that the internet header is processed to reflect the time

    spent processing the datagram. Even if no local information is available on the time actually

  • 7/31/2019 iSCSI Notes

    13/40

    iSCSI

    13

    spent, the field must be decremented by 1. The time is measured in units of seconds (i.e. the

    value 1 means one second). Thus, the maximum time to live is 255 seconds or 4.25 minutes.Since every module that processes a datagram must decrease the TTL by at least one even if it

    process the datagram in less than a second, the TTL must be thought of only as an upper bound

    on the time a datagram may exist. The intention is to cause undeliverable datagrams to be

    discarded, and to bound the maximum datagram lifetime.

    Some higher level reliable connection protocols are based on assumptions that old duplicate

    datagrams will not arrive after a certain time elapses. The TTL is a way for such protocols tohave an assurance that their assumption is met. The options are optional in each datagram, but

    required in implementations. That is, the presence or absence of an option is the choice of the

    sender, but each internet module must be able to parse every option. There can be several optionspresent in the option field.

    The options might not end on a 32-bit boundary. The internet header must be filled out with

    octets of zeros. The first of these would be interpreted as the end-of-options option, and the

    remainder as internet header padding.

    Every internet module must be able to act on every option. The Security Option is required ifclassified, restricted, or compartmented traffic is to be passed.

    Checksum: The internet header checksum is recomputed if the internet header is changed.

    For example, a reduction of the time to live, additions or changes to internet options, or due to

    fragmentation. This checksum at the internet level is intended to protect the internet header fields

    from transmission errors.

    There are some applications where a few data bit errors are acceptable while retransmission

    delays are not. If the internet protocol enforced data correctness such applications could not be

    supported.

    Errors: Internet protocol errors may be reported via the ICMP messages.

    Gateways: Gateways implement internet protocol to forward datagrams between networks.

    Gateways also implement the Gateway to Gateway Protocol (GGP) to coordinate routing and

    other internet control information. In gateways the higher level protocols need not beimplemented and the GGP functions are added to the IP module.

  • 7/31/2019 iSCSI Notes

    14/40

    iSCSI

    14

    Internet Header FormatA summary of the contents of the internet header follows

    0 1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|Version | IHL | Type of Service | Total Length |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Identification | Flags | Fragment Offset |

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Time to Live | Protocol | Header Checksum |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Source Address |

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Destination Address |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Options | Padding |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Example Internet Datagram Header

    Note that each tick mark represents one bit position.

    Version: 4 bits

    The Version field indicates the format of the internet header. This document describes version 4.

    IHL: 4 bits

    Internet Header Length is the length of the internet header in 32 bit words, and thus points to thebeginning of the data. Note that the minimum value for a correct header is 5.

    Type of Service: 8 bits

    The Type of Service provides an indication of the abstract parameters of the quality of servicedesired. These parameters are to be used to guide the selection of the actual service parameters

    when transmitting a datagram through a particular network. Several networks offer service

    precedence, which somehow treats high precedence traffic as more important than other traffic(generally by accepting only traffic above certain precedence at time of high load). The major

    choice is a three way tradeoff between low-delay, high-reliability, and high-throughput.

    Bits 0-2: Precedence.Bit 3: 0 = Normal Delay, 1 = Low Delay.

    Bits 4: 0 = Normal Throughput, 1 = High Throughput.

    Bits 5: 0 = Normal Relibility, 1 = High Relibility.Bit 6-7: Reserved for Future Use.

  • 7/31/2019 iSCSI Notes

    15/40

    iSCSI

    15

    0 1 2 3 4 5 6 7

    +-----+-----+-----+-----+-----+-----+-----+-----+| | | | | | |

    |PRECEDENCE| D | T | R | 0 | 0 |

    | | | | | | |

    +-----+-----+-----+-----+-----+-----+-----+-----+

    Precedence

    111 - Network Control110 - Internetwork Control

    101 - CRITIC/ECP

    100 - Flash Override011 - Flash

    010 - Immediate

    001 - Priority

    000 - Routine

    The use of the Delay, Throughput, and Reliability indications may increase the cost (in some

    sense) of the service. In many networks better performance for one of these parameters iscoupled with worse performance on another. Except for very unusual cases at most two of these

    three indications should be set.

    The type of service is used to specify the treatment of the datagram during its transmission

    through the internet system. Example mappings of the internet type of service to the actualservice provided on networks such as AUTODIN II, ARPANET, SATNET, and PRNET is given

    in "Service Mappings"

    The Network Control precedence designation is intended to be used within a network only. The

    actual use and control of that designation is up to each network. The Internetwork Control

    designation is intended for use by gateway control originators only. If the actual use of these

    precedence designations is of concern to a particular network, it is the responsibility of thatnetwork to control the access to, and use of, those precedence designations.

    Total Length: 16 bitsTotal Length is the length of the datagram, measured in octets, including internet header and

    data. This field allows the length of a datagram to be up to 65,535 octets. Such long datagrams

    are impractical for most hosts and networks. All hosts must be prepared to accept datagrams ofup to 576 octets (whether they arrive whole or in fragments). It is recommended that hosts only

    send datagrams larger than 576 octets if they have assurance that the destination is prepared to

    accept the larger datagrams.

    The number 576 is selected to allow a reasonable sized data block to be transmitted in addition to

    the required header information. For example, this size allows a data block of 512 octets plus 64

    header octets to fit in a datagram. The maximal internet header is 60 octets, and a typical

    internet header is 20 octets, allowing a margin for headers of higher level protocols.

  • 7/31/2019 iSCSI Notes

    16/40

    iSCSI

    16

    Identification: 16 bits

    An identifying value assigned by the sender to aid in assembling the fragments of a datagram.

    Flags: 3 bits

    Various Control Flags.

    Bit 0: reserved, must be zeroBit 1: (DF) 0 = May Fragment, 1 = Don't Fragment.

    Bit 2: (MF) 0 = Last Fragment, 1 = More Fragments.

    0 1 2

    +---+---+---+

    | | D | M || 0 | F | F |

    +---+---+---+

    Fragment Offset: 13 bits

    This field indicates where in the datagram this fragment belongs. The fragment offset ismeasured in units of 8 octets (64 bits). The first fragment has offset zero.

    Time to Live: 8 bits

    This field indicates the maximum time the datagram is allowed to remain in the internet system.

    If this field contains the value zero, then the datagram must be destroyed. This field is modified

    in internet header processing. The time is measured in units of seconds, but since every modulethat processes a datagram must decrease the TTL by at least one even if it process the datagram

    in less than a second, the TTL must be thought of only as an upper bound on the time a datagram

    may exist. The intention is to cause undeliverable datagrams to be discarded, and to bound themaximum datagram lifetime.

    Protocol: 8 bits

    This field indicates the next level protocol used in the data portion of the internet datagram. Thevalues for various protocols are specified in "Assigned Numbers".

    Header Checksum: 16 bitsA checksum on the header only. Since some header fields change (e.g., time to live), this is

    recomputed and verified at each point that the internet header is processed.

    The checksum algorithm is: The checksum field is the 16 bit one's complement of the one's

    complement sum of all 16 bit words in the header. For purposes of computing the checksum, the

    value of the checksum field is zero. This is a simple to compute checksum and experimental

    evidence indicates it is adequate, but it is provisional and may be replaced by a CRC procedure,depending on further experience.

    Source Address: 32 bits of the source address.

    Destination Address: 32 bits of the destination address.

  • 7/31/2019 iSCSI Notes

    17/40

    iSCSI

    17

    Options: variable

    The options may appear or not in datagrams. They must be implemented by all IP modules (hostand gateways). What is optional is their transmission in any particular datagram, not their

    implementation. In some environments the security option may be required in all datagrams.

    The option field is variable in length. There may be zero or more options. There are two casesfor the format of an option:

    Case 1: A single octet of option-type.

    Case 2: An option-type octet, an option-length octet, and the actual option-data octets.

    The option-length octet counts the option-type octet and the option-length octet as well as the

    option-data octets.

    The option-type octet is viewed as having 3 fields:

    1 bit copied flag,2 bits option class,

    5 bits option number.

    The copied flag indicates that this option is copied into all fragments on fragmentation.

    0 = not copied, 1 = copied

    The option classes are:0 = control

    1 = reserved for future use

    2 = debugging and measurement3 = reserved for future use

    The following internet options are defined:

    Class Number Length Description

    0 0 - End of Option list. This option occupies only 1 octet; it has no length

    octet

    0 1 - No Operation. This option occupies only 1 octet; it has no length

    octet

    0 2 11 Security. Used to carry Security, Compartmentation, User Group

    (TCC), and Handling Restriction Codes compatible with DOD

    requirements.

    0 3 var. Loose Source Routing. Used to route the internet datagram based on

    information supplied by the source0 9 var. Strict Source Routing. Used to route the internet datagram based on

    information supplied by the source

    0 7 var. Record Route. Used to trace the route an internet datagram takes

    0 8 4 Stream ID. Used to carry the stream identifier

    2 4 var. Internet Timestamp

  • 7/31/2019 iSCSI Notes

    18/40

    iSCSI

    18

    Specific Option Definitions

    End of Option List+------------+

    |00000000|

    +------------+

    Type=0

    This option indicates the end of the option list. This might not coincide with the end of the

    internet header according to the internet header length. This is used at the end of all options, notthe end of each option, and need only be used if the end of the options would not otherwise

    coincide with the end of the internet header.

    May be copied, introduced, or deleted on fragmentation, or for any other reason.

    No Operation

    +------------+

    |00000001|+------------+

    Type=1This option may be used between options, for example, to align the beginning of a subsequent

    option on a 32 bit boundary.

    May be copied, introduced, or deleted on fragmentation, or for any other reason.

    Security

    This option provides a way for hosts to send security, compartmentation, handling restrictions,and TCC (closed user group) parameters. The format for this option is as follows:

    +-----------+-----------+-----//------+------//-----+------//------+---//---+

    |10000010|00001011| SSS SSS |CCC CCC|HHH HHH| TCC |+-----------+-----------+-----//------+-----//------+-----//-------+---//---+

    Type=130 Length=11

    Security (S field): 16 bits

    Specifies one of 16 levels of security (eight of which are reserved for future use).00000000 00000000 - Unclassified

    11110001 00110101 - Confidential

    01111000 10011010 - EFTO

    10111100 01001101 - MMMM01011110 00100110 - PROG

    10101111 00010011 - Restricted

    11010111 10001000 - Secret

    01101011 11000101 - Top Secret00110101 11100010 - (Reserved for future use)

    10011010 11110001 - (Reserved for future use)

  • 7/31/2019 iSCSI Notes

    19/40

    iSCSI

    19

    01001101 01111000 - (Reserved for future use)

    00100100 10111101 - (Reserved for future use)00010011 01011110 - (Reserved for future use)

    10001001 10101111 - (Reserved for future use)

    11000100 11010110 - (Reserved for future use)

    11100010 01101011 - (Reserved for future use)

    Compartments (C field): 16 bits

    An all zero value is used when the information transmitted is not compartmented. Other valuesfor the compartments field may be obtained from the Defense Intelligence Agency.

    Handling Restrictions (H field): 16 bitsThe values for the control and release markings are alphanumeric digraphs and are defined in the

    Defense Intelligence Agency Manual DIAM 65-19, "Standard Security Markings".

    Transmission Control Code (TCC field): 24 bits

    Provides a means to segregate traffic and define controlled communities of interest amongsubscribers. The TCC values are trigraphs, and are available from HQ DCA Code 530.

    Must be copied on fragmentation. This option appears at most once in a datagram.

    Loose Source and Record Route

    +-----------+--------+---------+--------//--------+

    |10000011| length | pointer | route data |

    +-----------+--------+---------+--------//--------+Type=131

    The loose source and record route (LSRR) option provides a means for the source of an internet

    datagram to supply routing information to be used by the gateways in forwarding the datagram tothe destination, and to record the route information.

    The option begins with the option type code. The second octet is the option length whichincludes the option type code and the length octet, the pointer octet, and length-3 octets of route

    data. The third octet is the pointer into the route data indicating the octet which begins the next

    source address to be processed. The pointer is relative to this option, and the smallest legal valuefor the pointer is 4.

    A route data is composed of a series of internet addresses. Each internet address is 32 bits or 4

    octets. If the pointer is greater than the length, the source route is empty (and the recorded routefull) and the routing is to be based on the destination address field.

    If the address in destination address field has been reached and the pointer is not greater than the

    length, the next address in the source route replaces the address in the destination address field,and the recorded route address replaces the source address just used, and pointer is increased by

    four.

  • 7/31/2019 iSCSI Notes

    20/40

    iSCSI

    20

    The recorded route address is the internet module's own internet address as known in theenvironment into which this datagram is being forwarded.

    This procedure of replacing the source route with the recorded route (though it is in the reverse

    of the order it must be in to be used as a source route) means the option (and the IP header as awhole) remains a constant length as the datagram progresses through the internet. This option is

    a loose source route because the gateway or host IP is allowed to use any route of any number of

    other intermediate gateways to reach the next address in the route.

    Must be copied on fragmentation. It appears at most once in a datagram.

    Strict Source and Record Route

    +-----------+--------+--------+---------//--------+

    |10001001| length | pointer| route data |

    +-----------+--------+--------+---------//--------+Type=137

    The strict source and record route (SSRR) option provides a means for the source of an internet

    datagram to supply routing information to be used by the gateways in forwarding the datagram to

    the destination, and to record the route information.

    The option begins with the option type code. The second octet is the option length which

    includes the option type code and the length octet, the pointer octet, and length-3 octets of route

    data. The third octet is the pointer into the route data indicating the octet which begins the nextsource address to be processed. The pointer is relative to this option, and the smallest legal value

    for the pointer is 4.

    A route data is composed of a series of internet addresses. Each internet address is 32 bits or 4octets. If the pointer is greater than the length, the source route is empty (and the recorded route

    full) and the routing is to be based on the destination address field.

    If the address in destination address field has been reached and the pointer is not greater than the

    length, the next address in the source route replaces the address in the destination address field,

    and the recorded route address replaces the source address just used, and pointer is increased byfour.

    The recorded route address is the internet module's own internet address as known in the

    environment into which this datagram is being forwarded.

    This procedure of replacing the source route with the recorded route (though it is in the reverse

    of the order it must be in to be used as a source route) means the option (and the IP header as a

    whole) remains a constant length as the datagram progresses through the internet.

  • 7/31/2019 iSCSI Notes

    21/40

    iSCSI

    21

    This option is a strict source route because the gateway or host IP must send the datagram

    directly to the next address in the source route through only the directly connected networkindicated in the next address to reach the next gateway or host specified in the route.

    Must be copied on fragmentation. It appears at most once in a datagram.

    Record Route

    +------------+--------+--------+---------//--------+

    |00000111| length | pointer| route data |+------------+--------+--------+---------//--------+

    Type=7

    The record route option provides a means to record the route of an internet datagram.

    The option begins with the option type code. The second octet is the option length which

    includes the option type code and the length octet, the pointer octet, and length-3 octets of route

    data. The third octet is the pointer into the route data indicating the octet which begins the nextarea to store a route address. The pointer is relative to this option, and the smallest legal value

    for the pointer is 4.

    A recorded route is composed of a series of internet addresses. Each internet address is 32 bits or

    4 octets. If the pointer is greater than the length, the recorded route data area is full.

    The originating host must compose this option with a large enough route data area to hold all the

    address expected. The size of the option does not change due to adding addresses. The initial

    contents of the route data area must be zero.

    When an internet module routes a datagram it checks to see if the record route option is present.

    If it is, it inserts its own internet address as known in the environment into which this datagram is

    being forwarded into the recorded route begining at the octet indicated by the pointer, andincrements the pointer by four.

    If the route data area is already full (the pointer exceeds the length) the datagram is forwardedwithout inserting the address into the recorded route. If there is some room but not enough room

    for a full address to be inserted, the original datagram is considered to be in error and is

    discarded. In either case an ICMP parameter problem message may be sent to the source host.

    Not copied on fragmentation, goes in first fragment only. Appears at most once in a datagram.

    Stream Identifier+--------+--------+--------+--------+

    |10001000|00000010| Stream ID |

    +--------+--------+--------+--------+

    Type=136 Length=4

  • 7/31/2019 iSCSI Notes

    22/40

    iSCSI

    22

    This option provides a way for the 16-bit SATNET stream identifier to be carried through

    networks that do not support the stream concept.

    Must be copied on fragmentation. It appears at most once in a datagram.

    Internet Timestamp+------------+--------+--------+--------+

    |01000100| length |pointer |oflw|flg|

    +------------+--------+--------+--------+| internet address |

    +------------+--------+--------+--------+

    | timestamp |+------------+--------+--------+--------+

    | . |

    .

    .

    Type = 68

    The Option Length is the number of octets in the option counting the type, length, pointer, andoverflow/flag octets (maximum length 40).

    The Pointer is the number of octets from the beginning of this option to the end of timestamps

    plus one (i.e., it points to the octet beginning the space for next timestamp). The smallest legalvalue is 5. The timestamp area is full when the pointer is greater than the length.

    The Overflow (oflw) [4 bits] is the number of IP modules that cannot register timestamps due tolack of space.

    The Flag (flg) [4 bits] values are

    0 -- time stamps only, stored in consecutive 32-bit words,

    1 -- each timestamp is preceded with internet address of the registering entity,

    3 -- the internet address fields are prespecified. An IP module only registers its timestamp

    if it matches its own address with the next specified internet address.

    The Timestamp is a right-justified, 32-bit timestamp in milliseconds since midnight UT. If the

    time is not available in milliseconds or cannot be provided with respect to midnight UT then any

    time may be inserted as a timestamp provided the high order bit of the timestamp field is set to

    one to indicate the use of a non-standard value.

    The originating host must compose this option with a large enough timestamp data area to hold

    all the timestamp information expected. The size of the option does not change due to adding

    timestamps. The initial contents of the timestamp data area must be zero or internet address/zeropairs.

  • 7/31/2019 iSCSI Notes

    23/40

    iSCSI

    23

    If the timestamp data area is already full (the pointer exceeds the length) the datagram is

    forwarded without inserting the timestamp, but the overflow count is incremented by one.

    If there is some room but not enough room for a full timestamp to be inserted, or the overflow

    count itself overflows, the original datagram is considered to be in error and is discarded. In

    either case an ICMP parameter problem message may be sent to the source host.

    The timestamp option is not copied upon fragmentation. It is carried in the first fragment. It

    appears at most once in a datagram.

    Padding: variable

    The internet header padding is used to ensure that the internet header ends on a 32 bit boundary.The padding is zero.

    Interfaces: The functional description of user interfaces to the IP is, at best, fictional, sinceevery operating system will have different facilities. Consequently, we must warn readers that

    different IP implementations may have different user interfaces. However, all IPs must provide acertain minimum set of services to guarantee that all IP implementations can support the sameprotocol hierarchy. This section specifies the functional interfaces required of all IP

    implementations.

    Internet protocol interfaces on one side to the local network and on the other side to either a

    higher level protocol or an application program. In the following, the higher level protocol or

    application program (or even a gateway program) will be called the "user" since it is using the

    internet module. Since internet protocol is a datagram protocol, there is minimal memory or statemaintained between datagram transmissions, and each call on the internet protocol module by the

    user supplies all information necessary for the IP to perform the service requested.

  • 7/31/2019 iSCSI Notes

    24/40

    iSCSI

    24

    3. Transport Control ProtocolTransport Control Protocol is a connection oriented, end-to-end reliable protocol designed to fit

    into a layered hierarchy of protocols which support multi-network applications. The internet

    datagram envelopes provides a means for addressing source and destination TCPs in differentnetworks. The IP also deals with any fragmentation or reassembly of the TCP segments required

    to archive transport and delivery through multiple networks and interconnecting gateways. IPalso carries information on the precedence, security classification and compartmentation of theTCP segments, so this information can be communicated end-to-end across multiple networks.

    Operation: The primary purpose of TCP is to provide reliable, securable logical circuit orconnection service between pairs of processes. Basic Data Transfer, Reliability, Flow Control,

    Multiplexing, Connections, Precedence and Security are the required operations of TCP to

    provide service on top of a less reliable internet communication system.

    Basic Data Transfer: The TCP is able to transfer a continuous stream of octets in each direction

    between its users by packing some number of octets into segments for transmission through the

    internet communication system. The TCPs decide when to block and forward data at their ownconvenience. Sometimes users need to be sure that all the data they have submitted to the TCP

    has been transmitted or not. A push function causes the TCPs to promptly forward and deliverdata up to the point to the receiver. Push is not visible to receiving user and also does not supply

    a record boundary marker.

    Reliability: The TCP must recover from data that is damaged, lost, duplicated, or delivered out oforder by the internet communication system. This is achieved by assigning a sequence number to

    each octet transmitted, and requiring a positive acknowledge (ACK) from the receiving TCP. If

    the ACK is not received with in a timeout interval, the data is retransmitted. At the receiver, the

    sequence numbers are used to correctly order segments that may be received out of order and to

    eliminate duplicates. Damage is handled at the receiver by checking the checksum value that iswith each segment.

    Flow Control: TCP provides a means for the receiver to govern the amount of data sent by the

    sender. This is done by returning a window with every ACK indicating a range of acceptablesequence numbers beyond the last segment successfully received. This window helps the

    transmitter to determine the maximum allowed octets that can be sent before getting permissionfor the next transfer.

    Multiplexing: To allow for many processes within a single HOST to use TCP simultaneously for

    communication. TCP provides a set of addresses or ports to achieve this feature. Concatenation

    with the network and host addresses forms a socket. A pair of sockets uniquely identifying itstwo sides is a connection. That is, a socket may be simultaneously used in multiple connections.

    Connections: TCP needs to initialize and maintain certain status information for each data

    stream. The combination of such information, sockets, sequence numbers and window sizes is

    called a connection. If two processes wish to communicate, their TCPs mus t first establish aconnection and once communication is finished processes should terminate or close the

    connection to free the resources.

  • 7/31/2019 iSCSI Notes

    25/40

    iSCSI

    25

    Precedence and Security: The users of TCP may indicate their communication with requiredsecurity and precedence. When unused by user a default values take into effect.

    Model of Operation: Processes transmit data by calling on the TCP and passing buffers of

    data as arguments. TCP packages the data from these buffers into segments and calls on theinternet module to transmit each segment to the destination TCP. The receiving TCP places thedata from a segment into the receiving user's buffer and notifies the receiving user.

    After it comes to internet module the data is TCP segments are embedded in the data part of the

    internet datagram. From then followed according to the internet protocol standards for routing toreach the destination. A destination internet module unwraps the segment from the datagram

    (after reassembling the datagram, if necessary) and passes it to the destination TCP.

    One important feature is the type of service. This provides information to the gateway (orinternet module) to guide it in selecting the service parameters to be used in traversing the next

    network. Included in the type of service information is the precedence of the datagram.Datagrams may also carry security information to permit host and gateways that operate inmultilevel secure environments to properly segregate datagrams for security considerations.

    Interfaces: The TCP interfaces on one side to user or application processes and on the otherside to a lower level protocol such as Internet Protocol.

    The interface between an application process and the TCP is nothing but a set of calls much like

    the calls an OS provides to an application process. For example, there are calls have parameters

    for passing the address, type of service, precedence, security, and other control information to

    open and close a connection and to send and receive data on established connections.

    The interface between TCP and lower level protocol is essentially unspecified except that it is

    assumed there is a mechanism to pass information to each other asynchronously.

  • 7/31/2019 iSCSI Notes

    26/40

    iSCSI

    26

    Functional Specification

    Header Format: TCP segments are sent in the internet datagrams. The IP header carries severalinformation fields, including the source and destination host addresses. The TCP supplies

    information specific to the TCP.

    TCP Header Format0 1 2 3

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Source Port | Destination Port |

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Sequence Number |

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Acknowledgment Number |

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Data | |U|A|P|R|S|F | || Offset | Reserved |R|C| S|S|Y|I | Window |

    | | |G|K|H|T|N|N| |

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Checksum | Urgent Pointer |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    | Options | Padding |

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| data |

    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    TCP Header Format

    Note that one tick mark represents one bit position.

    Source Port: 16 bits

    Indicate the source port number in 16 bits.

    Destination Port: 16 bits.

    Indicate the destination port number in 16 bits.

    Sequence Number: 32 bits

    The sequence number of the first data octet in this segment (except when SYN is present). IfSYN is present the sequence number is the initial sequence number (ISN) and the first data octet

    is ISN+1.

    Acknowledgment Number: 32 bits

    If the ACK control bit is set this field contains the value of the next sequence number the sender

    of the segment is expecting to receive. Once a connection is established this is always sent.

  • 7/31/2019 iSCSI Notes

    27/40

  • 7/31/2019 iSCSI Notes

    28/40

  • 7/31/2019 iSCSI Notes

    29/40

    iSCSI

    29

    Maximum Segment Size

    +-----------+-----------+---------+--------+|00000010|00000100| max seg size |

    +-----------+-----------+---------+--------+

    Kind=2 Length=4

    Maximum Segment Size Option Data: 16 bits

    If this option is present, then it communicates the maximum receive segment size at the TCP

    which sends this segment. This field must only be sent in the initial connection request (i.e., insegments with the SYN control bit set). If this option is not used, any segment size is allowed.

    Padding: variableThe TCP header padding is used to ensure that the TCP header ends and data begins on a 32 bit

    boundary. The padding is composed of zeros.

  • 7/31/2019 iSCSI Notes

    30/40

    iSCSI

    30

    4. Internet Small Computer Systems Interface

    SCSI Concepts: The SCSI Architecture Model-2 [SAM-2] describes in detail the architectureof the SCSI family of I/O Protocols. SCSI is a family of interfaces for requesting services from

    I/O devices, including hard drives, tape drives, cd and dvd drivers, printers and scanners. In SCSI

    terminology, an I/O device is called a logical unit (LU). SCSI is cli ent-server architecture.Clients are called initiators who issue SCSI commands to request services from logical unitsof a server known as a target. These servers accept SCSI commands and process them.

    A SCSI transport maps the client-server SCSI protocol to a specific interconnect. For the SCSI

    transport lie initiators as one endpoint and target as the other endpoint. A target can contain

    multiple logical units (LUs) and each has an address within a target called Logical Unit Number

    (LUN). A SCSI task is a SCSI command or possibly a linked set of SCSI commands. Some LUssupport multiple pending (queued) tasks, but the queue of tasks is managed by the logical unit.

    The target uses an initiator provided task tag to distinguish between tasks. Only one command

    in a task can be outstanding at any given time.

    Each SCSI commands results in an optional data phase and a required response phase. In the data

    phase, information can travel from the initiator to target (e.g., write), target to initiator (e.g.,read), or in both directions. In the response phase, the target returns the final status of the

    operation, including any errors. Command Descriptor Blocks (CDBs) are the data structures used

    to contain the command parameters that an initiator sends to a target. The CDB content and

    structure is defined by [SAM2] and device-type specific SCSI standards.

    iSCSI Concepts and Functional Overview: The iSCSI protocol is a mapping of the SCSIremote procedure invocation model (see [SAM2]) over the TCP protocol. SCSI commands are

    carried by iSCSI requests and SCSI responses and status are carried by iSCSI responses. The

    initiator and target devices carry the communication between them using iSCSI protocol dataunit (iSCSI PDU).

    ISCSI allows a phase-collapse which means there is no handshake for each transaction as in

    other protocols. To all the initiators previous requests the target provides an updated responsewhen it sends data to the initiator for conserving the bandwidth.

    iSCSI transfer direction is defined with respect to the initiator. Transfers from initiator to target

    are termed as outbound or outgoing, while from a target to initiator it is named as inbound orincoming with respect to initiator. An iSCSI task is an iSCSI request for which a response is

    expected.

    Layers and Sessions: This layering model is used to specify the way in which initiator ortarget do the transmission and receive the PDUs.

    a) The SCSI layer builds/receives SCSI CDBs and passes/receives them with remainingcommand execute parameters ([SAM2]) to/from

    b) The iSCSI layer that builds/receives iSCSI PDUs and relays/receives them to/from one ormore TCP connections; the group of connections from an initiator- target session.

  • 7/31/2019 iSCSI Notes

    31/40

    iSCSI

    31

    Communication between the initiator and target occurs over one or more TCP connections. The

    TCP connection carries iSCSI PDUs that holds control messages, SCSI commands, parametersand data. A group of TCP connections that link an initiator with a target form a session (SCSI

    I_T nexus, see SCSI Arch. Model). iSCSI targets and initiators MUST support at least one TCP

    connection and MAY support several connections in a session. For Error recovery purpose, a

    single active connection in a session SHOULD support two connections during recovery.

    A session is defined by a session ID that is composed of an initiator part and a target part. TCP

    connections can be added and removed from a session. Each connection within a session isidentified by a connection ID (CID). Across all connections initiator sees one target image anda target seems one initiator image within a session. Initiator identifying elements, such as the

    Initiator Task Tag, are global across the session regardless of the connection on which they aresent or received.

    Ordering and iSCSI Numbering: iSCSI uses Command numbering and Status numberingschemes and a Data sequencing schemes.

    Command numbering is session-wide and is used for ordered command delivery over multiple

    connections and is also used for command flow control over a session.

    Status numbering is per connection and is used to enable missing status detection and recovery.

    Data sequencing is per command or part of a command (R2T triggered sequence) and is used to

    detect missing data and/or R2T PDUs due to header digest errors.

    Typically, fields in the iSCSI PDUs communicate the Sequence Numbers between the initiatorand target. During periods when traffic on a connection is unidirectional, iSCSI NOP-Out/In

    PDUs may be utilized to synchronize the command and status ordering counters of the target and

    initiator.

    Command Numbering and Acknowledging: iSCSI performs ordered command deliverywithin a session and so all commands from initiator to target PDUs are numbered. A set of Task

    Management Function request are performed on any iSCSI task. All the tasks are identified by

    the Initiator Task Tag for the life of the task.

    The command number is carried by the iSCSI PDU as CmdSN (Command Sequence Number).

    The numbering is session-wide. Outgoing iSCSI PDUs carry this number and is allocated by the

    iSCSI initiator with a 32-bit unsigned counter. Commands meant for immediate delivery are

    marked with an immediate delivery flag; they MUST also carry the current CmdSN and thisCmdSN is not changed. This CmdSN starts with the first login request on the first connection of

    a session and is incremented by 1 for every non-immediate command issued afterwards.

    If immediate delivery is used with task management commands, these commands may reach the

    target before the tasks on which they are supposed to act. However their CmdSN serves as a

    marker of their position in the stream of commands. The initiator and target MUST ensure thatthe task management commands act as specified by [SAM2]. The number of commands used for

  • 7/31/2019 iSCSI Notes

    32/40

    iSCSI

    32

    immediate delivery is not limited and their delivery for execution is not acknowledged through

    the numbering scheme. Immediate commands MAY be rejected by the iSCSI target layer due toa lack of resources. An iSCSI target MUST be able to handle at least one immediate task

    management command and one immediate non-task-management iSCSI command per

    connection at any time.

    On any connection, the iSCSI initiator MUST send the commands in increasing order of

    CmdSN, except for commands that are retransmitted due to digest error recovery and connection

    recovery.

    For the numbering mechanism, the initiator and target maintain the following three variables for

    each session:

    - CmdSN: the current Command Sequence Number, advanced by 1 on each commandshipped except for commands marked for immediate delivery. CmdSN always contains

    the number to be assigned to the next Command PDU.

    -

    ExpCmdSN: the next expected command by the target. The target iSCSI layer sets theExpCmdSN to the largest non-immediate CmdSN that it can deliver for execution plus 1.

    The target sends response (ACKs) to initiator up to, but not including, this number.- MaxCmdSN: the maximum number to be shipped. The queuing capacity of the receiving

    iSCSI layer is MaxCmdSNExpCmdSN + 1.

    The initiators ExpCmdSN and MaxCmdSN are derived from target-to-initiator PDU fields. Thetarget MUST NOT transmit a MaxCmdSN that is less than ExpCmdSN 1. The target MUST

    ignore any non-immediate command outside of this range or n on-immediate duplicates within

    the range.

    The CmdSN carried by immediate commands may lie outside the ExpCmdSN to MaxCmdSN

    range. This is obvious because if the previous CmdSN for a non-immediate command is same as

    the MaxCmdSN sent the window is closed and the current CmdSN carried for a immediatecommand will lie outside of the ExpCmdSN and MaxCmdSN range.

    MaxCmdSN and ExpCmdSN fields are processed by the initiator as follows:

    If the PDU MaxCmdSN is less than the PDU ExpCmdSN1, they both are ignored. If the PDU MaxCmdSN is greater than the local MaxCmdSN, it updates the local value. If the PDU ExpCmdSN is greater than the local ExpCmdSN, it updates the local value.This sequence is required because updates may arrive out of order (e.g., when updates are sent

    on different TCP connections).

    iSCSI initiators and targets MUST support the command numbering scheme. A numbered iSCSI

    request will not change its allocated CmdSN, regardless of the number of times and

    circumstances in which it is reissued. At the target, CmdSN is only relevant when the commandhas not created any state related to its execution (execution state); afterwards, CmdSN becomes

    irrelevant. Testing for the execution state (represented by identifying the Initiator Task tag)

    MUST precede any other action at the target. If no execution state is found, it is followed byordering and delivery. If execution state is found, it is followed by delivery.

  • 7/31/2019 iSCSI Notes

    33/40

    iSCSI

    33

    If an initiator issues a command retry for a command with CmdSN R on a connection when the

    session CmdSN value is Q, it MUST NOT advance the CmdSN past R + 2**31 -1 unless theconnection is no longer operational (i.e., it has returned to the FREE state), the connection has

    been reinstated, or a non-immediate command with CmdSN equal or greater than Q was issues

    subsequent to the command retry on the same connection and the reception of that command is

    acknowledged by the target.

    A target MUST NOT issue a command response or Data-In PDU with status before

    acknowledging the command. However, the acknowledgement can be included in the responseor Data-In PDU.

    Response/Status Numbering and Acknowledging: Responses in transit from the target tothe initiator are numbered. The StatSN (Status Sequence Number) is used for this purpose.

    StatSN is a counter maintained per connection. ExpStatSN is used by the initiator to

    acknowledge status. The status sequence number space is 32-bit unsigned-integers and thearithmetic operations are regular mod(2**32) arithmetic. Status numbering starts with the Login

    response to the first Login request of the connection. The Login response includes an initialvalue for status numbering (any initial value is valid).

    To enable command recovery, the target MAY maintain enough state information for data and

    status recovery after a connection failure. A target doing so can safely discard all of the state

    information maintained for recovery of a command after the delivery of the status for thecommand (numbered StatSN) is acknowledged through ExpStatSN. A large absolute difference

    between StatSN and ExpStatSN may indicate a failed connection. Initiators MUST undertake

    recovery actions if the difference is greater than an implementation defined constant that MUST

    NOT exceeds 2**31-1. Initiators and Targets MUST support the response-numbering scheme.

    Data Sequencing: Data and R2T PDUs transferred as part of some command execution MUSTbe sequenced. The DataSN field is used for data sequencing. For input (read) data PDUs, dataSNstarts with 0 for the first data PDU of an input command and advances by 1 for each subsequent

    data PDU. For output data PDUs, DataSN starts with 0 for the first data PDU of a sequence andadvances by 1 for each subsequent data PDU. R2Ts are also sequenced per command. It also

    advances for each subsequent R2T.

    Unlike command and status, data PDUs and R2Ts are not acknowledged by a field in regularoutgoing PDUs. Data-In PDUs can be acknowledged on demand by a special form of the

    SNACK PDU. Data and R2T PDUs are implicitly acknowledged by status for the command. The

    DataSN/R2TSN field enables the initiator to detect missing data or R2T PDUs. For any read or

    bidirectional command, a target MUST issue less than 2**32 combined R2T and Data-In PDUs.Any output data sequence MUST contain less than 2**32 Data-Out PDUs.

    iSCSI Login: The purpose of the iSCSI login is to enable a TCP connection for iSCSI use,authentication of the parties, negotiation of the sessions parameters and marking of the

    connection as belonging to an iSCSI session. The session is used to identify to a target all the

    connections with a given initiator that belong to the same I_T nexus. The targets listen on a well-known TCP port or other TCP port for incoming connections. Initiators begin the login process

  • 7/31/2019 iSCSI Notes

    34/40

    iSCSI

    34

    by connecting to one of these TCP ports. As part of the login process, the initiator and target

    SHOULD authenticate each other and MAY also set security association protocol for the session.

    To protect the TCP connection, an IPsec security association MAY be established before the

    Login request. The iSCSI Login Phase is carried through Login requests and responses. Once all

    authentication and operational parameters have been set, the session transactions to the FULLFEATURE PHASE and the initiator may start to send SCSI commands. The Login PDU

    includes the ISID part of the session ID (SSID). The target protocol group that services the login

    is implied by the selection of the connection endpoint. For a new session, the TSIH is zero. Aspart of the Login Response, the target generates a TSIH for the initiator future communication.

    During session establishment, the target identifies the SCSI initiator port (the "I" in the "I_Tnexus") through the value pair (InitiatorName, ISID). Any persistent state on the target

    associated with this Initiator port is identified with this value pair. Any state associated with the

    SCSI target port (the T in the I_T nexus) is defined externally by TargetName and portal

    group tag. Before the Full Feature Phase is established, only Login Request and Login Response

    PDUs are allowed.

    A target receiving any PDU except a Login request before the Login phase is started MUSTimmediately terminate the connection on which the PDU is received. If the target receives any

    PDU except a login request, it MUST send a Login Reject (with status invalid during login) as

    response and disconnect. In the same way if an initiator receives any PDU except a Login

    response, it MUST immediately terminate the connection.

    iSCSI Full Feature Phase: The initiators get authorized from target after a successfulcompletion of the login phase, the iSCSI session is now in the iSCSI Full Feature Phase. Oncethe iSCSI session is established with Full Features the initiator may send SCSI commands and

    data to the various LUs on the target using PDUs.

    An iSCSI connection is not in Full Feature Phase,

    a) When it does not have an established transport connection

    orb) When it has a valid transport connection, but a successful login was not performed or the

    connection is currently logged out.

    Command Connection Allegiance: Connection allegiance means, any iSCSI request issuedover a TCP connection, the corresponding response and/or other related PDU(s) MUST be sent

    over the same connection. If the original connection fails before the command is completed, the

    connection allegiance of the command may be explicitly reassigned to a different transportconnection. Thus, if an initiator issues a READ command, the target MUST send the requested

    data, if any, followed by the status to the initiator over the same TCP connection that was used to

    deliver the SCSI command. Retransmission requests (SNACK PDUs) and the data and status thatthey generate MUST also use the same connection.

    However, consecutive commands that are part of a SCSI linked command-chain task (see[SAM2]) MAY use different connections. Connection allegiance is strictly per-command and not

  • 7/31/2019 iSCSI Notes

    35/40

    iSCSI

    35

    per-task. During the iSCSI Full Feature Phase, the initiator and target MAY interleave unrelated

    SCSI commands, their SCSI Data, and responses over the session.

    Data Transfer Overview: Outgoing SCSI data (initiator to target user data or commandparameters) is sent as either solicited data or unsolicited data. Solicited data are sent in response

    to R2T PDUs. Unsolicited data can be sent as part of an iSCSI command PDU ("immediatedata") or in separate iSCSI data PDUs. Immediate data are assumed to originate at offset0 in

    the initiator SCSI write-buffer (outgoing data buffer). All other Data PDUs have the buffer offsetset explicitly in the PDU header.

    An initiator may send unsolicited data up to FirstBurstLength as immediate (up to the negotiatedmax PDU length), in a separate PDU sequence or both. All subsequent data MUST be solicited.

    The maximum length of an individual data PDU or the immediate-part of the first unsolicited

    burst MAY be negotiated at login. A target MAY separately enable immediate data (through the

    ImmediateData key) without enabling the more general (separate data PDUs) form of unsoliciteddata (through the InitiatorR2T key). Immediate data is meant to reduce the protocol overhead.

    An iSCSI initiator MAY choose not to send unsolicited data, only immediate data or

    FirstBurstLength bytes of unsolicited data with a command. If any non-immediate unsoliciteddata is sent, the total unsolicited data MUST be either FirstBurstLength, or all of the data if the

    total amount is less than the FirstBurstLength. If initiators send unsolicited data to a target that

    operates in R2T mode is an error. A target SHOULD NOT silently discard data and then requestretransmission through R2T. Initiators SHOULD NOT keep track of the data transferred to or

    from the target because SCSI targets perform residual count calculation to check how much data

    was actually transferred to or from the device by a command.

    SCSI data packets are matched to their corresponding SCSI commands by using tags specified in

    the protocol. iSCSI initiators and targets MUST enforce some ordering rules. They are

    1. When unsolicited data is used, the order of this data on each connection MUST match theorder in which the commands on that connection are sent.

    2. Command and unsolicited data PDUs may be interleaved on a single connection as long as the

    ordering requirements of each are maintained.3. A target that receives data out of order MAY terminate the session.

    Tags and Integrity Checks: Initiator tags for pending commands are unique initiator-wide for asession. Target tags are not strictly specified by the protocol. It is assumed that target tags are

    used by the target to tag the solicited data. Target tags are generated by the target and echoed

    the initiator. As the Initiator Task Tag is used to identify a task during its execution, the iSCSI

    initiator and target MUST verify that all other fields used in task-related PDUs have values thatare consistent with the values used at the task instantiation based on the Initiator Task Tag. Using

    inconsistent field values is considered a protocol error.

    Task Management: SCSI task management assumes that individual tasks and task groups can be

    aborted solely based on the task tags (for individual tasks) or the timing of the task management

    command (for task group), and that the task management action is executed synchronouslyi.e.,no message involving an aborted task will be seen by the SCSI initiator after receiving the task

  • 7/31/2019 iSCSI Notes

    36/40

    iSCSI

    36

    management response. In iSCSI initiators and targets interact asynchronously over several

    connections.

    iSCSI Connection Termination: An iSCSI connection may be terminated by use of a transportconnection shutdown or a transport reset. Transport reset is assumed to be an exceptional event.

    Graceful TCP connection shutdowns are done by sending TCP FINs. Shutdown SHOULD onlybe initiated only when the connection is not in iSCSI Full Feature Phase. A target MAY

    terminate a Full Feature Phase connection on internal exception events, but it SHOULDannounce the fact through an Asynchronous Message PDU. Connection termination with

    outstanding commands MAY require recovery actions. If a connection is terminated while in

    FULL Feature Phase, connection cleanup is required before starting recovery to avoid receivingstale PDUs after recovery.

    iSCSI Names: Both targets and initiators require names for the purpose of identification. Inaddition, names enable iSCSI storage resources to be managed regardless of location (address).

    An iSCSI node name is also the SCSI device name of an iSCSI device. The iSCSI name of a

    SCSI device is the principle object used in authentication of targets to initiators and initiators totargets. This name is also used to identify and manage iSCSI storage resources. iSCSI namesmust be unique within the operational domain of the end user.

    The operational domain of IP network is worldwide, so the iSCSI names are unique worldwide.To assist naming authorities in the construction of worldwide unique names, iSCSI provides two

    name formats for different types of naming authorities. iSCSI names are associated with iSCSI

    nodes, and not iSCSI network adapter cards, to ensure that the replacement of network adaptercards does not require reconfiguration of all SCSI and iSCSI resources allocation information.

    Some SCSI commands require that protocol-specific identifiers be communicated within SCSI

    CDBs. Each iSCSI initiator can discover the iSCSI Targets Names to which it has access, alongwith their addresses, using SendTargets text requests, or other techniques discussed in [rfc3721].

    iSCSI Name Properties: Each iSCSI node, whether an initiator or target, MUST have an iSCSIname. Initiators and targets MUST support the receipt of iSCSI names of up to the maximum

    length of 223 bytes. The initiators MUST present the iSCSI initiator name and the iSCSI target

    name to which it wishes to connect in the first login request of a new session or connection. Theonly exception is when a discovery session is to be established. In this case iSCSI Initiator name

    is still required, but the iSCSI target name MAY be omitted.

    iSCSI names have the following properties:a) iSCSI names are globally unique. No two initiators or targets can have the same name.

    b) iSCSI name are permanent. An iSCSI initiator node or target node has the same name for its

    lifetime.c) iSCSI names do not imply a location or address. An iSCSI change of address does not imply a

    change of name.

    d) iSCSI names do not rely on a central name broker; the naming authority is distributed.

    e) iSCSI names support integration with existing unique naming schemes.

  • 7/31/2019 iSCSI Notes

    37/40

    iSCSI

    37

    f) iSCSI names rely on existing naming authorities. iSCSI does not create any new naming

    authority.

    The encoding of an iSCSI name has the following properties:

    a) iSCSI names have the same encoding method regardless of the underlying protocols.

    b) iSCSI names are relatively simple to compare. The algorithm for comparing two iSCSI namesfor equivalence does not rely on an external server.

    c) iSCSI names are composed only of displayable characters with no whitespaces in iSCSI

    names. iSCSI names allow the use of international character sets but are not case sensitive.d) iSCSI names may be transported using both binary and ASCII-based protocols.

    The iSCSI name is designed to fulfill the functional requirements for Uniform Resource Names(URN) [rfc1737]. For example, it is required that the name have a global scope, be independent

    of address or location and be persistent and globally unique. Names must be extensible and

    scalable with the use of naming authorities.

    iSCSI Name Encoding: An iSCSI name MUST be a UTF-8 encoding of a string of Unicodecharacters with the following properties:- It is in Normalization Form C (UNICODE).- It only contains characters allowed by the output of iSCSI stringrep template (rfc3722).- The following characters are used for formatting iSCSI names:

    a) dash (- = U+002d)

    b) dot (.=U+002e)

    c) colon (:=U+003a)

    - The UTF-8 encoding of the name is not larger than 223 bytes.The stringrep process is described in [rfc3454]; iSCSIs use of the stringprep process is described

    in [rfc3722]. Stringprep method is designed by the International Domain Name (IDN) working

    group to translate human-typed strings into a format that can be compared as opaque strings.Strings MUST NOT include spacing, punctuations, diacritical marks or other characters that

    could get in the way of readability. The stringprep process also converts strings into equivalent

    strings of lower-case characters.

    The stringprep process does not need to be implemented if the names are only generated using

    numeric and lower-case (any character set) alphabetic characters. Once iSCSI names encoded inUTF-8 are normalized they may be safely compared byte-for-byte.

    iSCSI Name Structure: An iSCSI name consists of two parts a type designator followed by aunique name string. The iSCSI name does not define any new naming authorities. iSCSI supportstwo existing ways of defining naming authorities: an iSCSI-Qualified Name, using domain

    names to identify a naming authority, and the EUI format, where the IEEE Registration

    Authority assists in the formation of worldwide unique names (EUI-64 format), in ASCII-encoded hexadecimal. The type designator strings currently defined are: iqn. and eui.

    Type iqn. (iSCSI Qualified Name): This iSCSI name type can be used by any organization thatowns a domain name. This naming format is useful when an end user or service provider wishes

  • 7/31/2019 iSCSI Notes

    38/40

    iSCSI

    38

    to assign iSCSI names for targets and/or initiator. To generate names of this type, the person or

    organization generating the name must own a registered domain name. This domain name doesnot have to be active, and does not have to resolve to an address; it just needs to be reserved to

    prevent others from generating iSCSI names using the same domain name.

    Since a domain name can expire, be acquired by another entity, or may be used to generateiSCSI names by both owners, the domain name must be additionally qualified by a date during

    which the naming authority owned the domain name. For this reason, a date code is provided as

    part of the iqn. format.

    The iSCSI qualified name string consists of:

    - The string iqn., used to distinguish these names from eui. formatted names.- The date code, in YYYY-MM format. The date MUST be a date during which the

    naming authority owned the domain name used in this format, and SHOULD be the first

    month in which the domain name was owned by this naming authority at 00:01 GMT of

    the first day of the month.

    -

    A dot .- The reserved domain name of the naming authority (person or organization) creating thisiSCSI name.

    - An optional, colon (:) prefixed, string within the character set and length boundaries thatthe owner of the domain name deems appropriate. This may contain product types, serial

    numbers, host identifiers or software keys (e.g, it may include colons to separate

    organization boundaries).For example, Example Storage Arrays, Inc. might own the domain name example.com.

    The following are examples that may contain the following fields.Naming String defined by

    Type Date Auth "example.com" naming authority

    +---++-------+ +-------------+ +----------------------------------------+

    iqn.2001-04.com.example:storage:diskarrays-sn-a8675309iqn.2001-04.com.example

    iqn.2001-04.com.example:storage.tape1.sys1.xyz

    Type eui. (IEEE EUI-64 format): The IEEE Registration Authority provides a service forassigning globally unique identifiers [EUI]. The EUI-64 format is used to build global identifier

    in other network protocols. For example, Fiber Channel defines a method of encoding it into a

    WorldWideName. The format is eui. Followed by an EUI-64 identifier (16 ASCII-encoded

    hexadecimal digits).

    Example iSCSI name: Type EUI-64 identifie