Multipath TCP: A Comparative Analysiskarticbhargav.weebly.com/uploads/2/4/0/0/24006251/ecen... ·...

Multipath TCP: A Comparative Analysis

by Kartic Bhargrav and Corey Morrison

Introduction Multipath TCP (MPTCP) is being proposed as a means to exploit multiple connections to the internet. It is not the first multipathcapable ethernet protocol (that distinction likely belongs to SCTP) nor is it the most widely used multipath protocol (that distinction likely belongs to the linux devicemapper multipath [DMMultipath] implementation). However, it’s stated goal is to “ be backwards compatible with current, regular TCP, to increase its chances of deployment” and it’s current implementation involves seamlessly integrating with existing application level system calls and working alongside existing TCP connections [1]. These facts give this protocol the potential quickly become one of the widest deployed and most important multipath protocols. In this paper we will analyze the MPTCP protocol and compare it against the other two protocols listed above (SCTP and DMMultipath over FC). We will attempt to determine the weaknesses or strengths of MPTCP against the other two protocols through theoretical and quantitative analysis. Finally, we will attempt to demonstrate any weaknesses or strengths that we find using the ns2 simulator and SimSANS (a fibre channel network simulator).

Background

Multipath TCP (MPTCP) MPTCP is currently an experimental protocol defined in RFC 6824. It’s stated goal is to exist alongside TCP and to “do no harm” to existing TCP connections, while providing the extensions necessary so that additional paths can be discovered and utilized. Multipath TCP starts and maintains additional TCP connections and runs them as subflows underneath the main TCP connection. See Figure 1 for a quick visualization of this: ++ | Application | ++ ++ | Application | | MPTCP | ++ + + + | TCP | | Subflow (TCP) | Subflow (TCP) | ++ ++ | IP | | IP | IP | ++ ++ Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks The IP addresses for these additional subflows are discovered one of two ways; implicitly when a host with a free port connects to a known port on the other host, or explicitly using an inband message. Each subflow is treated as an individual TCP connection with it’s own set

of congestion control variables. Subflows can also be designated as backup subflows which do not immediately transfer data, but activate when primary flows fail. [1,5] Research has shown that MPTCP congestion control as defined in RFC 5681 does not result in fairness with standard TCP connections if two flows from an MPTCP connection go through the same bottlenecked link. As such, there’s a great deal of ongoing research about alternative congestion control schemes specifically for multipath protocols [18].

Stream Control Transmission Protocol (SCTP) SCTP is a transport layer protocol in the TCP/IP stack (similar to TCP and UDP). It is messageoriented like UDP, but also ensure reliable, insequence transport of messages with congestion control like TCP. It achieves this by using multihoming to establish multiple redundant paths between two hosts. In it’s current specification, SCTP is designed to transfer data on one pair of IP addresses at a time while the redundant pairs are used for failover and path health or control messages. [14] However, significant research is being done to allow SCTP to use multiple concurrent paths at once as needed [13].

SCTP requires that endpoint IP addresses are provided to the protocol at initialization. It does not include any way for endpoints to communicate possible other paths with each other. Ports must also connect in such a way that no port on either host is used more than once for the connection.

SCTP is currently not in widespread use, and as such routers and firewalls may not route SCTP packets properly. In the absence of native SCTP support in operating systems it is possible to tunnel SCTP over UDP, as well as mapping TCP API calls to SCTP ones.

Fibre Channel (FC) As defined by the FCPH standard: “(FC) is logically a bidirectional pointtopoint serial data channel, structured for high performance capability. Physically, the Fibre Channel can be an interconnection of multiple communication points...interconnected by a switching network, called a Fabric, or a pointtopoint link” [2]. It’s important to note that this standard is completely distinct from Ethernet/TCP/IP and the OSI stack. It’s controlled by a completely different standards body and there is no interoperability between Ethernet or Fibre Channel, though higher layers of one stack can be (and commonly are) encapsulated within the other. FC was designed to host multiple different higherlevel protocols such as HIPPI, SCSI, and IP[2]. As a result, it’s base hardware and fabric protocols (which some might equate to layer2 ethernet protocols) include a number of innovations which are usually reserved for higher layers in the TCP/IP stack. One such innovation is the inclusion of two hardware identifiers in each fibre channel port; one 64bit identifier for the port itself called the World Wide Port Name (WWPN), and a second 64bit identifier for the device that the port is attached to called a World Wide Node Name (WWNN). This second identifier is commonly used to logically associate multiple

WWPNs in a fabric together, or instance in the case of a multiport storage array that is used to serve data to dozens if not hundreds of different servers.[7] A second innovation is fabric level discovery services. Upon connection to the Fibre Channel network, all devices are required to register with a centralized name server on a port by port basis. This name server resides inside network hardware and keeps track of those devices logged into the network. The fibre channel fabric also informs devices when new devices enter and leave the fabric (either gracefully or abruptly) using a State Change Notification construct so that higher level applications might act appropriately. [6] A third innovations is two different frame identifiers: a sequence number and an exchange ID. Every frame in Fibre Channel is sent with a unique sequence number, and a single exchange ID will represent a set of one or more sequence numbers. Multiple concurrent exchange IDs can be used to achieve full bandwidth utilization between two FC endpoints. These identifiers are then used by higher level protocols (in this case the FC link services and the SCSI over FC protocol) to reorder and relate data received in a packet. [7] One last construct that bears mentioning is the flow control mechanism used by the SCSI over FC protocol, which uses a buffer to buffer credit system to make sure that no two ports overload each others’ receive buffers and cause packet loss. [7]

Linux Device Mapper Multipathing The Linux DMMultipath “provides inputoutput (I/O) failover and loadbalancing within Linux for block devices”. In this capacity, the multipath daemon will identify different logical paths to the same physical disk, group these paths together into a single logical disk, and present this logical disk to the operating system as a single device which can be written to and read from via Linux SCSI Interfaces. The multipath daemon then transparently selects a particular pair of fibre channel initiator (source) and target (destination) ports to write the blocks of data and transparently handles load balancing and events like port failures on the initiator or target.[3] In the DMMultipath implementation, the two hardware identifiers (WWNN and WWPN) are used in conjunction with SCSI inquiry commands to verify that two ports on a device are actually presenting the same disk to the operating system before the paths are logically combined and presented as a single disk device to the OS. As such, all existing applications can make SCSI calls to that single logical disk device using existing OS disk system calls with no knowledge of the DMMultipath package, the underlying SCSI system, nor the FC network upon which it all runs.[3] DMMultipath also provides a user space configuration file by which system administrators can set load balancing and failover policies on the system.

Theoretical Comparison

MPTCP vs. SCTP

Handshakes

Multipath TCP uses a threeway handshake to initialize a new flow the same way as basic TCP. SCTP however follows a 4Way Handshake for its connection setup. This is shown in figure 2. As such, SCTP places more solid importance on authentication with explicit verification tags. This is crucial in protecting systems against SYN Flooding attacks which are a persistent problem in TCPbased communications.

Figure 2: TCP Handshake (right) vs SCTP Handshake (left).

Congestion Control

On a subflow to subflow basis, MPTCP and SCTP both act either identically or similarly to TCP and utilize slow start algorithms and congestion windows for end to end flow control on a path. Additionally, MPTCP and CMTSCTP both couple all subflow congestion windows together under a global congestion window. Load balancing decisions on which subflow to use using these parameters are a constant subject of research and are not trivial[15].

However, MPTCP can have significantly more flows to manage, as MPTCP allows for fully meshed connections compared to even CMTSCTP. See figure 3 for an example of a fully meshed connection in MPTCP as opposed to the parallel connections in SCTP. In this picture, each host has two ports but the protocols set up connections between the two ports in different ways. In SCTP,

these connection pair may be explicitly defined while in MPTCP it is up to the protocol to detect and use the correct one. As such, choosing efficient port pairs ahead of time is crucial to the operation of SCTP and unfortunately this is neither trivial nor done automatically in most implementations. On the plus side, SCTP’s connection scheme means that it does not suffer from the unfairness problem mentioned in the background section on MPTCP.

Figure 3: Connections established in SCTP vs MPTCP

Other Notes

As currently defined, SCTP is not designed for concurrent multipath transfer the same way that MPTCP is. Instead, SCTP uses only one path at a time, and it switches to another path only after the current path fails. There has been a fair amount of academic work on an SCTP extension to provide concurrent multipath transmission (CMTSCTP) [13] and we were able to use one of these implementations in our experimental demonstration of MPTCP vs. TCP.

MCTCP vs. FC and DM-Multipath At first glance, fibre channel may not seem to be an ideal protocol to use in this comparison. Operating systems do not treat fibre channel controllers as network devices and are not involved in the FC network stack. FC’s primary use in industry is as a highspeed interface used by I/O controllers (mostly SCSI devices). And FC’s methods of operation are fundamentally different from IP networks… FC attempts to provide lossless and reliable transmission in FC while IP transmissions are assumed to be unreliable[2][4]. However, when combined with the Linux DMMultipath implementation and compared against the entire TCP stack the mappings of common features become apparent.

Figure 4: Mapping of Multipath TCP to FC

There are a number of properties of FC networks and endpoints which invite some comparison between DMMultipath over FC and MPTCP over Ethernet/IP. Both have the stated goal of providing multiple communication paths between two endpoints in a network by taking advantage of multihoming and multiple addressing. Both also strive to maintain backwards compatibility with existing transport protocols (MPTCP with legacy TCP and DMMultipath with legacy SCSI). Both attempt to provide either increased throughput or increased resilience in their respective networks and both attempt to do this automatically and transparently to the user. However, there are some distinct differences in the way this is achieved between the two stacks.

Dual Hardware IDs

One difference is the pair of hardware identifiers present in all FC ports as described in the background section. Combined with the fabric discovery protocols present in all FC fabrics, this allows the DMMultipath daemon to discover multiple paths and present them to the OS immediately. These presentations persist as long as the FC ports remain online and logged into the fabric and tend to remain up as long as the computer is on. This would be akin to MPTCP using ARPs or DNS requests on all ports to find additional paths to use during connection establishment and establishing these initial TCP subflows simultaneously. This does come at some cost. FC Host Bus adapters provide specialized hardware to offload the processing of these fabric discovery protocols and store all this port information.

Additionally, FC’s scope is currently limited to datacenter networks (with a maximum of 224

devices connected to a fabric at any time) and even in these networks it’s a best practice to use zoning to limit a device’s visibility into the SAN (and thus reduce the amount of time it takes to scan for new devices). Today, zoning is not an automatic process but FC network operators are able to still effectively manage their networks due to the somewhat static nature of Fibre Channel in general. With this in mind, I believe this implementation would not scale well to Internetsized networks without some modification In contrast, MPTCP requires inband communication on an initial TCP connection to determine these additional ports (both explicit and implicit address advertisement involves the use of an existing connection by one side or the other of the link). This allows MPTCP to be more dynamic and only have to store connection information for those ports it needs at any given time, but comes at the cost of speed and possibly utilization/redundancy. This makes it better suited to the more dynamic access patterns of ethernet networks. However, I believe MPTCP could benefit from adopting extensions so that other address discovery mechanisms for a connection could be better utilized. For instance, an addition to the initial TCP call so that all IP addressed discovered via a DNS request could be passed to the underlying TCP layer and treated as though they had been received explicitly could allow for quicker multipathing on redundant networks.

User Space Load Balancing The fact that multipathd runs at the software layer and is easily accessible to the end user comes with some design tradeoffs as well. It is possible for endusers to configure the load balancing and path selection schemes used by the DMMultipath software on a global, device type, or WWID basis. Multipath can also be selectively turned off for certain types of devices or certain WWIDs. This is mainly due to the wildly varying properties of different disk array controllers that have been connected to FC devices over the years (everything from highlatency singlequeue tape to highly parallel and lowlatency SSD arrays). As such, finding a single algorithm which fairly load balances all these different access patterns has been impossible and array manufacturers have spent the last decade loading the default multipath configuration file with ideal multipath settings for their devices. MPTCP could benefit from a similar scheme as research has shown that different network types (wirelessvswired, highlatency vs lowlatency, highdroprate vs lowdroprate, etc.) may have different ideal multipath patterns [10,15,18]. An additional TCP option could be added by which a host or individual interface could communicate some basic information about itself (like physical layer information) to it’s partner and it’s partner could adjust it’s load balancing algorithms accordingly.

Link Failure awareness

FC link layer services allow hosts to be quickly notified when their partners on the other side of the network disconnect either explicitly (FC allows ports to logout as well as login) or implicitly (switch detects Loss of Signal on a port). As such, when a port goes offline any active connections to it will be notified of the change by the fibre channel fabric itself (out of band) and can act accordingly. In contrast, if an MPTCP port fails any active connections to it either has to wait for a TCP timeout on the subflow to be notified or has to be notified by an existing port. This leads to a possible temporary head of line blocking issue when a port with a high RTT goes down and other ports on the MPTCP connection use up the remaining buffer. This is unlikely to happen in an FC network due to the explicit notifications from the network of port failures. It is possible that this problem could be solved using the dynamic congestion control algorithm mentioned earlier so that the MPTCP connection is less reliant on those connections which could possibly lead to issues in the event of failure.

Simulation comparing CMT-SCTP and MPTCP We decided that it might be revealing to see how these two protocols behaved in different types of networks with very different properties. As such we developed an ns2 simulation to monitor the throughput in traffic between two nodes transferring data with these protocols.

Methods We used ns2 as well as a few thirdparty libraries to simulate these two experimental protocols. We also found network specification for four common network types with highly different properties and measured the throughput of two systems running an CMTSCTP implementation, a MPTCP implementation, and up to three separate TCP flows running over the network [16,17].

Though the exact environments and propagation models could not be simulated, approximate values of parameters that represent each physical layer technology were taken. The key differences in each of these technologies were with respect to the Bandwidth, Delay constraints and the Packet Loss Ratio. These 4 parameters were accordingly altered in the simulation to yield the corresponding outputs and their exact values are detailed in Table 1. The link(s) between 2 sample nodes (One transmitting and the other receiving) were measured for overall throughput. The settings for these sample nodes are given in table 2. The results were directly plotted for each of the nodes using xgraph.

Technology Bandwidth (Mbit/s) Delay

(ms) Packet Loss Ratio

(%) Downlink Uplink

Ethernet (ref) 100 100 2 1

4GLTE 100 100 30 5

3GHSPA+ 56 22 80 9

802.11g 54 54 2 2

Table 1: Technology Parameters used in ns2 simulation of different networks

Outgoing packet size 1452 bytes (based on 1500byte MTU)

Outgoing rate As many packets as possible per second

Incoming packet size 0 bytes (off)

Incoming rate 0 (off)

Runtime 120 seconds

Receive buffer 7000000

Send buffer 14000000

Packetdelivery order (SCTP only) Disabled (100% unordered)

Table 2: Endpoint Settings used in each traffic flow

Results

Our experiments gave us the following four charts plotting overall throughput as a function of time for each of the three protocols (MPTCP, CMTSCTP, and Base TCP) in a given network type.

Our results were inconsistent across the different networks and it appears there may have been errors in our simulation or errors in the protocol implementations that we used. It appears that latency has an extremely adverse affect on the protocol simulations. Nevertheless, in three out of four networks it is clear that the ns2 CMTSCTP implementation achieves a higher throughput and get to that maximum throughput faster than the MPTCP implementation that we used. Both are able to outperform normal TCP flows, but CMTSCTP is able to fully utilize more of the links between the two nodes that MPTCP. This may have to do with possibly lighter checksum costs and memory needs for CMTSCTP.

Simulation comparing FC and MPTCP Unfortunately the most interesting parts of our comparison between FC and MPTCP had to do with failover and the current implementation of SimSANs does not support simulation of failover on active connections in the simulation. It is mostly used for analysis of static network configurations.[19] Additionally we felt that attempting to do base performance comparisons between MPTCP and FC would be unfair due to the grossly different MTU size, congestion control schemes, and network behaviors. Live analysis on an existing FC fabric would have to be used with wireline trace analyzers to take timing measurements. This equipment was beyond the scope of this project and could not be requisitioned in time.

Conclusions We completed our theoretical analysis and found a couple of ways that MPTCP may be improved. While a couple of the suggestions would only result in relatively minor speed improvements (mostly during connection initiation and slow start), communicating network or interface parameters explicitly so that congestion control algorithms can be adjusted to the network has the potential to help the congestion control problems of MPTCP. We were also able to simulate the different protocols in ns2 and show that they suffered from adverse effects depending on the type of network they were in, especially with respect to network latency.

Further work Simulation of a more network aware MPTCP protocol would be a useful project to help test some of the theories about dynamic load balancing in the FC comparison page. Modification of the SimSAN tool to support simulation of path failure would also be useful. iSCSI (which is also compatible with the DMMultipath package) might also be a promising protocol with which to compare against MPTCP. Additional debugging of our ns2 simulation would be useful to provide a more stable environment for which to test some of the other theories presented in the section comparing SCTP and MPTCP. The creation of a SYN flood module for ns2 would be useful.

References and Related Work [1] Ford, Et Al., "RFC 6824 TCP Extensions for Multipath Operation with Multiple Addresses.", RFC 6824, January 1, 2013. Accessed December 1, 2014. http://tools.ietf.org/html/rfc6824. [2] American National Standard Institute, “American National Standard for Information Technology: Fibre Channel: Physical and Signaling Interface (FCPH)”. New York: American National Standard Institute, 1995. http://www.t11.org/ftp/t11/member/fc/ph/fcph_43.pdf [3] Goggin, Et. All. "Linux Multipathing." July 20, 2005. Accessed December 1, 2014. https://www.kernel.org/doc/ols/2005/ols2005v1pages155176.pdf. [4] Information Sciences Institute., “RFC 791 Internet Protocol Darpa Internet Program Protocol Specification”,RFC 791, 1981, https://www.ietf.org/rfc/rfc791.txt [5] Ford, et al., “RFC 6182 Architectural Guidelines for Multipath TCP Development”, RFC 6182. March 2011. Accessed December 1, 2014 http://tools.ietf.org/html/rfc6182

[6] American National Standard Institute, “American National Standard for Information Technology: Fibre Channel: Link Services (FCLS2)”, 2010, http://www.t11.org/ftp/t11/pub/fc/ls2/10122v2.pdf [7] American National Standards Institute, “Fibre Channel: Framing and Signaling 3 (FCFS3)”, 2010. http://www.t11.org/ftp/t11/pub/fc/fs3/10010v3.pdf [8] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, and V. Paxson. Stream Control Transmission Protocol. RFC 2960, October 2000. Obsoleted by RFC 4960, updated by RFC 3309.

[9] Andreas Wiese, ‘Beyond TCP: SCTP vs. MPTCP’, "EZAG" COLLOQUIUM, TU Dresden, January 2014.

[10] Ausama A Majeed, ‘SCTP vs. TCP: Comparing Packets Loss Rate of Transport Protocols in Best Effort Networks’, Journal of College of Education for Pure Science, Vol. 2, Issue 3, pp. 4045, 2012.

[11] Armando L. Caro Jr., Keyur Shah, Janardhan R. Iyengar, Randall R. Stewart, Paul D. Amer, ‘Congestion Control: SCTP vs TCP’, Communications and Networks Consortium, 2009.

[12] Iljitsch van Beijnum, ‘Multipath TCP’, IETF Journal, September 2009.

[13] Iyengar, J. R. et all. “Concurrent Multipath Transfer Using SCTP Multihoming”, SPECTS, 2004

[14] Stewart, et al., “RFC 4960 Stream Control Transmission Protocol”, RFC 4960, September 2007, Accessed December 1, 2014 http://tools.ietf.org/html/rfc4960

[15] Becke, et al. “Comparison of Multipath TCP and CMTSCTP based on Intercontinental Measurements”, https://www.tdr.wiwi.unidue.de/fileadmin/fileupload/ITDR/Forschung/GLOBECOM2013.pdf

[16] “Implement multipath tcp on ns2”, Google Code Project, Aug 2010 https://code.google.com/p/multipathtcp/

[17] “SCTP Agents”, ns2 Project Online Manual, Aug 2011, http://www.isi.edu/nsnam/ns/doc/node426.html

[18] Singh, et al. “Enhancing Fairness and Congestion Control in Multipath TCP”, 6th Joint IFIP Wireless and Mobile Networking Conference, 2013

[19] “Data Center Storage Network Simulation”, SimSANs Website, http://www.simsans.org/

Multipath TCP: A Comparative Analysiskarticbhargav.weebly.com/uploads/2/4/0/0/24006251/ecen... ·...

Documents

Transcript of Multipath TCP: A Comparative Analysiskarticbhargav.weebly.com/uploads/2/4/0/0/24006251/ecen... ·...