Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

30
March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1 1 Department of Electrical and Computer Engineering Queen’s University Kingston, ON, Canada K7L 3N6 2 Mathematics and Computer Science Argonne National Laboratory Argonne, IL, USA

description

A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance. Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1. 1 Department of Electrical and Computer Engineering Queen’s University Kingston, ON, Canada K7L 3N6. 2 Mathematics and Computer Science - PowerPoint PPT Presentation

Transcript of Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

Page 1: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

1March 2010

A Study of Hardware Assisted IP over InfiniBand and its Impact on

Enterprise Data Center Performance

Ryan E. Grant1, Pavan Balaji2, Ahmad Afsahi1

1Department of Electrical and Computer Engineering

Queen’s University Kingston, ON, Canada K7L 3N6

2Mathematics and Computer Science

Argonne National Laboratory

Argonne, IL, USA

Page 2: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

2March 2010

Introduction

• Motivation• Background Information• Experimental Framework• Experimental Results

– Baseline Performance– Offloading Performance– Data Center Performance Results– Performance Bottleneck Investigation– Validation

• Conclusions– Future Work

• Questions

Page 3: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

3March 2010

Motivation

• Sockets-based protocols are used extensively in enterprise data centers

• IPoIB provides a high performance socket interface that does not rely on upper layer protocol support (such as TCP)

• Future convergence of networking fabrics, will use elements of InfiniBand and Ethernet

• Behaviour and performance of such systems is important to study in order to guide the development and installation of advanced networking technologies

Page 4: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

4March 2010

Motivation

• Why is UD/RC offloading important?

– Its new to IPoIB (UD offloading)

– It narrows the performance gap between IPoIB-UD and Sockets Direct Protocol (SDP)

– UD offloading allows us to effectively use software that does not utilize TCP (TCP is required for SDP)

Page 5: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

5March 2010

Outline

• Motivation• Background Information• Experimental Framework• Experimental Results

– Baseline Performance– Offloading Performance– Data Center Performance Results– Performance Bottleneck Investigation– Validation

• Conclusions– Future Work

• Questions

Page 6: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

6March 2010

Background Information

• InfiniBand– Queue Pair Operation, supporting RDMA and

send/receive modes

– Mellanox ConnectX host channel adapters support 4x InfiniBand operation• Bandwidth of 40 Gigabit/s (32 Gigabits/s data)

– The OpenFabrics Enterprise Software Distribution (OFED-1.4) (InfiniBand drivers) supports both SDP and IPoIB protocols

Page 7: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

7March 2010

Background Information

• Socket-based protocols provide IP functionality, using IP over IB (IPoIB)– IPoIB provides Large receive offload (LRO) and Large

send offload (LSO)– Large receive offload aggregates incoming packets– Large send offload segments large messages into

appropriate packet sizes in hardware• Sockets Direct Protocol (SDP) provides RDMA

capabilities– Bypasses the operating system’s TCP/IP stack– Utilizes hardware flow control, offloaded network and

transport stack in addition to RDMA – Operates in buffered-copy and zero-copy modes

Page 8: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

8March 2010

Background Information

Page 9: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

9March 2010

Background Information

• InfiniBand operates in several modes:– Reliable Connection (RC): Keeps traditional

connections while offering low level reliability and in order delivery

– Unreliable Datagram (UD): non-connection based datagram transmission, with no reliability

• IB is capable of RDMA, which is utilized for socket interfaces through the SDP API which replaces TCP programming semantics

• Additional IB modes: unreliable connection and reliable datagram, do not currently have hardware offloading implementations

Page 10: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

10March 2010

Outline

• Motivation• Background Information• Experimental Framework• Experimental Results

– Baseline Performance– Offloading Performance– Data Center Performance Results– Performance Bottleneck Investigation– Validation

• Conclusions– Future Work

• Questions

Page 11: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

11March 2010

Experimental Framework 

 

OS Processors InfiniBand HCA Switch OFED version

FedoraKernel2.6.27

2 – 2.0 Ghz Quad-Core AMD

Opteron

ConnectX 4X DDR HCAFirmware:

2.6.11

Mellanox 24-port MT47396

Infiniscale-III

1.4

• Network performance data was collected using Netperf-2.4.4

• Performance was validated using iperf-2.0.4

Page 12: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

12March 2010

Baseline Performance Results

• As expected Verbs RDMA and SDP (RDMA) show major latency advantages for small messages over IPoIB

Single Connection Latency

05

101520253035

1 4 16 64 256 1K 4KMessage Size (bytes)

Late

ncy

(μs)

IB Verbs RDMA IPoIB-RC IPoIB-LRO-LSOSDP-BC SDP-ZC

Page 13: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

13March 2010

Baseline Multi-Stream Performance• The use of multiple streams for IPoIB shows good performance for

IPoIB-RC/UD while IPoIB RC/UD and SDP Buffered copy are comparable

Multi-Stream Bandwidth

0

5000

10000

15000

1 4 16 64 256 1K 4K 16K 64K 256K 1MMessage Size (bytes)

Ban

dwid

th (M

bps)

IB Verbs RDMA IPoIB-RC IPoIB-LRO-LSOSDP-BC SDP-ZC

Page 14: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

14March 2010

Offloading Performance

• IPoIB-UD with offloading provides similar latency to that of IPoIB-RC

IPoIB Offloading Latency

15

20

25

30

35

1 4 16 64 256 1K 4KMessage Size (bytes)

Late

ncy

(μs)

IPoIB-UD-LRO-LSO IPoIB-UD-LRO-noLSOIPoIB-UD-noLRO-LSO IPoIB-UD-noLRO-noLSOIPoIB-RC

Page 15: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

15March 2010

Offloading Performance

• Although IPoIB-UD offloading outperforms non-offloaded IPoIB-UD, the single stream is still outperformed by IPoIB-RC

IPoIB Offloading Single-Stream Bandwidth

0

2000

4000

6000

8000

10000

1 4 16 64 256 1K 4K 16K 64K 256K 1MMessage Size (bytes)

Ban

dwid

th (M

bps)

IPoIB-UD-LRO-LSO IPoIB-UD-LRO-noLSOIPoIB-UD-noLRO-LSO IPoIB-UD-noLRO-noLSOIPoIB-RC

Page 16: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

16March 2010

Offloading Performance

• With multiple streams, IPoIB-UD-LRO-LSO outperforms IPoIB-RC, and greatly outperforms non offloaded IPoIB-UD

IPoIB Offloading Multi-Stream Bandwidth

02000400060008000

1000012000

1 4 16 64 256 1K 4K 16K 64K 256K 1MMessage Size (bytes)

Ban

dwid

th (M

bps)

IPoIB-UD-LRO-LSO IPoIB-UD-LRO-noLSOIPoIB-UD-noLRO-LSO IPoIB-UD-noLRO-noLSOIPoIB-RC

Page 17: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

17March 2010

Offloading Performance

• Offloaded IPoIB-UD provides a 85.1% improvement in bandwidth over non-offloaded IPoIB-UD

• Offloaded IPoIB-UD outperforms Multi-stream IPoIB-RC by 7.1%, and provides similar latency

• Offloaded IPoIB-UD provides bandwidth only 6.5% less than that of SDP

Page 18: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

18March 2010

Data Center Performance

Page 19: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

19March 2010

Data Center Performance

Data center throughput shows IPoIB-UD-LRO-LSO to maintain the highest level of throughput, while SDP is unexpectedly the worst performer of the group

Work Interactions Per Second

40

50

60

70

80

90

100

0 40 80 120 160 200 240 280 320

Time (s)

WIP

S

IPoIB-LRO-LSO IPoIB-noLRO-noLSO IPoIB-RC SDP

Page 20: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

20March 2010

Data Center PerformanceIPoIB-UD-LRO-LSO IPoIB-RC

IPoIB-UD-noLRO-noLSO SDP

Page 21: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

21March 2010

Data Center Performance

• IPoIB-UD-LRO-LSO provides the highest sustained bandwidth of all of the protocols, beating non-offloaded IPoIB-UD by 15.4% and IPoIB-RC by 5.8% and SDP by 29.1%

• IPoIB-UD-LRO-LSO provides similar response time to its nearest competitor, IPoIB-RC

• All of the IPoIB configurations provide higher bandwidth than SDP

Page 22: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

22March 2010

Performance Bottleneck Investigation

• SDP shows poor throughput and latency, much worse than would be initially expected

• Given the excellent performance of SDP at the micro-benchmarks level, several tests were conducted to determine the cause of SDP’s poor performance in the data center test

• It was determined that the large number of simultaneous connections were causing poor performance with SDP

• The number of connections used by the SDP data center were reduced while increasing the activity level of each connection to confirm this analysis

Page 23: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

23March 2010

Performance InvestigationThe resulting performance of SDP (with 50 connections) is increased greatly to a level that is more inline with expectations

Page 24: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

24March 2010

Performance Investigation

SDP delay - with many connectionsSDP delay - with fewer connections

Page 25: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

25March 2010

Performance Validation

• IPoIB and SDP show similar performance results on SPECWeb as were seen using the TPC-W benchmarks

Page 26: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

26March 2010

Performance Validation

• SPECWeb response time results show IPoIB-UD-LRO-LSO to have the overall lowest response times

Maximum Response Time

05000

10000150002000025000

0 1000 2000 3000 4000 5000 6000Time (s)

Tim

e (m

s)

LRO-LSO noLRO-noLSO SDP

Page 27: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

27March 2010

Outline

• Motivation• Background Information• Experimental Framework• Experimental Results

– Baseline Performance– Offloading Performance– Data Center Performance Results– Performance Bottleneck Investigation– Validation

• Conclusions– Future Work

• Questions

Page 28: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

28March 2010

Conclusions

• Micro-benchmarks have shown a 85.1% improvement in bandwidth of offloaded IB-UD over the non-offloaded case and a 26.2% maximum reduction in latency

• Offloaded IPoIB-UD shows a 15.4% improvement in throughput over non offloaded IPoIB-UD

• IPoIB-UD-LRO-LSO has a 29.1% higher throughput than SDP in our data center testing

• The benefits of using IPoIB-RC are minimal over those of IPoIB-UD when utilizing offloading capabilities

• Therefore, for future networks such as CEE the inclusion of a reliable connection mode is most likely unnecessary

Page 29: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

29March 2010

Future Work

• Resolving the issues holding back SDP performance when using large numbers of connections

• Utilizing Quality of Service to further enhance enterprise data center performance

• Combining IPoIB-UD, QoS and Virtual Protocol Interconnect to improve overall data center performance

Page 30: Ryan E. Grant 1 , Pavan Balaji 2 , Ahmad Afsahi 1

30March 2010

Thank You

Questions?

Questions?