NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol...

25
NFS/RDMA over 40Gbps iWARP Wael Noureddine Chelsio Communications

Transcript of NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol...

Page 1: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS/RDMA over 40Gbps iWARP

Wael Noureddine Chelsio Communications

Page 2: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Outline

RDMA Motivating trends iWARP

NFS over RDMA Overview Chelsio T5 support Performance results

2

Page 3: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Adoption Rate of 40GbE

3 Source: Crehan Research - 2Q14 CREHAN Quarterly Market Share Tables

Page 4: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Software Defined Everything

4 Source: European Telecommunications Standards Institute http://portal.etsi.org/nfv/nfv_white_paper.pdf October, 2012

Page 5: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Motivating Trends

Unprecedented curve in 40GbE growth (and pricing) Consolidation and virtualization

Software defined storage (everything) using commodity hardware

Rise of the data center Power efficiency

High speed, ultra low latency SSDs Need for high performance, high efficiency fabric

Ethernet remains the preferred technology TCP/IP for scalability, reliability, robustness and reach

iWARP RDMA over Ethernet 5

Page 6: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

RDMA Overview Direct memory-to-memory

transfer All protocol processing handled

by the NIC Must be in hardware

Protection handled by the NIC User space access requires both

local and remote enforcement Asynchronous communication

model Reduced host involvement

Performance Latency – polling Throughput

Efficiency Zero copy Kernel bypass (user space I/O) CPU bypass

6

T5 T5

Wireless/LAN/Datacenter/WAN

Network

Protection

Protocol Processing

MEMORY MEMORY

Payload Notifications

CQ

Payload

Host Host

CQ

Notifications

Packets Packets

Performance and efficiency in return for new communication paradigm

Page 7: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

iWARP RDMA over Ethernet

IETF RFCs in 2007 Open standard Multiple vendors

Ongoing standardization Extensions to maintain API

uniformity with InfiniBand Recent RFC 7306 by

Broadcom, Chelsio and Intel Mature stack

3rd generation hardware RDMA over TCP/IP/Ethernet

TCP reliability, scalability, congestion and flow control

IP routability Ethernet ubiquity

Wireless ready Near 10Gbps, low latency

Cloud ready Standard TCP/IP foundation No network restrictions

Full featured implementation All RDMA benefits

High performance High packet rate Low latency (1.5usec user-to-

user) Line rate 40Gb with single

connection

7

Page 8: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

iWARP Benefits Convergence

Coexists with all other traffic on same port

No special treatment needed Familiar protocol stack

Standard tools for monitoring/debugging

Standard network function appliances (security, load balancing…)

Plug-and-play No need for lossless network

operation Leverages existing

infrastructure Less expensive network

hardware Easy to deploy and manage

Leverages decades of TCP/IP experience Congestion avoidance and

control Critical for network stability

Reliability at hardware speeds Retransmission and re-

ordering Routable

Goes wherever IP is spoken Scalable across

Network size Network architecture Distance

Reliable, robust, scalable

8

Page 9: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Linux RDMA Architecture

9

RDMA NIC R-NIC

Host Channel Adapter

HCA

User Direct Access Programming Lib

UDAPL

Reliable Datagram Service

RDS

iSCSI RDMA Protocol (Initiator)

iSER

SCSI RDMA Protocol (Initiator)

SRP

Sockets Direct Protocol

SDP

IP over InfiniBand IPoIB

Performance Manager Agent

PMA

Subnet Manager Agent

SMA

Management Datagram

MAD

Subnet Administrator SA

Common

InfiniBand

iWARP

Key

InfiniBand HCA iWARP R-NIC

Hardware Specific Driver

Hardware Specific Driver

Connection Manager

MAD

InfiniBand OpenFabrics Kernel Level Verbs / API iWARP

SA Client

Connection Manager

Connection Manager Abstraction (CMA)

InfiniBand OpenFabrics User Level Verbs / API iWARP

SDP IPoIB SRP iSER RDS

SDP Lib

User Level MAD API

Open SM

Diag Tools

Hardware

Provider

Mid-Layer

Upper Layer Protocol

User APIs

Kernel Space

User Space

NFS-RDMA RPC

Cluster File Sys

Application Level

SMA

Clustered DB Access

Sockets Based Access

Various MPIs

Access to File

Systems

Block Storage Access

IP Based App

Access

Apps & Access

Methods for using OF Stack

UDAPL

Ker

nel b

ypas

s

Ker

nel b

ypas

s

Page 10: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS over RDMA Timeline

NetApp/Sun 2007 IETF RFCs

RFC 5532 problem statement in 2009 RFC 5666 RDMA for RPC in 2010 RFC 5667 NFS DDP in 2010

Renewed effort with rise in RDMA interest Under active development – mostly client side Chelsio, Emulex, Intel, LANL, Mellanox, NASA, NetApp,

Oracle…

10

Page 11: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS over RDMA Overview

NFS extensions to use RDMA fabric (for NFSv2,3,4) Client sends RPC in RDMA messages Server initiates RDMA data transfer transactions

Reduces client side CPU utilization Eliminates client side data copies Leverages low latency fabric Requires NIC with RDMA offload at both server and

client ends

11

Page 12: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Client Stack

12

NFS Client

RPC Transport Switch

TCP/IP or UDP/IP RPC

RDMA CM

IB CM IW CM

Host Stack

TCP Offload Module

RDMA Driver Network Device Driver

Kernel

RDMA Offload

TCP Offload T5 NIC

RDMA Verbs

RDMA RPC

Page 13: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Chelsio T5 Ethernet Controller ASIC

13

Single processor data-flow pipelined architecture

Up to 1M connections Concurrent multi-protocol

operation Full TCP/IPv4|IPv6 offload in

4CLK @500MHz

1G/10G/40G MAC Embedded

Layer 2 Ethernet Switch

Lookup, filtering and Firewall Cut-Through RX Memory

Cut-Through TX Memory

Data-flow Protocol Engine

Traffic Manager

Application Co-Processor TX

Application Co-Processor RX

DMA

Engi

ne

PCI-e

, X8,

Gen

3

General Purpose Processor

Optional external DDR3 memory

1G/10G/40G MAC

100M/1G/10G MAC

100M/1G/10G MAC

On-Chip DRAM Memory Controller

Page 14: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

T5 Storage Protocol Support

14

NFS

Lower Layer Driver

iSCSI

iSER

SMB Direct

NDK

FCoE NVMe

RPC

Network Driver RDMA Driver iSCSI Driver

T5 Network Controller

FCoE Driver

SMB

RDMA Offload

TCP Offload

iSCSI Offload FCoE

Offload NIC

T10-DIX

Page 15: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Test Configuration Clients (x4)

OS RHEL6.5 Kernel 3.16.0, NFSv4 + latest NFSRDMA fixes Processor Intel(R) Xeon(R) CPU E5-2687W [email protected] No of Processors 2 No of Cores Total 16 (HT Disabled) RAM 64 GB Card Type T580-CR Card Core Clock 500MHz

Server

OS RHEL6.1 Kernel 3.16.0, NFSv4 + latest NFSRDMA fixes Processor Intel(R) Xeon(R) CPU E5-2687W @ 3.10GHz No of Processors 2 No of Cores Total 16 (HT Disabled) RAM 64 GB Card Type T580-CR Card Core Clock 500MHz Share 32GB ramdisk w/ ext2 filesystem.

15

• Clients connected through switch to server with all 40Gbps links • Sequential I/O direct (no buffer caching) • Need OFED 3.12+ for 40G iWARP support

Clients

40 Gb Switch

NFS/RDMA Server

40 Gb 40 Gb 40 Gb

40 Gb

Page 16: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Write – iWARP vs. L2 NIC

16

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

4 8 16 32 64 128 256 512 1024 2048 4096

Thr

ough

put i

n M

B/s

ec

I/O Size in KB

Write

RDMA

NIC

Page 17: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Write Client Ints/sec – iWARP vs. L2 NIC

17

05000

100001500020000250003000035000400004500050000

4 8 16 32 64 128 256 512 1024 2048 4096

Inte

rrup

ts/s

ec

I/O Size in KB

Write Ints/Sec

RDMA-Clis

NIC-Clis

Page 18: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Read – iWARP vs. L2 NIC

18

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

4 8 16 32 64 128 256 512 1024 2048 4096

Thr

ough

put i

n M

B/s

ec

I/O Size in KB

Read

RDMA

NIC

Page 19: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Read Client Ints/sec – iWARP vs. L2 NIC

19

05000

100001500020000250003000035000400004500050000

4 8 16 32 64 128 256 512 1024 2048 4096

Inte

rrup

ts/s

ec

I/O Size in KB

Read Ints/Sec

RDMA-Clis

NIC-Clis

Page 20: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Write – iWARP vs. InfiniBand

20

0500

100015002000250030003500400045005000

4 8 16 32 64 128 256 512 1024 2048 4096

Thro

ughp

ut in

MB/

sec

I/O Size in KB

Write Throughput

IW

IB

RHEL6.4, NFS Share: 40GB ramdisk, ext2 file system Kernel: 3.16.0 + NFSv4 + latest NFSRDMA/cxgb4 fixes, default settings CPU: Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz 64GB RAM 2 CPUs, 16 cores total, no HT IW HW: Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller IB HW: Mellanox Technologies MT27500 Family [ConnectX-3] FDR

Page 21: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Write – iWARP vs. FDR InfiniBand

21

0102030405060708090

100

4 8 16 32 64 128 256 512 1024 2048 4096

% C

PU

I/O Size in KB

Write Idle CPU

IW-Srv

IB-Srv

IW-Clis

IB-Clis

Page 22: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Read – iWARP vs. InfiniBand

22

0500

100015002000250030003500400045005000

4 8 16 32 64 128 256 512 1024 2048 4096

Thro

ughp

ut in

MB/

sec

I/O Size in KB

Read Throughput

IW

IB

Page 23: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

NFS Read – iWARP vs. InfiniBand

23

0102030405060708090

100

4 8 16 32 64 128 256 512 1024 2048 4096

% C

PU

I/O Size in KB

Read Idle CPU

IW-Srv

IB-Srv

IW-Clis

IB-Clis

Page 24: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Conclusions

RDMA fabric offers potential for improved efficiency SMB v3.0 RDMA transport demonstrated significant gains

Renewed interest in NFS/RDMA Work in progress Performance benefits compared to NIC

iWARP RDMA is shipping at 40Gbps High performance Ethernet alternative to InfiniBand

Chelsio adapter enables simultaneous operation of RDMA, NIC, TOE, iSCSI, FCoE… TCP/IP for Wireless, LAN, Datacenter and Cloud networking Remains “a great all-in-one adapter”*

Call to action Contribute to RDMA and NFS/RDMA in Linux Mailing lists linux-rdma and linux-nfs on vger.kernel.org 24

* Helen Chen et al. OFA NFS/RDMA Presentation 2007

Page 25: NFS/RDMA over 40Gbps iWARP · 2019. 12. 21. · Reliable Datagram Service . RDS iSCSI RDMA Protocol (Initiator) iSER . SCSI RDMA Protocol (Initiator) SRP Sockets Direct Protocol SDP

2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.

Thank You

Please visit www.chelsio.com for more info

25