pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe...
Transcript of pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe...
![Page 1: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/1.jpg)
2015 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
pNFS/RDMA: Possibilities
Chuck LeverOracle Corporation
![Page 2: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/2.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
The opinions expressed in this presentation are the presenter’s own, and do not represent the
views of Oracle or anyone else.
![Page 3: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/3.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Given these storage trends: ❒ Throughput of networks is increasing ❒ Latency of persistent storage is dropping
exponentially ❒ Capacity is off the charts
❒ How can NFS make good use of our new Persistent Memory overlords?
What If . . . ?
3
![Page 4: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/4.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Traditional NFS
![Page 5: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/5.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Each NFS file resides on one server
❒ Applications locate files via a POSIX directory structure
❒ Clients access data via NFS READ and WRITE operations
Traditional NFS Operation
5
![Page 6: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/6.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Traditional NFS Server Storage Topology
6
SAN
Ethernet
NFS server
NFS clients
XFS
![Page 7: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/7.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ One RPC issued at a time per TCP socket
❒ Typically one or a few TCP sockets are shared across a server’s shares
❒ Data throughput is constrained by the server
Traditional NFS Weaknesses
7
![Page 8: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/8.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Traditional NFS FILE_SYNC WRITE
8
NFS Client NFS ServerTCP send
TCP send
Server updates durable storage
Application writes
Write is complete
TCP sendTCP send
TCP send
. . .
![Page 9: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/9.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ To avoid waiting for durable storage on every WRITE, NFSv3 introduced unstable WRITE plus COMMIT ❒ Client flushes data to server asynchronously ❒ Client sends COMMIT ❒ Server makes written data durable
❒ Transport bottlenecks remained
Two-phase Commit
9
![Page 10: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/10.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
What Is pNFS?
![Page 11: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/11.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ NFS protocol manages metadata ❒ Directory structure ❒ File open and lock state ❒ File data layout information ❒ Fall-back I/O mechanism
❒ Separate protocol and transports handle I/O
Data / Metadata Separation
11
![Page 12: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/12.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ A layout type: ❒ Specifies which transport protocol to use ❒ How to locate file data ❒ Specified separately from NFS protocol
❒ A layout instance tells where a file’s data resides ❒ Which NFS server and file, or ❒ Which SCSI LUN at which LBA
pNFS Layout Types
12
![Page 13: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/13.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Applications retain single-server view of files ❒ NFS server manages data layout ❒ Each NFS client can stripe file I/O across multiple
storage services ❒ Data and metadata operations run concurrently ❒ Clients and servers share a storage fabric
❒ SCSI, iSCSI, iSER, SRP ❒ Object-based storage ❒ NFS
Parallel NFS In A Nutshell
13
![Page 14: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/14.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
pNFS Server Storage Topology
14
SAN
Ethernet
NFS server
NFS clients
XFSSCSI
![Page 15: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/15.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ High Performance Computing ❒ Parallel I/O ❒ Greater file capacity
❒ Deployments where storage clients and servers share a storage fabric ❒ Each client can be directed to a particular
server ❒ Each file can be placed on a particular server
Example Usage Scenarios
15
![Page 16: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/16.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
What Is NFS/RDMA?
![Page 17: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/17.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ I/O-like access of the physical memory on another host ❒ Strong ordering of operations ❒ Asynchronous: completion fires when an
operation finishes ❒ Datagram channel: SEND and RECV ❒ Data transfer: READ and WRITE
What Is Remote Direct Memory Access?
17
![Page 18: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/18.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Zero-copy is possible on both send and receive ❒ No CPU cache footprint until app accesses
data ❒ Transport resources are pre-allocated
❒ No resource allocation in data path ❒ Reduced opportunity for deadlock
❒ Data transfer is concurrent with other transport operations
RDMA Ready For 100Gbps Fabrics
18
![Page 19: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/19.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Each RPC is conveyed by RDMA operations ❒ Ultra-low round-trip latency
❒ RNICs handle bulk data transfer ❒ Low CPU overhead ❒ High bandwidth
NFS/RDMA Concepts
19
![Page 20: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/20.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Non-I/O operations conveyed via RDMA SEND ❒ GETATTR, LOOKUP, and so on
❒ Data operations (i.e. NFS READ and WRITE) utilize RDMA READ and WRITE ❒ Server initiates all RDMA transfer ❒ After that, neither host CPU is involved
Data / Metadata Separation
20
![Page 21: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/21.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
NFS/RDMA FILE_SYNC WRITE
21
NFS Client NFS ServerRDMA SEND
RDMA READREAD result
RDMA SEND
RDMA READREAD result
Server updates durable storage
Application writes
Write is complete
![Page 22: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/22.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Use NFS/RDMA instead of NFS/TCP on IPoIB ❒ See “RDMA On 100Gbps Fabrics”
❒ Latency-sensitive SLAs
❒ CPU-intensive client workloads
❒ One-time bulk-data movement (e.g. backup)
Example Usage Scenarios
22
![Page 23: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/23.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
pNFS and NFS/RDMA
![Page 24: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/24.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Client gets direct access to durable storage
❒ E.g. ultra-low latency Persistent Memory
❒ No protocol translation overhead
❒ Data not even read into server DRAM
Why pNFS/RDMA?
24
![Page 25: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/25.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Multiple transport connections per client mount point
❒ Multiple QPs
❒ Multiple RNICs
Why pNFS/RDMA?
25
![Page 26: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/26.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ Single converged fabric shared between pNFS clients and servers
❒ Rather than “pNFS/TCP with SCSI”
❒ Instead use “pNFS/RDMA with SRP”
Why pNFS/RDMA?
26
![Page 27: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/27.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
pNFS/RDMA Server Storage Topology
27
RDMA Fabric
NFS server
NFS clients
XFS
![Page 28: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/28.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Next Steps
![Page 29: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/29.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ NFSv4.1 on RDMA is a pre-requisite
❒ Bi-directional RPC-over-RDMA ❒ Lots of backchannel session slots ❒ NFSv4.1 Upper Layer Binding to RPC-over-
RDMA
What’s Needed For NFS/RDMA
29
![Page 30: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/30.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ A new pNFS layout type is not required for operation with SRP or iSER
❒ Proposal: a new pNFS layout type for accessing remote Persistent Memory devices directly ❒ Device naming ❒ Ensuring data durability ❒ Error handling and fencing ❒ Authentication, data privacy
What’s Needed For pNFS
30
![Page 31: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/31.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Questions / Discussion
![Page 32: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/32.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
Appendix
![Page 33: pNFS over RDMA - Possibilities - SNIA · NFS server manages data layout Each NFS client can stripe file I/O across multiple storage services Data and metadata operations run concurrently](https://reader031.fdocuments.us/reader031/viewer/2022011911/5f8c062e319c6113392f19dc/html5/thumbnails/33.jpg)
2015 Storage Developer Conference. Copyright © 2015 Oracle and its affiliates. All Rights Reserved.
❒ pNFS Standards ❒ NFSv4.1: RFC 5661 ❒ pNFS layouts: RFCs 5662 - 5665
❒ NFS/RDMA Standards ❒ RPC-over-RDMA: RFC 5666 ❒ NFS/RDMA ULB: RFC 5667
NFS Reference Material
33