October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch [email protected].

26
June 23, 2022 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch [email protected]

Transcript of October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch [email protected].

Page 1: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

April 19, 2023

pNFS extension for NFSv4IETF-62 March 2005

Brent Welch

[email protected]

Page 2: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 2

Abstract

pNFS extends NFSv4

Scalable I/O problem

(Brief review) of the proposal

Status and next steps

Object Storage and pNFS

Object security scheme

Page 3: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 3

pNFS

Extension to NFSv4

NFS is THE file system standard

Fewest additions that enable parallel I/O from clients

Layouts are the key additional abstraction

Describe where data is located and how it is organized

Clients access storage directly, in parallel

Generalized support for asymmetric solutions

Files: Clean way to do filer virtualization, eliminate botteneck

Objects: Standard way to do object-based file systems

Blocks: Standard way to do block-based SAN file systems

Page 4: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 4

Scalable I/O Problem

Storage for 1000’s of active clients => lots of bandwidth

Scaling capacity through 100’s of TB and into PB

File Server model has good sharing and manageability

but it is hard to scale

Many other proprietary solutions

GPFS, CXFS, StorNext, Panasas, Sustina, Sanergy, …

Everyone has their own client

Like to have a standards based solution => pNFS

Page 5: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 5

Scaling and the Client

Gary Grider’s rule of thumb for HPC

1 Gbyte/sec for each Teraflop of computing power

2000 3.2 GHz processors => 6TF => 6 GB/sec

One file server with 48 GE NICs? I don’t think so.

100 GB/sec I/O system in ’08 or ’09 for 100 TF cluster

Making movies

1000 node rendering farm, plus 100’s of desktops at night

Oil and Gas

100’s to 1000’s of clients

Lots of large files (10’s of GB to TB each)

EDA, Compile Farms, Life Sciences …

Everyone has a Linux cluster these days

Page 6: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 6

Asymmetric File Systems

Control Path vs. Data Path (“out-of-band” control)

Metadata servers (Control)

File name space

Access control (ACL checking)

Sharing and coordination

Storage Devices (Data)

Clients access storage directly

SAN Filesystems

CXFS, EMC Hi Road, Sanergy, SAN FS

Object Storage File Systems

Panasas, Lustre

Client

pNFSServer

StorageStorageStorageStorageStoragStorageClientClientClientClient

NFSv4 + pNFS ops

StorageProtocol

StorageManagementProtocol

Page 7: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 7

NFS and Cluster File Systems

Currently, NFS head can be a client of a cluster file system (instead of a client of the local file system)

NFSv4 adds more state, makes integration with cluster file system more interesting

NFS Client

Cluster FS

NFS ClientNFS Client

NFS Client

NFS ClientNFS Client

NFS ClientNFS Client

Cluster FSCluster FS

Cluster FS

Native ClientNative Client

Native ClientNative Client

NFS “Head”

Native ClientNFS “Head”

Native ClientNFS “Head”

Native Client

Page 8: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 8

pNFS and Cluster File Systems

Replace proprietary cluster file system protocols with pNFS

Lots of room for innovation inside the cluster file system

pNFS Client

Cluster FS + NFSv4

pNFS Client

Native ClientNative Client

Native ClientNative Client

Cluster FS + NFSv4Cluster FS + NFSv4

Cluster FS + NFSv4Cluster FS + NFSv4

pNFS ClientpNFS Client

pNFS ClientpNFS Client

pNFS ClientpNFS Client

Page 9: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 9

pNFS Ops Summary

GETDEVINFO

Maps from opaque device ID used in layout data structures to the storage protocol type and necessary addressing information for that device

LAYOUTGET

Fetch location and access control information (i.e., capabilities)

LAYOUTCOMMIT

Commit write activity. New file size and attributes visible on storage.

LAYOUTRELEASE

Give up lease on the layout

CB_LAYOUTRETURN

Server callback to recall layout lease

Page 10: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 10

Multiple Data Server Protocols

BE INCLUSIVE !!Broaden the market reach

Three (or more) flavors of out-of-band metadata attributes:

BLOCKS: SBC/FCP/FC or SBC/iSCSI… for files built on blocks

OBJECTS: OSD/iSCSI/TCP/IP/GE for files built on objects

FILES: NFS/ONCRPC/TCP/IP/GE for files built on subfiles

Inode-level encapsulation in server and client code

Local Filesystem

pNFS server

pNFS IFS

Client Apps

Layout driver

1. SBC (blocks)2. OSD (objects)3. NFS (files)

NFSv4 extendedw/ orthogonallayout metadataattributes

Layout metadatagrant & revoke

Page 11: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 11

March 2005 Connectathon

NetApp server prototype by Garth Goodson

NFSv4 for the data protocol

Layouts are an array of filehandles and stateIDs

Storage Mangement protocol is also NFSv4

Client prototype by Sun

Solaris 10 variant

Page 12: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 12

Object Storage

Object interface is midway between files and blocks (think inode)

Create, Delete, Read, Write, GetAttr, SetAttr, …

Objects have numeric ID, not pathnames

Objects have a capability based security scheme (details in a moment)

Objects have extensible attributes to hold high-level FS information

Based on NASD and OBSD research out of CMU (Gibson et. al)

SNIA T10 standards based. V1 complete, V2 in progress.

Page 13: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 13

Objects as Building Blocks

Object

Comprised of:User DataAttributesLayout

Interface:ID <dev,grp,obj>Read/WriteCreate/DeleteGetattr/SetattrCapability-based

File Component:Stripe files acrossstorage nodes

Page 14: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 14

Objects and pNFS

Clients speak pNFS to metadata server

Or they will, eventually

Clients speak iSCSI/OSD to the data servers (OSD)

Files are striped across multiple objects

Metadata manager speaks iSCSI/OSD to data server

Metadata manager uses extended attributes on objects to store metadata

High level file system attributes like owners and ACLs

Storage maps

Directories are just files stored in objects

Page 15: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 15

Object Security (1)

Clean security model based on shared, secret device keys

Part of T10 standard

Metadata manager knows device keys

Metadata manager signs caps, OSD verifies them (explained next slide)

Metadata server returns capability to the client

Capability encodes: object ID, operation, expire time, data range, cap version, and a signature

Data range allows serialization over file data, as well as different rights to different attributes

Cap version allows revocation by changing it on the object

May want to protect capability with privacy (encryption) to avoid snooping the path between metadata manager and client

Page 16: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 16

Object Security (2)

Capability Request (client to metadata manager)

CapRequest = object ID, operation, data range

Capability signed with device key known by metadata manager

CapSignature = {CapRequest, Expires, CapVersion} KeyDevice

CapSignature used as a signing key (!)

OSD Command (client to OSD)

Request contains {CapRequest, Expires, CapVersion} + other details + nonce to prevent replay attacks

RequestSignature = {Request} CapSignature

OSD can compute CapSignature by signing CapRequest with its own key

OSD can then verify RequestSignature

Caches and other tricks used to make this go fast

Page 17: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 17

Scalable Bandwidth

Panasas Bandwidth vs. OSDs

0

2

4

6

8

10

12

0 50 100 150 200 250 300 350

Num OSDs (Panasas OSD has 2 disks, processor, and GE NIC)

GB

/se

c

Page 18: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 18

Per Shelf BandwidthBandwidth vs Clients, 10 OSD

0

50

100

150

200

250

300

350

400

450

0 5 10 15 20 25

Clients

MB

/se

c

Read (64K) Write (64K)

Page 19: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 19

Scalable NFSNFS Throughput (SPEC FS)

0

2

4

6

8

10

12

14

16

18

20

0 50000 100000 150000 200000 250000 300000 350000

SPEC SFS Ops/sec

Res

po

nse

Tim

e (m

sec)

60-Director ORT 10-Director ORT

Page 20: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 20

Rendering load from 1000 clients

Page 21: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 21

Status

pNFS ad-hoc working group

Dec ’03 Ann Arbor, April ’04 FAST, Aug ’04 IETF, Sept ’04 Pittsburgh

Internet Drafts

draft-gibson-pnfs-problem-statement-01.txt July 2004

draft-gibson-pnfs-reqs-00.txt October 2004

draft-welch-pnfs-ops-00.txt October 2004

draft-welch-pnfs-ops-01.txt March 2005

Next Steps

Add NFSv4 layout and storage protocol to current draft

Add text for additional error and recovery cases

Add separate drafts for object storage and block storage

Page 22: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 22

Backup

Page 23: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 23

Object Storage File System

Out of band control path

Direct data path

Page 24: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 24

“Out-of-band” Value Proposition

Out-of-band allows a client to use more than one storage address for a given file, directory or closely linked set of files

Parallel I/O direct from client to multiple storage devices

Scalable capacity: file/dir uses space on all storage: can get big

Capacity balancing: file/dir uses space on all storage: evenly

Load balancing: dynamic access to file/dir over all storage: evenly

Scalable bandwidth: dynamic access to file/dir over all storage: big

Lower latency under load: no bottleneck developing deep queues

Cost-effectiveness at scale: use streamlined storage servers

pNFS standard leads to standard client SW: share client support $$$

Page 25: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 25

Scaling and the Server

Tension between sharing and throughput

File server provides semantics, including sharing

Direct attach I/O provides throughput, no sharing

File server is a bottleneck between clients and storage

Pressure to make server ever faster and more expensive

Clustered NAS solutions, e.g., Spinnaker

SAN filesystems provide sharing and direct access

Asymmetric, out-of-band system with distinct control and data paths

Proprietary solutions, vendor-specific clients

Physical security model, which we’d like to improve

Client

Server

ClientClientClient

Page 26: October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch welch@panasas.com.

IETF 61, Nov 2004Page 26

Symmetric File Systems

Distribute storage among all the clients

GPFS (AIX), GFS, PVFS (User Level)

Reliability Issues

Compute nodes less reliable because of the disk

Storage less reliable, unless replication schemes employed

Scalability Issues

Stealing cycles from clients, which have other work to do

Coupling of computing and storage

Like early days of engineering workstations, private storage