October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch [email protected].
-
Upload
primrose-mosley -
Category
Documents
-
view
218 -
download
2
Transcript of October 2, 2015 pNFS extension for NFSv4 IETF-62 March 2005 Brent Welch [email protected].
IETF 61, Nov 2004Page 2
Abstract
pNFS extends NFSv4
Scalable I/O problem
(Brief review) of the proposal
Status and next steps
Object Storage and pNFS
Object security scheme
IETF 61, Nov 2004Page 3
pNFS
Extension to NFSv4
NFS is THE file system standard
Fewest additions that enable parallel I/O from clients
Layouts are the key additional abstraction
Describe where data is located and how it is organized
Clients access storage directly, in parallel
Generalized support for asymmetric solutions
Files: Clean way to do filer virtualization, eliminate botteneck
Objects: Standard way to do object-based file systems
Blocks: Standard way to do block-based SAN file systems
IETF 61, Nov 2004Page 4
Scalable I/O Problem
Storage for 1000’s of active clients => lots of bandwidth
Scaling capacity through 100’s of TB and into PB
File Server model has good sharing and manageability
but it is hard to scale
Many other proprietary solutions
GPFS, CXFS, StorNext, Panasas, Sustina, Sanergy, …
Everyone has their own client
Like to have a standards based solution => pNFS
IETF 61, Nov 2004Page 5
Scaling and the Client
Gary Grider’s rule of thumb for HPC
1 Gbyte/sec for each Teraflop of computing power
2000 3.2 GHz processors => 6TF => 6 GB/sec
One file server with 48 GE NICs? I don’t think so.
100 GB/sec I/O system in ’08 or ’09 for 100 TF cluster
Making movies
1000 node rendering farm, plus 100’s of desktops at night
Oil and Gas
100’s to 1000’s of clients
Lots of large files (10’s of GB to TB each)
EDA, Compile Farms, Life Sciences …
Everyone has a Linux cluster these days
IETF 61, Nov 2004Page 6
Asymmetric File Systems
Control Path vs. Data Path (“out-of-band” control)
Metadata servers (Control)
File name space
Access control (ACL checking)
Sharing and coordination
Storage Devices (Data)
Clients access storage directly
SAN Filesystems
CXFS, EMC Hi Road, Sanergy, SAN FS
Object Storage File Systems
Panasas, Lustre
Client
pNFSServer
StorageStorageStorageStorageStoragStorageClientClientClientClient
NFSv4 + pNFS ops
StorageProtocol
StorageManagementProtocol
IETF 61, Nov 2004Page 7
NFS and Cluster File Systems
Currently, NFS head can be a client of a cluster file system (instead of a client of the local file system)
NFSv4 adds more state, makes integration with cluster file system more interesting
NFS Client
Cluster FS
NFS ClientNFS Client
NFS Client
NFS ClientNFS Client
NFS ClientNFS Client
Cluster FSCluster FS
Cluster FS
Native ClientNative Client
Native ClientNative Client
NFS “Head”
Native ClientNFS “Head”
Native ClientNFS “Head”
Native Client
IETF 61, Nov 2004Page 8
pNFS and Cluster File Systems
Replace proprietary cluster file system protocols with pNFS
Lots of room for innovation inside the cluster file system
pNFS Client
Cluster FS + NFSv4
pNFS Client
Native ClientNative Client
Native ClientNative Client
Cluster FS + NFSv4Cluster FS + NFSv4
Cluster FS + NFSv4Cluster FS + NFSv4
pNFS ClientpNFS Client
pNFS ClientpNFS Client
pNFS ClientpNFS Client
IETF 61, Nov 2004Page 9
pNFS Ops Summary
GETDEVINFO
Maps from opaque device ID used in layout data structures to the storage protocol type and necessary addressing information for that device
LAYOUTGET
Fetch location and access control information (i.e., capabilities)
LAYOUTCOMMIT
Commit write activity. New file size and attributes visible on storage.
LAYOUTRELEASE
Give up lease on the layout
CB_LAYOUTRETURN
Server callback to recall layout lease
IETF 61, Nov 2004Page 10
Multiple Data Server Protocols
BE INCLUSIVE !!Broaden the market reach
Three (or more) flavors of out-of-band metadata attributes:
BLOCKS: SBC/FCP/FC or SBC/iSCSI… for files built on blocks
OBJECTS: OSD/iSCSI/TCP/IP/GE for files built on objects
FILES: NFS/ONCRPC/TCP/IP/GE for files built on subfiles
Inode-level encapsulation in server and client code
Local Filesystem
pNFS server
pNFS IFS
Client Apps
Layout driver
1. SBC (blocks)2. OSD (objects)3. NFS (files)
NFSv4 extendedw/ orthogonallayout metadataattributes
Layout metadatagrant & revoke
IETF 61, Nov 2004Page 11
March 2005 Connectathon
NetApp server prototype by Garth Goodson
NFSv4 for the data protocol
Layouts are an array of filehandles and stateIDs
Storage Mangement protocol is also NFSv4
Client prototype by Sun
Solaris 10 variant
IETF 61, Nov 2004Page 12
Object Storage
Object interface is midway between files and blocks (think inode)
Create, Delete, Read, Write, GetAttr, SetAttr, …
Objects have numeric ID, not pathnames
Objects have a capability based security scheme (details in a moment)
Objects have extensible attributes to hold high-level FS information
Based on NASD and OBSD research out of CMU (Gibson et. al)
SNIA T10 standards based. V1 complete, V2 in progress.
IETF 61, Nov 2004Page 13
Objects as Building Blocks
Object
Comprised of:User DataAttributesLayout
Interface:ID <dev,grp,obj>Read/WriteCreate/DeleteGetattr/SetattrCapability-based
File Component:Stripe files acrossstorage nodes
IETF 61, Nov 2004Page 14
Objects and pNFS
Clients speak pNFS to metadata server
Or they will, eventually
Clients speak iSCSI/OSD to the data servers (OSD)
Files are striped across multiple objects
Metadata manager speaks iSCSI/OSD to data server
Metadata manager uses extended attributes on objects to store metadata
High level file system attributes like owners and ACLs
Storage maps
Directories are just files stored in objects
IETF 61, Nov 2004Page 15
Object Security (1)
Clean security model based on shared, secret device keys
Part of T10 standard
Metadata manager knows device keys
Metadata manager signs caps, OSD verifies them (explained next slide)
Metadata server returns capability to the client
Capability encodes: object ID, operation, expire time, data range, cap version, and a signature
Data range allows serialization over file data, as well as different rights to different attributes
Cap version allows revocation by changing it on the object
May want to protect capability with privacy (encryption) to avoid snooping the path between metadata manager and client
IETF 61, Nov 2004Page 16
Object Security (2)
Capability Request (client to metadata manager)
CapRequest = object ID, operation, data range
Capability signed with device key known by metadata manager
CapSignature = {CapRequest, Expires, CapVersion} KeyDevice
CapSignature used as a signing key (!)
OSD Command (client to OSD)
Request contains {CapRequest, Expires, CapVersion} + other details + nonce to prevent replay attacks
RequestSignature = {Request} CapSignature
OSD can compute CapSignature by signing CapRequest with its own key
OSD can then verify RequestSignature
Caches and other tricks used to make this go fast
IETF 61, Nov 2004Page 17
Scalable Bandwidth
Panasas Bandwidth vs. OSDs
0
2
4
6
8
10
12
0 50 100 150 200 250 300 350
Num OSDs (Panasas OSD has 2 disks, processor, and GE NIC)
GB
/se
c
IETF 61, Nov 2004Page 18
Per Shelf BandwidthBandwidth vs Clients, 10 OSD
0
50
100
150
200
250
300
350
400
450
0 5 10 15 20 25
Clients
MB
/se
c
Read (64K) Write (64K)
IETF 61, Nov 2004Page 19
Scalable NFSNFS Throughput (SPEC FS)
0
2
4
6
8
10
12
14
16
18
20
0 50000 100000 150000 200000 250000 300000 350000
SPEC SFS Ops/sec
Res
po
nse
Tim
e (m
sec)
60-Director ORT 10-Director ORT
IETF 61, Nov 2004Page 20
Rendering load from 1000 clients
IETF 61, Nov 2004Page 21
Status
pNFS ad-hoc working group
Dec ’03 Ann Arbor, April ’04 FAST, Aug ’04 IETF, Sept ’04 Pittsburgh
Internet Drafts
draft-gibson-pnfs-problem-statement-01.txt July 2004
draft-gibson-pnfs-reqs-00.txt October 2004
draft-welch-pnfs-ops-00.txt October 2004
draft-welch-pnfs-ops-01.txt March 2005
Next Steps
Add NFSv4 layout and storage protocol to current draft
Add text for additional error and recovery cases
Add separate drafts for object storage and block storage
IETF 61, Nov 2004Page 22
Backup
IETF 61, Nov 2004Page 23
Object Storage File System
Out of band control path
Direct data path
IETF 61, Nov 2004Page 24
“Out-of-band” Value Proposition
Out-of-band allows a client to use more than one storage address for a given file, directory or closely linked set of files
Parallel I/O direct from client to multiple storage devices
Scalable capacity: file/dir uses space on all storage: can get big
Capacity balancing: file/dir uses space on all storage: evenly
Load balancing: dynamic access to file/dir over all storage: evenly
Scalable bandwidth: dynamic access to file/dir over all storage: big
Lower latency under load: no bottleneck developing deep queues
Cost-effectiveness at scale: use streamlined storage servers
pNFS standard leads to standard client SW: share client support $$$
IETF 61, Nov 2004Page 25
Scaling and the Server
Tension between sharing and throughput
File server provides semantics, including sharing
Direct attach I/O provides throughput, no sharing
File server is a bottleneck between clients and storage
Pressure to make server ever faster and more expensive
Clustered NAS solutions, e.g., Spinnaker
SAN filesystems provide sharing and direct access
Asymmetric, out-of-band system with distinct control and data paths
Proprietary solutions, vendor-specific clients
Physical security model, which we’d like to improve
Client
Server
ClientClientClient
IETF 61, Nov 2004Page 26
Symmetric File Systems
Distribute storage among all the clients
GPFS (AIX), GFS, PVFS (User Level)
Reliability Issues
Compute nodes less reliable because of the disk
Storage less reliable, unless replication schemes employed
Scalability Issues
Stealing cycles from clients, which have other work to do
Coupling of computing and storage
Like early days of engineering workstations, private storage