Garth A. Gibson*, David F. Nagle**, William Courtright II*, Nat Lanza*, Paul Mazaitis*, Marc...
-
Upload
clarence-sanders -
Category
Documents
-
view
214 -
download
2
Transcript of Garth A. Gibson*, David F. Nagle**, William Courtright II*, Nat Lanza*, Paul Mazaitis*, Marc...
Garth A. Gibson*, David F. Nagle**, William Courtright II*, Nat Lanza*, Paul Mazaitis*, Marc Unangst*, Jim Zelenka*
"NASDScalable Storage Systems",USENIX99, Extreme Linux
Workshop, Monterey, CA, June 1999.http://www.pdl.cs.cmu.edu/Publications/publications.html)
Motivation
• NASD minimizes server based data movement and separates management and filesystem sematics from store-and-forward copying
• Figure 1: Standalone server with attached disks– Look at long path requests and data take through OS layers and through
various machines
• Reference implementation of NASD for Linux 2.2 including NASD device code that runs on workstation or PC masquerading as subsystem or disk drive
• NFS-like distributed file system that uses NASD subsystems or devices
• NASD striping middleware for large striped files
Figure 1 -- NetSCSI and NASD
• Figure 1 outlines data path where clients ask for data, servers forward request to storage -- forwarded request is a DMA command to return data directly to a client.– When DMA is complete, status is returned to server and collected and
forwarded to client
• NASD– On first access, client contacts server for access checks
– Server grants reusable rights or capabilities
– Clients then present requests directly to storage
– Storage verifies capabilities and directly replies
NASD Interface
• Read, write object data
• Read, write object attributes
• Create, resize, remove soft partitions
• Construct copy-on-write version of object
• Logical version number on file can be changed by file manager to revoke capability
NASD Security
• Security protocol– Capability has public portion -- CAapArg, private key CapKey
– CapArg specifies what rights are being granted for which object
– CapKey is a keyed message digest of CapArg and a secret key shared only with target drive
– Client sends CapArg with each request, gnerates a CapKey-keyed digest of request parameters and CapArg
– Each drive knows its secret keys and receives CapArg with each request
– Can compute client’s CapKey and verify request
– If any field of CapArg or request has been changed, digest comparison will fail
– Scheme protects integrity of requests but does not protect privacy of data
Filesystems for NASD
• Constructed distributed file system with NFS-like semantics tailored for NASD
• Each file and directory occupies exactly one NASD object, offsets in files are same as offsets in objects
• File length, last file modify time correspond directly to NASD-maintained object attributes
• Remainder of file attributes stored in uninterpreted section of object’s attributes
• Data moving operations -- read, write) and attribute reads (getattr) are sent directly to NASD drive
– file attributes are either computed from NASD object attributes (e.g. modify times and object size) or stored in the uninterpreted filesystem-specific attribute
• Other requests are handled by file manager
• Capabilities are piggybacked on file manager’s response to lookup operations
Access to Striped Files and Continuous Media
• NASD-optimized parallel filesystem
• Filesystem manages objects not directly backed by data
• Backed by storage manager which redirects clients to component NASD objects
• NASD PFS supports SIO low-level parallel filesystem interface on top of NASD-NFS files striped using user-level Cheops middleware
• Figure 6
Garth A. Gibson, David F. Nagle, Khalil Amiri, Jeff Butler, Fay W. Chang, Howard Gobioff, Charles Hardin, Erik Riedel, David Rochberg and Jim Zelenka A cost-effective, high-bandwidth storage architecture.
Architectural Support for Programming Languages and Operating Systems Proceedings of the 8th international conference on Architectural support for programming languages and operating systems October 2 - 7,
1998, San Jose, CA USA Pages 92-103.
Evolution of storage architectures
• Local Filesystem -- Simple- aggregate, application, file management concurrency control, low level storage management. Data makes one trip of peripheral area network such as SCSI. Disks offer fixed sized block abstraction
• Distributed Filesystem -- Intermediate server machine is introduced. Server offers simple file access interface to clients.
• Distributed Filesystem with RAID controller -- Interpose another computer -- RAID controller.
• Distributed Filesystem that employs DMA -- Can arrange to DMA data to clients rather than to copy through server. HPSS is an example (although this is not how it is usually employed).
• NASD- based DFS, NASD-Cheops based DFS
Principals of NASD
• Direct transfer -- data moved between drive and client without indirection or store-and-forward through file server
• Asynchronous oversight -- Ability of client to perform most operations without synchronous appeal to the file manager
• Cryptographic integrity -- Drives ensure that commands and data have not been tampered with by generating and verifying cryptographic keyed digests
• Object based interface -- Drives export variable length objects instead of fixed-size blocks. Allows disk drives to direct knowledge of relationships between disk blocks and minimize security overhead.
Prototype Implementation
• NASD prototype drive runs on 133MHz, 64MB, Dec Alpha 3000/400 with two Seagate ST52160 disks attached by two 5 MB/s SCSI busses
• Intended to simulate a controller and drive
• NASD system implements own internal object access, cache, disk space management modules
• Figure 6 -- Performance for sequential reads and writes– Sequential bandwidth as function of request size
– NASD better tuned for disk access on reads that miss cache
– FFS better tuned for cache accesses
– Write performance of FFS due to immediate acknowledgement for writes up to 64KB
Scalability
• 13 NASD drives, each linked by OC-3 ATM to 10 client machines
• Each client issues series of sequential 2MB read requests striped across four NASDs.
• Each NASD can deliver 32MB/s from cache to RPC protocol stack
• DCE RPC cannot push more than 80Mb/s through a 155 Mb/s ATM link before receiving client saturates
• Figure 7 demonstrates close to linear scaling up to 10 clients
Computational Requirements
• Table 1 -- number of instructions needed to service given request size including all communications (DCE RPC, UDP/IP)
• Overhead mostly due to communications
• Significantly more expensive than Seagate Barracuda
Filesystems for NASD
• NFS covered in last paper
• AFS -- lookup operations carried out by parsing directory files locally
• AFS RPCs added to obtain and relinquish capabilities explicitly
• AFS’s sequential consistency provided by breaking callbacks (notifying holders of potentially stale copies) when a write capability is issued
• File manager does’nt know that a write operation has arrived at a drive so it must tell clients when a write may occur
• No new callbacks on file with outstanding write capability
• AFS enforces per-volume quota on allocated disk space
• File manager allocates space when it issues a capability, and it keeps track of how much space is actually written to
Active Disks
• Provide full application-level programmability of drives
• Customize functionality for data intensive computations
• NASD’s object based interface provides knowledge of data at devices without having to use external metadata