Distributed File Systems: Design Comparisons

36
1 Distributed File Systems: Design Comparisons David Eckhardt, Bruce Maggs slides used and modified with permission from Pei Cao’s lectures in Stanford Class CS- 244B

description

Distributed File Systems: Design Comparisons. David Eckhardt, Bruce Maggs slides used and modified with permission from Pei Cao’s lectures in Stanford Class CS-244B. Other Materials Used. 15-410 Lecture 34, NFS & AFS, Fall 2003, Eckhardt NFS: RFC 1094 for v2 (3/1989) - PowerPoint PPT Presentation

Transcript of Distributed File Systems: Design Comparisons

Page 1: Distributed File Systems: Design Comparisons

1

Distributed File Systems: Design Comparisons

David Eckhardt, Bruce Maggs

slides used and modified with permission from

Pei Cao’slectures in Stanford Class CS-244B

Page 2: Distributed File Systems: Design Comparisons

2

Other Materials Used• 15-410 Lecture 34, NFS & AFS, Fall 2003, Eckhardt• NFS:

– RFC 1094 for v2 (3/1989)– RFC 1813 for v3 (6/1995)– RFC 3530 for v4 (4/2003)

• AFS:– “The ITC Distributed File System: Principles and Design”,

Proceedings of the 10th ACM Symposium on Operating System Principles, Dec. 1985, pp. 35-50.

– “Scale and Performance in a Distributed File System”, ACM Transactions on Computer Systems, Vol. 6, No. 1, Feb. 1988, pp. 51-81.

– IBM AFS User Guide, version 36

Page 3: Distributed File Systems: Design Comparisons

3

More Related Material

RFCs related to Remote Procedure Calls (RPCs)– RFC 1831 XDR representation– RFC 1832 RPC– RFC 2203 RPC security

Page 4: Distributed File Systems: Design Comparisons

4

Outline• Why Distributed File Systems?

• Basic mechanisms for building DFSs– Using NFS V2 as an example

• Design choices and their implications– Naming (this lecture)– Authentication and Access Control (this lecture)– Batched Operations (this lecture)– Caching (next lecture)– Concurrency Control (next lecture)– Locking implementation (next lecture)

Page 5: Distributed File Systems: Design Comparisons

5

15-410 Gratuitous Quote of the Day

Good judgment comes from experience… Experience comes from bad judgment.

- attributed to many

Page 6: Distributed File Systems: Design Comparisons

6

Why Distributed File Systems?

Page 7: Distributed File Systems: Design Comparisons

7

What Distributed File Systems Provide

• Access to data stored at servers using file system interfaces

• What are the file system interfaces?– Open a file, check status of a file, close a file– Read data from a file– Write data to a file– Lock a file or part of a file– List files in a directory, create/delete a directory– Delete a file, rename a file, add a symlink to a file– etc

Page 8: Distributed File Systems: Design Comparisons

8

Why DFSs are Useful

• Data sharing among multiple users

• User mobility

• Location transparency

• Backups and centralized management

Page 9: Distributed File Systems: Design Comparisons

9

“File System Interfaces” vs. “Block Level Interfaces”

• Data are organized in files, which in turn are organized in directories

• Compare these with disk-level access or “block” access interfaces: [Read/Write, LUN, block#]

• Key differences:– Implementation of the directory/file structure and

semantics– Synchronization (locking)

Page 10: Distributed File Systems: Design Comparisons

10

Digression:“Network Attached Storage” vs.

“Storage Area Networks”

NAS SAN

Access Methods File access Disk block access

Access Medium Ethernet Fiber Channel and Ethernet

Transport Protocol Layer over TCP/IP SCSI/FC and SCSI/IP

Efficiency Less More

Sharing and Access Control

Good Poor

Integrity demands Strong Very strong

Clients Workstations Database servers

Page 11: Distributed File Systems: Design Comparisons

11

Basic DFS Implementation Mechanisms

Page 12: Distributed File Systems: Design Comparisons

12

Components in a DFS Implementation

• Client side:– What has to happen to enable applications to

access a remote file the same way a local file is accessed?

• Communication layer:– Just TCP/IP or a protocol at a higher level of

abstraction?

• Server side:– How are requests from clients serviced?

Page 13: Distributed File Systems: Design Comparisons

13

Client Side Example:Basic UNIX Implementation

• Accessing remote files in the same way as accessing local files kernel support

Page 14: Distributed File Systems: Design Comparisons

14

VFS interception

• VFS provides “pluggable” file systems• Standard flow of remote access

– User process calls read()– Kernel dispatches to VOP_READ() in some

VFS– nfs_read()

• check local cache• send RPC to remote NFS server• put process to sleep

Page 15: Distributed File Systems: Design Comparisons

15

VFS interception

• Standard flow of remote access (continued)– server interaction handled by kernel process

• retransmit if necessary

• convert RPC response to file system buffer

• store in local cache

• wake up user process

– nfs_read()• copy bytes to user memory

Page 16: Distributed File Systems: Design Comparisons

16

Communication Layer Example:Remote Procedure Calls (RPC)

Failure handling: timeout and re-issue

xid“call”serviceversionprocedureauth-infoarguments

xid“reply”reply_statauth-inforesults

RPC call RPC reply

Page 17: Distributed File Systems: Design Comparisons

17

Extended Data Representation (XDR)

• Argument data and response data in RPC are packaged in XDR format– Integers are encoded in big-endian format– Strings: len followed by ascii bytes with NULL

padded to four-byte boundaries– Arrays: 4-byte size followed by array entries– Opaque: 4-byte len followed by binary data

• Marshalling and un-marshalling data• Extra overhead in data conversion to/from XDR

Page 18: Distributed File Systems: Design Comparisons

18

Some NFS V2 RPC Calls

• NFS RPCs using XDR over, e.g., TCP/IP

• fhandle: 32-byte opaque data (64-byte in v3)

Proc. Input args Results

LOOKUP dirfh, name status, fhandle, fattr

READ fhandle, offset, count status, fattr, data

CREATE dirfh, name, fattr status, fhandle, fattr

WRITE fhandle, offset, count, data

status, fattr

Page 19: Distributed File Systems: Design Comparisons

19

Server Side Example: mountd and nfsd

• mountd: provides the initial file handle for the exported directory– Client issues nfs_mount request to mountd– mountd checks if the pathname is a directory and if the directory

should be exported to the client

• nfsd: answers the RPC calls, gets reply from local file system, and sends reply via RPC– Usually listening at port 2049

• Both mountd and nfsd use underlying RPC implementation

Page 20: Distributed File Systems: Design Comparisons

20

NFS V2 Design

• “Dumb”, “Stateless” servers• Smart clients• Portable across different OSs• Immediate commitment and idempotency

of operations• Low implementation cost• Small number of clients• Single administrative domain

Page 21: Distributed File Systems: Design Comparisons

21

Stateless File Server?• Statelessness

– Files are state, but...

– Server exports files without creating extra state• No list of “who has this file open” (permission check on each

operation on open file!)

• No “pending transactions” across crash

• Results– Crash recovery is “fast”

• Reboot, let clients figure out what happened

– Protocol is “simple”

• State stashed elsewhere– Separate MOUNT protocol

– Separate NLM locking protocol

Page 22: Distributed File Systems: Design Comparisons

22

NFS V2 Operations

• V2: – NULL, GETATTR, SETATTR– LOOKUP, READLINK, READ– CREATE, WRITE, REMOVE, RENAME– LINK, SYMLINK– READIR, MKDIR, RMDIR– STATFS (get file system attributes)

Page 23: Distributed File Systems: Design Comparisons

23

NFS V3 and V4 Operations

• V3 added:– READDIRPLUS, COMMIT (server cache!)– FSSTAT, FSINFO, PATHCONF

• V4 added:– COMPOUND (bundle operations)– LOCK (server becomes more stateful!)– PUTROOTFH, PUTPUBFH (no separate MOUNT)– Better security and authentication

Page 24: Distributed File Systems: Design Comparisons

24

NFS File Server Failure Issues

• Semantics of file write in V2– Bypass UFS file buffer cache

• Semantics of file write in V3– Provide “COMMIT” procedure

• Locking provided by server in V4

Page 25: Distributed File Systems: Design Comparisons

25

Design Choices in DFS

Page 26: Distributed File Systems: Design Comparisons

26

Topic 1: Name-Space Construction and Organization

• NFS: per-client linkage– Server: export /root/fs1/– Client: mount server:/root/fs1 /fs1 fhandle

• AFS: global name space– Name space is organized into Volumes

• Global directory /afs; • /afs/cs.wisc.edu/vol1/…; /afs/cs.stanford.edu/vol1/…

– Each file is identified as fid = <vol_id, vnode #, uniquifier>– All AFS servers keep a copy of “volume location database”,

which is a table of vol_id server_ip mappings

Page 27: Distributed File Systems: Design Comparisons

27

Implications on Location Transparency

• NFS: no transparency– If a directory is moved from one server to

another, client must remount

• AFS: transparency– If a volume is moved from one server to

another, only the volume location database on the servers needs to be updated

Page 28: Distributed File Systems: Design Comparisons

28

Topic 2: User Authentication and Access Control

• User X logs onto workstation A, wants to access files on server B– How does A tell B who X is?

– Should B believe A?

• Choices made in NFS V2

– All servers and all client workstations share the same <uid, gid> name space B send X’s <uid,gid> to A

• Problem: root access on any client workstation can lead to creation of users of arbitrary <uid, gid>

– Server believes client workstation unconditionally• Problem: if any client workstation is broken into, the

protection of data on the server is lost;

• <uid, gid> sent in clear-text over wire request packets can be faked easily

Page 29: Distributed File Systems: Design Comparisons

29

User Authentication (cont’d)

• How do we fix the problems in NFS v2– Hack 1: root remapping strange behavior– Hack 2: UID remapping no user mobility– Real Solution: use a centralized

Authentication/Authorization/Access-control (AAA) system

Page 30: Distributed File Systems: Design Comparisons

30

Example AAA System: NTLM

• Microsoft Windows Domain Controller– Centralized AAA server– NTLM v2: per-connection authentication

client

file server

Domain Controller

1 2 34

5

6 7

Page 31: Distributed File Systems: Design Comparisons

31

A Better AAA System: Kerberos

• Basic idea: shared secrets– User proves to KDC who he is; KDC generates shared

secret between client and file server

client

ticket servergenerates S

“Need to access fs”

K client[S] file serverK

fs [S]

S: specific to {client,fs} pair; “short-term session-key”; expiration time (e.g. 8 hours)

KDC

encrypt S withclient’s key

Page 32: Distributed File Systems: Design Comparisons

32

Kerberos Interactions

client

ticket servergenerates S

“Need to access fs”

Kclient[S], ticket = Kfs[use S for client]

file serverclient

1.

2.

ticket=Kfs[use S for client], S{client, time}

S{time}

• Why “time”?: guard against replay attack• mutual authentication• File server doesn’t store S, which is specific to {client, fs}• Client doesn’t contact “ticket server” every time it contacts fs

KDC

Page 33: Distributed File Systems: Design Comparisons

33

Kerberos: User Log-on Process

• How does user prove to KDC who the user is– Long-term key: 1-way-hash-func(passwd)

– Long-term key comparison happens once only, at which point the KDC generates a shared secret for the user and the KDC itself ticket-granting ticket, or “logon session key”

– The “ticket-granting ticket” is encrypted in KDC’s long-term key

Page 34: Distributed File Systems: Design Comparisons

34

Operator Batching

• Should each client/server interaction accomplish one file system operation or multiple operations?

• Advantage of batched operations

• How to define batched operations

Page 35: Distributed File Systems: Design Comparisons

35

Examples of Batched Operators

• NFS v3: – READDIRPLUS

• NFS v4:– COMPOUND RPC calls

Page 36: Distributed File Systems: Design Comparisons

36

Summary

• Functionalities of DFS• Implementation of DFS

– Client side: VFC interception– Communication: RPC or TCP/UDP– Server side: server daemons

• DFS name space construction– Mount vs. Global name space

• DFS access control– NTLM– Kerberos