Post on 17-Dec-2015
1
UNIX Internals – the New Frontiers
Distributed File Systems
2
Difference between DOS and DFS
Distributed OS looks like a centralized OS, but runs simultaneously on multiple machines. It may provide a FS shared by all its host machines.
Distributed FS is a software layer that manages communication between conventional operating systems and file systems
3
General Characteristics of DFS
Network transparency Location transparency & Location
independence User Mobility Fault tolerance Scalability File mobility
4
Design Considerations Name Space Stateful or stateless Semantics of sharing
UNIX semantics Session semantics
Remote access method
5
Network File System(NFS)
Based on Client-server model Communicate via remote procedure call
6
User Perspective An NFS server exports one or more file
systems Hard mount: must get a reply Soft mount: returns an error Spongy mount: hard for mount, soft for I/O
Commands: mount –t nfs nfssrv:/usr /usr mount –t nfs nfssrv:/usr/u1 /u1 mount –t nfs nfssrv:/usr /users mount –t nfs nfssrv:/usr/local
/usr/local
7
8
Design goals Not restricted to UNIX Not be dependent on any hardware Simple recovery mechanisms To access remote files transparently UNIX semantics NFS performance must be comparable
to that of a local disk Transport-independent
9
NFS components
NFS protocol RPC protocol XDR(Extended Data Representation) NFS server code NFS client code Mount protocol Daemon processes (nfsd, mountd,biod) NLM(Network Lock Manager)& NSM(Network Status Monitor)
10
Statelessness Each request is independent It makes crash recovery simple
Client crash Server crash
Problem: It must commit all modifications to stable
storage before replying to a request.
11
10.4 The protocol suite
Why XDR? Differences among internal
representation of data elements: Order, sizes of types. Opaque (byte stream) Typed Little-endian Big-endian
12
XDR
Integers 32 bits, (0 byte leftmost - most significant),
(signed integers - 2’s compliment) Variable-length opaque data
Length(4B),data is NULL padded Strings
Length(4B), ASCII string, NULL padded Arrays
size(4B),same type of data Structures
Natural order
13
14
RPC Specify the format of communications
between the client and the server. SUN RPC: synchronous requests only. Implemented on UDP/IP. Authentication to identify callers
AUTH _NULL, AUTH _UNIX, AUTH_SHORT, AUTH _DES, and AUTH _KERB
RPC language compiler: rpcgen
15
16
10.5 NFS Implementation Control Flow Vnode Rnode
17
File Handle Assign a file handle for lookup, create or
mkdir. Subsequent I/O operations will use it. A file handle =Opaque 32B object =<file
system ID, inode number, generation number>
Generation number is used to check if the file is not obsolete (its inode is allocated to another file)
18
The mount operation nfs_mount():
send RPC request with argument of pathname
Mountd daemon translate Checks Reply success with a file handle Initialize vfs, records name, address Allocate rnode & vnode Server must check access rights on each
request
19
Pathname Lookup Client:
Initiate lookup during open, create & stat From current or root directory, proceeds one
component at a time Send request if it is a NFS directory
Server From file handle ->FS ID->vfs->VGET-> vnode
->VOP_LOOKUP->vnode & pointer VOP_GETATTR->VOP_FID-> file handle Reply message= status+file handle+file attributes
Client: Gets the reply, allocates rnode+vnode, copy info and
proceeds to search for the next component
20
10.6 UNIX Semantics
NFS leads to a few incompatibilities with UNIX because of stateless.
Open file permission UNIX checks for open NFS checks for each read and write In NFS, the server always allows the owner of the
file to read or write the file. Write to the write-protected?
Save attributes containing the file permission when open
21
Deletion of open files The server has no ideas about the
open file. The clients renames the file to be
deleted. Delete it when closing it Delete on different machines?
22
Reads and Writes UNIX locks the vnode at the start of I/O NFS clients can lock the vnode on the
same machine. NFS offers no protection against
overlapping I/O requests. Using NLM(Network Lock Manager)
protocol is only advisory.
23
10.7 NFS Performance
Bottlenecks Writes must be committed to stable storage Fetching of file attributes requires one RPC
call per file Processing retransmitted requests adds to
the load on the server
24
Client-side caching Caching both blocks and file attributes To avoid invalid data
Keep an expiry time in the kernel 60 seconds for rechecking the modified time
Reduces but not eliminates the problem
25
Deferral of writes
Asynchronous writes for full blocks Delayed writes for partial blocks Flush delayed writes when closing or 30
seconds by biod daemon Server uses NVRAM buffer, flushes the
buffer to disk Write-gathering:
Wait, process >1 writes to one file and reply for each
The server process gathered write requests
26
The retransmissions cache Idempotent Nonidempotent Problem:
Retransmissions (xid) cache (server): Check xid, procedure number, & client ID Check cache only when failure
Remove request Remove, sends reply success, but lostClient restransmit removeServer processes remove request Remove error, sends remove failureClient receives the error message
27
New implementation
Caches all requests Check xid, procedure number, client ID, state
field & timestamp If request in progress, discard; if done,
discards if timestamp shows the request is in the throwaway window(3-6s)
Otherwise processes request if idempotent; For nonidempotent, checks the file if
modified, if not - send success; otherwise, retry it.
28
10.9 NFS Security NFS Access Control
On mount and request By an exports list
Mount: checks the list, denies the ineligible Request: authentication information,
AUTH_UNIX form(UID,GID)
Loophole: a imposter can use <UID,GID> to access the files of others
29
UID Remapping
A translation map for each client. Same UID may map to different UID on
the server Nobody if does not match in the map Implemented at RPC level Implemented at NFS level
Merging the map and /etc/exports file
30
Root Remapping Map the super user to nobody Limit the super user of the client to
access files on the server The UNIX framework is designed for an
isolated, multi-user environment. The users trust each other.
31
10.10 NFS Version 3 Commit request
Client writes, the kernel sends asynchronous write
Server saves to local cache, replies immediately Client holds the data copy until the process
closes the file and sends commit request Server flushes data to disk
file length: From 32 bits(4GB) to 64 bits(234 GB)
READDIRPLUS =(LOOKUP+GETATTR) Returns names, file handles, file attributes
32
Other DFS
The Andrew File System (10.15 – 10.17)
The DCE Distributed File System (10.18 – 10.18.5)