Pervasive Computing: Vision and Challenges M. Satyanarayanan IEEE Personal Communications, 2001.
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J....
-
Upload
philip-osborne -
Category
Documents
-
view
216 -
download
1
Transcript of CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J....
![Page 1: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/1.jpg)
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT
M. Satyanarayanan, J. J. Kistler, P. Kumar,M. E. Okasaki, E. H. Siegel, D. C. Steere
Carnegie-Mellon University
![Page 2: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/2.jpg)
Paper highlights
• A decentralized distributed file system to be accessed from autonomous workstations– Most of these features were already present in
AFS (Andrew File System)• An optimistic mechanism to handle inconsistent
updates:– Coda does not prevent inconsistencies,
it detects them
![Page 3: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/3.jpg)
Introduction (I)
• AFS was a very successful DFS for acampus-sized user community– Even wanted to extend it nationwide but
WWW took over instead• Key ideas include
– Close-to-open consistency– Callbacks
![Page 4: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/4.jpg)
Introduction (I)
• CODA extends AFS by– Providing constant availability through
replication– Introducing a disconnected mode for
portable computers• Most lasting contribution
![Page 5: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/5.jpg)
Hardware Model (I)
• CODA and AFS assume that client workstations are personal computers owned by their users– Fully autonomous– Cannot be trusted
• CODA allows owners of laptops to operate them in disconnected mode– Opposite of ubiquitous connectivity
![Page 6: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/6.jpg)
Hardware Model (II)
• Coda added later a weak connectivity mode for portable computers linked to the CODA servers through slow links (like modems)– Allows for slow reintegration
![Page 7: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/7.jpg)
Other Models
• Plan 9 and Amoeba– All computing is done by pool of servers– Workstations are just display units
• NFS and XFS– Clients are trusted and always connected
• Farsite– Untrusted clients double as servers
![Page 8: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/8.jpg)
Design Rationale
• CODA has three fundamental objectives– Scalability: to build a system that could grow
without major problems– Fault-Tolerance: system should remain
usable in the presence of server failures, communication failures and voluntary disconnections
– Unix Emulation
![Page 9: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/9.jpg)
Scalability
• AFS was scalable because– Clients cache entire files on their local disks– Server uses callbacks to maintain client cache
coherence• Reduces server’s involvement at open time
• Coda keeps same general philosophy– Accessibility and scalability are more important
than data consistency
![Page 10: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/10.jpg)
Accessibility
• Must handle two types of failures– Server failures:
• Data servers are replicated– Communication failures and voluntary
disconnections• Coda uses optimistic replication and
file hoarding
![Page 11: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/11.jpg)
Optimistic replication (I)
• Pessimistic replication control protocols guarantee the consistency of replicated in the presence of any non-Byzantine failures– Typically require a quorum of replicas to allow
access to the replicated data– Would not support disconnected mode
![Page 12: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/12.jpg)
Example
• Majority consensus voting:– Every update must involve a majority of
replicas– Every majority contains at least one
replica
2 3 3
![Page 13: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/13.jpg)
Optimistic replication (II)
• Optimistic replication control protocols allow access in disconnected mode– Tolerate temporary inconsistencies– Promise to detect them later– Provide much higher data availability
![Page 14: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/14.jpg)
UNIX sharing semantics
• Centralized UNIX file systems (and Sprite) provide one-copy semantics– Every modification to every byte of a file is
immediately and permanently visible to all clients
• AFS uses a laxer model (close-to-open consistency)
• Coda uses an even laxer model
![Page 15: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/15.jpg)
AFS-1 semantics
• First version of AFS– Revalidated cached file on each open– Propagated modified files when they were
closed• If two users on two different workstations modify
the same file at the same time, the users closing the file last will overwrite the changes made by the other user
![Page 16: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/16.jpg)
Open-to-Close Semantics
• Example:
TimeF
F’
F’ F”
First client
Second client
F” overwrites F’
TimeF
F’
F’
F”
F”
First client
Second client
F” overwrites F’
![Page 17: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/17.jpg)
AFS-2 semantics (I)
• AFS-1 required each client to call the server every time it was opening an AFS file– Most of these calls were unnecessary as user
files are rarely shared• AFS-2 introduces the callback mechanism
– Do not call the server, it will call you!
![Page 18: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/18.jpg)
AFS-2 semantics (II)
• When a client opens an AFS file for the first time, server promises to notify it whenever it receives a new version of the file from any other client– Promise is called a callback
• Relieves the server from having to answer a call from the client every time the file is opened– Significant reduction of server workload
![Page 19: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/19.jpg)
AFS-2 semantics (III)
• Callbacks can be lost!
– Client will call the server every tauminutes to check whether it received all callbacks it should have received
– Cached copy is only guaranteed to reflect the state of the server copy up to tau minutes before the time the client opened the file for the last time
![Page 20: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/20.jpg)
Coda semantics (I)
• Client keeps track of subset s of servers it was able to connect the last time it tried
• Updates s at least every tau seconds • At open time, client checks it has the most
recent copy of file among all servers in s– Guarantee weakened by use of callbacks– Cached copy can be up to tau minutes behind
the server copy
![Page 21: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/21.jpg)
Server Replication
• Uses a read-one, write-all approach• Each client has a preferred server
– Holds all callbacks for client– Answers all read requests from client
• Client still checks with other servers to find which one has the latest version of a file
• Servers probe each other once every few seconds to detect server failures
![Page 22: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/22.jpg)
Disconnected mode
• Client caches are managed using an LRU policy• Coda allows user to specify which files should
always remain cached on her workstation and to assign priorities to these files
• When workstation gets reconnected, Coda initiates a reintegration process – Found later that it required a fast link between
workstation and servers
![Page 23: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/23.jpg)
Conflict resolution
• Coda provides automatic resolution of simple directory update conflicts
• Other conflicts are to be resolved manually by the user
![Page 24: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/24.jpg)
Coda semantics (II)
• Conflicts between cached copy and most recent copy in s can happen– Coda guarantees they will always be detected
• At close time, client propagates the new version of the file to all servers in s– If s is empty. client marks the file for later
propagation
![Page 25: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/25.jpg)
Replica management (I)
• Emphasis here is on conflict detection– Coda does not prevent inconsistent updates
but guarantees that inconsistent updates will always be detected
• Each client update is uniquely identified by a storeid
![Page 26: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/26.jpg)
Replica management (II)
• Coda maintains for each replica– Its last store ID (LSID)– The length v of its update history (in other
words, a version number)
![Page 27: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/27.jpg)
Replica management (III)
• Each replication site also maintains its estimates of the version numbers vA, vB, vc of the replicas of the file held by other sites(current version vector or CVV)
• These estimates are always conservative
![Page 28: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/28.jpg)
Example
• Three copies– A:
LSID= 33345 v = 4 CVV = (4 4 3)– B:
LSID= 33345 v = 4 CVV = (4 4 3)– C:
LSID= 2235 v = 3 CVV = (3 3 3)
![Page 29: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/29.jpg)
Replica management (III)• Coda compares the states of replicas by
comparing their LSID’s and CVV’s• Four outcomes can be1. Strong equality: same LSID’s and same CVV’s
– Everything is fine
![Page 30: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/30.jpg)
Replica management (IV)
2. Weak equality: Same LSID’s and different CVV’s– Happens when one site was never notified
that the other was updated– Must fix CVV’s
![Page 31: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/31.jpg)
Replica management (V)
3. Dominance /Submission: LSID’s are different and every element of the CVV of a replica is greater than or equal to the corresponding element of the CVV of the other replicaExample: two replicas A and BCVVA = (4 3) A dominates B
CVVB = (3 3) B is dominated by A
A has the most recent version of the file
![Page 32: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/32.jpg)
Replica management (VI)
4. Inconsistency: LSID’s are different and some element of the CVV of a replica are greater than the corresponding elements of the CVV of the other replica but other are smallerExample: two replicas A and BCVVA = (4 2) A and B areCVVB = (2 3) inconsistentMust fix inconsistency before allowing access to the file
![Page 33: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/33.jpg)
Performance
• Coda is slightly slower than AFS– Server replication degrades performance by a
few percent– Reliance on Camelot transaction facility also
impacts performance– Coda is not as well tuned as AFS
• Using multicast helps reducing this gap
![Page 34: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/34.jpg)
Andrew Benchmark
![Page 35: CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,](https://reader035.fdocuments.us/reader035/viewer/2022070401/56649f1e5503460f94c34fcb/html5/thumbnails/35.jpg)
CONCLUSION
• Coda project had an unusually long life– First work to introduce disconnected mode
• Coda/AFS introduced many new ideas– Close-to-open consistency– Optimistic concurrency control– Callbacks (partially superseded by leases)