FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment
-
Upload
elaine-sutton -
Category
Documents
-
view
23 -
download
1
description
Transcript of FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment
FARSITE: Federated, FARSITE: Federated, Available, and Reliable Available, and Reliable
Storage for an Incompletely Storage for an Incompletely Trusted EnvironmentTrusted Environment
IntroductionIntroduction
Farsite: serverless distributed file systemFarsite: serverless distributed file system Logically functions as a centralized file serverLogically functions as a centralized file server
Designed for desktop environmentsDesigned for desktop environments Need some effort for initial configurationsNeed some effort for initial configurations With little central administration to With little central administration to
maintainmaintain
Farsite CharacteristicsFarsite Characteristics
Peer-to-peer among untrusted machinesPeer-to-peer among untrusted machines Need to handle privacy, integrity, durabilityNeed to handle privacy, integrity, durability
CryptographyCryptography Randomized replicationRandomized replication Byzantine fault-toleranceByzantine fault-tolerance
Farsite WorkloadsFarsite Workloads
High access localityHigh access locality Low update rateLow update rate Sequential accesses with rare Sequential accesses with rare
concurrencyconcurrency
AdministrationAdministration
Machine certificates bind machines to their Machine certificates bind machines to their public keyspublic keys
User certificates bind users to their public User certificates bind users to their public keyskeys
Namespace certificates bind namespace Namespace certificates bind namespace roots to their managing machinesroots to their managing machines
Design AssumptionsDesign Assumptions
for ~10for ~1055 machines machines All interconnected by a high-bandwidth, All interconnected by a high-bandwidth,
low-latency networklow-latency network Majority of machines to be up most of the Majority of machines to be up most of the
timetime Uncorrelated permanent machine failuresUncorrelated permanent machine failures Read-mostly sharingRead-mostly sharing Few malicious usersFew malicious users
Enabling Technology TrendsEnabling Technology Trends
Increase in unused disk capacityIncrease in unused disk capacity In 2000, 58% of disk capacity unused at In 2000, 58% of disk capacity unused at
MicrosoftMicrosoft Can replicate data for reliabilityCan replicate data for reliability
Decrease in the computational costDecrease in the computational cost Can easily encrypt at 53 MB/secCan easily encrypt at 53 MB/sec Disk transfers at 32 MB/secDisk transfers at 32 MB/sec Can use strong cryptography for securityCan use strong cryptography for security
Namespace RootsNamespace Roots
Allow multiple roots for multiple machinesAllow multiple roots for multiple machines
Trust and CertificationTrust and Certification
Based on public-key-cryptographic Based on public-key-cryptographic certificatescertificates Encrypt(KeyEncrypt(Keypublicpublic, text, textplainplain) ) text textciphercipher
Decrypt(KeyDecrypt(Keyprivateprivate, text, textciphercipher) ) text textplainplain
Encrypt(KeyEncrypt(Keyprivateprivate, text, textplainplain) ) text textciphercipher
Decrypt(KeyDecrypt(Keypublicpublic, text, textciphercipher) ) text textplainplain
Public Key Encryption BasicsPublic Key Encryption Basics
IdeaIdea Public key is publishedPublic key is published Private key is the secretPrivate key is the secret
Encrypt(KeyEncrypt(Keymy_publicmy_public, “Hi, Andy”), “Hi, Andy”) Anyone can create it, but only I can read itAnyone can create it, but only I can read it
Encrypt(KeyEncrypt(Keymy_privatemy_private, “I’m Andy”), “I’m Andy”) Everyone can read it, but only I can create itEveryone can read it, but only I can create it
Public Key Encryption BasicsPublic Key Encryption Basics
Encrypt(KeyEncrypt(Keyyour_publicyour_public, Encrypt(Key, Encrypt(Keymy_privatemy_private, “I , “I
know your secret”))know your secret”)) Only you can read it, and only I can send itOnly you can read it, and only I can send it
Basic SystemBasic System
Every machine has three rolesEvery machine has three roles ClientClient
• A machine that interacts with a userA machine that interacts with a user Directory groupDirectory group
• A set of machines that manage files via Byzantine-A set of machines that manage files via Byzantine-fault-tolerant protocolfault-tolerant protocol
• Every group member owns a replicaEvery group member owns a replica File hostFile host
More on the Basic SystemMore on the Basic System
+ Reliability + Reliability
+ Data integrity+ Data integrity
- Performance- Performance Byzantine’s algorithm can only tolerate up to Byzantine’s algorithm can only tolerate up to
1/3 of failed replicas1/3 of failed replicas Need lots of replicasNeed lots of replicas
- Privacy- Privacy
- Storage consumption - Storage consumption
System EnhancementsSystem Enhancements
Local cachingLocal caching A client can lease a copy of a fileA client can lease a copy of a file
Encrypt written files with public keys of all Encrypt written files with public keys of all authorized clientsauthorized clients Offload those files to file hostsOffload those files to file hosts Store only the content hash of those files Store only the content hash of those files
locallylocally Can validate damaged copiesCan validate damaged copies Can tolerate n – 1 file host failuresCan tolerate n – 1 file host failures
Traditional Byzantine Approach Traditional Byzantine Approach [CL99][CL99]
Client
File
Meta-Data
Byzantine fault-tolerant protocol
Byzantine servers
3f +1 file copies to handle f failures
Farsite: BFT only for meta-dataFarsite: BFT only for meta-data
Client
Byzantine fault-tolerant protocol
Directory groupFile hosts
f + 1 file copiesfor f failures
Semantic Differences from NTFSSemantic Differences from NTFS
Hard limit on concurrent writesHard limit on concurrent writes Soft limit on concurrent readSoft limit on concurrent read
Sometime supply stale snapshotsSometime supply stale snapshots No name-locking on open file’s pathNo name-locking on open file’s path
File System FeaturesFile System Features
ReliabilityReliability AvailabilityAvailability SecuritySecurity DurabilityDurability ConsistencyConsistency ScalabilityScalability EfficiencyEfficiency ManageabilityManageability
Reliability and AvailabilityReliability and Availability
ReplicationReplication When a machine in unavailable for an When a machine in unavailable for an
extended periodextended period Its functions migrate to othersIts functions migrate to others
CachingCaching
PrivacyPrivacy
File content and metadata are encryptedFile content and metadata are encrypted Convergent encryptionConvergent encryption
Encrypt(HashEncrypt(Hashone_wayone_way(block(blockplainplain), block), blockplainplain) )
blockblockciphercipher
Hash EncryptData blocks
More on Convergent EncryptionMore on Convergent Encryption
Block hashes are used to identify identical Block hashes are used to identify identical block contentsblock contents
Block-level encryption allows block-level Block-level encryption allows block-level changes without re-encrypting the entire changes without re-encrypting the entire filefile
More on Convergent EncryptionMore on Convergent Encryption
Encrypt(KeyEncrypt(Keyfilefile, file_hashes, file_hashesplainplain) )
file_hashesfile_hashesciphercipher
EncryptBlock hashes
More on Convergent EncryptionMore on Convergent Encryption
Encrypt(KeyEncrypt(Keyclient1_publicclient1_public, Key, Keyfilefile) ) Key Keyfile_cipher1file_cipher1
Encrypt(KeyEncrypt(Keyclient2_publicclient2_public, Key, Keyfilefile) ) Key Keyfile_cipher2file_cipher2
…… Store both encrypted file and keysStore both encrypted file and keys
DirectoriesDirectories
Also encrypted Also encrypted Use Use exclusive encryptionexclusive encryption
Prevent malicious client from encrypting a Prevent malicious client from encrypting a syntactically illegal namesyntactically illegal name
IntegrityIntegrity
Use hash trees to compare filesUse hash trees to compare files If the root matches, two files are identicalIf the root matches, two files are identical If not, compare the hashes at the lower levelIf not, compare the hashes at the lower level Until the discrepancy is identifiedUntil the discrepancy is identified
The cost of in-place updates is logarithmic The cost of in-place updates is logarithmic of the file sizeof the file size
Linear time to verify the integrity of Linear time to verify the integrity of individual blocksindividual blocks
DurabilityDurability
Updates are logged and compressed Updates are logged and compressed locallylocally
The log is pushed back to the directory The log is pushed back to the directory group periodically and when a lease is group periodically and when a lease is recalledrecalled
Each log entry is verified Each log entry is verified
ConsistencyConsistency
Control can be loaned to clientsControl can be loaned to clients Content leasesContent leases Name leasesName leases Mode leasesMode leases Access leasesAccess leases
Data ConsistencyData Consistency
Content leasesContent leases Read/writeRead/write Read-onlyRead-only
• Assures no stale dataAssures no stale data Single-writer, multiple-reader semanticsSingle-writer, multiple-reader semantics A lease is kept until it is expired or recalledA lease is kept until it is expired or recalled Can lease a file, directory, a treeCan lease a file, directory, a tree
Namespace ConsistencyNamespace Consistency
Name leasesName leases Can create a file name Can create a file name Can create a directory and its files and Can create a directory and its files and
subdirectoriessubdirectories
Windows File-Sharing SemanticsWindows File-Sharing Semantics
Mode leasesMode leases Read, write, delete, exclude-read, exclude-Read, write, delete, exclude-read, exclude-
write, exclude-deletewrite, exclude-delete
Windows Deletion SemanticsWindows Deletion Semantics
Open it, mark it for deletion, close itOpen it, mark it for deletion, close it A file is not deleted until the last file closeA file is not deleted until the last file close Access leasesAccess leases
Public: Lease holder has the file openPublic: Lease holder has the file open ProtectedProtected
• No other client will be granted access without first No other client will be granted access without first contacting the lease holdercontacting the lease holder
PrivatePrivate• No other client has any access lease on the fileNo other client has any access lease on the file
ScalabilityScalability
Hint-based pathname translationHint-based pathname translation CachingCaching
Delayed directory-change notificationDelayed directory-change notification
Space EfficiencySpace Efficiency
Reclaim space from duplicate filesReclaim space from duplicate files Workgroup-shared documentsWorkgroup-shared documents Multiple copies of common applicationsMultiple copies of common applications Can save 50% of storage requirementCan save 50% of storage requirement Based on hash comparisonsBased on hash comparisons
Time EfficiencyTime Efficiency
Insert a delay between a file creation and Insert a delay between a file creation and replicationreplication Expect many files get deleted shortly after Expect many files get deleted shortly after
their creationtheir creation Reduced network trafficReduced network traffic
Local-Machine AdministrationLocal-Machine Administration
Machine replacementMachine replacement A special case of hardware failureA special case of hardware failure
Little need for backupLittle need for backup
Performance MeasurementsPerformance Measurements
Used only five machines…Used only five machines… With only 1 hour of file-system traceWith only 1 hour of file-system trace
450,164 file operations450,164 file operations 2 to 4 times as long as NTFS 2 to 4 times as long as NTFS
reads/writes/closesreads/writes/closes 9 times as long for opens9 times as long for opens 20 times as long for metadata accesses20 times as long for metadata accesses 5.5 times slower I/O latencies5.5 times slower I/O latencies