Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.
-
date post
19-Dec-2015 -
Category
Documents
-
view
219 -
download
3
Transcript of Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.
![Page 1: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/1.jpg)
Coda Server Internals
Peter J Braam
![Page 2: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/2.jpg)
Contents
Data structure overviewVolumesVnodesInodes
![Page 3: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/3.jpg)
Data Structure Overview
Inodes
VolumesVnodesDirectory cntsACLReslogs
Volinfo records
VSGDBPdb recordsTokens
Servers/SCMPartitionsStartup flagsSkipvolumesLOG & DATA & DB Locators
File Contents
Meta Data &Dir contents
Volume location
Security
Configuration Data
/vicep* partitions
RVM
VLDB, VRDB: RW db files
VSGDB, .pdb, .tk files:dynamic RO db files
Static data
Object Purpose Resides where
![Page 4: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/4.jpg)
RVM layout (coda_globals.h)
Already_initialized (int)
struct VolHead[MAXVOLS]
struct VnodeDiskObject
*SmallVnodeFreeLists[SM_FREESIZE]
short SmallVnodeIndex
…. Same for large …
MaxVolId (unsigned long)
Remainder is dynamically allocated
![Page 5: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/5.jpg)
Volume zoo (volume.h, camprivate.h)
RVM: structures VolumeData VolHead VolumeHeader VolumeDiskDa
ta
VM: structures Volume VolumeInfo ……..
![Page 6: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/6.jpg)
A volume in RVM
VolHead
VolumeHeader VolumeDataVolumeHeader
*volumeDiskData*smallVnodeListsnsmallVnodesnsmallLists-- same for big --
stampid parentidtype
containspointer torvm malloced data
![Page 7: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/7.jpg)
VolumeDiskData (rvm)
Lots of stuff: Identity & location: partition, name, runtime info: use, inService, blessed,
salvaged Vnode related: next uniquefier Versionvector Resolution flags, pointer to recov_vol_log Quota Resource usage: filecount, diskused etc
![Page 8: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/8.jpg)
Volumes in VM
struct Volumes sit in VolHash with copies of RVM data structures
Salvage before “attaching” to VolHashModel of operation (FS):
GetVolume copy out from RVM Do your mods in VM PutVolume does RVM transaction
Model of operation (Volutil): operate on RVM
![Page 9: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/9.jpg)
Volumes in Venus RPC’s
One RPC: GetVolInfo used for mount point traversal
Only relates to volume location database volume replication database VSGDB
Could sit in separate Volume Location Server
![Page 10: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/10.jpg)
Vnodes (cvnode.h)
Small & large: large for directories difference is ACL at back of large vnodes
Inode field: small vnodes: points to diskfile inode number large vnodes: is RVM address of dir inode
Contain important small structure: vv_tPointers to reslog entriesVM: cvnode’s with hash table, freelists etc
![Page 11: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/11.jpg)
Vnodes in RVM
RVM: VnodeDiskinfo (rvm_malloced) vnodes sit on rec_smolists
each link points to a DiskVnode lists link vnodes with identical
vnodenumbers but different uniquefiers new vnodes grabbed from FreeLists
(index.cc, recov{a,b,c}.cc) volumes have arrays of rec_smolists
which grow when they are full
![Page 12: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/12.jpg)
Vnodes in action
Model: GetFSObj calls GetVnode work is done PutFS Objects calls
rvm_begin_transactionReplaceVnode - copies data from VM to RVMrvm_end_transaction
Getting a vnode takes 3 pointer derefs, possibly 3 page faults vs. 1 for local file systems.
Is this necessary? Probably not. Cure it: yes!
![Page 13: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/13.jpg)
Directories (rvm)
DirInode page table and “copy on write” refcount
DirPages 2048 bytes each build up the directory divided into 64 32byte blobs Hash table for fast name lookups Blob Freelist Array of free blobs per page
![Page 14: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/14.jpg)
Directories
More than one vnode can point to directory (copy on write)
VM: hash table of DirHandles point to VM contiguous copy of dir point to DirInode have a lock etc
Model: as for volumes & vnodesCritique: too baroque
![Page 15: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/15.jpg)
Files
Vnode references file by InodeNumber
Files are copy on writeThere are “FileInodes” like dir
inodes, but they are held in external DB or in inode itself
Server always reads/writes whole files (could be exploited)
![Page 16: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/16.jpg)
Volinit and salvage
Set up volume hash table, serverlist, DiskPartitionList
Cycle through partitions, check each for list of inodes every inode has a vnode every vnode has a directory name every directory name has a vnode
Put volume in a VM hash table
![Page 17: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/17.jpg)
Server connection info
Array of HostEntry (a “venus”) Contains a linked list of connections Contains a callback connection id
Connection setup first binding creates a host & callback conn new binding creates a new connection and
verifies callback in RPC2_NewBinding & ViceNewConnectFS
![Page 18: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/18.jpg)
Callbacks
Hashtable of FileEntries: each contains Fid number of users linked list of callbacks
Callbacks: point to HostEntryOps:
RPC: BreakCallBack Local: placing, delete, deleteVenus
![Page 19: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/19.jpg)
Callbacks
Connection is non-authenticated. Should be fixed. Session key for CB connection should not expire.
Side effect of callback connection is used for BackFetch bulk transfer of files during reintegration.
![Page 20: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/20.jpg)
RPC processing
Venus RPC’s: srvproc.cc - standard file ops srvproc2.cc - standard volume ops codaproc.cc - repair stuff codaproc2.cc - reintegration stuff
Volutil RPC’s: vol-your-rpc.cc (in coda-src/volutil)
Resolution: below
![Page 21: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/21.jpg)
RPC processing
RPC structure: ValidateParms: validate, hand off COP2, cid GetObject: vm copy, lock objects CheckSemantics:
Concurrency, Integrity, Permissions
Perform operations:BulkTransfer, UpdateObjects, OutParms
PutObject: rvm transactions, inode deletions
![Page 22: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/22.jpg)
vlists
GetFSObjects: instantiate a vlist RPC needs list of objects copied from RVM Modification status is held there (did
CopyOnWrite kick in etc)PutObjects
rvm_begin_transaction walk through the list, copy, rvm_set_range,
unlock rvm_end_transaction
![Page 23: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/23.jpg)
COP2 handling
In COP2 Venus give final VV to serverare sent out by Venus (with some
delay) often piggybacked in bulkserver knows about pending COP2
entries in hash table (coppend.cc)Manager thread CopPendingManager
Runs every minute. Removes entries more than 900 secs old
![Page 24: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/24.jpg)
Cop2 to RVM
Data can be PiggyBacked on another rpc sent in ViceCop2 rpc.
Both cases call InternalCop2 (srvproc.cc)InternalCop2 (codaproc.cc)
notifies the manager to dequeue gets the FS objects listed for the COP2 installs final VV’s into RVM (rvm
transaction!)
![Page 25: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/25.jpg)
COP2 Problems
Easy cause of conflicts in replicated volumes when clients access objects in rapid succession. (Can be fixed easily during the writeback caching operation)
Not optimized for singly replicated volume.
![Page 26: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/26.jpg)
Resolution
Initiated by client with RPC to coordinator ViceResolve (codaproc.cc)
coordinator sets up connections in VSG
(unauthenticated) LockAndFetch (res/reslock, resutil):
lock volumes, collect “closure”
![Page 27: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/27.jpg)
Resolution - special cases
RegResDirRequired (rvmres/rvmrescoord.cc)
check for unresolved ancestors already inconsistent runts (missing objects) weak equality (identical storeid)
![Page 28: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/28.jpg)
RecovDirResolve
Phase II: (rvmres/{rescoord,subphase?}.cc) coordinator request logs from other servers subordinates lock affected dirs,marshall logs coordinator merges logs
Phase III: ship merged log to subordinates perform operations on VM copies Return results to coordinator
![Page 29: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/29.jpg)
Resolution
Phase IV: (is old Phase 3 …) collect results, compute new VV’s ship
to subordinates commit results
![Page 30: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/30.jpg)
Comments on resolution
Old versions of resolution: OldDirResolve: resolve only runts and weak DirResolve: resolve only in VM Remove these
resolve directory has nothing to do with resolution: should be called librepair. Srv uses merely one function in it - repair uses the rest
![Page 31: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/31.jpg)
Volume Log
During FS operations, log entries are created for use during resolution
Different format per operation (rvmres/recov_vollog.cc)
Added to the vlist by SpoolVMLogRecord
Put in RVM at commit time
![Page 32: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/32.jpg)
Repair
Venus makes ViceRepair RPC. File and symlink repair: BulkTransfer the
object Directory repair, BulkTransfer the repair
file and replay operations Venus follows this with a COP2 multi rpc For directory repair Venus invokes
asynchronous resolve
![Page 33: Coda Server Internals Peter J Braam. Contents zData structure overview zVolumes zVnodes zInodes.](https://reader038.fdocuments.us/reader038/viewer/2022110207/56649d3f5503460f94a188a4/html5/thumbnails/33.jpg)
Future
Good: Design is simple and efficient There is little C++: should eliminate easy to multi-thread
Bad: Scalability ~8GB in practice, ~40GB in
theory Data handling is bad: tricky to fix Volume code was & is worst: rewrite