Chapter 11: File System Implementation
description
Transcript of Chapter 11: File System Implementation
Chapter 11: File System Chapter 11: File System ImplementationImplementation
22
File System ImplementationFile System ImplementationFile-System StructureFile-System StructureFile-System Implementation File-System Implementation Directory ImplementationDirectory ImplementationAllocation MethodsAllocation MethodsFree-Space Management Free-Space Management Efficiency and PerformanceEfficiency and PerformanceRecoveryRecoveryLog-Structured File SystemsLog-Structured File SystemsNFSNFSExample: WAFL File SystemExample: WAFL File System
33
ObjectivesObjectives
To describe the details of implementing To describe the details of implementing local file systems and directory structureslocal file systems and directory structuresTo describe the implementation of remote To describe the implementation of remote file systemsfile systemsTo discuss block allocation and free-block To discuss block allocation and free-block algorithms and trade-offsalgorithms and trade-offs
44
File-System StructureFile-System Structure
File structureFile structure Logical storage unitLogical storage unit Collection of related informationCollection of related information
File system resides on secondary storage File system resides on secondary storage (disks)(disks)File system organized into layersFile system organized into layersFile control blockFile control block – storage structure – storage structure consisting of information about a fileconsisting of information about a file
55
Layered File SystemLayered File System
66
A Typical File Control BlockA Typical File Control Block
77
In-Memory File System In-Memory File System StructuresStructures
The following figure illustrates the The following figure illustrates the necessary file system structures provided necessary file system structures provided by the operating systems.by the operating systems.
Figure 12-3(a) refers to opening a file.Figure 12-3(a) refers to opening a file.
Figure 12-3(b) refers to reading a file.Figure 12-3(b) refers to reading a file.
88
In-Memory File System In-Memory File System StructuresStructures
99
Virtual File SystemsVirtual File Systems
Virtual File Systems (VFS) provide an Virtual File Systems (VFS) provide an object-oriented way of implementing file object-oriented way of implementing file systems.systems.VFS allows the same system call interface VFS allows the same system call interface (the API) to be used for different types of (the API) to be used for different types of file systems.file systems.The API is to the VFS interface, rather The API is to the VFS interface, rather than any specific type of file system.than any specific type of file system.
1010
Schematic View of Virtual File Schematic View of Virtual File SystemSystem
1111
Directory ImplementationsDirectory Implementations
Linear listLinear list of file names with pointer to the data of file names with pointer to the data blocks.blocks. simple to programsimple to program time-consuming to executetime-consuming to execute
Hash TableHash Table – linear list with hash data structure. – linear list with hash data structure. decreases directory search timedecreases directory search time collisionscollisions – situations where two file names hash to – situations where two file names hash to
the same locationthe same location fixed sizefixed size
1212
Allocation MethodsAllocation Methods
An allocation method refers to how disk An allocation method refers to how disk blocks are allocated for files:blocks are allocated for files:
Contiguous allocationContiguous allocation
Linked allocationLinked allocation
Indexed allocationIndexed allocation
1313
Contiguous AllocationContiguous Allocation
Each file occupies a set of contiguous Each file occupies a set of contiguous blocks on the diskblocks on the diskSimple – only starting location (block #) and Simple – only starting location (block #) and length (number of blocks) are requiredlength (number of blocks) are requiredRandom accessRandom accessWasteful of space (dynamic storage-Wasteful of space (dynamic storage-allocation problem)allocation problem)Files cannot growFiles cannot grow
1414
Contiguous Allocation of Disk Contiguous Allocation of Disk SpaceSpace
1515
Extent-Based SystemsExtent-Based Systems
Many newer file systems (I.e. Veritas File Many newer file systems (I.e. Veritas File System - the file system for the HP-UX System - the file system for the HP-UX systems) use a modified contiguous systems) use a modified contiguous allocation schemeallocation schemeExtent-based file systems allocate disk Extent-based file systems allocate disk blocks in blocks in extentsextentsAn An extentextent is a contiguous block of a disk is a contiguous block of a disk Extents are allocated for file allocationExtents are allocated for file allocation A file consists of one or more extents.A file consists of one or more extents.
1616
Linked AllocationLinked AllocationEach file is a linked list of disk blocks: blocks may be scattered Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk.anywhere on the disk.
pointerblock =
1717
Linked Allocation (Cont.)Linked Allocation (Cont.)Simple – need only a starting addressSimple – need only a starting addressFree-space management system – no waste of space Free-space management system – no waste of space No random accessNo random accessMappingMapping
Block to be accessed is the Qth block in the linked chain of blocks representing the file.Displacement into block = R + 1
File-allocation table (FAT) – disk-space allocation used by MS-DOS and OS/2.
1818
Linked AllocationLinked Allocation
1919
File-Allocation TableFile-Allocation Table
2020
Indexed AllocationIndexed Allocation
Brings all pointers together into the Brings all pointers together into the index block.index block.Logical view.Logical view.
index table
2121
Example of Indexed AllocationExample of Indexed Allocation
2222
Indexed Allocation (Cont.)Indexed Allocation (Cont.)Need index tableNeed index tableRandom accessRandom accessDynamic access without external fragmentation, but have Dynamic access without external fragmentation, but have overhead of index block.overhead of index block.Mapping from logical to physical in a file of maximum size Mapping from logical to physical in a file of maximum size of 256K words and block size of 512 words. We need of 256K words and block size of 512 words. We need only 1 block for index table.only 1 block for index table.
R
2323
Indexed Allocation – Mapping Indexed Allocation – Mapping (Cont.)(Cont.)
Mapping from logical to physical in a Mapping from logical to physical in a file of unbounded length (block size of file of unbounded length (block size of 512 words).512 words).Linked scheme – Link blocks of index Linked scheme – Link blocks of index table (no limit on size).table (no limit on size).
2424
Indexed Allocation – Mapping Indexed Allocation – Mapping (Cont.)(Cont.)
outer-index
index table file
Two-level index (maximum file size is Two-level index (maximum file size is 51251233))
2525
Combined Scheme: Combined Scheme: UNIX (4K bytes per block)UNIX (4K bytes per block)
2626
Free-Space ManagementFree-Space ManagementBit vector (Bit vector (nn blocks) blocks)
…
0 1 2 n-1
bit[i] = 0 block[i] free
1 block[i] occupied
Block number calculation
(number of bits per word) *(number of 0-value words) +offset of first 1 bit
2727
Free-Space Management Free-Space Management Bit map requires extra space. For example:Bit map requires extra space. For example:block size = 2block size = 21212 bytes bytesdisk size = 2disk size = 23030 bytes (1 gigabyte) bytes (1 gigabyte)nn = 2 = 23030/2/21212 = 2 = 21818 bits (or 32K bytes) bits (or 32K bytes)Easy to get contiguous files Easy to get contiguous files Linked list (free list)Linked list (free list) Cannot get contiguous space easilyCannot get contiguous space easily No waste of spaceNo waste of space
Grouping Grouping CountingCounting
2828
Free-Space Management Free-Space Management Need to protect:Need to protect: Pointer to free listPointer to free list Bit mapBit map
Must be kept on diskMust be kept on diskCopy in memory and disk may differCopy in memory and disk may differCannot allow for block[Cannot allow for block[ii] to have a situation where bit[] to have a situation where bit[ii] = 1 ] = 1 in memory and bit[in memory and bit[ii] = 0 on disk] = 0 on disk
Solution:Solution:Set bit[Set bit[ii] = 1 in disk] = 1 in diskAllocate block[Allocate block[ii]]Set bit[Set bit[ii] = 1 in memory] = 1 in memory
2929
Directory ImplementationDirectory Implementation
Linear listLinear list of file names with pointer to the data of file names with pointer to the data blocksblocks simple to programsimple to program time-consuming to executetime-consuming to execute
Hash TableHash Table – linear list with hash data structure – linear list with hash data structure decreases directory search timedecreases directory search time collisionscollisions – situations where two file names hash – situations where two file names hash
to the same locationto the same location fixed sizefixed size
3030
Linked Free Space List on DiskLinked Free Space List on Disk
3131
Efficiency and PerformanceEfficiency and PerformanceEfficiency is dependent on:Efficiency is dependent on: disk allocation and directory algorithmsdisk allocation and directory algorithms types of data kept in file’s directory entrytypes of data kept in file’s directory entry
PerformancePerformance disk cache – separate section of main memory for disk cache – separate section of main memory for
frequently used blocksfrequently used blocks free-behind and read-ahead – techniques to free-behind and read-ahead – techniques to
optimize sequential accessoptimize sequential access improve PC performance by dedicating section of improve PC performance by dedicating section of
memory as virtual disk, or RAM diskmemory as virtual disk, or RAM disk
3232
Page CachePage Cache
A A page cachepage cache caches pages rather than caches pages rather than disk blocks using virtual memory disk blocks using virtual memory techniquestechniquesMemory-mapped I/O uses a page cacheMemory-mapped I/O uses a page cacheRoutine I/O through the file system uses Routine I/O through the file system uses the buffer (disk) cachethe buffer (disk) cacheThis leads to the following figure ==>This leads to the following figure ==>
3333
I/O Without a Unified Buffer I/O Without a Unified Buffer CacheCache
3434
Unified Buffer CacheUnified Buffer Cache
A unified buffer cache uses the same page A unified buffer cache uses the same page cache to cache both memory-mapped cache to cache both memory-mapped pages and ordinary file system I/Opages and ordinary file system I/O
3535
I/O Using a Unified Buffer I/O Using a Unified Buffer CacheCache
3636
RecoveryRecovery
Consistency checking – compares data in Consistency checking – compares data in directory structure with data blocks on disk, and directory structure with data blocks on disk, and tries to fix inconsistenciestries to fix inconsistenciesUse system programs to Use system programs to back upback up data from data from disk to another storage device (floppy disk, disk to another storage device (floppy disk, magnetic tape, other magnetic disk, optical)magnetic tape, other magnetic disk, optical)Recover lost file or disk by Recover lost file or disk by restoringrestoring data from data from backupbackup
3737
Log Structured File SystemsLog Structured File SystemsLog structuredLog structured (or journaling) file systems record each (or journaling) file systems record each update to the file system as a update to the file system as a transactiontransactionAll transactions are written to a All transactions are written to a loglog A transaction is considered A transaction is considered committedcommitted once it is once it is
written to the logwritten to the log However, the file system may not yet be updatedHowever, the file system may not yet be updated
The transactions in the log are asynchronously written to The transactions in the log are asynchronously written to the file systemthe file system When the file system is modified, the transaction is When the file system is modified, the transaction is
removed from the logremoved from the logIf the file system crashes, all remaining transactions in the If the file system crashes, all remaining transactions in the log must still be performedlog must still be performed
3838
The Sun Network File System The Sun Network File System (NFS)(NFS)
An implementation and a specification of a An implementation and a specification of a software system for accessing remote files software system for accessing remote files across LANs (or WANs)across LANs (or WANs)
The implementation is part of the Solaris The implementation is part of the Solaris and SunOS operating systems running on and SunOS operating systems running on Sun workstations using an unreliable Sun workstations using an unreliable datagram protocol (UDP/IP protocol and datagram protocol (UDP/IP protocol and EthernetEthernet
3939
NFS (Cont.)NFS (Cont.)Interconnected workstations viewed as a Interconnected workstations viewed as a set of independent machines with set of independent machines with independent file systems, which allows independent file systems, which allows sharing among these file systems in a sharing among these file systems in a transparent mannertransparent manner
4040
NFS (Cont.)NFS (Cont.) A remote directory is mounted over a local file A remote directory is mounted over a local file
system directorysystem directory The mounted directory looks like an integral The mounted directory looks like an integral subtree of the local file system, replacing the subtree of the local file system, replacing the subtree descending from the local directorysubtree descending from the local directory
Specification of the remote directory for the mount Specification of the remote directory for the mount operation is nontransparent; the host name of the operation is nontransparent; the host name of the remote directory has to be providedremote directory has to be provided
Files in the remote directory can then be Files in the remote directory can then be accessed in a transparent manneraccessed in a transparent manner
4141
NFS (Cont.)NFS (Cont.) Subject to access-rights accreditation, Subject to access-rights accreditation,
potentially any file system (or directory within potentially any file system (or directory within a file system), can be mounted remotely on a file system), can be mounted remotely on top of any local directorytop of any local directory
4242
NFS (Cont.)NFS (Cont.)NFS is designed to operate in a heterogeneous NFS is designed to operate in a heterogeneous environment of different machines, operating environment of different machines, operating systems, and network architectures; the NFS systems, and network architectures; the NFS specifications independent of these mediaspecifications independent of these mediaThis independence is achieved through the use This independence is achieved through the use of RPC primitives built on top of an External of RPC primitives built on top of an External Data Representation (XDR) protocol used Data Representation (XDR) protocol used between two implementation-independent between two implementation-independent interfacesinterfaces
4343
NFS (Cont.)NFS (Cont.)
The NFS specification distinguishes The NFS specification distinguishes between the services provided by a mount between the services provided by a mount mechanism and the actual remote-file-mechanism and the actual remote-file-access services access services
4444
Three Independent File Three Independent File SystemsSystems
4545
Mounting in NFS Mounting in NFS
Mounts Cascading mounts
4646
NFS Mount ProtocolNFS Mount ProtocolEstablishesEstablishes initial logical connection between initial logical connection between server and clientserver and clientMount operation includes name of remote Mount operation includes name of remote directory to be mounted and name of server directory to be mounted and name of server machine storing itmachine storing it Mount request is mapped to corresponding RPC and Mount request is mapped to corresponding RPC and
forwarded to mount server running on server machine forwarded to mount server running on server machine Export list – specifies local file systems that server Export list – specifies local file systems that server
exports for mounting, along with names of machines exports for mounting, along with names of machines that are permitted to mount them that are permitted to mount them
4747
NFS Mount ProtocolNFS Mount Protocol
Following a mount request that conforms to Following a mount request that conforms to its export list, the server returns a file handleits export list, the server returns a file handle—a key for further accesses—a key for further accessesFile handleFile handle – a file-system identifier, and an – a file-system identifier, and an inode number to identify the mounted inode number to identify the mounted directory within the exported file systemdirectory within the exported file systemThe mount operation changes only the The mount operation changes only the user’s view and does not affect the server user’s view and does not affect the server side side
4848
NFS ProtocolNFS ProtocolProvides a set of remote procedure calls for Provides a set of remote procedure calls for remote file operations. The procedures support remote file operations. The procedures support the following operations:the following operations: searching for a file within a directory searching for a file within a directory reading a set of directory entries reading a set of directory entries manipulating links and directories manipulating links and directories accessing file attributesaccessing file attributes reading and writing filesreading and writing files
4949
NFS ProtocolNFS Protocol
NFS servers are NFS servers are statelessstateless; each request ; each request has to provide a full set of argumentshas to provide a full set of arguments
(NFS V4 is just coming available – very (NFS V4 is just coming available – very different, stateful)different, stateful)Modified data must be committed to the Modified data must be committed to the server’s disk before results are returned to server’s disk before results are returned to the client (lose advantages of caching)the client (lose advantages of caching)The NFS protocol does not provide The NFS protocol does not provide concurrency-control mechanismsconcurrency-control mechanisms
5050
Three Major Layers of NFS Three Major Layers of NFS Architecture Architecture
1) NFS service layer1) NFS service layer – bottom layer of the – bottom layer of the architecturearchitecture Implements the NFS protocolImplements the NFS protocol
2) UNIX file-system interface2) UNIX file-system interface (based on the (based on the open, read, writeopen, read, write, and , and closeclose calls, and calls, and file file descriptorsdescriptors))
5151
Three Major Layers of NFS Three Major Layers of NFS ArchitectureArchitecture
3) Virtual File System3) Virtual File System (VFS) layer (VFS) layer – – distinguishes local files from remote ones, distinguishes local files from remote ones, and local files are further distinguished and local files are further distinguished according to their file-system typesaccording to their file-system types The VFS activates file-system-specific operations The VFS activates file-system-specific operations
to handle local requests according to their file-to handle local requests according to their file-system types system types
Calls the NFS protocol procedures for remote Calls the NFS protocol procedures for remote requestsrequests
5252
Schematic View of NFS Schematic View of NFS Architecture Architecture
5353
NFS Path-Name TranslationNFS Path-Name Translation
Performed by breaking the path into Performed by breaking the path into component names and performing a component names and performing a separate NFS lookup call for every pair of separate NFS lookup call for every pair of component name and directory vnodecomponent name and directory vnode
To make lookup faster, a directory name To make lookup faster, a directory name lookup cache on the client’s side holds the lookup cache on the client’s side holds the vnodes for remote directory namesvnodes for remote directory names
5454
NFS Remote OperationsNFS Remote OperationsNearly one-to-one correspondence between Nearly one-to-one correspondence between regular UNIX system calls and the NFS regular UNIX system calls and the NFS protocol RPCs (except opening and closing protocol RPCs (except opening and closing files)files)NFS adheres to the remote-service paradigm, NFS adheres to the remote-service paradigm, but employs buffering and caching techniques but employs buffering and caching techniques for the sake of performance for the sake of performance
5555
NFS Remote OperationsNFS Remote Operations
File-blocks cache – when a file is opened, the kernel File-blocks cache – when a file is opened, the kernel checks with the remote server whether to fetch or checks with the remote server whether to fetch or revalidate the cached attributesrevalidate the cached attributes Cached file blocks are used only if the Cached file blocks are used only if the
corresponding cached attributes are up to datecorresponding cached attributes are up to dateFile-attribute cache – the attribute cache is updated File-attribute cache – the attribute cache is updated whenever new attributes arrive from the serverwhenever new attributes arrive from the serverClients do not free delayed-write blocks until the Clients do not free delayed-write blocks until the server confirms that the data have been written to diskserver confirms that the data have been written to disk
5656
Example: WAFL File SystemExample: WAFL File System
Used on Network Appliance “Filers” – Used on Network Appliance “Filers” – distributed file system appliancesdistributed file system appliances““Write-anywhere file layout”Write-anywhere file layout”Serves up NFS, CIFS, http, ftpServes up NFS, CIFS, http, ftpRandom I/O optimized, write optimizedRandom I/O optimized, write optimized NVRAM for write cachingNVRAM for write caching
Similar to Berkeley Fast File System, with Similar to Berkeley Fast File System, with extensive modificationsextensive modifications
5757
The WAFL File LayoutThe WAFL File Layout
5858
Snapshots in WAFLSnapshots in WAFL
5959
11.0211.02
End of Chapter 11End of Chapter 11