Operating Systems 2010/2011johanl/educ/2IN05/FS_2010_2.pdf · 2010-12-07 · Operating Systems...
Transcript of Operating Systems 2010/2011johanl/educ/2IN05/FS_2010_2.pdf · 2010-12-07 · Operating Systems...
Operating Systems2010/2011
07/12/2010 1TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
File Systems – part 2 (ch11, ch17)
Shudong Chen
Recap
• Tasks, requirements for filesystems• Two views:
– User view• File type / attribute / access modes• Directory structure
– OS designers view
• The API– File operations
– Directory operations
07/12/2010 2TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
07/12/2010 2
– Directory operations
• Namespaces– Namespace construction methods / purpose / joining
• File sharing– File sharing / failure modes / consistency
• File protection– Access list & group
Objectives
• OS view – implementation– from designers’ perspective
• To describe the details of implementing local file systems and
directory structures
• To discuss block allocation and free-block algorithms and trade-
offs
• To describe the implementation of remote file systems
07/12/2010 3TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
07/12/2010 3
• To describe the implementation of remote file systems
• Distributed file systems
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 4TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
Design tasks
• To provide efficient and convenient access to secondary-storage devices by allowing data to be stored, located, and retrieved easily.
• Two design problems
– defining how the file system should look to the user
• defining a file and its attributes
07/12/2010 5TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• defining a file and its attributes
• defining the operations allowed on a file
• defining the directory structure for organizing files
– creating algorithms and data structures to map the logical file system
onto physical secondary-storage devices
Layered file system
• FS is organized into layers
– Higher layers use the features of lower layers to create new features.
– Duplication of code is minimized.
– Layering introduces more overhead which may result in deceased
performance.
• How many layers to use and what each layer should do is a major design
challenge.
07/12/2010 6TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]• Transfer information between the main memory
and the disk system
• Consist of device drivers and interrupt handlers
• Issue generic commands to the appropriate device
driver to read and write physical blocks on the disk
• Manage memory buffers and caches that hold
various file systems, directory and data blocks
•Translate logical block addresses to physical
block addresses
• Manage free-space
• Manage metadata information
• directory structure (file-organization module)
• file structure (FCB)
• Responsible for protection and security
File descriptor
• Metadata
– includes all of the file-system structure except the actual data
(contents of the files)
• FCB (file control block)
– storage structure consisting of information about a file
• ownership, permissions, and locations of the file contents
– all information about a file in one place
07/12/2010 7TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– all information about a file in one place
• eases access, update
• avoids consistency problems
– with a unique identifier number to associate with a directory entry
– Examples
• Unix: inode table
– data structure per file
• NTFS: master file table
– relational database structure: a row per file
07/12/2010 7
Example: inode (Unix)
• (a rather dated version)
07/12/2010 8TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
07/12/2010 8
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 9TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
Layered file system
• FS is organized into layers
– Higher layers use the features of lower layers to create new features.
07/12/2010 10TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Transfer information between the main memory
and the disk system
• Consist of device drivers and interrupt handlers
On-disk file system structures
• File system resides on secondary storage (disks)
– Most disks are divided into partitions
– Each partition has its own file systems
– Sector 0 of a disk is called MBR (Master Boot Record)• MBR is used to boot the computer
• The end of MBR contains the partition table
– Giving starting and ending addresses of each partition
– One of the partitions is marked as active in the table
– At boot time
• BIOS reads in and run the MBR
07/12/2010 11TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• BIOS reads in and run the MBR
• MBR then locates the active partition and reads in the first block (the boot contol block), and
execute it
• Boot control block in turns loads the OS in the partition
• File system contains information about
– how to boot an operating system stored there ---- (boot control block)
– the total number of blocks ---- (volume control block)
– the number and location of free blocks ---- (volume control block)
– the directory structure
– and individual files
On-disk file system structure (Cont.)
• Each disk partition contains
– boot control block
• typically the first block of a volume
• contains information needed by the system to boot an OS from that volume
• can be empty, if the disk does not contain an OS
• called as boot block in UFS, partition boot sector in NTFS
– volume control block
• contains volume details, such as the number of blocks in the partition, the size of the blocks, a
free-block count and free-block pointers, and a free-FCB count and FCB pointers.
• called as superblock in UFS, master file table in NTFS
07/12/2010 12TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• called as superblock in UFS, master file table in NTFS
– the directory structure
– and individual files
Layered file system
• FS is organized into layers
– Higher layers use the features of lower layers to create new features.
07/12/2010 13TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Issue generic commands to the appropriate
device driver to read and write physical blocks on
the disk
• Manage memory buffers and caches that hold
various file systems, directory and data blocks
Implementing basic file system
• Implementation – Setup and maintain tables for files that are currently being accessed (“open”)
• caching of attributes
• maintaining the temporary attributes
• access control
– Exchange reference to descriptor with I/O control
07/12/2010 14TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
07/12/2010 14
In-memory file system structures
• In-memory directory structure cache– holds the directory information of recently accessed directories
– to speed directory operations
• System-wide open-file table (OFT)– contains a copy of the FCB of each open file, as well as other information, e.g.,
– the number of processes that have the file open.
07/12/2010 15TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Per-process open-file table– contains a pointer to the appropriate entry in the system-wide open-file table, as well as
other information, e.g.,
– a pointer to the current location in the file
– the access mode in which the file is open
Example: opening a file from an application
Application Kernel
open(“/home/johan/Aap”, O_RDWR, O_CREAT)
/home
owner: root
access: drwxr-xr-x
entries: johan, piet, ....
/
owner: root
access: drwxr-xr-x
buffers, block, containing
entries: home, usr, ....
open subdir home of /
check permission
create data structure
(OFT) Open File Table
descriptors
07/12/2010 16TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
/home/johan
owner: johan
access: drwxr-xr-x
entries: OS, Aap, ....
open subdir johan from /home
entries: johan, piet, ....
open file Aap of /home/johan
/home/johan/aap
owner: johan
access: -rw-------
size: 1021
blocks, buffers, ....
descriptor (pointer) to data structure
Open File Table
• Upon open API call:
– verify descriptor info against user (process) rights
– allocate resources: OFT entry, buffers
– initialize access: read/write pointer
– copy and maintain information from the i-node
– return index (a pointer to the appropriate entry) in OFT for subsequent access
• called as file descriptor in Unix, file handler in Windows
• in Linux: this index is stored with the process
07/12/2010 17TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Close
– flush buffers to file contents and OFT entry to inode
– release resources (buffer, OFT entry)
• per-process OFT entry is removed, system-wide OFT entry is decremented
• Notice: at this level file-sharing may occur
– two process accessing the same file
• both with an entry in the OFT
• results unpredictable
– can use a file as ‘semaphore’ ---- the first that creates it has it
07/12/2010 17
In-memory file system structures
07/12/2010 18TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
(a) refers to opening a file. (b) refers to reading a file.
Virtual File Systems
• Virtual File Systems (VFS) – provide an object-oriented way of implementing
multiple types of file systems
– allow the same system call interface (the API) to be
used for different types of file systems
• The API is to the VFS interface, rather than any specific
type of file system.
• VFSs have three major layers:
07/12/2010 19TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• VFSs have three major layers:– File-system interface layer
• based on the open(), read(), write(), and close() calls
and on file descriptors.
– VFS interface layer, which servers two important functions:
• Separates file-system-generic operations from their implementation by defining a clean VFS interface
• Provides a mechanism for uniquely representing a file throughout a network.
– Different types of FS and remote file system protocol implementing layer
• Handle local requests according to their FS types
• Calls the NFS protocol procedures for remote requests.
• VFS distinguishes local files from remote ones, and local files are further
distinguished according to their file-system types.
Linux VFS architecture
• four main object types
– inode object --- represents an individual file
– file object --- represents an open file
– superblock object --- represents an entire file system
– dentry object --- represents an individual directory entry
• a set of operations for each object type, e.g., APIs for the file object
07/12/2010 20TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• a set of operations for each object type, e.g., APIs for the file object
– int open (…), ssize_t read(…), ssize_t write(…), int mmap(…)
• a function table
– lists the addresses of the actual functions that implement these operations
• every object contains a pointer to the function table
• complete definition: /usr/include/linux/fs.h
07/12/2010 20
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 21TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
Layered file system
• FS is organized into layers
– Higher layers use the features of lower layers to create new features.
07/12/2010 22TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
•Translate logical block addresses to physical
block addresses
• Manage free-space
Directory implementation
• File directory: collection of (entries for) files– mapping to access point for the file object (or to an internal address)
– just a file with internal structure
• Implementation task– Resolving remapping info, e.g. mounting
• .... in fact, the implementation of the name-space
07/12/2010 23TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Two issues
– what to store per entry?
• name + description
• scale of possibilities ranging from “all info (attributes) here” to “just a pointer to information”
– tradeoff: efficiency-storage
– policy: “never copy volatile information”, hence a reference here is better
– how to organize the entries?
• efficient insert & delete
07/12/2010 23
Data structures for implementation
• Linear list of file names with pointer to the data blocks
– simple to program
– time-consuming to execute --- requires a linear search
• insertion: first search the directory to be sure that no existing file with
same name, then add the new entry to the end.
• deletion: first search, then release the space allocated to it.
07/12/2010 24TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Hash Table --- linear list with hash data structure
– decreases directory search time
• the hash table takes a value computed from the file name and returns a
pointer to the file name in the linear list
– collisions: situations where two file names hash to the same location
– the major difficulties
• fixed size
• the dependence of the hush function on that size
Data structures for implementation
• Variable sized
– linked list
– length/value encoding
– B-trees
• insert and delete logarithmic
• good for searching, bad for sequential access
Search by
yourself for
details
07/12/2010 25TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• good for searching, bad for sequential access
• tunable to e.g. disk-block size
– B+ trees: nothing stored in internal nodes
• good in both
• somewhat more overhead as internal nodes are empty
07/12/2010 25
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 26TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
Allocation methods
• An allocation method refers to how disk blocks are allocated for files.
• Design tasks
– Disk space is utilized effectively
– Files can be accessed quickly
07/12/2010 27TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– Files can be accessed quickly
• Three common approaches are possible
– Contiguous allocation
– Linked list allocation
– Indexed allocation
Contiguous Allocation
• Each file occupies a set of
contiguous blocks on the disk
• Simple – only starting location
(block #) and length (number of
blocks) are required
07/12/2010 28TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Pros
– fast sequential access: less seeks between consecutive disk blocks
– easy random access: can easily calculate the disk location of any file block
• Cons
– external fragmentation
– wastes space if user declares a bigger size
– hard to grow files
Extent-Based Systems
• Many newer file systems (i.e. Veritas File System) use a modified
contiguous allocation scheme
– a contiguous chunk of space is allocated initially
– if that amount proves not to be large enough, another chunk of contiguous
space, known as an extent, is added
– the location of a file’s blocks is recorded as
• location and a block count
07/12/2010 29TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• location and a block count
• plus a link to the first block of the next extent
– a file consists of one or more extents
Linked List Allocation
• Keeps each file as a linked list of disk blocks
• The directory contains a pointer to the first and last blocks of the file
• First word of each block points to the next one
• Disk blocks need not be adjacent– May be scattered anywhere on the disk
07/12/2010 30TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Linked List Allocation
• Pros
– no space lost to disk fragmentation
• solves the external fragmentation and size-declaration problems
of contiguous allocation
– fast sequential access
• Cons
07/12/2010 31TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Cons
– extremely slow random access
– amount of data in block is not a power of 2
• How to solve the problems?
– use File Allocation Table (FAT)
• take the pointer word from each disk block
• putting it in a table in memory
File-Allocation Table
• Linked allocation
cannot support
efficient direct
access!
– Pointers to the blocks are scattered with the
07/12/2010 32TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Linked list allocation using a file allocation table in RAM
scattered with the blocks themselves all over the disk.
– Pointers must be retrieved in order.
Indexed Allocation
• Brings all pointers together into the index block.
– User (or system) declares max number of blocks in file
– System allocates a file header with an array of pointers big enough• to point to that number of blocks
• the header is also called an inode
– The file header is load into memory when open
Equivalent to a page table for MMU
07/12/2010 33TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
table for MMU
Indexed Allocation
• Pros– can easily grow file up to the number of blocks allocated in file
header
– easy random access
• Cons– lots of seeks for sequential access
07/12/2010 34TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– lots of seeks for sequential access
– hard to grow file bigger than initially allocated in the file header
Multi-level Indexed Allocation
• How to grow files sizes?
– The solution is to use multi-level index files
• direct blocks
– contain addresses of
blocks that contain data of
the file
07/12/2010 35TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Combined scheme of Unix
• single indirect block
– is an index block
– containing the addresses of
blocks that contain pointers
to the actual data blocks
• double indirect block
• triple indirect block
Multi-level Indexed Allocation
• Pros and cons
– files can easily expand
– small files don’t pay full overhead of deep trees
• Cons
– lots of indirect blocks for big files
07/12/2010 36TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– lots of seeks for sequential access
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 37TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
Bit vector (bit map)
• Each block is represented by 1 bit.
…
0 1 2 n-1
bit[ i ] =
��� 0 ⇒ block[ i ] allocated
1 ⇒ block[ i ] free
07/12/2010 38TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
���
1 ⇒ block[ i ] free
− Relatively simple
− Efficient in finding the first free block or n consecutive free blocks
• The first free block number calculation:
− (number of bits per word) X (number of 0-value words) + offset of first 1 bit
001111001111110001100000011100000…
Linked Free Space List on Disk
• Standard linked-list approach
− link together all the free disk blocks
− keeping a pointer to the first free block in a special location in the disk
− and caching it in memory
− the first block contains a pointer to the next free disk, and so on
• Grouping
07/12/2010 39TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Grouping
− stores the address of n free blocks in the first free block
− the first n-1 block are free
− the last block contains the addresses of another n free blocks
• Counting
− keep the address of the first free block and the number (n) of free contiguous blocks that follow the first block
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 40TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
Efficiency
• Efficiency in disk use dependent on:
– disk allocation and directory algorithms in use, e.g., inodes
• space lost --- preallocated to a volume
• however, improved file system performance --- reduced seek time by
keeping a file’s data blocks near the inode block
– types of data kept in file’s directory entry, e.g., last access date
07/12/2010 41TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– types of data kept in file’s directory entry, e.g., last access date
• to determine whether the file needs to be backed up
• whenever a file is read, a field in the directory structure must be written to
– pointer size
– static or dynamic table entry de-allocation (OFT)
– …
Performance
• Ways to improve PC performance
– disk cache
• separate section of main memory for frequently used blocks
– free-behind and read-ahead
• techniques to optimize sequential access
• free behind
07/12/2010 42TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• free behind
– remove a page from the buffer as soon as the next page is requested
– assume that the previous pages are not likely to be used again and waste buffer space
• read-ahead
– a requested page and several subsequent pages are read and cached
– assume that these pages are likely to be requested after the current page is processed
– dedicating section of memory as virtual disk, or RAM disk
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 43TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
Remaining issues
• Reliability
– errors
• loss of a block
– create a file of lost blocks
– recover lost file or disk by restoring data from backup
• inconsistencies
– consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies
07/12/2010 44TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
on disk, and tries to fix inconsistencies
– replicate data
• repeat the basic tables
• store several versions
– avoid errors
• journaling: record a log of updates
– write (to a file / region of the disk) the operation to be performed before actually doing it
– journal: small, circular buffer
07/12/2010 44
Log Structured File Systems
• Log structured (or journaling) file systems record each update to
the file system as a transaction
• All transactions are written to a log
– A transaction is considered committed once it is written to the log
– However, the file system may not yet be updated
07/12/2010 45TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• The transactions in the log are asynchronously written to the file
system
– When the file system is modified, the transaction is removed from the
log
• If the file system crashes, all remaining transactions in the log must
still be performed
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 46TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
07/12/2010 46
Please read by yourselves
The Sun Network File System (NFS)
• An implementation and a specification of a software system for
accessing remote files across LANs (or WANs)
• The implementation is part of the Solaris and SunOS operating systems
running on Sun workstations using an unreliable datagram protocol
UDP/IP protocol and Ethernet (V3)
07/12/2010 47TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
NFS (Cont.)
• Interconnected workstations viewed as a set of independent machines with independent file systems, which allows sharing among these file systems in a transparent manner
– A remote directory is mounted over a local file system directory• The mounted directory looks like an integral subtree of the local file system,
replacing the subtree descending from the local directory
– Specification of the remote directory for the mount operation is
07/12/2010 48TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– Specification of the remote directory for the mount operation is nontransparent; the host name of the remote directory has to be provided
• Files in the remote directory can then be accessed in a transparent manner
– Subject to access-rights accreditation, potentially any file system (or directory within a file system), can be mounted remotely on top of any local directory
NFS (Cont.)
• NFS is designed to operate in a heterogeneous environment of different
machines, operating systems, and network architectures; the NFS
specifications independent of these media
• This independence is achieved through the use of RPC primitives built
on top of an External Data Representation (XDR) protocol used between
two implementation-independent interfaces
07/12/2010 49TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• The NFS specification distinguishes between the services provided by a
mount mechanism and the actual remote-file-access services
Three Independent File Systems
07/12/2010 50TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Mounting in NFS
07/12/2010 51TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Mounts Cascading mounts
NFS Mount Protocol
• Establishes initial logical connection between server and client
• Mount operation includes name of remote directory to be mounted and name of
server machine storing it
– Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine
– Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to mount them
07/12/2010 52TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Following a mount request that conforms to its export list, the server returns a
file handle—a key for further accesses
• File handle – a file-system identifier, and an inode number to identify the
mounted directory within the exported file system
• The mount operation changes only the user’s view and does not affect the
server side
NFS Protocol
• Provides a set of remote procedure calls for remote file operations. The procedures support the following operations:
– searching for a file within a directory
– reading a set of directory entries
– manipulating links and directories
– accessing file attributes
– reading and writing files
07/12/2010 53TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• NFS servers are stateless; each request has to provide a full set of arguments
(NFS V4 is just coming available – very different, stateful)
• Modified data must be committed to the server’s disk before results are returned to the client (lose advantages of caching)
• The NFS protocol does not provide concurrency-control mechanisms
Three Major Layers of NFS Architecture
• UNIX file-system interface (based on the open, read, write, and close calls, and file descriptors)
• Virtual File System (VFS) layer – distinguishes local files from remote ones, and local files are further distinguished according to their file-system types
– The VFS activates file-system-specific operations to handle local
07/12/2010 54TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– The VFS activates file-system-specific operations to handle local
requests according to their file-system types
– Calls the NFS protocol procedures for remote requests
• NFS service layer – bottom layer of the architecture
– Implements the NFS protocol
Schematic View of NFS Architecture
07/12/2010 55TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
NFS Path-Name Translation
• Performed by breaking the path into component names and performing a separate NFS lookup call for every pair of component name and directory vnode
• To make lookup faster, a directory name lookup cache on the client’s side holds the vnodes for remote directory names
07/12/2010 56TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
NFS Remote Operations
• Nearly one-to-one correspondence between regular UNIX system calls and the
NFS protocol RPCs (except opening and closing files)
• NFS adheres to the remote-service paradigm, but employs buffering and caching
techniques for the sake of performance
• File-blocks cache – when a file is opened, the kernel checks with the remote
server whether to fetch or revalidate the cached attributes
07/12/2010 57TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
server whether to fetch or revalidate the cached attributes
– Cached file blocks are used only if the corresponding cached attributes are up to date
• File-attribute cache – the attribute cache is updated whenever new attributes
arrive from the server
• Clients do not free delayed-write blocks until the server confirms that the data
have been written to disk
Agenda
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 58TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
07/12/2010 58
Background
• Distributed file system (DFS) – a distributed implementation of the classical time-sharing model of a file system, where multiple users share files and storage resources
• A DFS manages set of dispersed storage devices
• Overall storage space managed by a DFS is composed of
07/12/2010 59TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Overall storage space managed by a DFS is composed of different, remotely located, smaller storage spaces
• There is usually a correspondence between constituent storage spaces and sets of files
DFS Structure
• Service – software entity running on one or more machines and
providing a particular type of function to a priori unknown clients
• Server – service software running on a single machine
• Client – process that can invoke a service using a set of
operations that forms its client interface
07/12/2010 60TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• A client interface for a file service is formed by a set of primitive file
operations (create, delete, read, write)
• Client interface of a DFS should be transparent, i.e., not distinguish
between local and remote files
Naming
• Naming – mapping between logical and physical objects
• Comparing to a conventional file system, in a DFS– the naming mapping is expanded to included the specific machine on whose disk
the file is stored
– For a file being replicated in several sites, the mapping returns a set of the locations of this file’s replicas
07/12/2010 61TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Symbolic link into other namespace
Mounting
• Example: netMount /dev/hda1 server1:/usr/local
bin lib
Current namespace filesystem on
server1
07/12/2010 62TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
62
bin lib
Store closure (reference to
/dev/hda1, filesystem
type, ....) with prefix
file access means network
traffic
Naming Schemes — Three Main Approaches
• Files named by combination of their host name and local name;
guarantees a unique system-wide name
• Attach remote directories to local directories, giving the appearance
of a coherent directory tree; only previously mounted remote
directories can be accessed transparently
• Total integration of the component file systems
07/12/2010 63TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Total integration of the component file systems
– A single global name structure spans all the files in the system
– If a server is unavailable, some arbitrary set of directories on different
machines also becomes unavailable
DFS organization
• Both client and server have a file system
• Server file system task:
– serve requests from the network
• Client file system task:
– serve requests from the application
– transparently include server file system
07/12/2010 64TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
64
Choices
• Design choices concern the layer where the connection client-server is put
1. Application:
– client just accesses the remote file system as just another application• needs a server application on the other side
• comparable to web-servers
2. Logical file system, file-organization module
and basic file system
07/12/2010 65TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
and basic file system
– client uses the remote file system as a process, via descriptor (references)
3. I/O control
– client uses the remote file system as a block storage mechanism
Stateful & stateless File Service
• Two approaches to store server-side information when a client accesses remote files:
– Stateful: the server tracks each file being accessed by each client
– Stateless: the server only provides required blocks without knowledge of how they are used
07/12/2010 66TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Stateful File Service
• Mechanism
– Client opens a file
– Server fetches information about the file from its disk, stores it in its memory, and gives the client a connection identifier unique to the client and the open file
– Identifier is used for subsequent accesses until the session ends
– Server must reclaim the main-memory space used by clients
07/12/2010 67TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– Server must reclaim the main-memory space used by clients who are no longer active
• Increased performance
– Fewer disk accesses
– Stateful server knows if a file was opened for sequential access and can thus read ahead the next blocks
Stateful server
• Server implements basic file system
07/12/2010 68TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Stateless File Server
• Avoids state information by making each request self-contained
• Each request identifies the file and position in the file
• No need to establish and terminate a connection by open and close operations
07/12/2010 69TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
and close operations
Stateless server
• Server implements low-level data access
07/12/2010 70TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
Distinctions between stateful & stateless Service
• Failure Recovery
– A stateful server loses all its volatile state in a crash
• Restore state by recovery protocol based on a dialog with
clients, or abort operations that were underway when the
crash occurred
• Server needs to be aware of client failures in order to
reclaim space allocated to record the state of crashed client
07/12/2010 71TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
reclaim space allocated to record the state of crashed client
processes (orphan detection and elimination)
– With stateless server, the effects of server failure sand recovery are almost unnoticeable
• A newly reincarnated server can respond to a self-
contained request without any difficulty
Distinctions (Cont.)
• Penalties for using the robust stateless service:
– longer request messages
– slower request processing
– additional constraints imposed on DFS design
• Some environments require stateful service
– A server employing server-initiated cache validation cannot
07/12/2010 72TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
– A server employing server-initiated cache validation cannot
provide stateless service, since it maintains a record of which
files are cached by which clients
– UNIX use of file descriptors and implicit offsets is inherently
stateful; servers must maintain tables to map the file
descriptors to inodes, and store the current offset within a file
Caching & Consistency
• Retaining recently accessed disk blocks in a cache
– to reduce network traffic
– remote accesses to the same information can be handled locally
• Is locally cached copy of the data consistent with the master copy?
– Client-initiated approach
• Client initiates a validity check
• Server checks whether the local data are consistent with the
07/12/2010 73TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Server checks whether the local data are consistent with the
master copy
– Server-initiated approach
• Server records, for each client, the (parts of) files it caches
• When server detects a potential inconsistency, it must react
Summary
• File system structure
• File system implementation
• Directory implementation
• Allocation methods
• Free space management
07/12/2010 74TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
• Free space management
• Efficiency and performance
• Recovery
• NFS
• Distributed file systems
Exercises
• Ch11 - 3, 5, 6, 10, 12
• Ch17 - 1, 2, 9
• Check the website
07/12/2010 75TU/e Computer Science, System Architecture and Networking
Johan J. Lukkien, Shudong Chen , [email protected]
07/12/2010 75