Operating Systems Slides 7 - File Systems
-
Upload
xiaolin-wang -
Category
Education
-
view
7.375 -
download
6
description
Transcript of Operating Systems Slides 7 - File Systems
Long-term Information Storage Requirements
▶ Must store large amounts of data▶ Information stored must survive the termination of the
process using it▶ Multiple processes must be able to access the
information concurrently
2 / 78
File-System Structure
.File-system design addressing two problems:..
......
1. defining how the FS should look to the user▶ defining a file and its attributes▶ the operations allowed on a file▶ directory structure
2. creating algorithms and data structures to map thelogical FS onto the physical disk
3 / 78
File-System — A Layered Design
APPs⇓
Logical FS⇓
File-org module⇓
Basic FS⇓
I/O ctrl⇓
Devices
▶ logical file system — managesmetadata information
- maintains all of the file-systemstructure (directory structure, FCB)
- responsible for protection andsecurity
▶ file-organization module- logical block
addresstranslate−−−−−→ physical block
address- keeps track of free blocks
▶ basic file system issues genericcommands to device driver, e.g
- “read drive 1, cylinder 72, track 2,sector 10”
▶ I/O Control — device drivers, and INThandlers
- device driver:high-levelcommands
translate−−−−−→ hardware-specificinstructions
4 / 78
The Operating Structure
APPs⇓
Logical FS⇓
File-org module⇓
Basic FS⇓
I/O ctrl⇓
Devices
.Example — To create a file..
......
1. APP calls creat()2. Logical FS
2.1 allocates a new FCB2.2 updates the in-mem dir structure2.3 writes it back to disk2.4 calls the file-org module
3. file-organization module3.1 maps the directory I/O into disk-block
numbers3.2 allocates blocks for storing the file’s
data
.Benefit of layered design..
......The I/O control and sometimes the basic file system codecan be used by multiple file systems.
5 / 78
File— A Logical View Of Information Storage
.User’s view..
......
A file is the smallest storage unit on disk.▶ Data cannot be written to disk unless they are within a file
.UNIX view..
......
Each file is a sequence of 8-bit bytes▶ It’s up to the application program to interpret this byte
stream.
6 / 78
File— What Is Stored In A File?
Source code, object files, executable files, shell scripts,PostScript....Different type of files have different structure..
......
▶ UNIX looks at contents to determine typeShell scripts start with “#!”
PDF start with “%PDF...”Executables start with magic number
▶ Windows uses file naming conventionsexecutables end with “.exe” and “.com”
MS-Word end with “.doc”MS-Excel end with “.xls”
7 / 78
File Naming
.Vary from system to system..
......
▶ Name length?▶ Characters? Digits? Special characters?▶ Extension?▶ Case sensitive?
8 / 78
File Types
Regular files: ASCII, binaryDirectories: Maintaining the structure of the FS
.In UNIX, everything is a file..
......
Character special files: I/O related, such as terminals,printers ...
Block special files: Devices that can contain file systems,i.e. disks
disks — logically, linear collections ofblocks; disk driver translates theminto physical block addresses
9 / 78
.Binary files..
......(a) (b)
Header
Header
Header
Magic number
Text size
Data size
BSS size
Symbol table size
Entry point
Flags
Text
Data
Relocationbits
Symboltable
Objectmodule
Objectmodule
Objectmodule
Modulename
Date
Owner
Protection
Size
���H
eade
r
Fig. 6-3. (a) An executable file. (b) An archive.
An UNIX executable file An UNIX archive
10 / 78
File Attributes — Metadata
▶ Name only information kept in human-readable form▶ Identifier unique tag (number) identifies file within file
system▶ Type needed for systems that support different types▶ Location pointer to file location on device▶ Size current file size▶ Protection controls who can do reading, writing,
executing▶ Time, date, and user identification data for protection,
security, and usage monitoring
11 / 78
File OperationsPOSIX file system calls
1. fd = creat(name, mode)2. fd = open(name, flags)3. status = close(fd)4. byte_count = read(fd, buffer, byte_count)5. byte_count = write(fd, buffer, byte_count)6. offset = lseek(fd, offset, whence)7. status = link(oldname, newname)8. status = unlink(name)9. status = truncate(name, size)
10. status = ftruncate(fd, size)11. status = stat(name, buffer)12. status = fstat(fd, buffer)13. status = utimes(name, times)14. status = chown(name, owner, group)15. status = fchown(fd, owner, group)16. status = chmod(name, mode)17. status = fchmod(fd, mode)
12 / 78
.An Example Program Using File System Calls..
......
/* File copy program. Error checking and reporting is minimal. */
#include <sys/types.h> /* include necessary header files */#include <fcntl.h>#include <stdlib.h>#include <unistd.h>
int main(int argc, char *argv[]); /* ANSI prototype */
#define BUF3SIZE 4096 /* use a buffer size of 4096 bytes */#define OUTPUT3MODE 0700 /* protection bits for output file */
int main(int argc, char *argv[]){
int in3 fd, out3 fd, rd3count, wt3count;char buffer[BUF3SIZE];
if (argc != 3) exit(1); /* syntax error if argc is not 3 */
/* Open the input file and create the output file */in3fd = open(argv[1], O3RDONLY); /* open the source file */if (in3 fd < 0) exit(2); /* if it cannot be opened, exit */out3 fd = creat(argv[2], OUTPUT3MODE); /* create the destination file */if (out3fd < 0) exit(3); /* if it cannot be created, exit */
/* Copy loop */while (TRUE) {
rd3count = read(in3 fd, buffer, BUF3SIZE); /* read a block of data */if (rd3count <= 0) break; /* if end of file or error, exit loop */
wt3count = write(out3fd, buffer, rd3count); /* write data */if (wt3count <= 0) exit(4); /* wt3count <= 0 is an error */
}
/* Close the files */close(in3fd);close(out3 fd);if (rd3count == 0) /* no error on last read */
exit(0);else
exit(5); /* error on last read */}
Fig. 6-5. A simple program to copy a file.13 / 78
open()
.fd open(pathname, flags)..
......
A per-process open-file table is kept in the OS▶ upon a successful open() syscall, a new entry is added into
this table▶ indexed by file descriptor (fd)
To see files opened by a process, e.g. init∼$ lsof -p 1
.Why open() is needed?..
......
To avoid constant searching▶ Without open(), every file operation involves searching
the directory for the file.
14 / 78
Directories— Single-Level Directory Systems
All files are contained in the same directory.
......
Root directory
A A B C
Fig. 6-7. A single-level directory system containing four files,owned by three different people, A, B, and C.
- contains 4 files- owned by 3 different
people, A, B, and C
.Limitations..
......
- name collision- file searching
Often used on simple embedded devices, such as telephone,digital cameras...
15 / 78
Directories— Two-level Directory Systems
.A separate directory for each user..
......
Files
Userdirectory
A A
A B
B
C
CC C
Root directory
Fig. 6-8. A two-level directory system. The letters indicate theowners of the directories and files.
Limitation: hard to access others files
16 / 78
Directories— Hierarchical Directory Systems
Userdirectory
User subdirectoriesC C
C
C C
C
B
B
A
A
B
B
C C
C
B
Root directory
User file
Fig. 6-9. A hierarchical directory system. 17 / 78
Directories— Path Names
ROOT
bin boot dev e t c h o m e var
grub p a s s w d staff s t u d mail
w x 6 7 2 2 0 0 8 1 1 5 2 0 0 1
dir
file
2 0 0 8 1 1 5 2 0 0 1
18 / 78
Directories— Directory Operations
Create Delete Rename LinkOpendir Closedir Readdir Unlink
19 / 78
File System Implementation.A typical file system layout..
......
|<---------------------- Entire disk ------------------------>|
+-----+-------------+-------------+-------------+-------------+
| MBR | Partition 1 | Partition 2 | Partition 3 | Partition 4 |
+-----+-------------+-------------+-------------+-------------+
_______________________________/ \____________
/ \
+---------------+-----------------+--------------------+---//--+
| Boot Ctrl Blk | Volume Ctrl Blk | Dir Structure | Files |
| (MBR copy) | (Super Blk) | (inodes, root dir) | dirs |
+---------------+-----------------+--------------------+---//--+
|<-------------Master Boot Record (512 Bytes)------------>|
0 439 443 445 509 511
+----//-----+----------+------+------//---------+---------+
| code area | disk-sig | null | partition table | MBR-sig |
| 440 | 4 | 2 | 16x4=64 | 0xAA55 |
+----//-----+----------+------+------//---------+---------+
20 / 78
On-Disk Information Structure
Boot control block a MBR copyUFS: boot block
NTFS: partition boot sectorVolume control block Contains volume details
number of blocks size of blocksfree-block count free-block pointersfree FCB count free FCB pointers
UFS: superblockNTFS: Master File Table
Directory structure Organizes the files FCBFile controlblock (FCB) contains file details (metadata).
UFS: i-nodeNTFS: stored in MFT using a relatiional database
structure, with one row per file
21 / 78
Each File-System Has a Superblock
Superblock keeps information about the file system:▶ Type — ext2, ext3, ext4...▶ Size▶ Status — how it’s mounted, free blocks, free inodes, ...▶ Information about other metadata structures
∼# dumpe2fs /dev/sda1 | grep -i superblock
22 / 78
Implementing FilesContiguous Allocation
572 CHAPTER 12 / FILE MANAGEMENT
access, degree of multiprogramming, other performance factors in the system,disk caching, disk scheduling, and so on.
File Allocation Methods Having looked at the issues of preallocation versusdynamic allocation and portion size, we are in a position to consider specific file al-location methods. Three methods are in common use: contiguous, chained, and in-dexed. Table 12.3 summarizes some of the characteristics of each method.
With contiguous allocation, a single contiguous set of blocks is allocated to afile at the time of file creation (Figure 12.7). Thus, this is a preallocation strategy,using variable-size portions. The file allocation table needs just a single entry foreach file, showing the starting block and the length of the file. Contiguous allocationis the best from the point of view of the individual sequential file. Multiple blockscan be read in at a time to improve I/O performance for sequential processing. It isalso easy to retrieve a single block. For example, if a file starts at block b, and the ithblock of the file is wanted, its location on secondary storage is simply b $ i % 1. Con-tiguous allocation presents some problems. External fragmentation will occur, mak-ing it difficult to find contiguous blocks of space of sufficient length. From time totime, it will be necessary to perform a compaction algorithm to free up additional
Table 12.3 File Allocation Methods
Contiguous Chained Indexed
Preallocation? Necessary Possible Possible
Fixed or variable size portions? Variable Fixed blocks Fixed blocks Variable
Portion size Large Small Small Medium
Allocation frequency Once Low to high High Low
Time to allocate Medium Long Short Medium
File allocation table size One entry One entry Large Medium
0 1 2 3 4
5 6 7
File A
File Allocation Table
File B
File C
File E
File D
8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
File Name
File AFile BFile CFile DFile E
29183026
35823
Start Block Length
Figure 12.7 Contiguous File Allocation
M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 572
- simple;- good for read only;
- fragmentation
23 / 78
Linked List (Chained) Allocation A pointer in each diskblock
12.6 / SECONDARY STORAGE MANAGEMENT 573
space on the disk (Figure 12.8).Also, with preallocation, it is necessary to declare thesize of the file at the time of creation, with the problems mentioned earlier.
At the opposite extreme from contiguous allocation is chained allocation(Figure 12.9). Typically, allocation is on an individual block basis. Each block con-tains a pointer to the next block in the chain. Again, the file allocation table needsjust a single entry for each file, showing the starting block and the length of the file.Although preallocation is possible, it is more common simply to allocate blocks asneeded. The selection of blocks is now a simple matter: any free block can be addedto a chain. There is no external fragmentation to worry about because only one
Figure 12.9 Chained Allocation
0 1 2 3 4
5 6 7
File A
File Allocation Table
File B
File C
File E File D
8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
File Name
File AFile BFile CFile DFile E
0381916
35823
Start Block Length
Figure 12.8 Contiguous File Allocation (After Compaction)
0 1 2 3 4
5 6 7
File Allocation Table
File B
8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
File B
File Name Start Block Length
1 5
M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 573
- no wasteblock;
- slow randomaccess;
- not 2n
24 / 78
Linked List (Chained) Allocation Though there is noexternal fragmentation, consolidation is stillpreferred.574 CHAPTER 12 / FILE MANAGEMENT
block at a time is needed.This type of physical organization is best suited to sequen-tial files that are to be processed sequentially. To select an individual block of a filerequires tracing through the chain to the desired block.
One consequence of chaining, as described so far, is that there is no accommo-dation of the principle of locality. Thus, if it is necessary to bring in several blocks ofa file at a time, as in sequential processing, then a series of accesses to different partsof the disk are required. This is perhaps a more significant effect on a single-usersystem but may also be of concern on a shared system. To overcome this problem,some systems periodically consolidate files (Figure 12.10).
Indexed allocation addresses many of the problems of contiguous and chainedallocation. In this case, the file allocation table contains a separate one-level index foreach file; the index has one entry for each portion allocated to the file. Typically, thefile indexes are not physically stored as part of the file allocation table. Rather, thefile index for a file is kept in a separate block, and the entry for the file in the file al-location table points to that block.Allocation may be on the basis of either fixed-sizeblocks (Figure 12.11) or variable-size portions (Figure 12.12). Allocation by blockseliminates external fragmentation, whereas allocation by variable-size portions im-proves locality. In either case, file consolidation may be done from time to time. Fileconsolidation reduces the size of the index in the case of variable-size portions, butnot in the case of block allocation. Indexed allocation supports both sequential anddirect access to the file and thus is the most popular form of file allocation.
Free Space Management
Just as the space that is allocated to files must be managed, so the space that is notcurrently allocated to any file must be managed. To perform any of the file alloca-tion techniques described previously, it is necessary to know what blocks on the diskare available. Thus we need a disk allocation table in addition to a file allocationtable. We discuss here a number of techniques that have been implemented.
0 1 2 3 4
5 6 7
File Allocation Table
File B
8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
File B
File Name Start Block Length
0 5
Figure 12.10 Chained Allocation (After Consolidation)
M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 574
25 / 78
FAT: Linked list allocation with a table in RAM.
......
▶ Taking the pointer out of eachdisk block, and putting it into atable in memory
▶ fast random access (chain is inRAM)
▶ is 2n
▶ the entire table must be in RAM
disk↗⇒ FAT↗⇒ RAMused ↗
Physicalblock
File A starts here
File B starts here
Unused block
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
10
11
7
3
2
12
14
-1
-1
Fig. 6-14. Linked list allocation using a file allocation table inmain memory.
26 / 78
Indexed Allocation 12.6 / SECONDARY STORAGE MANAGEMENT 575
Bit Tables This method uses a vector containing one bit for each block on thedisk. Each entry of a 0 corresponds to a free block, and each 1 corresponds to ablock in use. For example, for the disk layout of Figure 12.7, a vector of length 35 isneeded and would have the following value:
00111000011111000011111111111011000
A bit table has the advantage that it is relatively easy to find one or a con-tiguous group of free blocks. Thus, a bit table works well with any of the file allo-cation methods outlined. Another advantage is that it is as small as possible.
Figure 12.11 Indexed Allocation with Block Portions
0 1 2 3 4
5 6 7
File Allocation Table
File B
8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
File B
File Name Index Block
24
183
1428
0 1 2 3 4
5 6 7
File B
8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
Start Block
12814
341
Length
File Allocation Table
File B
File Name Index Block
24
Figure 12.12 Indexed Allocation with Variable-Length Portions
M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 575
▶ i-node: a data structure for each file▶ an i-node is in memory only if the file is open
filesopened ↗ ⇒ RAMused ↗
27 / 78
I-node — FCB in UNIX
Directory inode (128B)
Type Mode
User ID Group ID
File size # blocks
# links Flags
Timestamps (×3)
Triple indirect
Double indirect
Single indirect
Direct blocks (×12)
.
..
passwd
fstab
… …
Directory block
File inode (128B)
Type Mode
User ID Group ID
File size # blocks
# links Flags
Timestamps (×3)
Triple indirect
Double indirect
Single indirect
Direct blocks (×12)
Indirect block
inode #
inode #
inode #
inode #
Direct blocks (×512)
Block #s ofmoredirectoryblocks
Block # ofblock with512 singleindirectentries
Block # ofblock with512 doubleindirectentries
File data block
Data
File type Description0 Unknown1 Regular file2 Directory3 Character device4 Block device5 Named pipe6 Socket7 Symbolic link
Mode: 9-bit pattern
28 / 78
Inode QuizGiven: block size is 1KB
pointer size is 4B Addressing: byte offset 9000byte offset 350,000
+----------------+
0 | 4096 |
+----------------+ ---->+----------+ Byte 9000 in a file
1 | 228 | / | 367 | |
+----------------+ / | Data blk | v
2 | 45423 | / +----------+ 8th blk, 808th byte
+----------------+ /
3 | 0 | / -->+------+
+----------------+ / / 0| |
4 | 0 | / / +------+
+----------------+ / / : : :
5 | 11111 | / / +------+ Byte 350,000
+----------------+ / ->+-----+/ 75| 3333 | in a file
6 | 0 | / / 0| 331 | +------+\ |
+----------------+ / / +-----+ : : : \ v
7 | 101 | / / | | +------+ \ 816th byte
+----------------+/ / | : | 255| | \-->+----------+
8 | 367 | / | : | +------+ | 3333 |
+----------------+ / | : | 331 | Data blk |
9 | 0 | / | | Single indirect +----------+
+----------------+ / +-----+
S | 428 (10K+256K) | / 255| |
+----------------+/ +-----+
D | 9156 | 9156 /***********************
+----------------+ Double indirect What about the ZEROs?
T | 824 | ***********************/
+----------------+
29 / 78
UNIX In-Core Data Structure
▶ mount table — info about each mounted FS▶ directory-structure cache holds the dir-info of recently
accessed dirs▶ inode table — an in-core version of the on-disk inode
table▶ file table
▶ global▶ keeps inode of each open file▶ keeps track of
▶ how many processes are associated with each open file▶ where the next read and write will start▶ access rights
▶ user file descriptor table▶ per process▶ identifies all open files for a process
30 / 78
UNIX In-Core Data Structure
.open()/creat()
..
......
1. add entry in each table2. returns a file descriptor — an index into the user file
descriptor table
31 / 78
A File Is Opened By Multiple Processes?
.Two levels of internal tables in the OS..
......
A per-process table tracks all files that a process has open.Stores
▶ the current-file-position pointer (not really)▶ access rights▶ more...
a.k.a file descriptor tableA system-wide table keeps process-independent
information, such as▶ the location of the file on disk▶ access dates▶ file size▶ file open count — the number of processes
opening this file
32 / 78
Per-process FDT
Process 1
+------------------+ System-wide
| ... | open-file table
+------------------+ +------------------+
| Position pointer | | ... |
| Access rights | +------------------+
| ... |\ | ... |
+------------------+ \ +------------------+
| ... | --------->| Location on disk |
+------------------+ | R/W |
| Access dates |
Process 2 | File size |
+------------------+ | Pointer to inode |
| Position pointer | | File-open count |
| Access rights |----------->| ... |
| ... | +------------------+
+------------------+ | ... |
| ... | +------------------+
+------------------+
33 / 78
.A process executes the following code:..
......
fd1 = open(”/etc/passwd”, O_RDONLY);fd2 = open(”local”, O_RDWR);fd3 = open(”/etc/passwd”, O_WRONLY);
user FDT file table inode table
+--------+ +-----------+ +---------------+
0| STDIN | | : | | : |
+--------+ +-----------+ | : |
1| STDOUT | | count R | | : |
+--------+ -->| 1 |\ +---------------+
2| STDERR | / +-----------+ ‘---->| (/etc/passwd) |
+--------+/ | : | ,-->| count 2 |
3| | +-----------+ | +---------------+
+--------+ | count RW | | | : |
4| |---->| 1 |\ / +---------------+
+--------+ +-----------+ \/ | (local) |
5| | | : | /\--->| count 1 |
+--------+\ +-----------+/ +---------------+
: : : \ | count W | | : |
+--------+ -->| 1 | +---------------+
+-----------+
34 / 78
.One more process B:..
......
fd1 = open(”/etc/passwd”, O_RDONLY);fd2 = open(”private”, O_RDONLY);
user FDT
proc A file table
+--------+ +-----------+ inode table
0| STDIN | | : | +---------------+
+--------+ +-----------+ | : |
1| STDOUT | | count R | | : |
+--------+ ------>| 1 |\ +---------------+
2| STDERR | / +-----------+ \--------->| (/etc/passwd) |
+--------+/ | : | ----->| count 3 |
3| | +-----------+ / ->| |
+--------+ | count RW | / / +---------------+
4| |-------->| 1 |\ / / | : |
+--------+ +-----------+ \/ / | : |
5| | | : | /\ / +---------------+
+--------+\ +-----------+/ -------->| (local) |
: : : \ ---->| count R | / | count 1 |
+--------+ \/ | 1 | / +---------------+
/\ +-----------+ / | : |
proc B | \ | : | / | : |
+--------+ | \ +-----------+/ +---------------+
0| STDIN | | -->| count W | ------->| (private) |
+--------+ | | 1 | / | count 1 |
1| STDOUT | | +-----------+ / +---------------+
+--------+ | | : | / | : |
2| STDERR | / +-----------+/ | : |
+--------+/ | count R | +---------------+
3| | .------>| 1 |
+--------+/ +-----------+
4| |
+--------+
: : :
+--------+
35 / 78
Why File Table?To allow a parent and child to share a file position, but toprovide unrelated processes with their own values.
Mode
i-node
Link count
Uid
Gid
File size
Times
Addresses offirst 10
disk blocks
Single indirect
Double indirect
Triple indirect
Parent’sfile
descriptortable
Child’sfile
descriptortable
Unrelatedprocess
filedescriptor
table
Open filedescription
File positionR/W
Pointer to i-node
File positionR/W
Pointer to i-node
Pointers todisk blocks
Tripleindirectblock Double
indirectblock Single
indirectblock
‘
Fig. 10-33. The relation between the file descriptor table, the openfile description table, and the i-node table.
36 / 78
Why File Table?
.Where To Put File Position Info?..
......
Inode table? No. Multiple processes can open the same file.Each one has its own file position.
User file descriptor table? No. Trouble in file sharing.
.Example..
......
#!/bin/bash
echo hello
echo world
Where should the “world” be?
∼$ ./hello.sh > A
37 / 78
Implementing Directories
(a)
games
news
work
attributes
attributes
attributes
attributes
Data structurecontaining theattributes
(b)
games
news
work
Fig. 6-16. (a) A simple directory containing fixed-size entries withthe disk addresses and attributes in the directory entry. (b) A direc-tory in which each entry just refers to an i-node.
(a) A simple directory (Windows)▶ fixed size entries▶ disk addresses and attributes in directory entry
(b) Directory in which each entry just refers to an i-node(UNIX)
38 / 78
How Long A File Name Can Be?
File 1 entry length
File 1 attributes
Pointer to file 1's name
File 1 attributes
Pointer to file 2's name
File 2 attributes
Pointer to file 3's nameFile 2 entry length
File 2 attributes
File 3 entry length
File 3 attributes
p
e
b
e
r
c
u
t
o
t
d
j
-
g
p
e
b
e
r
c
u
t
o
t
d
j
-
g
p
e r s o
n n e l
f o o
p
o
l
e
n
r
n
f o o
s
e
Entry
for one
file
Heap
Entry
for one
file
(a) (b)
File 3 attributes
39 / 78
UNIX Treats a Directory as a File
Directory inode (128B)
Type Mode
User ID Group ID
File size # blocks
# links Flags
Timestamps (×3)
Triple indirect
Double indirect
Single indirect
Direct blocks (×12)
.
..
passwd
fstab
… …
Directory block
File inode (128B)
Type Mode
User ID Group ID
File size # blocks
# links Flags
Timestamps (×3)
Triple indirect
Double indirect
Single indirect
Direct blocks (×12)
Indirect block
inode #
inode #
inode #
inode #
Direct blocks (×512)
Block #s ofmoredirectoryblocks
Block # ofblock with512 singleindirectentries
Block # ofblock with512 doubleindirectentries
File data block
Data
.Example..
......
. 2
.. 2bin 11116545boot 2cdrom 12dev 3...
...
40 / 78
.The steps in looking up /usr/ast/mbox..
......
Root directoryI-node 6 is for /usr
Block 132 is /usr
directory
I-node 26 is for
/usr/ast
Block 406 is /usr/ast directory
Looking up usr yields i-node 6
I-node 6 says that /usr is in
block 132
/usr/ast is i-node
26
/usr/ast/mbox is i-node
60
I-node 26 says that
/usr/ast is in block 406
1
1
4
7
14
9
6
8
.
..
bin
dev
lib
etc
usr
tmp
6
1
19
30
51
26
45
dick
erik
jim
ast
bal
26
6
64
92
60
81
17
grants
books
mbox
minix
src
Mode size
times
132
Mode size
times
406
41 / 78
File Sharing— Multiple Users
User IDs identify users, allowing permissions andprotections to be per-user
Group IDs allow users to be in groups, permitting groupaccess rights
.Example: 9-bit pattern..
......
owner access 7⇒ rwx1 1 1
group access 5⇒ r−x1 0 1
public access 0⇒ −−−0 0 0
42 / 78
File Sharing— Remote File Systems
Uses networking to allow file system access betweensystems
▶ Manually via programs like FTP▶ Automatically, seamlessly using distributed file systems▶ Semi automatically, via the world wide web
Client-server model allows clients to mount remote filesystems from servers
▶ NFS — standard UNIX client-server file sharing protocol▶ CIFS — standard Windows protocol▶ Standard system calls are translated into remote calls
Distributed Information Systems (distributed namingservices)
▶ such as LDAP, DNS, NIS, Active Directory implementunified access to information needed for remotecomputing
43 / 78
File Sharing— Protection
▶ File owner/creator should be able to control:▶ what can be done▶ by whom
▶ Types of access▶ Read▶ Write▶ Execute▶ Append▶ Delete▶ List
44 / 78
Shared Files— Hard Links vs. Soft Links
Root directory
B
B B C
C C
CA
B C
B
? C C C
A
Shared file
Fig. 6-18. File system containing a shared file. 45 / 78
.Hard Links..
......
Hard links + the same inode
46 / 78
.Drawback..
......
C's directory B's directory B's directoryC's directory
Owner = C Count = 1
Owner = C Count = 2
Owner = C Count = 1
(a) (b) (c)
47 / 78
.Symbolic Links..
......
A symbolic link has its own inode + a directory entry.
48 / 78
Disk Space Management— Statistics
49 / 78
▶ Block size is chosen while creating the FS▶ Disk I/O performance is conflict with space utilization
▶ smaller block size ⇒ better space utilization▶ larger block size ⇒ better disk I/O performance
∼$ dumpe2fs /dev/sda1 | grep ”Block size”
50 / 78
Keeping Track of Free Blocks
1. Linked List10.5 Free-Space Management 443
0 1 2 3
4 5 7
8 9 10 11
12 13 14
16 17 18 19
20 21 22 23
24 25 26 27
28 29 30 31
15
6
free-space list head
Figure 10.10 Linked free-space list on disk.
of a large number of free blocks can now be found quickly, unlike the situationwhen the standard linked-list approach is used.
10.5.4 Counting
Another approach takes advantage of the fact that, generally, several contigu-ous blocks may be allocated or freed simultaneously, particularly when space isallocated with the contiguous-allocation algorithm or through clustering. Thus,rather than keeping a list of n free disk addresses, we can keep the address ofthe first free block and the number (n) of free contiguous blocks that follow thefirst block. Each entry in the free-space list then consists of a disk address anda count. Although each entry requires more space than would a simple diskaddress, the overall list is shorter, as long as the count is generally greater than1. Note that this method of tracking free space is similar to the extent methodof allocating blocks. These entries can be stored in a B-tree, rather than a linkedlist, for efficient lookup, insertion, and deletion.
10.5.5 Space Maps
Sun’s ZFS file system was designed to encompass huge numbers of files,directories, and even file systems (in ZFS, we can create file-system hierarchies).The resulting data structures could have been large and inefficient if they hadnot been designed and implemented properly. On these scales, metadata I/Ocan have a large performance impact. Consider, for example, that if the free-space list is implemented as a bit map, bit maps must be modified both whenblocks are allocated and when they are freed. Freeing 1 GB of data on a 1-TBdisk could cause thousands of blocks of bit maps to be updated, because thosedata blocks could be scattered over the entire disk.
2. Bit map (n blocks)
0 1 2 3 4 5 6 7 8 .. n-1
+-+-+-+-+-+-+-+-+-+-//-+-+
|0|0|1|0|1|1|1|0|1| .. |0|
+-+-+-+-+-+-+-+-+-+-//-+-+
bit[i] ={0⇒ block[i] is free1⇒ block[i] is occupied
51 / 78
Journaling File Systems
.Operations required to remove a file in UNIX:..
......
1. Remove the file from its directory- set inode number to 0
2. Release the i-node to the pool of free i-nodes- clear the bit in inode bitmap
3. Return all the disk blocks to the pool of free disk blocks- clear the bits in block bitmap
What if crash occurs between 1 and 2, or between 2 and 3?
52 / 78
Journaling File Systems
.Keep a log of what the file system is going to dobefore it does it..
......
▶ so that if the system crashes before it can do its plannedwork, upon rebooting the system can look in the log tosee what was going on at the time of the crash andfinish the job.
▶ NTFS, EXT3, and ReiserFS use journaling among others
53 / 78
Ext2 File System
.Physical Layout..
......
+------------+---------------+---------------+--//--+---------------+
| Boot Block | Block Group 0 | Block Group 1 | | Block Group n |
+------------+---------------+---------------+--//--+---------------+
__________________________/ \_____________
/ \
+-------+-------------+------------+--------+-------+--------+
| Super | Group | Data Block | inode | inode | Data |
| Block | Descriptors | Bitmap | Bitmap | Table | Blocks |
+-------+-------------+------------+--------+-------+--------+
1 blk n blks 1 blk 1 blk n blks n blks
54 / 78
Ext2 Block groups
.The partition is divided into Block Groups..
......
▶ Block groups are same size — easy locating▶ Kernel tries to keep a file’s data blocks in the same
block group — reduce fragmentation▶ Backup critical info in each block group▶ The Ext2 inodes for each block group are kept in the
inode table▶ The inode-bitmap keeps track of allocated and
unallocated inodes
55 / 78
.Group descriptor..
......
▶ Each block group has a group descriptor▶ All the group descriptors together make the groupdescriptor table
▶ The table is stored along with the superblock▶ Block Bitmap: tracks free blocks▶ Inode Bitmap: tracks free inodes▶ Inode Table: all inodes in this block group▶ Free blocks count, Free Inodes count, Used directory
count: counters▶ see more: ∼# dumpe2fs /dev/sda1
56 / 78
Ext2 Block Allocation Policies626 Chapter 15 The Linux System
allocating scattered free blocks
allocating continuous free blocks
block in use bit boundaryblock selectedby allocator
free block byte boundarybitmap search
Figure 15.9 ext2fs block-allocation policies.
these extra blocks to the file. This preallocation helps to reduce fragmentationduring interleaved writes to separate files and also reduces the CPU cost ofdisk allocation by allocating multiple blocks simultaneously. The preallocatedblocks are returned to the free-space bitmap when the file is closed.
Figure 15.9 illustrates the allocation policies. Each row represents asequence of set and unset bits in an allocation bitmap, indicating used andfree blocks on disk. In the first case, if we can find any free blocks sufficientlynear the start of the search, then we allocate them no matter how fragmentedthey may be. The fragmentation is partially compensated for by the fact thatthe blocks are close together and can probably all be read without any diskseeks, and allocating them all to one file is better in the long run than allocatingisolated blocks to separate files once large free areas become scarce on disk. Inthe second case, we have not immediately found a free block close by, so wesearch forward for an entire free byte in the bitmap. If we allocated that byteas a whole, we would end up creating a fragmented area of free space betweenit and the allocation preceding it, so before allocating we back up to make thisallocation flush with the allocation preceding it, and then we allocate forwardto satisfy the default allocation of eight blocks.
15.7.3 Journaling
One popular feature in a file system is journaling, whereby modificationsto the file system are sequentially written to a journal. A set of operationsthat performs a specific task is a transaction. Once a transaction is written tothe journal, it is considered to be committed, and the system call modifyingthe file system (write()) can return to the user process, allowing it tocontinue execution. Meanwhile, the journal entries relating to the transactionare replayed across the actual file-system structures. As the changes are made, a
57 / 78
Maths
Given block size = 4kblock bitmap = 1 blk , then
blocks per group = 8bits× 4k = 32k
How large is a group?
group size = 32k× 4k = 128MB
How many block groups are there?
≈ partition sizegroup size =
partition size128M
How many files can I have in max?
≈ partition sizeblock size =
partition size4k
58 / 78
Ext2 inode
59 / 78
.Ext2 inode..
......
Mode: holds two pieces of information1. Is it a
{file|dir|sym-link|blk-dev|char-dev|FIFO}?2. Permissions
Owner info: Owners’ ID of this file or directorySize: The size of the file in bytes
Timestamps: Accessed, created, last modified timeDatablocks: 15 pointers to data blocks (12 + S+D+ T)
60 / 78
.Max File Size..
......
Given: {block size = 4kpointer size = 4B
,
We get:
Max File Size = number of pointers× block size
= (
number of pointers︷ ︸︸ ︷12︸︷︷︸
direct
+ 1k︸︷︷︸1−indirect
+ 1k× 1k︸ ︷︷ ︸2−indirect
+1k× 1k× 1k︸ ︷︷ ︸3−indirect
)× 4k
= 48k+ 4M+ 4G+ 4T
61 / 78
Ext2 Superblock
▶ Magic Number: 0xEF53▶ Revision Level: determines what new features are
available▶ Mount Count and Maximum Mount Count: determines if
the system should be fully checked▶ Block Group Number: indicates the block group holding
this superblock▶ Block Size: usually 4k▶ Blocks per Group: 8bits× block size▶ Free Blocks: System-wide free blocks▶ Free Inodes: System-wide free inodes▶ First Inode: First inode number in the file system▶ see more: ∼# dumpe2fs /dev/sda1
62 / 78
Ext2 File Types
File type Description0 Unknown1 Regular file2 Directory3 Character device4 Block device5 Named pipe6 Socket7 Symbolic link
Device file, pipe, and socket: No data blocks are required.All info is stored in the inode
Fast symbolic link: Short path name (< 60 chars) needs nodata block. Can be stored in the 15 pointerfields
63 / 78
Ext2 Directories0 11|12 23|24 39|40
+----+--+-+-+----+----+--+-+-+----+----+--+-+-+----+----+--//--+
| 21 |12|1|2|. | 22 |12|2|2|.. | 53 |16|5|2|hell|o | |
+----+--+-+-+----+----+--+-+-+----+----+--+-+-+----+----+--//--+
,--------> inode number
| ,---> record length
| | ,---> name length
| | | ,---> file type
| | | | ,----> name
+----+--+-+-+----+
0 | 21 |12|1|2|. |
+----+--+-+-+----+
12| 22 |12|2|2|.. |
+----+--+-+-+----+----+
24| 53 |16|5|2|hell|o |
+----+--+-+-+----+----+
40| 67 |28|3|2|usr |
+----+--+-+-+----+----+
52| 0 |16|7|1|oldf|ile |
+----+--+-+-+----+----+
68| 34 |12|4|2|sbin|
+----+--+-+-+----+
▶ Directories are special files▶ “.” and “..” first▶ Padding to 4×▶ inode number is 0 — deleted
file
64 / 78
Many different FS are in use
.Windows........
uses drive letter (C:, D:, ...) to identify each FS
.UNIX..
......
integrates multiple FS into a single structure▶ From user’s view, there is only one FS hierarchy
∼$ man fs
65 / 78
Virtural File Systems.Put common parts of all FS in a separate layer..
......
▶ It’s a layer in the kernel▶ It’s a common interface to several kinds of file systems▶ It calls the underlying concrete FS to actual manage the
data
User process
FS 1 FS 2 FS 3
Buffer cache
Virtual file system
File system
VFS interface
POSIX
66 / 78
67 / 78
.Virtual File System..
......
▶ Manages kernel level file abstractions in one format forall file systems
▶ Receives system call requests from user level (e.g.write, open, stat, link)
▶ Interacts with a specific file system based on mountpoint traversal
▶ Receives requests from other parts of the kernel, mostlyfrom memory management
.Real File Systems..
......
▶ managing file & directory data▶ managing meta-data: timestamps, owners, protection,
etc.▶ disk data, NFS data... translate←−−−−−−−−−→ VFS data
68 / 78
File System Mounting
/
a b a
c
p q r q q r
d
/
c d
b
Diskette
/
Hard diskHard disk
x y z
x y z
Fig. 10-26. (a) Separate file systems. (b) After mounting.
69 / 78
A FS must be mounted before it can be used.Mount — The file system is registered with theVFS..
......
▶ The superblock is read into the VFS superblock▶ The table of addresses of functions the VFS requires is
read into the VFS superblock▶ The FS’ topology info is mapped onto the VFS
superblock data structure
.The VFS keeps a list of the mounted file systemstogether with their superblocks..
......
The VFS superblock contains:▶ Device, blocksize▶ Pointer to the root inode▶ Pointer to a set of superblock routines▶ Pointer to file_system_type data structure▶ more...
70 / 78
V-node
▶ Every file/directory in the VFS has a VFS inode, kept inthe VFS inode cache
▶ The real FS builds the VFS inode from its own info.Like the EXT2 inodes, the VFS inodes describe..
......
▶ files and directories within the system▶ the contents and topology of the Virtual File System
71 / 78
VFS Operation
...
Process table
0
File descriptors
...
V-nodes
openreadwrite
Function pointers
...2
4
VFS
Read function
FS 1
Call from VFS into FS 1
72 / 78
Linux VFS
.The Common File Model..
......
All other filesystems must map their own concepts into thecommon file model
For example, FAT filesystems do not have inodes.
▶ The main components of the common file model are- superblock – information about mounted filesystem- inode – information about a specific file- file – information about an open file- dentry – information about directory entry
▶ Geared toward Unix FS
73 / 78
.The Superblock Object..
......
▶ is implemented by each FS and is used to storeinformation describing that specific FS
▶ usually corresponds to the filesystem superblock or thefilesystem control block
▶ Filesystems that are not disk-based (such as sysfs, proc)generate the superblock on-the-fly and store it inmemory
▶ struct super_block in <linux/fs.h>▶ s_op in struct super_block + struct super_operations —
the superblock operations table▶ Each item in this table is a pointer to a function that
operates on a superblock object
74 / 78
.The Inode Object..
......
▶ For Unix-style filesystems, this information is simplyread from the on-disk inode
▶ For others, the inode object is constructed in memory inwhatever manner is applicable to the filesystem
▶ struct inode in <linux/fs.h>▶ An inode represents each file on a FS, but the inodeobject is constructed in memory only as files areaccessed
▶ includes special files, such as device files or pipes▶ i_op + struct inode_operations
75 / 78
.The Dentry Object..
......
▶ components in a path▶ makes path name lookup easier▶ struct dentry in <linux/dcache.h>▶ created on-the-fly from a string representation of a path
name
76 / 78
.Dentry State..
......
▶ used▶ unused▶ negative
.Dentry Cache..
......
consists of three parts:1. Lists of “used” dentries2. A doubly linked “least recently used” list of unused and
negative dentry objects3. A hash table and hashing function used to quickly
resolve a given path into the associated dentry object
77 / 78
.The File Object..
......
▶ is the in-memory representation of an open file▶ open() ⇒ create; close() ⇒ destroy▶ there can be multiple file objects in existence for the
same file▶ Because multiple processes can open and manipulate a
file at the same time▶ struct file in <linux/fs.h>
78 / 78