Operating Systems Slides 7 - File Systems

78
File Systems Wang Xiaolin June 19, 2013 [email protected] 1 / 78

description

handouts version: http://cs2.swfu.edu.cn/~wx672/lecture_notes/os/slides/fs-a.pdf

Transcript of Operating Systems Slides 7 - File Systems

Page 1: Operating Systems Slides 7 - File Systems

File Systems

Wang Xiaolin

June 19, 2013

) [email protected]

1 / 78

Page 2: Operating Systems Slides 7 - File Systems

Long-term Information Storage Requirements

▶ Must store large amounts of data▶ Information stored must survive the termination of the

process using it▶ Multiple processes must be able to access the

information concurrently

2 / 78

Page 3: Operating Systems Slides 7 - File Systems

File-System Structure

.File-system design addressing two problems:..

......

1. defining how the FS should look to the user▶ defining a file and its attributes▶ the operations allowed on a file▶ directory structure

2. creating algorithms and data structures to map thelogical FS onto the physical disk

3 / 78

Page 4: Operating Systems Slides 7 - File Systems

File-System — A Layered Design

APPs⇓

Logical FS⇓

File-org module⇓

Basic FS⇓

I/O ctrl⇓

Devices

▶ logical file system — managesmetadata information

- maintains all of the file-systemstructure (directory structure, FCB)

- responsible for protection andsecurity

▶ file-organization module- logical block

addresstranslate−−−−−→ physical block

address- keeps track of free blocks

▶ basic file system issues genericcommands to device driver, e.g

- “read drive 1, cylinder 72, track 2,sector 10”

▶ I/O Control — device drivers, and INThandlers

- device driver:high-levelcommands

translate−−−−−→ hardware-specificinstructions

4 / 78

Page 5: Operating Systems Slides 7 - File Systems

The Operating Structure

APPs⇓

Logical FS⇓

File-org module⇓

Basic FS⇓

I/O ctrl⇓

Devices

.Example — To create a file..

......

1. APP calls creat()2. Logical FS

2.1 allocates a new FCB2.2 updates the in-mem dir structure2.3 writes it back to disk2.4 calls the file-org module

3. file-organization module3.1 maps the directory I/O into disk-block

numbers3.2 allocates blocks for storing the file’s

data

.Benefit of layered design..

......The I/O control and sometimes the basic file system codecan be used by multiple file systems.

5 / 78

Page 6: Operating Systems Slides 7 - File Systems

File— A Logical View Of Information Storage

.User’s view..

......

A file is the smallest storage unit on disk.▶ Data cannot be written to disk unless they are within a file

.UNIX view..

......

Each file is a sequence of 8-bit bytes▶ It’s up to the application program to interpret this byte

stream.

6 / 78

Page 7: Operating Systems Slides 7 - File Systems

File— What Is Stored In A File?

Source code, object files, executable files, shell scripts,PostScript....Different type of files have different structure..

......

▶ UNIX looks at contents to determine typeShell scripts start with “#!”

PDF start with “%PDF...”Executables start with magic number

▶ Windows uses file naming conventionsexecutables end with “.exe” and “.com”

MS-Word end with “.doc”MS-Excel end with “.xls”

7 / 78

Page 8: Operating Systems Slides 7 - File Systems

File Naming

.Vary from system to system..

......

▶ Name length?▶ Characters? Digits? Special characters?▶ Extension?▶ Case sensitive?

8 / 78

Page 9: Operating Systems Slides 7 - File Systems

File Types

Regular files: ASCII, binaryDirectories: Maintaining the structure of the FS

.In UNIX, everything is a file..

......

Character special files: I/O related, such as terminals,printers ...

Block special files: Devices that can contain file systems,i.e. disks

disks — logically, linear collections ofblocks; disk driver translates theminto physical block addresses

9 / 78

Page 10: Operating Systems Slides 7 - File Systems

.Binary files..

......(a) (b)

Header

Header

Header

Magic number

Text size

Data size

BSS size

Symbol table size

Entry point

Flags

Text

Data

Relocationbits

Symboltable

Objectmodule

Objectmodule

Objectmodule

Modulename

Date

Owner

Protection

Size

���H

eade

r

Fig. 6-3. (a) An executable file. (b) An archive.

An UNIX executable file An UNIX archive

10 / 78

Page 11: Operating Systems Slides 7 - File Systems

File Attributes — Metadata

▶ Name only information kept in human-readable form▶ Identifier unique tag (number) identifies file within file

system▶ Type needed for systems that support different types▶ Location pointer to file location on device▶ Size current file size▶ Protection controls who can do reading, writing,

executing▶ Time, date, and user identification data for protection,

security, and usage monitoring

11 / 78

Page 12: Operating Systems Slides 7 - File Systems

File OperationsPOSIX file system calls

1. fd = creat(name, mode)2. fd = open(name, flags)3. status = close(fd)4. byte_count = read(fd, buffer, byte_count)5. byte_count = write(fd, buffer, byte_count)6. offset = lseek(fd, offset, whence)7. status = link(oldname, newname)8. status = unlink(name)9. status = truncate(name, size)

10. status = ftruncate(fd, size)11. status = stat(name, buffer)12. status = fstat(fd, buffer)13. status = utimes(name, times)14. status = chown(name, owner, group)15. status = fchown(fd, owner, group)16. status = chmod(name, mode)17. status = fchmod(fd, mode)

12 / 78

Page 13: Operating Systems Slides 7 - File Systems

.An Example Program Using File System Calls..

......

/* File copy program. Error checking and reporting is minimal. */

#include <sys/types.h> /* include necessary header files */#include <fcntl.h>#include <stdlib.h>#include <unistd.h>

int main(int argc, char *argv[]); /* ANSI prototype */

#define BUF3SIZE 4096 /* use a buffer size of 4096 bytes */#define OUTPUT3MODE 0700 /* protection bits for output file */

int main(int argc, char *argv[]){

int in3 fd, out3 fd, rd3count, wt3count;char buffer[BUF3SIZE];

if (argc != 3) exit(1); /* syntax error if argc is not 3 */

/* Open the input file and create the output file */in3fd = open(argv[1], O3RDONLY); /* open the source file */if (in3 fd < 0) exit(2); /* if it cannot be opened, exit */out3 fd = creat(argv[2], OUTPUT3MODE); /* create the destination file */if (out3fd < 0) exit(3); /* if it cannot be created, exit */

/* Copy loop */while (TRUE) {

rd3count = read(in3 fd, buffer, BUF3SIZE); /* read a block of data */if (rd3count <= 0) break; /* if end of file or error, exit loop */

wt3count = write(out3fd, buffer, rd3count); /* write data */if (wt3count <= 0) exit(4); /* wt3count <= 0 is an error */

}

/* Close the files */close(in3fd);close(out3 fd);if (rd3count == 0) /* no error on last read */

exit(0);else

exit(5); /* error on last read */}

Fig. 6-5. A simple program to copy a file.13 / 78

Page 14: Operating Systems Slides 7 - File Systems

open()

.fd open(pathname, flags)..

......

A per-process open-file table is kept in the OS▶ upon a successful open() syscall, a new entry is added into

this table▶ indexed by file descriptor (fd)

To see files opened by a process, e.g. init∼$ lsof -p 1

.Why open() is needed?..

......

To avoid constant searching▶ Without open(), every file operation involves searching

the directory for the file.

14 / 78

Page 15: Operating Systems Slides 7 - File Systems

Directories— Single-Level Directory Systems

All files are contained in the same directory.

......

Root directory

A A B C

Fig. 6-7. A single-level directory system containing four files,owned by three different people, A, B, and C.

- contains 4 files- owned by 3 different

people, A, B, and C

.Limitations..

......

- name collision- file searching

Often used on simple embedded devices, such as telephone,digital cameras...

15 / 78

Page 16: Operating Systems Slides 7 - File Systems

Directories— Two-level Directory Systems

.A separate directory for each user..

......

Files

Userdirectory

A A

A B

B

C

CC C

Root directory

Fig. 6-8. A two-level directory system. The letters indicate theowners of the directories and files.

Limitation: hard to access others files

16 / 78

Page 17: Operating Systems Slides 7 - File Systems

Directories— Hierarchical Directory Systems

Userdirectory

User subdirectoriesC C

C

C C

C

B

B

A

A

B

B

C C

C

B

Root directory

User file

Fig. 6-9. A hierarchical directory system. 17 / 78

Page 18: Operating Systems Slides 7 - File Systems

Directories— Path Names

ROOT

bin boot dev e t c h o m e var

grub p a s s w d staff s t u d mail

w x 6 7 2 2 0 0 8 1 1 5 2 0 0 1

dir

file

2 0 0 8 1 1 5 2 0 0 1

18 / 78

Page 19: Operating Systems Slides 7 - File Systems

Directories— Directory Operations

Create Delete Rename LinkOpendir Closedir Readdir Unlink

19 / 78

Page 20: Operating Systems Slides 7 - File Systems

File System Implementation.A typical file system layout..

......

|<---------------------- Entire disk ------------------------>|

+-----+-------------+-------------+-------------+-------------+

| MBR | Partition 1 | Partition 2 | Partition 3 | Partition 4 |

+-----+-------------+-------------+-------------+-------------+

_______________________________/ \____________

/ \

+---------------+-----------------+--------------------+---//--+

| Boot Ctrl Blk | Volume Ctrl Blk | Dir Structure | Files |

| (MBR copy) | (Super Blk) | (inodes, root dir) | dirs |

+---------------+-----------------+--------------------+---//--+

|<-------------Master Boot Record (512 Bytes)------------>|

0 439 443 445 509 511

+----//-----+----------+------+------//---------+---------+

| code area | disk-sig | null | partition table | MBR-sig |

| 440 | 4 | 2 | 16x4=64 | 0xAA55 |

+----//-----+----------+------+------//---------+---------+

20 / 78

Page 21: Operating Systems Slides 7 - File Systems

On-Disk Information Structure

Boot control block a MBR copyUFS: boot block

NTFS: partition boot sectorVolume control block Contains volume details

number of blocks size of blocksfree-block count free-block pointersfree FCB count free FCB pointers

UFS: superblockNTFS: Master File Table

Directory structure Organizes the files FCBFile controlblock (FCB) contains file details (metadata).

UFS: i-nodeNTFS: stored in MFT using a relatiional database

structure, with one row per file

21 / 78

Page 22: Operating Systems Slides 7 - File Systems

Each File-System Has a Superblock

Superblock keeps information about the file system:▶ Type — ext2, ext3, ext4...▶ Size▶ Status — how it’s mounted, free blocks, free inodes, ...▶ Information about other metadata structures

∼# dumpe2fs /dev/sda1 | grep -i superblock

22 / 78

Page 23: Operating Systems Slides 7 - File Systems

Implementing FilesContiguous Allocation

572 CHAPTER 12 / FILE MANAGEMENT

access, degree of multiprogramming, other performance factors in the system,disk caching, disk scheduling, and so on.

File Allocation Methods Having looked at the issues of preallocation versusdynamic allocation and portion size, we are in a position to consider specific file al-location methods. Three methods are in common use: contiguous, chained, and in-dexed. Table 12.3 summarizes some of the characteristics of each method.

With contiguous allocation, a single contiguous set of blocks is allocated to afile at the time of file creation (Figure 12.7). Thus, this is a preallocation strategy,using variable-size portions. The file allocation table needs just a single entry foreach file, showing the starting block and the length of the file. Contiguous allocationis the best from the point of view of the individual sequential file. Multiple blockscan be read in at a time to improve I/O performance for sequential processing. It isalso easy to retrieve a single block. For example, if a file starts at block b, and the ithblock of the file is wanted, its location on secondary storage is simply b $ i % 1. Con-tiguous allocation presents some problems. External fragmentation will occur, mak-ing it difficult to find contiguous blocks of space of sufficient length. From time totime, it will be necessary to perform a compaction algorithm to free up additional

Table 12.3 File Allocation Methods

Contiguous Chained Indexed

Preallocation? Necessary Possible Possible

Fixed or variable size portions? Variable Fixed blocks Fixed blocks Variable

Portion size Large Small Small Medium

Allocation frequency Once Low to high High Low

Time to allocate Medium Long Short Medium

File allocation table size One entry One entry Large Medium

0 1 2 3 4

5 6 7

File A

File Allocation Table

File B

File C

File E

File D

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File Name

File AFile BFile CFile DFile E

29183026

35823

Start Block Length

Figure 12.7 Contiguous File Allocation

M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 572

- simple;- good for read only;

- fragmentation

23 / 78

Page 24: Operating Systems Slides 7 - File Systems

Linked List (Chained) Allocation A pointer in each diskblock

12.6 / SECONDARY STORAGE MANAGEMENT 573

space on the disk (Figure 12.8).Also, with preallocation, it is necessary to declare thesize of the file at the time of creation, with the problems mentioned earlier.

At the opposite extreme from contiguous allocation is chained allocation(Figure 12.9). Typically, allocation is on an individual block basis. Each block con-tains a pointer to the next block in the chain. Again, the file allocation table needsjust a single entry for each file, showing the starting block and the length of the file.Although preallocation is possible, it is more common simply to allocate blocks asneeded. The selection of blocks is now a simple matter: any free block can be addedto a chain. There is no external fragmentation to worry about because only one

Figure 12.9 Chained Allocation

0 1 2 3 4

5 6 7

File A

File Allocation Table

File B

File C

File E File D

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File Name

File AFile BFile CFile DFile E

0381916

35823

Start Block Length

Figure 12.8 Contiguous File Allocation (After Compaction)

0 1 2 3 4

5 6 7

File Allocation Table

File B

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File B

File Name Start Block Length

1 5

M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 573

- no wasteblock;

- slow randomaccess;

- not 2n

24 / 78

Page 25: Operating Systems Slides 7 - File Systems

Linked List (Chained) Allocation Though there is noexternal fragmentation, consolidation is stillpreferred.574 CHAPTER 12 / FILE MANAGEMENT

block at a time is needed.This type of physical organization is best suited to sequen-tial files that are to be processed sequentially. To select an individual block of a filerequires tracing through the chain to the desired block.

One consequence of chaining, as described so far, is that there is no accommo-dation of the principle of locality. Thus, if it is necessary to bring in several blocks ofa file at a time, as in sequential processing, then a series of accesses to different partsof the disk are required. This is perhaps a more significant effect on a single-usersystem but may also be of concern on a shared system. To overcome this problem,some systems periodically consolidate files (Figure 12.10).

Indexed allocation addresses many of the problems of contiguous and chainedallocation. In this case, the file allocation table contains a separate one-level index foreach file; the index has one entry for each portion allocated to the file. Typically, thefile indexes are not physically stored as part of the file allocation table. Rather, thefile index for a file is kept in a separate block, and the entry for the file in the file al-location table points to that block.Allocation may be on the basis of either fixed-sizeblocks (Figure 12.11) or variable-size portions (Figure 12.12). Allocation by blockseliminates external fragmentation, whereas allocation by variable-size portions im-proves locality. In either case, file consolidation may be done from time to time. Fileconsolidation reduces the size of the index in the case of variable-size portions, butnot in the case of block allocation. Indexed allocation supports both sequential anddirect access to the file and thus is the most popular form of file allocation.

Free Space Management

Just as the space that is allocated to files must be managed, so the space that is notcurrently allocated to any file must be managed. To perform any of the file alloca-tion techniques described previously, it is necessary to know what blocks on the diskare available. Thus we need a disk allocation table in addition to a file allocationtable. We discuss here a number of techniques that have been implemented.

0 1 2 3 4

5 6 7

File Allocation Table

File B

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File B

File Name Start Block Length

0 5

Figure 12.10 Chained Allocation (After Consolidation)

M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 574

25 / 78

Page 26: Operating Systems Slides 7 - File Systems

FAT: Linked list allocation with a table in RAM.

......

▶ Taking the pointer out of eachdisk block, and putting it into atable in memory

▶ fast random access (chain is inRAM)

▶ is 2n

▶ the entire table must be in RAM

disk↗⇒ FAT↗⇒ RAMused ↗

Physicalblock

File A starts here

File B starts here

Unused block

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

10

11

7

3

2

12

14

-1

-1

Fig. 6-14. Linked list allocation using a file allocation table inmain memory.

26 / 78

Page 27: Operating Systems Slides 7 - File Systems

Indexed Allocation 12.6 / SECONDARY STORAGE MANAGEMENT 575

Bit Tables This method uses a vector containing one bit for each block on thedisk. Each entry of a 0 corresponds to a free block, and each 1 corresponds to ablock in use. For example, for the disk layout of Figure 12.7, a vector of length 35 isneeded and would have the following value:

00111000011111000011111111111011000

A bit table has the advantage that it is relatively easy to find one or a con-tiguous group of free blocks. Thus, a bit table works well with any of the file allo-cation methods outlined. Another advantage is that it is as small as possible.

Figure 12.11 Indexed Allocation with Block Portions

0 1 2 3 4

5 6 7

File Allocation Table

File B

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File B

File Name Index Block

24

183

1428

0 1 2 3 4

5 6 7

File B

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

Start Block

12814

341

Length

File Allocation Table

File B

File Name Index Block

24

Figure 12.12 Indexed Allocation with Variable-Length Portions

M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 575

▶ i-node: a data structure for each file▶ an i-node is in memory only if the file is open

filesopened ↗ ⇒ RAMused ↗

27 / 78

Page 28: Operating Systems Slides 7 - File Systems

I-node — FCB in UNIX

Directory inode (128B)

Type Mode

User ID Group ID

File size # blocks

# links Flags

Timestamps (×3)

Triple indirect

Double indirect

Single indirect

Direct blocks (×12)

.

..

passwd

fstab

… …

Directory block

File inode (128B)

Type Mode

User ID Group ID

File size # blocks

# links Flags

Timestamps (×3)

Triple indirect

Double indirect

Single indirect

Direct blocks (×12)

Indirect block

inode #

inode #

inode #

inode #

Direct blocks (×512)

Block #s ofmoredirectoryblocks

Block # ofblock with512 singleindirectentries

Block # ofblock with512 doubleindirectentries

File data block

Data

File type Description0 Unknown1 Regular file2 Directory3 Character device4 Block device5 Named pipe6 Socket7 Symbolic link

Mode: 9-bit pattern

28 / 78

Page 29: Operating Systems Slides 7 - File Systems

Inode QuizGiven: block size is 1KB

pointer size is 4B Addressing: byte offset 9000byte offset 350,000

+----------------+

0 | 4096 |

+----------------+ ---->+----------+ Byte 9000 in a file

1 | 228 | / | 367 | |

+----------------+ / | Data blk | v

2 | 45423 | / +----------+ 8th blk, 808th byte

+----------------+ /

3 | 0 | / -->+------+

+----------------+ / / 0| |

4 | 0 | / / +------+

+----------------+ / / : : :

5 | 11111 | / / +------+ Byte 350,000

+----------------+ / ->+-----+/ 75| 3333 | in a file

6 | 0 | / / 0| 331 | +------+\ |

+----------------+ / / +-----+ : : : \ v

7 | 101 | / / | | +------+ \ 816th byte

+----------------+/ / | : | 255| | \-->+----------+

8 | 367 | / | : | +------+ | 3333 |

+----------------+ / | : | 331 | Data blk |

9 | 0 | / | | Single indirect +----------+

+----------------+ / +-----+

S | 428 (10K+256K) | / 255| |

+----------------+/ +-----+

D | 9156 | 9156 /***********************

+----------------+ Double indirect What about the ZEROs?

T | 824 | ***********************/

+----------------+

29 / 78

Page 30: Operating Systems Slides 7 - File Systems

UNIX In-Core Data Structure

▶ mount table — info about each mounted FS▶ directory-structure cache holds the dir-info of recently

accessed dirs▶ inode table — an in-core version of the on-disk inode

table▶ file table

▶ global▶ keeps inode of each open file▶ keeps track of

▶ how many processes are associated with each open file▶ where the next read and write will start▶ access rights

▶ user file descriptor table▶ per process▶ identifies all open files for a process

30 / 78

Page 31: Operating Systems Slides 7 - File Systems

UNIX In-Core Data Structure

.open()/creat()

..

......

1. add entry in each table2. returns a file descriptor — an index into the user file

descriptor table

31 / 78

Page 32: Operating Systems Slides 7 - File Systems

A File Is Opened By Multiple Processes?

.Two levels of internal tables in the OS..

......

A per-process table tracks all files that a process has open.Stores

▶ the current-file-position pointer (not really)▶ access rights▶ more...

a.k.a file descriptor tableA system-wide table keeps process-independent

information, such as▶ the location of the file on disk▶ access dates▶ file size▶ file open count — the number of processes

opening this file

32 / 78

Page 33: Operating Systems Slides 7 - File Systems

Per-process FDT

Process 1

+------------------+ System-wide

| ... | open-file table

+------------------+ +------------------+

| Position pointer | | ... |

| Access rights | +------------------+

| ... |\ | ... |

+------------------+ \ +------------------+

| ... | --------->| Location on disk |

+------------------+ | R/W |

| Access dates |

Process 2 | File size |

+------------------+ | Pointer to inode |

| Position pointer | | File-open count |

| Access rights |----------->| ... |

| ... | +------------------+

+------------------+ | ... |

| ... | +------------------+

+------------------+

33 / 78

Page 34: Operating Systems Slides 7 - File Systems

.A process executes the following code:..

......

fd1 = open(”/etc/passwd”, O_RDONLY);fd2 = open(”local”, O_RDWR);fd3 = open(”/etc/passwd”, O_WRONLY);

user FDT file table inode table

+--------+ +-----------+ +---------------+

0| STDIN | | : | | : |

+--------+ +-----------+ | : |

1| STDOUT | | count R | | : |

+--------+ -->| 1 |\ +---------------+

2| STDERR | / +-----------+ ‘---->| (/etc/passwd) |

+--------+/ | : | ,-->| count 2 |

3| | +-----------+ | +---------------+

+--------+ | count RW | | | : |

4| |---->| 1 |\ / +---------------+

+--------+ +-----------+ \/ | (local) |

5| | | : | /\--->| count 1 |

+--------+\ +-----------+/ +---------------+

: : : \ | count W | | : |

+--------+ -->| 1 | +---------------+

+-----------+

34 / 78

Page 35: Operating Systems Slides 7 - File Systems

.One more process B:..

......

fd1 = open(”/etc/passwd”, O_RDONLY);fd2 = open(”private”, O_RDONLY);

user FDT

proc A file table

+--------+ +-----------+ inode table

0| STDIN | | : | +---------------+

+--------+ +-----------+ | : |

1| STDOUT | | count R | | : |

+--------+ ------>| 1 |\ +---------------+

2| STDERR | / +-----------+ \--------->| (/etc/passwd) |

+--------+/ | : | ----->| count 3 |

3| | +-----------+ / ->| |

+--------+ | count RW | / / +---------------+

4| |-------->| 1 |\ / / | : |

+--------+ +-----------+ \/ / | : |

5| | | : | /\ / +---------------+

+--------+\ +-----------+/ -------->| (local) |

: : : \ ---->| count R | / | count 1 |

+--------+ \/ | 1 | / +---------------+

/\ +-----------+ / | : |

proc B | \ | : | / | : |

+--------+ | \ +-----------+/ +---------------+

0| STDIN | | -->| count W | ------->| (private) |

+--------+ | | 1 | / | count 1 |

1| STDOUT | | +-----------+ / +---------------+

+--------+ | | : | / | : |

2| STDERR | / +-----------+/ | : |

+--------+/ | count R | +---------------+

3| | .------>| 1 |

+--------+/ +-----------+

4| |

+--------+

: : :

+--------+

35 / 78

Page 36: Operating Systems Slides 7 - File Systems

Why File Table?To allow a parent and child to share a file position, but toprovide unrelated processes with their own values.

Mode

i-node

Link count

Uid

Gid

File size

Times

Addresses offirst 10

disk blocks

Single indirect

Double indirect

Triple indirect

Parent’sfile

descriptortable

Child’sfile

descriptortable

Unrelatedprocess

filedescriptor

table

Open filedescription

File positionR/W

Pointer to i-node

File positionR/W

Pointer to i-node

Pointers todisk blocks

Tripleindirectblock Double

indirectblock Single

indirectblock

Fig. 10-33. The relation between the file descriptor table, the openfile description table, and the i-node table.

36 / 78

Page 37: Operating Systems Slides 7 - File Systems

Why File Table?

.Where To Put File Position Info?..

......

Inode table? No. Multiple processes can open the same file.Each one has its own file position.

User file descriptor table? No. Trouble in file sharing.

.Example..

......

#!/bin/bash

echo hello

echo world

Where should the “world” be?

∼$ ./hello.sh > A

37 / 78

Page 38: Operating Systems Slides 7 - File Systems

Implementing Directories

(a)

games

mail

news

work

attributes

attributes

attributes

attributes

Data structurecontaining theattributes

(b)

games

mail

news

work

Fig. 6-16. (a) A simple directory containing fixed-size entries withthe disk addresses and attributes in the directory entry. (b) A direc-tory in which each entry just refers to an i-node.

(a) A simple directory (Windows)▶ fixed size entries▶ disk addresses and attributes in directory entry

(b) Directory in which each entry just refers to an i-node(UNIX)

38 / 78

Page 39: Operating Systems Slides 7 - File Systems

How Long A File Name Can Be?

File 1 entry length

File 1 attributes

Pointer to file 1's name

File 1 attributes

Pointer to file 2's name

File 2 attributes

Pointer to file 3's nameFile 2 entry length

File 2 attributes

File 3 entry length

File 3 attributes

p

e

b

e

r

c

u

t

o

t

d

j

-

g

p

e

b

e

r

c

u

t

o

t

d

j

-

g

p

e r s o

n n e l

f o o

p

o

l

e

n

r

n

f o o

s

e

Entry

for one

file

Heap

Entry

for one

file

(a) (b)

File 3 attributes

39 / 78

Page 40: Operating Systems Slides 7 - File Systems

UNIX Treats a Directory as a File

Directory inode (128B)

Type Mode

User ID Group ID

File size # blocks

# links Flags

Timestamps (×3)

Triple indirect

Double indirect

Single indirect

Direct blocks (×12)

.

..

passwd

fstab

… …

Directory block

File inode (128B)

Type Mode

User ID Group ID

File size # blocks

# links Flags

Timestamps (×3)

Triple indirect

Double indirect

Single indirect

Direct blocks (×12)

Indirect block

inode #

inode #

inode #

inode #

Direct blocks (×512)

Block #s ofmoredirectoryblocks

Block # ofblock with512 singleindirectentries

Block # ofblock with512 doubleindirectentries

File data block

Data

.Example..

......

. 2

.. 2bin 11116545boot 2cdrom 12dev 3...

...

40 / 78

Page 41: Operating Systems Slides 7 - File Systems

.The steps in looking up /usr/ast/mbox..

......

Root directoryI-node 6 is for /usr

Block 132 is /usr

directory

I-node 26 is for

/usr/ast

Block 406 is /usr/ast directory

Looking up usr yields i-node 6

I-node 6 says that /usr is in

block 132

/usr/ast is i-node

26

/usr/ast/mbox is i-node

60

I-node 26 says that

/usr/ast is in block 406

1

1

4

7

14

9

6

8

.

..

bin

dev

lib

etc

usr

tmp

6

1

19

30

51

26

45

dick

erik

jim

ast

bal

26

6

64

92

60

81

17

grants

books

mbox

minix

src

Mode size

times

132

Mode size

times

406

41 / 78

Page 42: Operating Systems Slides 7 - File Systems

File Sharing— Multiple Users

User IDs identify users, allowing permissions andprotections to be per-user

Group IDs allow users to be in groups, permitting groupaccess rights

.Example: 9-bit pattern..

......

owner access 7⇒ rwx1 1 1

group access 5⇒ r−x1 0 1

public access 0⇒ −−−0 0 0

42 / 78

Page 43: Operating Systems Slides 7 - File Systems

File Sharing— Remote File Systems

Uses networking to allow file system access betweensystems

▶ Manually via programs like FTP▶ Automatically, seamlessly using distributed file systems▶ Semi automatically, via the world wide web

Client-server model allows clients to mount remote filesystems from servers

▶ NFS — standard UNIX client-server file sharing protocol▶ CIFS — standard Windows protocol▶ Standard system calls are translated into remote calls

Distributed Information Systems (distributed namingservices)

▶ such as LDAP, DNS, NIS, Active Directory implementunified access to information needed for remotecomputing

43 / 78

Page 44: Operating Systems Slides 7 - File Systems

File Sharing— Protection

▶ File owner/creator should be able to control:▶ what can be done▶ by whom

▶ Types of access▶ Read▶ Write▶ Execute▶ Append▶ Delete▶ List

44 / 78

Page 45: Operating Systems Slides 7 - File Systems

Shared Files— Hard Links vs. Soft Links

Root directory

B

B B C

C C

CA

B C

B

? C C C

A

Shared file

Fig. 6-18. File system containing a shared file. 45 / 78

Page 46: Operating Systems Slides 7 - File Systems

.Hard Links..

......

Hard links + the same inode

46 / 78

Page 47: Operating Systems Slides 7 - File Systems

.Drawback..

......

C's directory B's directory B's directoryC's directory

Owner = C Count = 1

Owner = C Count = 2

Owner = C Count = 1

(a) (b) (c)

47 / 78

Page 48: Operating Systems Slides 7 - File Systems

.Symbolic Links..

......

A symbolic link has its own inode + a directory entry.

48 / 78

Page 49: Operating Systems Slides 7 - File Systems

Disk Space Management— Statistics

49 / 78

Page 50: Operating Systems Slides 7 - File Systems

▶ Block size is chosen while creating the FS▶ Disk I/O performance is conflict with space utilization

▶ smaller block size ⇒ better space utilization▶ larger block size ⇒ better disk I/O performance

∼$ dumpe2fs /dev/sda1 | grep ”Block size”

50 / 78

Page 51: Operating Systems Slides 7 - File Systems

Keeping Track of Free Blocks

1. Linked List10.5 Free-Space Management 443

0 1 2 3

4 5 7

8 9 10 11

12 13 14

16 17 18 19

20 21 22 23

24 25 26 27

28 29 30 31

15

6

free-space list head

Figure 10.10 Linked free-space list on disk.

of a large number of free blocks can now be found quickly, unlike the situationwhen the standard linked-list approach is used.

10.5.4 Counting

Another approach takes advantage of the fact that, generally, several contigu-ous blocks may be allocated or freed simultaneously, particularly when space isallocated with the contiguous-allocation algorithm or through clustering. Thus,rather than keeping a list of n free disk addresses, we can keep the address ofthe first free block and the number (n) of free contiguous blocks that follow thefirst block. Each entry in the free-space list then consists of a disk address anda count. Although each entry requires more space than would a simple diskaddress, the overall list is shorter, as long as the count is generally greater than1. Note that this method of tracking free space is similar to the extent methodof allocating blocks. These entries can be stored in a B-tree, rather than a linkedlist, for efficient lookup, insertion, and deletion.

10.5.5 Space Maps

Sun’s ZFS file system was designed to encompass huge numbers of files,directories, and even file systems (in ZFS, we can create file-system hierarchies).The resulting data structures could have been large and inefficient if they hadnot been designed and implemented properly. On these scales, metadata I/Ocan have a large performance impact. Consider, for example, that if the free-space list is implemented as a bit map, bit maps must be modified both whenblocks are allocated and when they are freed. Freeing 1 GB of data on a 1-TBdisk could cause thousands of blocks of bit maps to be updated, because thosedata blocks could be scattered over the entire disk.

2. Bit map (n blocks)

0 1 2 3 4 5 6 7 8 .. n-1

+-+-+-+-+-+-+-+-+-+-//-+-+

|0|0|1|0|1|1|1|0|1| .. |0|

+-+-+-+-+-+-+-+-+-+-//-+-+

bit[i] ={0⇒ block[i] is free1⇒ block[i] is occupied

51 / 78

Page 52: Operating Systems Slides 7 - File Systems

Journaling File Systems

.Operations required to remove a file in UNIX:..

......

1. Remove the file from its directory- set inode number to 0

2. Release the i-node to the pool of free i-nodes- clear the bit in inode bitmap

3. Return all the disk blocks to the pool of free disk blocks- clear the bits in block bitmap

What if crash occurs between 1 and 2, or between 2 and 3?

52 / 78

Page 53: Operating Systems Slides 7 - File Systems

Journaling File Systems

.Keep a log of what the file system is going to dobefore it does it..

......

▶ so that if the system crashes before it can do its plannedwork, upon rebooting the system can look in the log tosee what was going on at the time of the crash andfinish the job.

▶ NTFS, EXT3, and ReiserFS use journaling among others

53 / 78

Page 54: Operating Systems Slides 7 - File Systems

Ext2 File System

.Physical Layout..

......

+------------+---------------+---------------+--//--+---------------+

| Boot Block | Block Group 0 | Block Group 1 | | Block Group n |

+------------+---------------+---------------+--//--+---------------+

__________________________/ \_____________

/ \

+-------+-------------+------------+--------+-------+--------+

| Super | Group | Data Block | inode | inode | Data |

| Block | Descriptors | Bitmap | Bitmap | Table | Blocks |

+-------+-------------+------------+--------+-------+--------+

1 blk n blks 1 blk 1 blk n blks n blks

54 / 78

Page 55: Operating Systems Slides 7 - File Systems

Ext2 Block groups

.The partition is divided into Block Groups..

......

▶ Block groups are same size — easy locating▶ Kernel tries to keep a file’s data blocks in the same

block group — reduce fragmentation▶ Backup critical info in each block group▶ The Ext2 inodes for each block group are kept in the

inode table▶ The inode-bitmap keeps track of allocated and

unallocated inodes

55 / 78

Page 56: Operating Systems Slides 7 - File Systems

.Group descriptor..

......

▶ Each block group has a group descriptor▶ All the group descriptors together make the groupdescriptor table

▶ The table is stored along with the superblock▶ Block Bitmap: tracks free blocks▶ Inode Bitmap: tracks free inodes▶ Inode Table: all inodes in this block group▶ Free blocks count, Free Inodes count, Used directory

count: counters▶ see more: ∼# dumpe2fs /dev/sda1

56 / 78

Page 57: Operating Systems Slides 7 - File Systems

Ext2 Block Allocation Policies626 Chapter 15 The Linux System

allocating scattered free blocks

allocating continuous free blocks

block in use bit boundaryblock selectedby allocator

free block byte boundarybitmap search

Figure 15.9 ext2fs block-allocation policies.

these extra blocks to the file. This preallocation helps to reduce fragmentationduring interleaved writes to separate files and also reduces the CPU cost ofdisk allocation by allocating multiple blocks simultaneously. The preallocatedblocks are returned to the free-space bitmap when the file is closed.

Figure 15.9 illustrates the allocation policies. Each row represents asequence of set and unset bits in an allocation bitmap, indicating used andfree blocks on disk. In the first case, if we can find any free blocks sufficientlynear the start of the search, then we allocate them no matter how fragmentedthey may be. The fragmentation is partially compensated for by the fact thatthe blocks are close together and can probably all be read without any diskseeks, and allocating them all to one file is better in the long run than allocatingisolated blocks to separate files once large free areas become scarce on disk. Inthe second case, we have not immediately found a free block close by, so wesearch forward for an entire free byte in the bitmap. If we allocated that byteas a whole, we would end up creating a fragmented area of free space betweenit and the allocation preceding it, so before allocating we back up to make thisallocation flush with the allocation preceding it, and then we allocate forwardto satisfy the default allocation of eight blocks.

15.7.3 Journaling

One popular feature in a file system is journaling, whereby modificationsto the file system are sequentially written to a journal. A set of operationsthat performs a specific task is a transaction. Once a transaction is written tothe journal, it is considered to be committed, and the system call modifyingthe file system (write()) can return to the user process, allowing it tocontinue execution. Meanwhile, the journal entries relating to the transactionare replayed across the actual file-system structures. As the changes are made, a

57 / 78

Page 58: Operating Systems Slides 7 - File Systems

Maths

Given block size = 4kblock bitmap = 1 blk , then

blocks per group = 8bits× 4k = 32k

How large is a group?

group size = 32k× 4k = 128MB

How many block groups are there?

≈ partition sizegroup size =

partition size128M

How many files can I have in max?

≈ partition sizeblock size =

partition size4k

58 / 78

Page 59: Operating Systems Slides 7 - File Systems

Ext2 inode

59 / 78

Page 60: Operating Systems Slides 7 - File Systems

.Ext2 inode..

......

Mode: holds two pieces of information1. Is it a

{file|dir|sym-link|blk-dev|char-dev|FIFO}?2. Permissions

Owner info: Owners’ ID of this file or directorySize: The size of the file in bytes

Timestamps: Accessed, created, last modified timeDatablocks: 15 pointers to data blocks (12 + S+D+ T)

60 / 78

Page 61: Operating Systems Slides 7 - File Systems

.Max File Size..

......

Given: {block size = 4kpointer size = 4B

,

We get:

Max File Size = number of pointers× block size

= (

number of pointers︷ ︸︸ ︷12︸︷︷︸

direct

+ 1k︸︷︷︸1−indirect

+ 1k× 1k︸ ︷︷ ︸2−indirect

+1k× 1k× 1k︸ ︷︷ ︸3−indirect

)× 4k

= 48k+ 4M+ 4G+ 4T

61 / 78

Page 62: Operating Systems Slides 7 - File Systems

Ext2 Superblock

▶ Magic Number: 0xEF53▶ Revision Level: determines what new features are

available▶ Mount Count and Maximum Mount Count: determines if

the system should be fully checked▶ Block Group Number: indicates the block group holding

this superblock▶ Block Size: usually 4k▶ Blocks per Group: 8bits× block size▶ Free Blocks: System-wide free blocks▶ Free Inodes: System-wide free inodes▶ First Inode: First inode number in the file system▶ see more: ∼# dumpe2fs /dev/sda1

62 / 78

Page 63: Operating Systems Slides 7 - File Systems

Ext2 File Types

File type Description0 Unknown1 Regular file2 Directory3 Character device4 Block device5 Named pipe6 Socket7 Symbolic link

Device file, pipe, and socket: No data blocks are required.All info is stored in the inode

Fast symbolic link: Short path name (< 60 chars) needs nodata block. Can be stored in the 15 pointerfields

63 / 78

Page 64: Operating Systems Slides 7 - File Systems

Ext2 Directories0 11|12 23|24 39|40

+----+--+-+-+----+----+--+-+-+----+----+--+-+-+----+----+--//--+

| 21 |12|1|2|. | 22 |12|2|2|.. | 53 |16|5|2|hell|o | |

+----+--+-+-+----+----+--+-+-+----+----+--+-+-+----+----+--//--+

,--------> inode number

| ,---> record length

| | ,---> name length

| | | ,---> file type

| | | | ,----> name

+----+--+-+-+----+

0 | 21 |12|1|2|. |

+----+--+-+-+----+

12| 22 |12|2|2|.. |

+----+--+-+-+----+----+

24| 53 |16|5|2|hell|o |

+----+--+-+-+----+----+

40| 67 |28|3|2|usr |

+----+--+-+-+----+----+

52| 0 |16|7|1|oldf|ile |

+----+--+-+-+----+----+

68| 34 |12|4|2|sbin|

+----+--+-+-+----+

▶ Directories are special files▶ “.” and “..” first▶ Padding to 4×▶ inode number is 0 — deleted

file

64 / 78

Page 65: Operating Systems Slides 7 - File Systems

Many different FS are in use

.Windows........

uses drive letter (C:, D:, ...) to identify each FS

.UNIX..

......

integrates multiple FS into a single structure▶ From user’s view, there is only one FS hierarchy

∼$ man fs

65 / 78

Page 66: Operating Systems Slides 7 - File Systems

Virtural File Systems.Put common parts of all FS in a separate layer..

......

▶ It’s a layer in the kernel▶ It’s a common interface to several kinds of file systems▶ It calls the underlying concrete FS to actual manage the

data

User process

FS 1 FS 2 FS 3

Buffer cache

Virtual file system

File system

VFS interface

POSIX

66 / 78

Page 67: Operating Systems Slides 7 - File Systems

67 / 78

Page 68: Operating Systems Slides 7 - File Systems

.Virtual File System..

......

▶ Manages kernel level file abstractions in one format forall file systems

▶ Receives system call requests from user level (e.g.write, open, stat, link)

▶ Interacts with a specific file system based on mountpoint traversal

▶ Receives requests from other parts of the kernel, mostlyfrom memory management

.Real File Systems..

......

▶ managing file & directory data▶ managing meta-data: timestamps, owners, protection,

etc.▶ disk data, NFS data... translate←−−−−−−−−−→ VFS data

68 / 78

Page 69: Operating Systems Slides 7 - File Systems

File System Mounting

/

a b a

c

p q r q q r

d

/

c d

b

Diskette

/

Hard diskHard disk

x y z

x y z

Fig. 10-26. (a) Separate file systems. (b) After mounting.

69 / 78

Page 70: Operating Systems Slides 7 - File Systems

A FS must be mounted before it can be used.Mount — The file system is registered with theVFS..

......

▶ The superblock is read into the VFS superblock▶ The table of addresses of functions the VFS requires is

read into the VFS superblock▶ The FS’ topology info is mapped onto the VFS

superblock data structure

.The VFS keeps a list of the mounted file systemstogether with their superblocks..

......

The VFS superblock contains:▶ Device, blocksize▶ Pointer to the root inode▶ Pointer to a set of superblock routines▶ Pointer to file_system_type data structure▶ more...

70 / 78

Page 71: Operating Systems Slides 7 - File Systems

V-node

▶ Every file/directory in the VFS has a VFS inode, kept inthe VFS inode cache

▶ The real FS builds the VFS inode from its own info.Like the EXT2 inodes, the VFS inodes describe..

......

▶ files and directories within the system▶ the contents and topology of the Virtual File System

71 / 78

Page 72: Operating Systems Slides 7 - File Systems

VFS Operation

...

Process table

0

File descriptors

...

V-nodes

openreadwrite

Function pointers

...2

4

VFS

Read function

FS 1

Call from VFS into FS 1

72 / 78

Page 73: Operating Systems Slides 7 - File Systems

Linux VFS

.The Common File Model..

......

All other filesystems must map their own concepts into thecommon file model

For example, FAT filesystems do not have inodes.

▶ The main components of the common file model are- superblock – information about mounted filesystem- inode – information about a specific file- file – information about an open file- dentry – information about directory entry

▶ Geared toward Unix FS

73 / 78

Page 74: Operating Systems Slides 7 - File Systems

.The Superblock Object..

......

▶ is implemented by each FS and is used to storeinformation describing that specific FS

▶ usually corresponds to the filesystem superblock or thefilesystem control block

▶ Filesystems that are not disk-based (such as sysfs, proc)generate the superblock on-the-fly and store it inmemory

▶ struct super_block in <linux/fs.h>▶ s_op in struct super_block + struct super_operations —

the superblock operations table▶ Each item in this table is a pointer to a function that

operates on a superblock object

74 / 78

Page 75: Operating Systems Slides 7 - File Systems

.The Inode Object..

......

▶ For Unix-style filesystems, this information is simplyread from the on-disk inode

▶ For others, the inode object is constructed in memory inwhatever manner is applicable to the filesystem

▶ struct inode in <linux/fs.h>▶ An inode represents each file on a FS, but the inodeobject is constructed in memory only as files areaccessed

▶ includes special files, such as device files or pipes▶ i_op + struct inode_operations

75 / 78

Page 76: Operating Systems Slides 7 - File Systems

.The Dentry Object..

......

▶ components in a path▶ makes path name lookup easier▶ struct dentry in <linux/dcache.h>▶ created on-the-fly from a string representation of a path

name

76 / 78

Page 77: Operating Systems Slides 7 - File Systems

.Dentry State..

......

▶ used▶ unused▶ negative

.Dentry Cache..

......

consists of three parts:1. Lists of “used” dentries2. A doubly linked “least recently used” list of unused and

negative dentry objects3. A hash table and hashing function used to quickly

resolve a given path into the associated dentry object

77 / 78

Page 78: Operating Systems Slides 7 - File Systems

.The File Object..

......

▶ is the in-memory representation of an open file▶ open() ⇒ create; close() ⇒ destroy▶ there can be multiple file objects in existence for the

same file▶ Because multiple processes can open and manipulate a

file at the same time▶ struct file in <linux/fs.h>

78 / 78