File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory...

24
File Systems and Disk Layout File Systems and Disk Layout

Transcript of File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory...

Page 1: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

File Systems and Disk LayoutFile Systems and Disk Layout

Page 2: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

I/O: The Big PictureI/O: The Big Picture

I/O Bus

Memory Bus

Processor

Cache

MainMemory

DiskController

Disk Disk

GraphicsController

NetworkInterface

Graphics Network

interrupts

I/O Bridge

Page 3: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Rotational MediaRotational Media

SectorTrack

Cylinder

HeadPlatter

Arm

Access time = seek time + rotational delay + transfer time

seek time = 5-15 milliseconds to move the disk arm and settle on a cylinderrotational delay = 8 milliseconds for full rotation at 7200 RPM: average delay = 4 mstransfer time = 1 millisecond for an 8KB block at 8 MB/s

Bandwidth utilization is less than 50% for any noncontiguous access at a block grain.

Page 4: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Disks and DriversDisks and Drivers

Disk hardware and driver software provide basic facilities for nonvolatile secondary storage (block devices).

1. OS views the block devices as a collection of volumes.A logical volume may be a partition of a single disk or a

concatenation of multiple physical disks (e.g., RAID).

2. OS accesses each volume as an array of fixed-size sectors.Identify sector (or block) by unique (volumeID, sector

ID).

Read/write operations DMA data to/from physical memory.

3. Device interrupts OS on I/O completion.ISR wakes up process, updates internal records, etc.

Page 5: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Using Disk StorageUsing Disk Storage

Typical operating systems use disks in three different ways:

1. System calls allow user programs to access a “raw” disk. Unix: special device file identifies volume directly.

Any process that can open the device file can read or write any specific sector in the disk volume.

2. OS uses disk as backing storage for virtual memory.OS manages volume transparently as an “overflow area”

for VM contents that do not “fit” in physical memory.

3. OS provides syscalls to create/access files residing on disk.OS file system modules virtualize physical disk storage

as a collection of logical files.

Page 6: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Unix File SyscallsUnix File Syscalls

int fd; /* file descriptor */fd = open(“/bin/sh”, O_RDONLY, 0);fd = creat(“/tmp/zot”, 0777);unlink(“/tmp/zot”);

char data[bufsize];bytes = read(fd, data, count);bytes = write(fd, data, count);lseek(fd, 50, SEEK_SET);

mkdir(“/tmp/dir”, 0777);rmdir(“/tmp/dir”);

process file descriptor table

system openfile table

/

etc tmpbin

Page 7: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Nachos File Syscalls/OperationsNachos File Syscalls/Operations

Create(“zot”);

OpenFileId fd;fd = Open(“zot”);Close(fd);

char data[bufsize];Write(data, count, fd);Read(data, count, fd);

Limitations:1. small, fixed-size files and directories2. single disk with a single directory3. stream files only: no seek syscall4. file size is specified at creation time5. no access control, etc.

BitMap

FileSystem

Directory

FileSystem class internal methods:Create(name, size)OpenFile = Open(name)Remove(name)List()

A single 10-entry directory storesnames and disk locations for allcurrently existing files.

Bitmap indicates whether each disk block is in-use or free.

FileSystem data structures resideon-disk, but file system code alwaysoperates on a cached copy in memory(read/modify/write).

Page 8: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Preview of Issues for File SystemsPreview of Issues for File Systems

1. Buffering disk data for access from the processor.block I/O (DMA) must use aligned, physically resident

buffers

block update is a read-modify-write

2. Creating/representing/destroying independent files.disk block allocation, file block map structures

directories and symbolic naming

3. Masking the high seek/rotational latency of disk access.smart block allocation on disk

block caching, read-ahead (prefetching), and write-behind

4. Reliability and the handling of updates.

Page 9: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Representing a File On-Disk in NachosRepresenting a File On-Disk in Nachos

FileHdr

Allocate(...,filesize)length = FileLength()sector = ByteToSector(offset)

A file header describes an on-diskfile as an ordered sequence ofsectors with a length, mapped bya logical-to-physical block map.

OpenFile(sector)Seek(offset)Read(char* data, bytes)Write(char* data, bytes)

OpenFile An OpenFile represents a file inactive use, with a seek pointer and read/write primitives for arbitrarybyte ranges.

once upon a time/nin a l

and far far away,/nlived t

he wise and sagewizard.

logicalblock 0

logicalblock 1

logicalblock 2

OpenFile* ofd = filesys->Open(“tale”);ofd->Read(data, 10) gives ‘once upon ‘ofd->Read(data, 10) gives ‘a time/nin ‘

bytessectors

Page 10: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

File MetadataFile Metadata

On disk, each file is represented by a FileHdr structure.

The FileHdr object is an in-memory copy of this structure.

bytessectors

etc.

file attributes: may include owner, access control, time of create/modify/access, etc.

logical-physical block map (like a translation table)

physical block pointers in the block map are sector IDs

FileHdr* hdr = new FileHdr();hdr->FetchFrom(sector)hdr->WriteBack(sector)

The FileHdr is a file system “bookeeping” structurethat supplements the file data itself: these kinds ofstructures are called filesystem metadata.

A Nachos FileHdr occupiesexactly one disk sector.

To operate on the file (e.g.,to open it), the FileHdr mustbe read into memory.

Any changes to the attributesor block map must be writtenback to the disk to make thempermanent.

Page 11: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Representing Large FilesRepresenting Large Files

The Nachos FileHdr occupies exactly onedisk sector, limiting the maximum file size.

inode

directblockmap

(12 entries)

indirectblock

doubleindirectblock

sector size = 128 bytes120 bytes of block map = 30 entrieseach entry maps a 128-byte sectormax file size = 3840 bytes

In Unix, the FileHdr (called an index-node or inode) represents large files using a hierarchical block map.

Each file system block is a clump of sectors (4KB, 8KB, 16KB).Inodes are 128 bytes, packed into blocks.Each inode has 68 bytes of attributes and 15 block map entries.

suppose block size = 8KB12 direct block map entries in the inode can map 96KB of data.One indirect block (referenced by the inode) can map 16MB of data.One double indirect block pointer in inode maps 2K indirect blocks.

maximum file size is 96KB + 16MB + (2K*16MB) + ...

Page 12: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Representing Small FilesRepresenting Small Files

Internal fragmentation in the file system blocks can waste significant space for small files.

E.g., 1KB files waste 87% of disk space (and bandwidth) in a naive file system with an 8KB block size.

Most files are small: one study [Irlam93] shows a median of 22KB.

FFS solution: optimize small files for space efficiency.

• Subdivide blocks into 2/4/8 fragments (or just frags).

• Free block maps contain one bit for each fragment.

To determine if a block is free, examine bits for all its fragments.

• The last block of a small file is stored on fragment(s).

If multiple fragments they must be contiguous.

CPS 210

Page 13: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Basics of DirectoriesBasics of Directories

0

rain: 32

hail: 48

0

wind: 18

snow: 62

directoryfileHdr

A directory is a set of file names, supporting lookup by symbolic name.

In Nachos, each directory is a file containinga set of mappings from name->FileHdr.

sector 32

Directory(entries)sector = Find(name)Add(name, sector)Remove(name)

Each directory entry is a fixed-sizeslot with space for a FileNameMaxLen byte name.

Entries or slots are found by a linear scan.

A directory entry may hold a pointer to another directory,forming a hierarchical name space.

Page 14: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

A Nachos Filesystem On DiskA Nachos Filesystem On Disk

111000100010110110111101

100110100011000100010101

001011100001100101000100

sector 0

allocationbitmap file

0

rain: 32

hail: 48

0

wind: 18

snow: 62

once upon a time/n in a l

and far far away, lived th

sector 1

directoryfile

Every box in this diagramrepresents a disk sector.

An allocation bitmap file maintainsfree/allocated state of each physicalblock; its FileHdr is always stored insector 0.

A directory maintains thename->FileHdr mappings forall existing files; its FileHdr isalways stored in sector 1.

Page 15: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Unix File Naming (Hard Links)Unix File Naming (Hard Links)

0

rain: 32

hail: 48

0

wind: 18

sleet: 48

inode 48

inode link count = 2

directory A directory B

A Unix file may have multiple names.

link system calllink (existing name, new name)create a new name for an existing fileincrement inode link count

unlink system call (“remove”)unlink(name)destroy directory entrydecrement inode link countif count = 0 and file is not in active usefree blocks (recursively) and on-disk inode

Each directory entry naming thefile is called a hard link.

Each inode contains a reference countshowing how many hard links name it.

Page 16: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Unix Symbolic (Soft) LinksUnix Symbolic (Soft) Links

Unix files may also be named by symbolic (soft) links.

• A soft link is a file containing a pathname of some other file.

0

rain: 32

hail: 48

inode 48

inode link count = 1

directory A

0

wind: 18

sleet: 67

directory B

../A/hail/0

inode 67

symlink system callsymlink (existing name, new name)allocate a new file (inode) with type symlinkinitialize file contents with existing namecreate directory entry for new file with new name

The target of the link may beremoved at any time, leavinga dangling reference.

How should the kernel handle recursive soft links?

Page 17: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

The Problem of Disk LayoutThe Problem of Disk Layout

The level of indirection in the file block maps allows flexibility in file layout.

“File system design is 99% block allocation.” [McVoy]

Competing goals for block allocation:

• allocation cost

• bandwidth for high-volume transfers

• stamina

• efficient directory operations

Goal: reduce disk arm movement and seek overhead.metric of merit: bandwidth utilization

Page 18: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

FFS and LFSFFS and LFS

We will study two different approaches to block allocation:• Cylinder groups in the Fast File System (FFS) [McKusick81]

clustering enhancements [McVoy91], and improved cluster allocation [McKusick: Smith/Seltzer96]

FFS can also be extended with metadata logging [e.g., Episode]

• Log-Structured File System (LFS)

proposed in [Douglis/Ousterhout90]

implemented/studied in [Rosenblum91]

BSD port, sort of maybe: [Seltzer93]

extended with self-tuning methods [Neefe/Anderson97]

• Other approach: extent-based file systems

CPS 210

Page 19: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

FFS Cylinder GroupsFFS Cylinder Groups

FFS defines cylinder groups as the unit of disk locality, and it factors locality into allocation choices.

• typical: thousands of cylinders, dozens of groups

• Strategy: place “related” data blocks in the same cylinder group whenever possible.

seek latency is proportional to seek distance

• Smear large files across groups:

Place a run of contiguous blocks in each group.

• Reserve inode blocks in each cylinder group.

This allows inodes to be allocated close to their directory entries and close to their data blocks (for small files).

CPS 210

Page 20: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Sequential File WriteSequential File Write

physicaldisk

sector

time in milliseconds

writewrite stallread

sync command(typed to shell)pushes indirectblocks to disk

read nextblock of

free spacebitmap (??)

note sequential block allocation

sync

Page 21: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Sequential Writes: A Closer LookSequential Writes: A Closer Look

writewrite stall

140 msdelay for

cylinder seeketc. (???)

longer delayfor head movement

to push indirectblocks

16 MB in one second(one indirect block worth)

time in milliseconds

physicaldisk

sector

Page 22: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Small-File Create StormSmall-File Create Storm

writewrite stall

time in milliseconds

physicaldisk

sector

sync

sync

syncinodes andfile contents

(localized allocation)

delayed-writemetadata

note synchronouswrites for some

metadata

50 MB

Page 23: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Small-File Create: A Closer LookSmall-File Create: A Closer Look

time in milliseconds

physicaldisk

sector

Page 24: File Systems and Disk Layout. I/O: The Big Picture I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface.

Alternative Structure: DOS FATAlternative Structure: DOS FAT

EOF

13

2

9

8

FREE

4

12

3

FREEEOF

EOF

FREE

snow: 6

rain: 5

hail: 10

Disk Blocks

FAT

0

directory

rootdirectory