U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Software Systems File Systems...
-
Upload
gabriel-blankenship -
Category
Documents
-
view
212 -
download
0
Transcript of U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Software Systems File Systems...
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Software SystemsFile Systems and Storage
Emery Berger and Mark CornerUniversity of Massachusetts
Amherst
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 2
Files Associate names with data Usually stored on persistent media (disks)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 3
File Names Hierarchical directory structure
– Absolute, relative to current Windows names = location + dir
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
File Systems Organized set of data types
– organize data– point to where data is stored– searchable database of files
LOTS of file systems– AFS, BFS, CFS, DFS, EFS, FFS, GFS, HFS, etc.– Distributed, local, encrypted, different OSs
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 5
Directories Directory – just special file
– Contains metadata, filenames • pointers to inodes
Typically hierarchical tree– odd exposure of data structure to user
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Blocks Storage organized as a sequence of blocks
– Unit or reading and writing– Read, modify, write sequence
File system tracks free and full blocks– typically stored in a bitmap
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Inodes On disk data structure
– Describes where all the bits of a file (dir) are
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Storage Lots of forms of permanent storage
– Disk drives, flash storage, Tape, CDs, DVDs
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Storage Disks
– Seek latency & rotational latency– High bandwidth– One of two moving parts in a PC
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Storage Flash memory
– Predictable & low latency (including random)– Lower bandwidth– Larger erase blocks, wears out, energy
Prediction: all PC storage Flash-based in 10 years
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Locality File systems use directory structure to
improve locality– More important for disks than Flash– E.g., ext2 – all files in same directory
clustered in same region of disk– Try to make all blocks of same file sequential– Move directories apart for expansion
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Caching Disk blocks, inodes, directories all cached 1/3 to 1/2 of memory is disk cache Disk drive has a cache too!
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 13
C:\Documents and Settings\Emery\Local Settings\Temporary Internet Files\Content.IE5>ls -ltratotal 1873-rwx------+ 1 Emery None 67 Jan 10 17:31 desktop.inidrwx------+ 2 Emery None 0 Jan 17 22:42 0NDWKTYTdrwx------+ 7 Emery None 0 Feb 19 19:53 .drwx------+ 7 Emery None 0 Apr 20 14:45 ..drwx------+ 2 Emery None 0 May 1 21:41 8HZD6WS6drwx------+ 2 Emery None 0 May 1 21:54 I4F15DOKdrwx------+ 2 Emery None 0 May 1 22:03 XM0N4Q4W-rwx------+ 1 Emery None 1916928 May 3 12:21 index.datdrwx------+ 2 Emery None 0 May 3 12:21 S0RKZRFZ
C:\Documents and Settings\Emery\Local Settings\Temporary Internet Files\Content.IE5>
Poor Man’s Database Because files & directories are easy to
use, they get used as de facto databases– e.g., Internet Explorer web cache
• ~ 1000 files in each hash subdirectory
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 14
File Systems Abstraction File system manages files
– Traditionally: file system maps files to disk But: files convenient abstraction
use same, easy interface (read, write)– Block devices (/dev/scsi0)
• Disk drives – transfer in blocks– Character devices (/dev/tty)
• Console, printer– Proc filesystem (/proc/mem)– FIFO (named pipes)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 15
elnux14> echo "foo" > /dev/ttyfoo
Device files Unix devices live in /dev,
act like ordinary files
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 16
elnux14> ls -l /proc/30917/total 0dr-xr-xr-x 2 emery fac 0 May 3 13:18 attr-r-------- 1 emery fac 0 May 3 13:18 auxv-r--r--r-- 1 emery fac 0 May 3 13:01 cmdlinelrwxrwxrwx 1 emery fac 0 May 3 13:18 cwd -> /nfs/elsrv4/users5/fac/emery-r-------- 1 emery fac 0 May 3 13:18 environlrwxrwxrwx 1 emery fac 0 May 3 13:18 exe -> /bin/tcshdr-x------ 2 emery fac 0 May 3 12:06 fd-rw-r--r-- 1 emery fac 0 May 3 13:18 loginuid-r-------- 1 emery fac 0 May 3 13:18 maps-rw------- 1 emery fac 0 May 3 13:18 mem-r--r--r-- 1 emery fac 0 May 3 13:18 mountslrwxrwxrwx 1 emery fac 0 May 3 13:18 root -> /-r--r--r-- 1 emery fac 0 May 3 13:01 stat-r--r--r-- 1 emery fac 0 May 3 13:18 statm-r--r--r-- 1 emery fac 0 May 3 13:01 statusdr-xr-xr-x 3 emery fac 0 May 3 13:18 task-r--r--r-- 1 emery fac 0 May 3 13:10 wchan
/proc filesystem Normal file access to kernel internals
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 17
File Metadata Files have a lot of associated “metadata”;
ex.: Unix (from stat)– Date created, last modified, last accessed– Size (bytes)– User & group ID of file’s owner– File type (not content type)
• Directory• Regular file• Block / character device (disk drive, screen)• FIFO
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 18
Untyped Files Unix, Windows – file contents untyped
– Stream of bytes– Type implied by convention (extensions)
• .ppt, .pdf, …
Mac: file types stored in metadata
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 19
Access Control Unix: each file has associated bits that
control access (& other stuff)– Read– Write– Execute
Can specify for three “users”– User (file owner)– Group (set of users)– Other (everyone else)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 20
elnux14> ls -l ack.scm-rw-r----- 1 emery fac 197 Feb 25 15:19 ack.scmelnux14> chmod -r ack.scmelnux14> ls -l ack.scm--w------- 1 emery fac 197 Feb 25 15:19 ack.scmelnux14> cat ack.scmcat: ack.scm: Permission denied
Access Control - chmod Can read bits via ls, set bits via chmod
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Access Control Lists (ACLs) ACLs are more expressive
– Specify different rights per user or group– Opinion: one of the biggest UNIX problems
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
What’s Wrong with One Disk?
22
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 23
Distributed File Systems Numerous drawbacks of local file systems
– Inconvenient– Administrative overhead– Single point-of-failure
Solution: distributed file systems– FS appears local, but data remote– Two major implementations:
• Windows (CIFS, SAMBA)• NFS (Sun’s Network File System)
Lots of manual DFSs (rsync, svn, USB keys)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 24
Complications Complexity and design tradeoffs
– Naming – absolute vs. relative (to server)– Remote access vs. caching– Stateless or stateful server– Single image or replication
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 25
Naming & Transparency Issues
– How are files named?– Do filenames reveal location?– Do filenames change if file moves?– Do filenames change if user moves?
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 26
Location naming Location transparency Use indirection!
– filename does not reveal storage location– Normal in Unix– Compare to Windows - C:\foo\bar
Name may still change– if storage location changes– transparent not independent!
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
What parts are transparent? Windows
– Local: //computer/share/…./directory/file• Remote files are explicit!
– Remote: .…/directory/file UNIX:
– Local: /…./mountpoint/directory/file• Remote files look like any other file
– Remote: /…./directory/file Neither reveals all of storage location
– Windows reveals machine, UNIX does not
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 28
NFS Example
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 29
URLs Viewed as File System Uniform Resource Locator names
increasingly standard way to access data
protocol://machine/path/to/file
Good? Bad? Looks like Windows… same?
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 30
File Caching Cache information from file server locally
Local disk:– Reduces access time (compared to remote)– Safe if node fails– Requires client to have disk (…)
Local memory:– Quick– Works without disks– Smaller cache size– Not fault-tolerant
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 31
Remote File Access & Caching Caching issues:
– Performance: • Where & when to cache file blocks?
– Correctness:• When to propagate updates back to remote file?• What happens with multiple clients sharing?
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Sharing with Others
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
When do changes get written? User A opens a file, changes a file
– When does it write it to file server?– If another user opens file does it see the changes?
Unix/one-copy semantics– Immediate
• keep in mind UI issues
Session semantics– After close
Transaction semantics– Defined by program– Uncommon in FS
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 34
How is client informed? Client-initiated consistency
– client contacts server and checks consistency– every access– at given intervals– only upon opening a file
Server-initiated consistency– server detects potential conflicts, invalidates caches– Server needs to know:
• which clients have cached which parts of which files, plus• which clients are readers & which are writers
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Conflicts Simplest kind: Read-Write Conflicts
– Two people read same thing:• “The cat is red”
– Both write:• “The cat is brown”, “The cat is purple”
– Which is right? Can this happen locally?
– Yes! Try it with an editor Worse with DFS, not obvious to user why
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
RAID, NAS, SAN Storage Redundant Array of Inexpensive Disks
– Multiple disks attached to controller– Disks each carry part of data
• Redundancy, error detection, parallel transfer Network Attached Storage
– Box w/network port and storage (ie. XRAID) Storage Area Network
– Specialized network of NAS (ie. XSAN)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
The “Near”-Future Parallel File Systems (pNFS, GFS) Separate meta-data and data Store data chunks on different machines
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 38
The End
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Atomic Updates Shadowing Logs
Explain!
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science 40
elnux14> mkfifo thePipeelnux14> ls -ld thePipeprw-r----- 1 emery fac 0 May 3 14:00 thePipeelnux14> cat simplesocket.h > thePipe &[1] 32242elnux14> wc -l < thePipe155[1] Done cat simplesocket.h > thePipeelnux14>
Named Pipes (FIFO) Special file: acts like unnamed pipe
– E.g., cat file | wc -l
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Named Pipes (FIFO) Special file: acts like unnamed pipe
E.g., cat file | wc -l
41
elnux14> mkfifo thePipeelnux14> ls -ld thePipeprw-r----- 1 emery fac 0 May 3 14:00 thePipeelnux14> cat simplesocket.h > thePipe &[1] 32242elnux14> wc -l < thePipe155[1] Done cat simplesocket.h > thePipeelnux14>
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Named Pipes (FIFO) Special file: acts like unnamed pipe
E.g., cat file | wc -l
42
elnux14> mkfifo thePipeelnux14> ls -ld thePipeprw-r----- 1 emery fac 0 May 3 14:00 thePipeelnux14> cat simplesocket.h > thePipe &[1] 32242elnux14> wc -l < thePipe155[1] Done cat simplesocket.h > thePipeelnux14>
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Named Pipes (FIFO) Special file: acts like unnamed pipe
E.g., cat file | wc –l
Useful when cannot do redirection Especially for compression
43
elnux14> mkfifo thePipeelnux14> ls -ld thePipeprw-r----- 1 emery fac 0 May 3 14:00 thePipeelnux14> cat simplesocket.h > thePipe &[1] 32242elnux14> wc -l < thePipe155[1] Done cat simplesocket.h > thePipeelnux14>
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Named Pipes (FIFO) Exercise:
Program named “joe” outputs file “joe.out” Huge (~ 3 GB)
Compress it automagically using gzip -c & named FIFO to “joe.out.gz”
44
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science
Named Pipes (FIFO) Exercise:
Program named “joe” outputs file “joe.out” Huge (~ 3 GB)
Compress it automagically using gzip -c & named FIFO to “joe.out.gz”
45
elnux14> mkfifo joe.outelnux14> gzip –c < joe.out > joe.out.gz &[1]elnux14> joe