Tolerating File-System Mistakes with EnvyFS

34
Tolerating File-System Mistakes with EnvyFS Lakshmi N. Bairavasundaram NetApp, Inc. Swaminathan Sundararaman Andrea C. Arpaci- Dusseau Remzi H. Arpaci- Dusseau University of Wisconsin Madison

description

Tolerating File-System Mistakes with EnvyFS. Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin Madison. Lakshmi N. Bairavasundaram NetApp, Inc. File Systems in Today’s World. Modern file systems are complex - PowerPoint PPT Presentation

Transcript of Tolerating File-System Mistakes with EnvyFS

Page 1: Tolerating File-System Mistakes  with EnvyFS

Tolerating File-System Mistakes with EnvyFS

Lakshmi N. BairavasundaramNetApp, Inc.

Swaminathan Sundararaman Andrea C. Arpaci-DusseauRemzi H. Arpaci-Dusseau

University of Wisconsin Madison

Page 2: Tolerating File-System Mistakes  with EnvyFS

File Systems in Today’s World

• Modern file systems are complex– Tens of thousands of lines of code (e.g., XFS 45K LOC)

• Storage stack is also getting deeper– Hypervisor, network, logical volume manager

• Need to handle a gamut of failures– Memory allocation, disk faults, bit flips, system crashes

• Preserve integrity of its meta-data and user data

04/21/23 2Tolerating File-System Mistakes with EnvyFS

Page 3: Tolerating File-System Mistakes  with EnvyFS

File System Bugs

• Bug reports for Linux 2.6 series from Bugzilla– ext3: 64, JFS: 17, ReiserFS: 38– Some are FS corruption causing permanent data loss

• FS bugs broadly classified into two categories– “fail-stop”: System immediately crashes

• Solutions: Nooks [Swift 04], CuriOS [David08]

– “fail-silent”: Accidentally corrupt on-disk state• Many such bugs uncovered [Prabhakaran05, Gunawi08, Yang04, Yang06b]

04/21/23 3Tolerating File-System Mistakes with EnvyFS

Page 4: Tolerating File-System Mistakes  with EnvyFS

04/21/23 4Tolerating File-System Mistakes with EnvyFS

Bugs are inevitable in file systems

Challenge: how to cope with them?

Page 5: Tolerating File-System Mistakes  with EnvyFS

• Based on N-version programming [Avizienis77]– NFS servers [Rodrigues01], databases [Vandiver07], security [Cox06]

N-Version File Systems

• EnvyFS: Simple software layer

– Store data in N child file systems– Operations performed on all children

• Rely on a simple software layer• Challenge: reducing overheads

while retaining reliability – SubSIST: Novel Single Instance

Store04/21/23 Tolerating File-System Mistakes with EnvyFS 5

EnvyFS layer

Child

1

Child

2

Child

N

Disk driver

Disk

SIS layer

Application

Page 6: Tolerating File-System Mistakes  with EnvyFS

Results• Robustness – Traditional file systems handle few corruptions (< 4%)– EnvyFS3 tolerates 98.9% of single file system mistakes

• Performance– Desktop workloads: EnvyFS3 has comparable performance– I/O intensive workloads:

• Normal mode: EnvyFS3 + SubSIST acceptable performance • Under memory pressure: EnvyFS3 + SubSIST large overheads

• Potential as a debugging tool for FS developers– Pinpoint the source of “fail-silent” bug in ext3

04/21/23 6Tolerating File-System Mistakes with EnvyFS

Page 7: Tolerating File-System Mistakes  with EnvyFS

Outline

• Introduction• Building reliable file systems• Reducing overheads with SubSIST • Evaluation• Conclusion

04/21/23 7Tolerating File-System Mistakes with EnvyFS

Page 8: Tolerating File-System Mistakes  with EnvyFS

N-Version Systems

Development process: 1. Producing the specification of software2. Implementing N versions of the software3. Creating N-version layer

— Executes different versions— Determines the consensus result

04/21/23 8Tolerating File-System Mistakes with EnvyFS

Page 9: Tolerating File-System Mistakes  with EnvyFS

1. Producing Specification• Our own specification ?– Impractical: Requires wide scale changes to file systems– Specifications take years to get accepted

• Can we leverage existing specification ?– Yes, can leverage VFS, but there are some issues

• VFS not precise for N-versioning purpose– Needs to handle cases where specification is not precise– e.g., Ordering directory entries, inode number allocation

04/21/23 Tolerating File-System Mistakes with EnvyFS 9

Page 10: Tolerating File-System Mistakes  with EnvyFS

Imprecise VFS Specification

Ordering directory entries

• Issue:– No specified return order– Can’t blindly compare entries

• Solution: – Read all entries from a directory

(dir: test in our case) from all FSes– Match entries from FSes – Return majority results

04/21/23 10Tolerating File-System Mistakes with EnvyFS

FS X

FS Y

FS Z

EnvyFS layer

…File 1File 2File 3

Dir: test

File 2File 3File 1

Dir: test Dir: test

File 1 File 2 File 3

Readdir: test No Entries

File 3File 1File 2

File 1File 2File 3

File 1File 2File 3

Dir: test

Page 11: Tolerating File-System Mistakes  with EnvyFS

Virt # FS 1 FS 3FS 2

??

File 1 | 36

Imprecise VFS Specification (cont)• Inode number allocation

– Inode numbers returned through system calls– Each child file system issues different inode numbers– Possible solution: Force file systems to use same algorithm?– Our solution: Issue inode numbers at EnvyFS layer

04/21/23 11Tolerating File-System Mistakes with EnvyFS

FS X

FS Y

FS Z

EnvyFS layer

Dir: test Dir: test Dir: test

File 1 | 10 File 1 |65

File 1 10File 2 15File 3 16

File 2 04File 3 44File 1 36

File 1 |

15 10 36 65

Inode Mapping Table

15Stat: File 1

File 3 99File 1 65File 2 43

Inode Numbers

Inode Mapping Table not persistently stored

Page 12: Tolerating File-System Mistakes  with EnvyFS

2. Implementing N versions of FS

• Painful process– High cost of development, long time delays

• Lucky! Hard work already done for us– 30 different disk based file systems in Linux 2.6

• Which file systems to use?– ext3, JFS, ReiserFS in a three-version FS – Others should work without modifications

04/21/23 12Tolerating File-System Mistakes with EnvyFS

Page 13: Tolerating File-System Mistakes  with EnvyFS

3. Creating N-Version Layer

04/21/23 13

• N-Version layer (EnvyFS)– Inserted beneath VFS– Simple design to avoid bugs

• Example: Reading a file– Allocate N data buffers– Read data block from the disk– Compare: data, return code, file position– Return: data, return code

• Issues:– Allocate memory for each read operation– Extra copy from allocated buffer to application– Comparison overheads

ComparatorsWrappersInode Mapping Table

Application

VFS layer

…ext3

JFS

Reis

erFS

EnvyFS Layer

Read (file, 1 block)

Read (file, 1 block)

Read (…) Read (…) Read (…)

F F Fpos: x pos: x pos: x

D D D

D D D

err = err = err =

Disk

Derr ,

Derr ,

Tolerating File-System Mistakes with EnvyFS

Page 14: Tolerating File-System Mistakes  with EnvyFS

Reading a File in EnvyFS• Solution:

– Same application buffer for all FS– TCP-like checksums for data comparison– Compare: checksums, return code, file

position– Read data until majority

04/21/23 14Tolerating File-System Mistakes with EnvyFS

ComparatorsWrappersInode Mapping Table

Application

VFS layer

…ext3

JFS

Reis

erFS

EnvyFS Layer

Read (file, 1 block)

Read (file, 1 block)

Read (…) Read (…)

F F Fpos: x pos: x

D D

D D D

err = err = err =

FS 1 # FS 2 # FS N #

435 435 … 436

ChecksumsDisk

Derr ,

Derr ,

Read (…) D

pos: x

Page 15: Tolerating File-System Mistakes  with EnvyFS

Outline

• Introduction• Building reliable file systems• Reducing overheads with SubSIST • Evaluation• Conclusion

04/21/23 15Tolerating File-System Mistakes with EnvyFS

Page 16: Tolerating File-System Mistakes  with EnvyFS

Part 1 Part 2 Part N……Disk 1 Disk 2 Disk N

Disk

Case for Single Instance Storage (SIS)

• Ideal: One disk per FS

• Practical: One disk for all FS

• Overheads– Effective storage space: 1/N– N times more I/O (Read/write)

• Challenge: Maintain diversity while minimizing overheads

04/21/23 16Tolerating File-System Mistakes with EnvyFS

EnvyFS layer

…FS 1

FS 2

FS N

Application

VFS layer

Disk Req. Queue

1

1 2 N

1 2 N

Page 17: Tolerating File-System Mistakes  with EnvyFS

SubSIST: Single Instance Store• Variant of an Single Instance Store

– Selectively merges data blocks

• Block addressable SIS– Exports virtual disks to FSes– Manages mapping, free space info.– Not persistently stored on disk

• EnvyFS writes through N file systems – N data blocks merged to 1 data block– Content hashes not stored persistently – Meta-data blocks not merged– Inter FS blocks and not intra FS

04/21/23 17Tolerating File-System Mistakes with EnvyFS

EnvyFS layer

…FS 1

FS 2

FS N

Application

VFS layer

Vdisk 1

Disk

Vdisk 2 Vdisk N

Read CacheCHash LayerFree Space Management

SubS

IST

DDM MMD

DDD

D

Page 18: Tolerating File-System Mistakes  with EnvyFS

FS 1

D

Disk

Handling Data Block Corruptions? Corruption to data in a single FS

– Due to bugs, bit flips, storage stack– Corrupt data blocks not merged– All other N-1 data blocks merged– Corrupt data block fixed at next read

× Corruption to data block inside disk

• Single copy of data – Different code paths – Different on-disk structures

04/21/23 18Tolerating File-System Mistakes with EnvyFS

EnvyFS layer

…FS 2

FS N

Application

VFS layer

Vdisk 1 Vdisk 2 Vdisk N

Read CacheCHash LayerFree Space Management

SubS

IST

DD

DD

DD

D DDD

D

Page 19: Tolerating File-System Mistakes  with EnvyFS

Outline

• Introduction• Building reliable file systems • Reducing overheads with SubSIST • Evaluation– Reliability– Performance

• Conclusion

04/21/23 19Tolerating File-System Mistakes with EnvyFS

Page 20: Tolerating File-System Mistakes  with EnvyFS

Robustness of EnvyFS in recovering from a child file system’s mistake?

DiskBB

B

EnvyFS layer

Block Driver

B

Reliability Evaluation: Fault Injection

• Corruption: bugs in FS / storage stack• Types of disk blocks

– superblock, inode, block bitmap, file data, …

• Perform different file ops– mount, stat, creat, unlink, read, …

• Report user visible results• All results are applicable with SubSIST

except corruption to data blocks04/21/23 Tolerating File-System Mistakes with

EnvyFS 20

ext 3 JFS

Reis

erFS

Pseudo Device Driver

VFS

B B

B

B

Type-aware fault injection [Prabhakaran05]

Page 21: Tolerating File-System Mistakes  with EnvyFS

ext3

path

trav

ersa

lSE

T-1

(sta

t, …

)SE

T-2

(chm

od)

read

read

link

getd

irent

ries

crea

tlin

km

kdir

rena

me

sym

link

writ

etr

unca

term

dir

unlin

km

ount

SET-

3 (fs

ync)

umou

nt

INODE

DIR

BMAP

IMAP

INDIRECT

DATA

SUPER

JSUPER

GDESC

Result Matrix

Normal

Data loss

N/A

Cannot mount

Ops fail

Data corrupt

Crash

Read-only

eDepends

E

04/21/23 21Tolerating File-System Mistakes with EnvyFS

Page 22: Tolerating File-System Mistakes  with EnvyFS

ext3

path

trav

ersa

lSE

T-1

(sta

t, …

)SE

T-2

(chm

od)

read

read

link

getd

irent

ries

crea

tlin

km

kdir

rena

me

sym

link

writ

etr

unca

term

dir

unlin

km

ount

SET-

3 (fs

ync)

umou

nt

INODE

DIR

BMAP

IMAP

INDIRECT

DATA

SUPER

JSUPER

GDESC

Data loss

N/A

Cannot mount

Ext3 stores many superblock copies;

but, does not handle superblock corruption

04/21/23 Tolerating File-System Mistakes with EnvyFS 22

E

Page 23: Tolerating File-System Mistakes  with EnvyFS

ext3

path

trav

ersa

lSE

T-1

(sta

t, …

)SE

T-2

(chm

od)

read

read

link

getd

irent

ries

crea

tlin

km

kdir

rena

me

sym

link

writ

etr

unca

term

dir

unlin

km

ount

SET-

3 (fs

ync)

umou

nt

INODE

DIR

BMAP

IMAP

INDIRECT

DATA

SUPER

JSUPER

GDESC

Data loss

N/A

Cannot mount

Ops fail

Crash

• In addition to operations failing, inode corruption leads to data loss

• Unlink: system crash during unmount

04/21/23 Tolerating File-System Mistakes with EnvyFS 23

E

Page 24: Tolerating File-System Mistakes  with EnvyFS

ext3

path

trav

ersa

lSE

T-1

(sta

t, …

)SE

T-2

(chm

od)

read

read

link

getd

irent

ries

crea

tlin

km

kdir

rena

me

sym

link

writ

etr

unca

term

dir

unlin

km

ount

SET-

3 (fs

ync)

umou

nt

INODE

DIR

BMAP

IMAP

INDIRECT

DATA

SUPER

JSUPER

GDESC

Normal

Data loss

N/A

Cannot mount

Ops fail

Data corrupt

Crash

Read-only

eDepends

E

04/21/23 24Tolerating File-System Mistakes with EnvyFS

Page 25: Tolerating File-System Mistakes  with EnvyFS

Kernel panic in

ext3pa

th tr

aver

sal

SET-

1 (s

tat,

…)

SET-

2 (c

hmod

)re

adre

adlin

kge

tdire

ntrie

scr

eat

link

mkd

irre

nam

esy

mlin

kw

rite

trun

cate

rmdi

run

link

mou

ntSE

T-3

(fsyn

c)um

ount

Normal

N/A

INODE

DIR

BMAP

IMAP

INDIRECT

DATA

SUPER

JSUPER

GDESC

EnvyFS3 works in every scenario04/21/23 Tolerating File-System Mistakes with

EnvyFS 25

EnvyFS3

E RJ

EnvyFS

Page 26: Tolerating File-System Mistakes  with EnvyFS

Potential for Bug Isolationext3 EnvyFS3

Tim

e

Unlink on corrupt inode:- ext3_lookup (bug)- ext3_unlink

Unmount (panic)

Tim

e

Unlink on corrupt inode:- ext3_lookup (bug)- ext3 inode does not match

others- Further ops not issued

In typical use, a problem is noticed only on panic

In EnvyFS3, a problem is noticed the first time child file system returns wrong results

04/21/23 Tolerating File-System Mistakes with EnvyFS 26

Page 27: Tolerating File-System Mistakes  with EnvyFS

JFS

path

trav

ersa

lSE

T-1

SET-

2re

adre

adlin

kge

tdire

ntrie

scr

eat

link

mkd

irre

nam

esy

mlin

kw

rite

trun

cate

rmdi

run

link

mou

ntSE

T-3

umou

nt

INODE

DIR

BMAP

IMAP

INTERNAL

DATA

SUPER

JSUPER

JDATA

AGGR-INODE

IMAPDESC

IMAPCNTL

Normal

Data loss

N/A

Cannot mount

Ops fail

Data corrupt

Crash

Read-only

aDepends

J

04/21/23 27Tolerating File-System Mistakes with EnvyFS

Page 28: Tolerating File-System Mistakes  with EnvyFS

EnvyFS3

path

trav

ersa

lSE

T-1

SET-

2re

adre

adlin

kge

tdire

ntrie

scr

eat

link

mkd

irre

nam

esy

mlin

kw

rite

trun

cate

rmdi

run

link

mou

ntSE

T-3

umou

nt

INODE

DIR

BMAP

IMAP

INTERNAL

DATA

SUPER

JSUPER

JDATA

AGGR-INODE

IMAPDESC

IMAPCNTL

Normal

N/A

Crash

Kernel panic in EnvyFS3

E RJ

EnvyFS

04/21/23 28Tolerating File-System Mistakes with EnvyFS

Page 29: Tolerating File-System Mistakes  with EnvyFS

• Experimental setup– AMD Opteron 2.2 GHz Processor– 2GB RAM– 80 GB Hitachi Deskstar 7200-rpm SATA disk– Linux 2.6.12– 4GB disk partition for each file system

OpenSSH BenchmarkPerformance Evaluation

04/21/23 29Tolerating File-System Mistakes with EnvyFS

Elap

sed

Tim

e (i

n Se

cond

s)

File Systems

• CPU Intensive• OpenSSH 4.5 -- Copy, untar and

make

Performance of EnvyFS3 is comparable to a single file system

3 % overhead

Page 30: Tolerating File-System Mistakes  with EnvyFS

• I/O Intensive– Mimics busy mail server workload– Transaction: creates, deletes, reads, appends, …

• Postmark Configuration– 2500 files– File size: 4Kb – 40Kb– No. of transactions: 10K and 100K

Postmark Benchmark

04/21/23 30Tolerating File-System Mistakes with EnvyFS

Elap

sed

Tim

e (i

n Se

cond

s)

129.039.0 26.414.7 9.6 29

10734

851

430

128

243

78

406

271EnvyFS3: 3.3x+ SubSIST: -32%

EnvyFS3: 8x+ SubSIST: 4x

EnvyFS3: 1.7x+ SubSIST: 11.5%

Page 31: Tolerating File-System Mistakes  with EnvyFS

Summary of Results

• Robustness– Traditional file systems vulnerable to corruptions – EnvyFS3 tolerates almost all mistakes in one FS

• Performance– Desktop workloads: EnvyFS3 has comparable performance

– I/O intensive workloads: • Regular Operations: EnvyFS3 + SubSIST acceptable performance

• Memory pressure: EnvyFS3 + SubSIST has large overhead

04/21/23 31Tolerating File-System Mistakes with EnvyFS

Page 32: Tolerating File-System Mistakes  with EnvyFS

Outline

• Introduction• Building reliable file systems • Reducing overheads with SubSIST • Evaluation• Conclusion

04/21/23 32Tolerating File-System Mistakes with EnvyFS

Page 33: Tolerating File-System Mistakes  with EnvyFS

Conclusion

• Bugs/mistakes are inevitable in any software– Must cope, not just hope to avoid

• EnvyFS: N-version approach to tolerating FS bugs – Built using existing specification and file systems

• SubSIST: single instance store– Decreases overheads while retaining reliability

04/21/23 33Tolerating File-System Mistakes with EnvyFS

Page 34: Tolerating File-System Mistakes  with EnvyFS

Thank You!

Advanced Systems Lab (ADSL)University of Wisconsin-Madison

http://www.cs.wisc.edu/adsl

04/21/23 34Tolerating File-System Mistakes with EnvyFS