RELIABILITY ANALYSIS OF ZFS CS 736 Project University of Wisconsin - Madison.

25
RELIABILITY ANALYSIS OF ZFS CS 736 Project University of Wisconsin - Madison

Transcript of RELIABILITY ANALYSIS OF ZFS CS 736 Project University of Wisconsin - Madison.

University of Wisconsin - Madison

RELIABILITY ANALYSIS OF ZFS

CS 736 Project

University of Wisconsin - Madison

Reliability Analysis of ZFS

To perform reliability analysis of ZFS Test existing reliability claims Layered driver interface – simulating

transient block corruptions at various levels in ZFS on-disk hierarchy.

Results Classes of fault handled by ZFS. Measure of the robustness of ZFS. Lessons on building a reliable, robust file

system.

Summary

University of Wisconsin - Madison

Coming Up

ZFS Organization ZFS On Disk format ZFS features and specs regarding reliability.

Experimental Setup and Experiments Results and Conclusions Future Work

Outline of the talk

University of Wisconsin - Madison

ZFS OrganizationPooled Storage Model

-Pooled Storage Model- Disk is a ZFS pool comprising of many file systems.

ZFS Pool

ZFS ZFSZFS ZFS

University of Wisconsin - Madison

ZFS Organization

Transactional based object file system Every structure is an object. Operation on object(s) is a transaction. Grouping of transaction as transaction group.

All data and metadata blocks are checksummed. No silent corruptions.

Modifications are always Copy on Write Always on-disk consistent.

All metadata and data(optional) is compressed.

Object based

University of Wisconsin - Madison

ZFS Structures

Entire file system is represented as Objects - dnode_phys_t Object Sets - dnode_phys_t [ ]

P/L analogy – each object is a template. The bonus buffer describes specific attributes.

University of Wisconsin - Madison

ZFS Structures

Data transferred to disks in terms of blocks.

Block pointers (blkptr_t) used to locate, verify and describe blocks. Contains checksum and compression

information. Physical size of block <> Logical Size of

block Gang blocks

Blocks and block pointers

University of Wisconsin - Madison

ZFS Structures

Data Virtual Address – combination of fields in blkptr_t to locate block on disk.

Wideness – blkptr_t can store upto three copies of the data pointed by a unique DVA. These blocks are called as “ditto blocks”. Three for pool wide

metadata Two for file system wide

metadata One for data (configurable)

Block pointers

offset1

asizevdev1

asizevdev2

offset2

asizevdev3

offset3

Lvl typ cksum comp psize lsize

University of Wisconsin - Madison

ZFS StructuresWideness

University of Wisconsin - Madison

ZFS Structures

ZAP (ZFS Attribute Processor) ZAP objects used to handle arbitrary

(name, object) associations within an object set (objset) Most commonly used to implement

directories Also used extensively throughout the DSL

Attributes on disk

University of Wisconsin - Madison

Putting it all together

•Everything in ZFS is an object.

•A dnode describes and organizes a collection of blocks making up an object.

Objects

Objects

University of Wisconsin - Madison

Putting it all together

•Group related objects to form objsets.

•Filesystems, volumes, clones and snapshots are objsets.

Objects

Object set

Object Sets

University of Wisconsin - Madison

Putting it all together

Objects

Object set

Snapshot Information

DataSet

•Encapsulates objset and provides• Space usage• Snapshot Information

Space map

DataSets

University of Wisconsin - Madison

Putting it all together

Objects

Object set

Snapshot Information

DataSetChild Map

Properties

DataSet Directory

•Groups Datasets

•Properties such as quotas, compression

•Dataset Relationships

Space map

Dataset directories

University of Wisconsin - Madison

A road less travelledFrom vdev label to data

University of Wisconsin - Madison

To sum up

Layers of indirection End to end Checksums which are

separated from data. Wideness (Ditto Blocks) (3 – 2 – 1) Compression Copy on Write Scrub facility

Moving forward

University of Wisconsin - Madison

Experimental Setup

Corruption Framework Corrupter Driver

Modify physical disk blocks

Analyzer App Understand on-disk

ZFS structures Consumer App

Monitor ZFS responses, error codes

University of Wisconsin - Madison

Experimental Setup - Simplification

Setup on Solaris 10 VM Only one physical vdev (disk) No striping, mirror, raid… Initial target – Pointer Corruption

Reduced Sample Space Interesting Cases

Disable compression as much as possible

University of Wisconsin - Madison

Initial Finding

All metadata compressed Cannot disable metadata compression

Pointer Corruption not feasible Perform corruptions on compressed

objects Representative of effects of disk faults on

ZFS

University of Wisconsin - Madison

Corruption Experiments

TYPE: Type-aware Object Corruptions

TARGET (Targeted On-Disk Objects) Vdev labels [@Pool] Uberblocks [@Pool] Object sets

Meta Object Set [@Pool] objset_phys_t (describing object set) Object array

Myfs Object Set [@FS] objset_phys_t Indirect blkptr objects Object array

ZIL [@FS] File Data [@FS] Directory Data [@FS]

University of Wisconsin - Madison

Results

Detection Recovery Correction

vdev label YES/Checksum YES/Replica NO/COW

uberblock YES/Checksum YES/Replica NO/COW

MOS Object YES/Checksum YES/Replica NO/COW

MOS Object Set YES/Checksum YES/Replica NO/COW

FS Object YES/Checksum YES/Replica NO/COW

FS Indirect Objects

YES/Checksum YES/Replica NO/COW

FS Object Set YES/Checksum YES/Replica NO/COW

ZIL YES/Checksum NO NO

Directory Data YES/Checksum NO/Configurable

NO/Configurable

File Data YES/Checksum NO/Configurable

NO/Configurable

University of Wisconsin - Madison

Summary (using IRON Taxonomy)

Detection Checksums in

parent blkptrs

Recovery Replication in

parent blkptrs (ditto blocks)

University of Wisconsin - Madison

Conclusion

Integration of File System and Volume Manager Saves an additional translation

Use of one generic pointer block for checksums and replication Merkel tree provides Robustness

Use of replication/compression in commodity file system viable

COW can be used effectively

University of Wisconsin - Madison

Observations/Questions

No correction of ditto blocks: relies on COW Consecutive (n=wideness) failures without

transaction group commit ?? Snapshot corruption ??

Explicit scrubbing corrects ditto blocks in-place Potential for corruption ??

Space/ Performance hit due to redundancy/compression 2% hit in terms of space/IO ?? (Banham & Nash) No Page Cache, uses ARC

University of Wisconsin - Madison

Future Work

Snapshot corruptions Multiple device configuration

Striping Mirror RAID-Z