RELIABILITY ANALYSIS OF ZFS CS 736 Project University of Wisconsin - Madison.
-
Upload
mary-brummitt -
Category
Documents
-
view
214 -
download
0
Transcript of RELIABILITY ANALYSIS OF ZFS CS 736 Project University of Wisconsin - Madison.
University of Wisconsin - Madison
Reliability Analysis of ZFS
To perform reliability analysis of ZFS Test existing reliability claims Layered driver interface – simulating
transient block corruptions at various levels in ZFS on-disk hierarchy.
Results Classes of fault handled by ZFS. Measure of the robustness of ZFS. Lessons on building a reliable, robust file
system.
Summary
University of Wisconsin - Madison
Coming Up
ZFS Organization ZFS On Disk format ZFS features and specs regarding reliability.
Experimental Setup and Experiments Results and Conclusions Future Work
Outline of the talk
University of Wisconsin - Madison
ZFS OrganizationPooled Storage Model
-Pooled Storage Model- Disk is a ZFS pool comprising of many file systems.
ZFS Pool
ZFS ZFSZFS ZFS
University of Wisconsin - Madison
ZFS Organization
Transactional based object file system Every structure is an object. Operation on object(s) is a transaction. Grouping of transaction as transaction group.
All data and metadata blocks are checksummed. No silent corruptions.
Modifications are always Copy on Write Always on-disk consistent.
All metadata and data(optional) is compressed.
Object based
University of Wisconsin - Madison
ZFS Structures
Entire file system is represented as Objects - dnode_phys_t Object Sets - dnode_phys_t [ ]
P/L analogy – each object is a template. The bonus buffer describes specific attributes.
University of Wisconsin - Madison
ZFS Structures
Data transferred to disks in terms of blocks.
Block pointers (blkptr_t) used to locate, verify and describe blocks. Contains checksum and compression
information. Physical size of block <> Logical Size of
block Gang blocks
Blocks and block pointers
University of Wisconsin - Madison
ZFS Structures
Data Virtual Address – combination of fields in blkptr_t to locate block on disk.
Wideness – blkptr_t can store upto three copies of the data pointed by a unique DVA. These blocks are called as “ditto blocks”. Three for pool wide
metadata Two for file system wide
metadata One for data (configurable)
Block pointers
offset1
asizevdev1
asizevdev2
offset2
asizevdev3
offset3
Lvl typ cksum comp psize lsize
University of Wisconsin - Madison
ZFS Structures
ZAP (ZFS Attribute Processor) ZAP objects used to handle arbitrary
(name, object) associations within an object set (objset) Most commonly used to implement
directories Also used extensively throughout the DSL
Attributes on disk
University of Wisconsin - Madison
Putting it all together
•Everything in ZFS is an object.
•A dnode describes and organizes a collection of blocks making up an object.
Objects
Objects
University of Wisconsin - Madison
Putting it all together
•Group related objects to form objsets.
•Filesystems, volumes, clones and snapshots are objsets.
Objects
Object set
Object Sets
University of Wisconsin - Madison
Putting it all together
Objects
Object set
Snapshot Information
DataSet
•Encapsulates objset and provides• Space usage• Snapshot Information
Space map
DataSets
University of Wisconsin - Madison
Putting it all together
Objects
Object set
Snapshot Information
DataSetChild Map
Properties
DataSet Directory
•Groups Datasets
•Properties such as quotas, compression
•Dataset Relationships
Space map
Dataset directories
University of Wisconsin - Madison
To sum up
Layers of indirection End to end Checksums which are
separated from data. Wideness (Ditto Blocks) (3 – 2 – 1) Compression Copy on Write Scrub facility
Moving forward
University of Wisconsin - Madison
Experimental Setup
Corruption Framework Corrupter Driver
Modify physical disk blocks
Analyzer App Understand on-disk
ZFS structures Consumer App
Monitor ZFS responses, error codes
University of Wisconsin - Madison
Experimental Setup - Simplification
Setup on Solaris 10 VM Only one physical vdev (disk) No striping, mirror, raid… Initial target – Pointer Corruption
Reduced Sample Space Interesting Cases
Disable compression as much as possible
University of Wisconsin - Madison
Initial Finding
All metadata compressed Cannot disable metadata compression
Pointer Corruption not feasible Perform corruptions on compressed
objects Representative of effects of disk faults on
ZFS
University of Wisconsin - Madison
Corruption Experiments
TYPE: Type-aware Object Corruptions
TARGET (Targeted On-Disk Objects) Vdev labels [@Pool] Uberblocks [@Pool] Object sets
Meta Object Set [@Pool] objset_phys_t (describing object set) Object array
Myfs Object Set [@FS] objset_phys_t Indirect blkptr objects Object array
ZIL [@FS] File Data [@FS] Directory Data [@FS]
University of Wisconsin - Madison
Results
Detection Recovery Correction
vdev label YES/Checksum YES/Replica NO/COW
uberblock YES/Checksum YES/Replica NO/COW
MOS Object YES/Checksum YES/Replica NO/COW
MOS Object Set YES/Checksum YES/Replica NO/COW
FS Object YES/Checksum YES/Replica NO/COW
FS Indirect Objects
YES/Checksum YES/Replica NO/COW
FS Object Set YES/Checksum YES/Replica NO/COW
ZIL YES/Checksum NO NO
Directory Data YES/Checksum NO/Configurable
NO/Configurable
File Data YES/Checksum NO/Configurable
NO/Configurable
University of Wisconsin - Madison
Summary (using IRON Taxonomy)
Detection Checksums in
parent blkptrs
Recovery Replication in
parent blkptrs (ditto blocks)
University of Wisconsin - Madison
Conclusion
Integration of File System and Volume Manager Saves an additional translation
Use of one generic pointer block for checksums and replication Merkel tree provides Robustness
Use of replication/compression in commodity file system viable
COW can be used effectively
University of Wisconsin - Madison
Observations/Questions
No correction of ditto blocks: relies on COW Consecutive (n=wideness) failures without
transaction group commit ?? Snapshot corruption ??
Explicit scrubbing corrects ditto blocks in-place Potential for corruption ??
Space/ Performance hit due to redundancy/compression 2% hit in terms of space/IO ?? (Banham & Nash) No Page Cache, uses ARC