Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently...

18
<Insert Picture Here> Btrfs Features and Status Chris Mason

Transcript of Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently...

Page 1: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

<Insert Picture Here>

Btrfs Features and Status

Chris Mason

Page 2: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Btrfs Goals

• General purpose filesystem that scales to very large storage

• Fast development cycle to attract developers and users early

• Feature focused, providing features other Linux filesystems cannot

• Administration focused, easy to run and very fault tolerant

• Perform well in a variety of workloads

Page 3: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Btrfs Features

• Extent based file storage• Copy on write metadata and data• Space efficient packing of small files• Space efficient indexed directories• Integrity checksumming• Writable snapshots• Efficient incremental backups• Multiple device support• Full back references for all types of metadata• Offline conversion from Ext3

Page 4: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Btrfs Status

• V0.16 recently released• Most releases happen 1-2 months apart• Generally usable in many workloads• Generally stable• Works against mainline kernels and 2.6.18 enterprise

kernels• Multi-device support functional

Page 5: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Metadata Format

• Metadata is stored in key / value pairs in a btree– Flexible format used to consistently describe and index

metadata

• Items from multiple files and directories share the same btree blocks

• Special purpose btrees used for different functions• Metadata format allows efficient searches for blocks

that have changed since a given transaction

Page 6: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Snapshots and Subvolumes

• Subvolume is the unit of snapshotting • As efficient as directories on disk• Every transaction is a new unnamed snapshot• Every commit schedules the old snapshot for deletion• Snapshots use a reference counted COW algorithm to track blocks• Snapshots can be written and snapshotted again• COW leaves old data in place and writes the new data to a new location

Page 7: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many
Page 8: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many
Page 9: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Metadata and Data Integrity

• Checksumming on both metadata and data• Read errors or checksum errors are corrected by

using another mirror of the block (if available)• Metadata back references

– Detect many types of corruptions– Help limit repair time and scope– Connect extents back to the files that reference them

• Block 112233 is corrupted, which files do I need to fix?

• Disk scrubbing (planned feature)

Page 10: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Multi-device Support

• Devices are added into a pool of available storage• New logical address space is allocated with a specific

RAID configuration and data storage flags– System (used by the volume management code)– Metadata– Data– Raid0, raid1, raid10, single-spindle-dup

• Space is allocated from the storage pool in large chunks (1GB or more)

• Devices can be mixed in size and speed

Page 11: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Multi-device Support

• Devices can be grouped for use by specific files or subvolumes (planned feature)

• Single block number translation and IO path for all extents

• Storage can be restriped or dynamically reconfigured• Device management uses the same transaction

subsystem as the rest of the FS– Much lower overhead when reconfiguring storage

• Device management knows which parts of each device are in use by the FS

Page 12: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Multiple Device Managment

• Create the filesystem– mkfs.btrfs /dev/sdb– Mount /dev/sdb /mnt

• Add some devices– Btrfs-vol -a /dev/sdc /mnt– Btrfs-vol -a /dev/sdd /mnt

• Redistribute the data to the new devices– Btrfs-vol -b /mnt– This also restripes as required

• Remove a device– Btrfs-vol -r /dev/sdc /mnt

Page 13: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Online Resizing

• Shrink online by 1GB– Btrfsctl -r -1g /mnt

• Shrink a specific device– Btrfs-show # find device ids– Btrfsctl -r 2:-1g /mnt

• 2: indicates to shrink devid 2

• Grow a device online– Btrfsctl -r 2:+1g /mnt

• Interfaces will be expanded to allow raid reconfiguration and advanced device management online

Page 14: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Synchronous Operations

• COW transaction subsystem is slow for frequent commits– Forces recow of many blocks– Forces significant amounts of IO writing out extent allocation

metadata

• Simple write ahead log added for synchronous operations on files or directories

• File or directory Items are copied into a dedicated tree – File back refs allow us to log file names without the directory– One log btree per subvolume

Page 15: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Synchronous Operations

• The log tree uses the same COW btree code as the rest of the FS

• The log tree uses the same writeback code as the rest of the FS, and uses the metadata raid policy.

• Commits of the log tree are separate from commits in the main transaction code.– fsync(file) only writes metadata for that one file– fsync(file) does not trigger writeback of any other data blocks

Page 16: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many
Page 17: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Mixed Fsync Workload

• 8 Processes creating, writing and fsyncing 20k files• 1 Process creating 30 copies of the linux kernel

sources in sequence

Kernel tree I/O Rate

83MB/s 15,680Ext4 122MB/s 1280Ext3 66.85MB/s 3200

Fsync Creates

Btrfs

Page 18: Btrfs Features and Status - Oraclemason/presentation/enduser.pdf · Btrfs Status • V0.16 recently released • Most releases happen 1-2 months apart • Generally usable in many

Btrfs Problems

• ENOSPC• Disk format not yet finalized• http://btrfs.wiki.kernel.org/index.php/Project_ideas