Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook

Understanding & Tuning Compaction Algorithms

Nicolas SpiegelbergSoftware Engineer, Facebook

HBase Users Group,January 23, 2013

1 Background

2 Compactions in Hbase

3 Compactions: Other System Algorithms

4 Parting Thoughts

Agenda

Compactions: Background

. . . . Shard #2

Log Structured Merge Tree

Shard #1

Server

. . . . ColumnFamily #2

ColumnFamily #1 Memstore

HFiles flush

Data in HFile is sorted; has block index for efficient retrieval

About LSMTWrite Algorithms are relatively-trivial▪ Write new, immutable file▪ Avoid stalls

Read Algorithms are varied▪ Compaction▪ Server-side Filters▪ Block Index▪ Bloom Filter

Compactions: IntroCritical for Read Performance▪ Merge N files▪ Reduces read IO when earlier filters don’t help enough▪ The most complicated part of an LSMT

▪ What & when to select

HFiles

Compactions: DisclaimersAssumptions▪ Only general algorithms included

▪ Coprocessors available for some common apps▪ Assume a relatively-stable R+W workload

Compactions in HBase

Sigma CompactionDefault algorithm in HBase 0.90

#1. File selection based on summation of sizes.

size[i] < (size[0] + size[1] + …size[i-1]) * C#2. Compact only if at least N eligible files found.

+ trivial implementation - non-deterministic latency+ minimal overwrites - files have variable lifetime

- no incremental benefit

Compactions: ConfigurationAll Compaction Algorithms▪ hbase.hstore.compaction.ratio▪ hbase.hstore.compaction.min▪ hbase.hregion.majorcompaction

▪ hbase.offpeak.start.hour▪ hbase.offpeak.end.hour▪ hbase.hstore.compaction.ratio.offpeak

Tiered CompactionDefault algorithm in BigTable/HBase

#1. File selection based on size relative to a pivot:

size[i] * C >= size[p] <= size[k] / C :: i < p < k#2. Compact only if at least N eligible files found.

(groups files into “tiers”)

+ trivial implementation - more files seeks necessary+ more deterministic behavior - still write-biased+ medium size files are warm - no incremental benefit

Compactions: ConfigurationTiered Compaction▪ Enable: “hbase.hstore.compaction.CompactionPolicy”

▪ Default.NumCompactionTiers▪ Default.Tier.X

▪ MaxSize▪ MaxAgeInDisk

Compactions: Work Queues▪ Problem: Starvation▪ Solution:

▪ Handle Large & Small Compactions Differently▪ Allow a configurable “throttle” to determine which queue

Compactions: ConfigurationCompaction Work Queues▪ hbase.regionserver.thread.compaction.small▪ hbase.regionserver.thread.compaction.large▪ hbase.regionserver.thread.compaction.throttle / “ThrottlePoint”

Compactions: Other Algorithms

Leveled CompactionDefault algorithm in LevelDB

#1. Bucket into tiers of magnitude difference (~10x) #2. Shard the compaction across files (not just block index)#3. Only the shard that goes over a certain size

+ optimized for read-heavy use - complicated algorithm+ faster compaction turnaround - heavy rewrites on write-dominated use+ easy to cache-on-compact - time range filters less effective

Time-Series CompactionHFiles

▪ Log-structured Merge Tree▪ Time-ordered Data Storage!

▪ Time-Series Compaction▪ Implement with Coprocessor▪ Time-boundary Based

▪ Shard HFiles on Hour, Day, etc… day… hour… …

▪ Time-series data optimized▪ Write-biased query optimized

HFiles

Parting Thoughts

Compactions: Associated JIRAs▪ 0.90 Sigma Compactions (HBASE-3209)▪ 0.92 Multi-Threaded Compactions (HBASE-1476)▪ 0.96 Tier-based Compaction (HBASE-6371 & 7055)

▪ Future Make Compactions Pluggable (HBASE-7516)Leveled Compaction (HBASE-7519)

Compactions: High Level ThoughtsVariables▪ Disk IO on HFile Read▪ Disk & Network IO on Compaction (R+W)

Compactions: High Level ThoughtsRelated Questions▪ Is data mutate or append?

▪ Mutates benefit from lazy seeks but cause disk bloat▪ HFile reduction is less useful as Rows queries are larger

▪ Are you missing critical filters?▪ Explicit vs. Implicit Requests▪ Cache on write/compact (CacheConfig)▪ Time Range / Column Filter▪ Bloom Filters: non-trivial decision, need to measure

Thanks! Questions?

Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook

Documents

Transcript of Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook

Spiegelberg The puzzle of Wittgenstein’s Phaenomenologie

rashmibade.files.wordpress.com€¦ · Web viewCompaction: Mechanism of compaction, factors affecting compaction, standard & modified proctor Tests, field compaction equipments,

compaction factor

Compaction Equipments

Shallow Compaction Deep Compaction · Laboratory Compaction Test-to obtain the compaction curve and define the optimum water content and maximum dry density for a specific compactive

Vibro Compaction

Accugrade Compaction

COMPACTION COMPACTION TECHNOLOGY - Ammann · COMPACTION TECHNOLOGY AMMANN TECHNOLOGY PORTFOLIO COMPACTION. 2 INTELLIGENT COMPACTION GETS EVEN SMARTER Precise, transparent and verifiable

Compaction Geomechanics MBDCI Compaction Geomechanics: Mechanisms, Screening Maurice B. Dusseault.

Deep vibratory compaction of granular soils - Fellenius Deep vibratory compaction of granular... · Deep vibratory compaction of granular soils ... maximum depths of compaction, ...

Flex Pave OH Intelligent Compaction presentationIntelligent Compaction • Intelligent compaction equipment measures and records the quality of compaction during the compaction process.

Rates of Soil Compaction and Compaction Recovery in Areas ...

Layout Compaction

Implementation of Intelligent Compaction Technology for ... · Definition of Intelligent Compaction (IC) Components of Intelligent Compaction How the Intelligent Compaction works

Compression Compaction

Soil Compaction

Proktor compaction

Compaction Grouting

Compaction Lecture

ASPHALT COMPACTION