Cassandra 2.1 boot camp, Compaction
-
Upload
joshmckenzie -
Category
Technology
-
view
475 -
download
1
description
Transcript of Cassandra 2.1 boot camp, Compaction
![Page 1: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/1.jpg)
Compaction
![Page 2: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/2.jpg)
● Overview● Compaction strategies● Tombstones● Code walkthrough
Agenda
![Page 3: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/3.jpg)
Why?
● SSTables immutable● Get rid of duplicate/overwritten data● Drop deleted data and tombstones
![Page 4: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/4.jpg)
When?
● Manually, nodetool compact / scrub ...● When we add sstables
○ After flush○ Once a compaction is done○ After streaming
● Search for usages of○ o.a.c.db.compaction.
CompactionManager#submitBackground
![Page 5: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/5.jpg)
Types of compaction● Minor - runs automatically in the background● Major - includes all sstables, only for size tiered
compaction● Single-sstable compactions
○ upgradesstables○ scrub○ cleanup
● Anticompaction○ After incremental repair to split out repaired/unrepaired data
![Page 6: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/6.jpg)
Compaction strategies● Pluggable interface● Strategies decide
○ what sstables to compact○ how big they should be○ what implementation of CompactionTask to use
● Strategies can get notified when adding new sstables○ Makes it possible to make smart decisions when deciding which
sstables to compact○ LCS does this to keep track of what sstables are in each level
![Page 7: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/7.jpg)
SizeTieredCompactionStrategy● Combines sstables based on their size● Skips sstables that are ‘cold’ - not read much
![Page 8: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/8.jpg)
LeveledCompactionStrategy● Keeps levels of non-overlapping sstables● Each level is 10x the size of the previous one● All sstables in levels 1+ are about the same size
(160MB)● L0 is the dumping ground, overlapping, larger sstables
![Page 9: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/9.jpg)
Tombstones● Write a tombstone to delete data● Covers data, but only data that is older than
the tombstone● Drop covered data during compaction
![Page 10: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/10.jpg)
When can we drop tombstones?● Once the tombstone has existed
gc_grace_seconds● When the tombstone is guaranteed to not
cover any data on the node○ All sstables containing the key are included in the
compaction○ The other sstables where the key exists only contain
newer data
![Page 11: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/11.jpg)
Code walkthrough
![Page 12: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/12.jpg)
![Page 13: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/13.jpg)
CompactionManager
● submitBackground○ Trigger minor compaction○ Fill executor with BackgroundCompactionTasks
● BackgroundCompactionTask● submitMaximal
○ Major compaction○ Not blocking, get() the future to block○ runWithCompactionsDisabled
● OneSSTableOperation○ Common way to run the single-sstable compactions in parallel
![Page 14: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/14.jpg)
CompactionTask
● Gets executed in the CompactionExecutor and does the actual compacting
● Eventually calls runWith(..) which is where the magic happens
![Page 15: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/15.jpg)
CompactionTask
![Page 16: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/16.jpg)
CompactionController
● Keep track of overlapping sstables○ Is the currently compacting key in any other sstable?
● maxPurgeableTimestamp(DecoratedKey key)○ How old tombstones do we need to keep?○ Worst case, currently compacting key is the oldest in that sstable
![Page 17: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/17.jpg)
SSTableRewriter
● Open compaction results early
![Page 18: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/18.jpg)
SSTableWriter● Writes sstables…● Give it rows, it writes index, data file, sstable metadata
files etc● openEarly(..)
○ link index and data files○ in-memory-fake the rest of the files
● Collect SSTable metadata
![Page 19: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/19.jpg)
SSTable metadata● Collected whenever an sstable is written● StatsMetadata
○ Kept on-heap○ min/maxTimestamp○ min/maxColumnNames○ sstableLevel
● CompactionMetadata○ Deserialized when needed○ ancestors○ cardinalityEstimator - HyperLogLog signature
● ValidationMetadata○ Used to validate sstables when opening
![Page 20: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/20.jpg)
Iterators all the way downa 1 2 3
b 2 3 5
d .. .. ..
a 2 5 7
b 2 4 5
e .. .. ..
a 1 2 3 5 7
b 2 3 4 5
d .. .. .. .. ..
e .. .. .. .. ..
● “Partition iterator” for each sstable (SSTableScanner)
● “Cell iterator” for each partition (OnDiskAtomIterator)
● MergeIterator (MI) that takes a number of (sorted) iterators and merges them
● One MI for sstables that merges partitions
● One MI for each partition that merges cells
![Page 21: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/21.jpg)
MergeIterator● Interesting implementation is ManyToOne● Merges many sorted iterators into one● Reducer
○ reduce(..) gets called for every version that should be reduced
○ getReduced() gets called when all versions with the same name/priority/value has been reduce():ed
![Page 22: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/22.jpg)
MergeIterator1. call next()2. poll one item out of the PQ3. Reducer.reduce(..)4. goto 2, until we find an item
that differs5. Call next() on the iterators
you polled6. Re-add the iterators to the PQ7. return Reducer.getReduced
![Page 23: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/23.jpg)
CompactionIterable
● Creates LazilyCompactedRow● Simple Reducer
![Page 24: Cassandra 2.1 boot camp, Compaction](https://reader033.fdocuments.us/reader033/viewer/2022052907/5590f0731a28abdd378b4653/html5/thumbnails/24.jpg)
LazilyCompactedRow
● “Lazy” because we don’t deserialize until we need to
● Uses a MergeIterator to merge the rows● Drops tombstones if possible
○ Uses CompactionController for this