Fractal tree-technology-and-the-art-of-indexing

Post on 11-May-2015

346 views 0 download

Tags:

Transcript of Fractal tree-technology-and-the-art-of-indexing

Fractal Tree® Technology OverviewThe Art of Indexing

Martín Farach-ColtonCo-founder & Chief Technology Officer

The Art of Indexing®

Not all indexing is the same

B-tree is the basis for almost all DB systems• Data structure invented in 1972

• Has not kept up with hardware trends‣ Works poorly on modern rotational disks‣ Works poorly on SSD

Fractal Tree Indexes is the basis of TokuDB• Scales with hardware

• Fast Indexing ➔ More Indexing ➔ Faster Queries

• Great Compression

• No Fragmentation

• Reduced wear on SSDs

How do Fractal Tree Indexes outperform

B-trees?

How do Fractal Tree Indexes outperform

B-trees?

First, some facts about storage systems

The Art of Indexing®

Storage is quirky

Hard disks are slow for random I/O but fast for

sequential I/O

Difference causes problems like fragmentation, ...

The Art of Indexing®

Storage is quirky

Hard disks are slow for random I/O but fast for

sequential I/O

SSDs are fast for random I/O but expensive for

sequential.

Garbage collection causes artefacts: increased

wear, write cliffs...

Difference causes problems like fragmentation, ...

The Art of Indexing®

Storage ♡ Big Reads and Writes

Big I/Os

The Art of Indexing®

Storage ♡ Big Reads and Writes

Big I/Os

Hard Disks: No Fragmentation

The Art of Indexing®

Storage ♡ Big Reads and Writes

Big I/Os

Hard Disks: No Fragmentation SSDs: Less Garbage

Collection & Wear

The Art of Indexing®

Storage ♡ Big Reads and Writes

Big I/Os

Hard Disks: No Fragmentation SSDs: Less Garbage

Collection & Wear

But you only get the goodies if each I/O has lots of new bytes

The Art of Indexing®

Storage ♡ Big Reads and Writes

Big I/Os

Hard Disks: No Fragmentation SSDs: Less Garbage

Collection & Wear

But you only get the goodies if each I/O has lots of new bytes

If you read a big block, change one byte, then write it, you get terrible wear problems

The Art of Indexing®

Storage ♡ Big Reads and Writes

Big I/Os

Hard Disks: No Fragmentation SSDs: Less Garbage

Collection & Wear

Both: Better Compression

But you only get the goodies if each I/O has lots of new bytes

If you read a big block, change one byte, then write it, you get terrible wear problems

The Art of Indexing®

Caching is great• But you can’t cache all your data

• For stuff not in memory, you have to go to disk

Goal: Do the best we can for the stuff on disk

Data is big, RAM is Small

Data

RAM

Now, What’s a B-tree? & a Fractal Tree Index

The Art of Indexing®

What’s a B-tree?

B

B B B... ...

B B……………………………………………….………….

Search Tree

The Art of Indexing®

What’s a B-tree?

B

B B B... ...

B B……………………………………………….………….

Each node has B pivots

Search Tree

The Art of Indexing®

What’s a B-tree?

B

B B B... ...

B B……………………………………………….………….

Each node has B pivots

Search Tree

Fetching pivots as a group saves I/Os

The Art of Indexing®

What’s a B-tree?

B

B B B... ...

B B……………………………………………….………….

Each node has B pivots

Search Tree

Fetching pivots as a group saves I/Os

Most leaves are on disk, not in RAM

The Art of Indexing®

What’s a B-tree?

B

B B B... ...

B B……………………………………………….………….

Each node has B pivots

Search Tree

Fetching pivots as a group saves I/Os

Most leaves are on disk, not in RAM Inserting into a leaf that’s not in memory will require an I/O per insertion

The Art of Indexing®

B-tree Delivery Service

If fast memory is like walking across a room• Each update in a B-tree is a walking trip from ‣ New York

The Art of Indexing®

B-tree Delivery Service

If fast memory is like walking across a room• Each update in a B-tree is a walking trip from ‣ New York to St Louis

The Art of Indexing®

B-tree Delivery Service

If fast memory is like walking across a room• Each update in a B-tree is a walking trip from ‣ New York to St Louis

Each item gets its own round trip!

The Art of Indexing®

Real-world delivery

Keep regional warehouses• Only move stuff when you can move a lot

The Art of Indexing®

Real-world delivery

Keep regional warehouses• Only move stuff when you can move a lot

Each item gets moved several times

The Art of Indexing®

Real-world delivery

Keep regional warehouses• Only move stuff when you can move a lot

Each item gets moved several times

But each trip is vastly cheaper

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

Flush a buffer when it fills

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

Flush a buffer when it fills

A flush might take an I/O, but it does lots of useful work

The Art of Indexing®

Fractal Tree Indexes

B

B B BB

Each node has pivots & Buffers

Buffers fill as updates arrive

Flush a buffer when it fills

A flush might take an I/O, but it does lots of useful work

More changes per write ➔ fewer changes for same write

load ➔ less SSD wear

The Art of Indexing®

Fractal Tree Indexes: Queries

B

B B BB

Lots of buffers have messages

The Art of Indexing®

Fractal Tree Indexes: Queries

B

B B BB

Lots of buffers have messages

But query follows root-leaf path

The Art of Indexing®

Fractal Tree Indexes: Queries

B

B B BB

Lots of buffers have messages

But query follows root-leaf path

So every query has the most up-to-date information

The Art of Indexing®

Fractal Tree Indexes: Queries

B

B B BB

Lots of buffers have messages

But query follows root-leaf path

So every query has the most up-to-date information

Messages can be insert, update, delete

The Art of Indexing®

Fractal Tree Indexes: Schema Changes

B

B B BB

Schema Changes are broadcast messages

The Art of Indexing®

Fractal Tree Indexes: Schema Changes

B

B B BB

Schema Changes are broadcast messages

The Art of Indexing®

Fractal Tree Indexes: Schema Changes

B

B B BB

Schema Changes are broadcast messages

The Art of Indexing®

Fractal Tree Indexes: Schema Changes

B

B B BB

Schema Changes are broadcast messages

Insertion into root is fast, changes are immediate

The Art of Indexing®

Fractal Tree Indexes: Schema Changes

B

B B BB

Schema Changes are broadcast messages

Leaves get rewritten when broadcast message

makes it that far

Insertion into root is fast, changes are immediate

Schema change message still on root-

leaf path

The Art of Indexing®

Analysis

Delivery system gives goodies:• Messages get moved, but each I/O pays for a lot of

movement

• You get very fast inserts

• You get Hot Schema Changes

Each flush carries lots of useful information• So it’s worth it to make nodes big

• No fragmentation, Better Compression

• Much better wear on SSDs