MongoDB and Fractal Tree Indexes

49
1 MongoDB and Fractal Tree ® Indexes Tim Callaghan* VP/Engineering, Tokutek [email protected] MongoDB Boston 2012 * not [yet] a MongoDB expert

description

Interested in learning more about MongoDB? Sign up for MongoSV, the largest annual user conference dedicated to MongoDB. Learn more at MongoSV.com

Transcript of MongoDB and Fractal Tree Indexes

Page 1: MongoDB and Fractal Tree Indexes

1

MongoDB and Fractal Tree® Indexes

Tim Callaghan*!VP/Engineering, Tokutek!

[email protected]!!!

MongoDB Boston 2012

* not [yet] a MongoDB expert

Page 2: MongoDB and Fractal Tree Indexes

2

B-trees

Page 3: MongoDB and Fractal Tree Indexes

B-tree Definition

In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches,

sequential access, insertions, and deletions in logarithmic time.

http://en.wikipedia.org/wiki/B-tree

Page 4: MongoDB and Fractal Tree Indexes

B-tree Overview

I will use a simple single-pivot example throughout this presentation

Page 5: MongoDB and Fractal Tree Indexes

5

Basic B-tree

Internal Nodes - Path to data

Leaf Nodes - Actual Data

Pointers

Pivots

Page 6: MongoDB and Fractal Tree Indexes

B-tree example

22

10 99

2, 3, 4 10,20 22,25 99

* Pivot Rule is >=

Page 7: MongoDB and Fractal Tree Indexes

B-tree - insert

22

10 99

2, 3, 4 10,15,20 22,25 99

“Insert 15”

Value stored in leaf node

Page 8: MongoDB and Fractal Tree Indexes

B-tree - search

22

10 99

2, 3, 4 10,20 22,25 99

“Find 25”

Page 9: MongoDB and Fractal Tree Indexes

DISK

RAM

RAM

B-tree - storage

22

10 99

2, 3, 4 10,20 22,25 99

Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes

Page 10: MongoDB and Fractal Tree Indexes

DISK

RAM

RAM

B-tree – serial insertions

22

10 99

2, 3, 4 10,20 22,25 99

Serial insertion workloads are in-memory, think MongoDB’s “_id” index

Page 11: MongoDB and Fractal Tree Indexes

11

Fractal Tree Indexes

Page 12: MongoDB and Fractal Tree Indexes

Fractal Tree Indexes

similar to B-trees - store data in leaf nodes - use PK for ordering

message buffer

message buffer

message buffer

All internal nodes have message buffers

different than B-trees - message buffer in all internal nodes - doesn’t need to update leaf node immediately - much larger nodes (4MB vs. 8KB*)

Page 13: MongoDB and Fractal Tree Indexes

13

Fractal Tree Indexes – “insert 15”

22

10 99

2, 3, 4 10, 20 22, 25 99

insert(15)

No IO is required, all internal nodes usually fit in RAM

Page 14: MongoDB and Fractal Tree Indexes

14

Fractal Tree Indexes – “find 25”

22

10 99

2, 3, 4 10 22, 25 99

insert(15)

insert(20) insert(25)

delete(3)

Page 15: MongoDB and Fractal Tree Indexes

15

Fractal Tree Indexes – “insert 8”

22

10 99

2, 3, 4 10 22, 25 99

insert(15)

Buffer is full, push messages down to next level.

insert(20) insert(25)

delete(3)

Page 16: MongoDB and Fractal Tree Indexes

16

Fractal Tree Indexes – “insert 8”

22

10 99

2, 4, 8 10, 20, 25 22, 25 99

insert(15)

Inserted 8, 20, 25. Deleted 3.

Page 17: MongoDB and Fractal Tree Indexes

17

Fractal Tree Indexes – compression

•  Large node size (4MB) leads to high compression ratios.

•  Supports zlib, quicklz, and lzma compression algorithms.

•  Compression is generally 5x to 25x, similar to what gzip and 7z can do to your data.

•  Significantly less disk space needed •  Less writes, bigger writes •  Both of which are great for SSDs

•  Reads are highly compressed, more data per IO

Page 18: MongoDB and Fractal Tree Indexes

18

So what does this have to do with MongoDB?

Page 19: MongoDB and Fractal Tree Indexes

19

So what does this have to do with MongoDB?

* Watch Tyler Brock’s presentation “Indexing and Query Optimization”

Page 20: MongoDB and Fractal Tree Indexes

20

MongoDB Storage

25

10 99

(2,ptr2), (4,ptr4)

(10,ptr10) (25,ptr25), (98,ptr98)

(101,ptr101)

85

40 120

(2,ptr10), (35,ptr101)

(55,ptr4) (90,ptr2) (2599,ptr98)

db.test.insert({foo:55}) db.test.ensureIndex({foo:1})

PK index (_id + pointer) Secondary Index (foo + pointer)

The “pointer” tells MongoDB where to look in the data files for the actual document data.

Page 21: MongoDB and Fractal Tree Indexes

21

MongoDB Storage

25

10 99

(2,ptr2), (4,ptr4)

(10,ptr10) (25,ptr25), (98,ptr98)

(101,ptr101)

85

40 120

(2,ptr10), (35,ptr101)

(55,ptr4) (90,ptr2) (2599,ptr98)

B-trees

Page 22: MongoDB and Fractal Tree Indexes

22

•  Tokutek’s Fractal Tree Index Implementations •  MySQL Storage Engine (TokuDB) •  BerkeleyDB API •  File System (TokuFS)

•  Recently added Fractal Tree Indexes to MongoDB 2.2

•  Existing indexes are still supported •  Source changes are available via our blog at

www.tokutek.com/tokuview •  This is a work in progress (see roadmap

slides)

Who is Tokutek and what have we done?

Page 23: MongoDB and Fractal Tree Indexes

23

as simple as

db.test.ensureIndex({foo:1}, {v:2})

MongoDB and Fractal Tree Indexes

Page 24: MongoDB and Fractal Tree Indexes

24

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Node size, defaults to 4MB.

Indexing Options #1

Page 25: MongoDB and Fractal Tree Indexes

25

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Basement node size, defaults to 128K. •  Smallest retrievable unit of a leaf node,

efficient point queries

Indexing Options #2

Page 26: MongoDB and Fractal Tree Indexes

26

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Compression algorithm, defaults to quicklz. •  Supports quicklz, lzma, zlib, and none. •  LZMA provides 40% additional compression

beyond quicklz, needs more CPU. •  Decompression is of quicklz and lzma are

similar.

Indexing Options #3

Page 27: MongoDB and Fractal Tree Indexes

27

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Clustering indexes store data by key and

include the entire document as the payload (rather than a pointer to the document)

•  Always “cover” a query, no need to retrieve the document data

Indexing Options #4

Page 28: MongoDB and Fractal Tree Indexes

28

How well does it perform?

Three Benchmarks •  Benchmark 1 : Raw insertion performance •  Benchmark 2 : Insertion plus queries •  Benchmark 3 : Covered indexes vs. clustering

indexes

Page 29: MongoDB and Fractal Tree Indexes

29

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank

Page 30: MongoDB and Fractal Tree Indexes

30

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.”

Page 31: MongoDB and Fractal Tree Indexes

31

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.” Understand benchmark specifics and review all results.

Page 32: MongoDB and Fractal Tree Indexes

32

Benchmark 1 : Overview

•  Measure single threaded insertion performance •  Document is URI (character), name (character),

origin (character), creation date (timestamp), and expiration date (timestamp)

•  Secondary indexes on URI, name, origin, expiration •  Machine specifics: – Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek

Controller (256MB, write-back), 4x10K SAS/RAID 0 – Ubuntu 10.04 Server (64-bit), ext4 filesystem – MongoDB v2.2.RC0

Page 33: MongoDB and Fractal Tree Indexes

33

Benchmark 1 : Without Journaling

Page 34: MongoDB and Fractal Tree Indexes

34

Benchmark 1 : With Journaling

Page 35: MongoDB and Fractal Tree Indexes

35

Benchmark 1 : Observations

•  Fractal Tree Indexing insertion performance is 8x better than standard MongoDB indexing with journaling, and 11x without journaling

•  Fractal Tree Indexing insertion performance reaches steady state, even at 200 million insertions. MongoDB insertion performance seems to be in continual decline at only 50 million insertions

•  B-tree performance is great until the working data set > RAM

Page 36: MongoDB and Fractal Tree Indexes

36

Benchmark 2 : Overview

•  Measure single threaded insertion performance while querying for 1000 documents with a URI greater than or equal to a randomly selected value once every 60 seconds

•  Document is same as benchmark 1 •  Secondary indexes on URI, name, origin, expiration •  Fractal Tree Index on URI is clustering – clustering indexes store entire document inline – Compression controls disk usage – no need to get document data from elsewhere –  db.tokubench.ensureIndex({URI:1}, {v:2, clustering:true})

•  Same hardware as benchmark 1

Page 37: MongoDB and Fractal Tree Indexes

37

Benchmark 2 : Insertion Performance

Page 38: MongoDB and Fractal Tree Indexes

38

Benchmark 2 : Query Latency

Page 39: MongoDB and Fractal Tree Indexes

39

Benchmark 2 : Observations

•  Fractal Tree Indexing insertion performance is 10x better than standard MongoDB indexing

•  Fractal Tree Indexing query latency is 268x better than standard MongoDB indexing

•  B-tree performance is great until the working data set > RAM

•  Random lookups are bad

...but what about MongoDB’s covered indexes?

Page 40: MongoDB and Fractal Tree Indexes

40

Benchmark 3 : Overview

•  Same workload and hardware as benchmark 2 •  Create a MongoDB covered index on URI to

eliminate lookups in the data files. –  db.tokubench.ensureIndex({URI:1,creation:1,name:1,origin:1})

Page 41: MongoDB and Fractal Tree Indexes

41

Benchmark 3 : Insertion Performance

Page 42: MongoDB and Fractal Tree Indexes

42

Benchmark 3 : Query Latency

Page 43: MongoDB and Fractal Tree Indexes

43

Benchmark 3 : Observations

•  Fractal Tree Indexing insertion performance is still 3.7x better than standard MongoDB indexing

•  Fractal Tree Indexing query latency is 3.2x better than standard MongoDB indexing (although the MongoDB performance is highly variable)

•  B-tree performance is great until the working data set > RAM

•  MongoDB’s covered indexes can help a lot – But what happens when I add new fields to my

document? o Do I drop and re-create by including my new field? o Do I live without it?

– Clustered Fractal Tree Indexes keep on covering your queries!

Page 44: MongoDB and Fractal Tree Indexes

44

Roadmap : Continuing the Implementation

•  Optimize Indexing Insert/Update/Delete Operations – Each of our secondary indexes is currently creating and

committing a transaction for each operation – A single transaction envelope will improve performance

Page 45: MongoDB and Fractal Tree Indexes

45

Roadmap : Continuing the Implementation

•  Add Support for Parallel Array Indexes – MongoDB does not support indexing the following two

fields: o {a: [1, 2], b: [1, 2]}

– “it could get out of hand” – Ticketed on 3/24/2010,

jira.mongodb.org/browse/SERVER-826 – Benchmark coming soon…

Page 46: MongoDB and Fractal Tree Indexes

46

Roadmap : Continuing the Implementation

•  Add Crash Safety – Our implementation is not [yet] crash safe with the

MongoDB PK/heap storage mechanism. – MongoDB journal is separate from Fractal Tree Index

logs. – Need to create a transactional envelope around both of

them

Page 47: MongoDB and Fractal Tree Indexes

47

Roadmap : Continuing the Implementation

•  Replace MongoDB data store and PK index – A clustering index on _id eliminates the need for two

storage systems – Compression greatly reduces disk footprint – This is a large task

Page 48: MongoDB and Fractal Tree Indexes

48

We are looking for evaluators!

Email me at [email protected]

See me after the presentation

Page 49: MongoDB and Fractal Tree Indexes

49

Questions?

Tim Callaghan [email protected]

@tmcallaghan

More detailed benchmark information in my blogs at

www.tokutek.com/tokuview