20121024 mongodb-boston (1)

MongoDB and Fractal Tree® Indexes

Tim Callaghan*!VP/Engineering, Tokutek!

tim@tokutek.com!!!

MongoDB Boston 2012

* not [yet] a MongoDB expert

B-trees

B-tree Definition

In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches,

sequential access, insertions, and deletions in logarithmic time.

http://en.wikipedia.org/wiki/B-tree

B-tree Overview

I will use a simple single-pivot example throughout this presentation

Basic B-tree

Internal Nodes - Path to data

Leaf Nodes - Actual Data

Pointers

Pivots

B-tree example

2, 3, 4 10,20 22,25 99

* Pivot Rule is >=

B-tree - insert

2, 3, 4 10,15,20 22,25 99

“Insert 15”

Value stored in leaf node

B-tree - search

2, 3, 4 10,20 22,25 99

“Find 25”

B-tree - storage

2, 3, 4 10,20 22,25 99

Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes

B-tree – serial insertions

2, 3, 4 10,20 22,25 99

Serial insertion workloads are in-memory, think MongoDB’s “_id” index

Fractal Tree Indexes

similar to B-trees - store data in leaf nodes - use PK for ordering

message buffer

All internal nodes have message buffers

different than B-trees - message buffer in all internal nodes - doesn’t need to update leaf node immediately - much larger nodes (4MB vs. 8KB*)

Fractal Tree Indexes – “insert 15”

2, 3, 4 10, 20 22, 25 99

insert(15)

No IO is required, all internal nodes usually fit in RAM

Fractal Tree Indexes – “find 25”

2, 3, 4 10 22, 25 99

insert(15)

insert(20) insert(25)

delete(3)

2, 3, 4 10 22, 25 99

insert(15)

Buffer is full, push messages down to next level.

insert(20) insert(25)

delete(3)

2, 4, 8 10, 20, 25 22, 25 99

insert(15)

Inserted 8, 20, 25. Deleted 3.

Fractal Tree Indexes – compression

•  Large node size (4MB) leads to high compression ratios.

•  Supports zlib, quicklz, and lzma compression algorithms.

•  Compression is generally 5x to 25x, similar to what gzip and 7z can do to your data.

•  Significantly less disk space needed •  Less writes, bigger writes •  Both of which are great for SSDs

•  Reads are highly compressed, more data per IO

So what does this have to do with MongoDB?

* Watch Tyler Brock’s presentation “Indexing and Query Optimization”

MongoDB Storage

(2,ptr2), (4,ptr4)

(10,ptr10) (25,ptr25), (98,ptr98)

(101,ptr101)

40 120

(2,ptr10), (35,ptr101)

(55,ptr4) (90,ptr2) (2599,ptr98)

db.test.insert({foo:55}) db.test.ensureIndex({foo:1})

PK index (_id + pointer) Secondary Index (foo + pointer)

The “pointer” tells MongoDB where to look in the data files for the actual document data.

MongoDB Storage

(2,ptr2), (4,ptr4)

(10,ptr10) (25,ptr25), (98,ptr98)

(101,ptr101)

40 120

(2,ptr10), (35,ptr101)

(55,ptr4) (90,ptr2) (2599,ptr98)

B-trees

•  Tokutek’s Fractal Tree Index Implementations •  MySQL Storage Engine (TokuDB) •  BerkeleyDB API •  File System (TokuFS)

•  Recently added Fractal Tree Indexes to MongoDB 2.2

•  Existing indexes are still supported •  Source changes are available via our blog at

www.tokutek.com/tokuview •  This is a work in progress (see roadmap

slides)

Who is Tokutek and what have we done?

as simple as

db.test.ensureIndex({foo:1}, {v:2})

MongoDB and Fractal Tree Indexes

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Node size, defaults to 4MB.

Indexing Options #1

•  Basement node size, defaults to 128K. •  Smallest retrievable unit of a leaf node,

efficient point queries

Indexing Options #2

•  Compression algorithm, defaults to quicklz. •  Supports quicklz, lzma, zlib, and none. •  LZMA provides 40% additional compression

beyond quicklz, needs more CPU. •  Decompression is of quicklz and lzma are

similar.

Indexing Options #3

•  Clustering indexes store data by key and

include the entire document as the payload (rather than a pointer to the document)

•  Always “cover” a query, no need to retrieve the document data

Indexing Options #4

How well does it perform?

Three Benchmarks •  Benchmark 1 : Raw insertion performance •  Benchmark 2 : Insertion plus queries •  Benchmark 3 : Covered indexes vs. clustering

indexes

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.”

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.” Understand benchmark specifics and review all results.

Benchmark 1 : Overview

•  Measure single threaded insertion performance •  Document is URI (character), name (character),

origin (character), creation date (timestamp), and expiration date (timestamp)

•  Secondary indexes on URI, name, origin, expiration •  Machine specifics: – Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek

Controller (256MB, write-back), 4x10K SAS/RAID 0 – Ubuntu 10.04 Server (64-bit), ext4 filesystem – MongoDB v2.2.RC0

Benchmark 1 : Without Journaling

Benchmark 1 : With Journaling

Benchmark 1 : Observations

•  Fractal Tree Indexing insertion performance is 8x better than standard MongoDB indexing with journaling, and 11x without journaling

•  Fractal Tree Indexing insertion performance reaches steady state, even at 200 million insertions. MongoDB insertion performance seems to be in continual decline at only 50 million insertions

•  B-tree performance is great until the working data set > RAM

•  Measure single threaded insertion performance while querying for 1000 documents with a URI greater than or equal to a randomly selected value once every 60 seconds

•  Document is same as benchmark 1 •  Secondary indexes on URI, name, origin, expiration •  Fractal Tree Index on URI is clustering – clustering indexes store entire document inline – Compression controls disk usage – no need to get document data from elsewhere –  db.tokubench.ensureIndex({URI:1}, {v:2, clustering:true})

•  Same hardware as benchmark 1

Benchmark 2 : Insertion Performance

Benchmark 2 : Query Latency

•  Fractal Tree Indexing insertion performance is 10x better than standard MongoDB indexing

•  Fractal Tree Indexing query latency is 268x better than standard MongoDB indexing

•  Random lookups are bad

...but what about MongoDB’s covered indexes?

•  Same workload and hardware as benchmark 2 •  Create a MongoDB covered index on URI to

eliminate lookups in the data files. –  db.tokubench.ensureIndex({URI:1,creation:1,name:1,origin:1})

Benchmark 3 : Insertion Performance

Benchmark 3 : Query Latency

•  Fractal Tree Indexing insertion performance is still 3.7x better than standard MongoDB indexing

•  Fractal Tree Indexing query latency is 3.2x better than standard MongoDB indexing (although the MongoDB performance is highly variable)

•  MongoDB’s covered indexes can help a lot – But what happens when I add new fields to my

document? o Do I drop and re-create by including my new field? o Do I live without it?

– Clustered Fractal Tree Indexes keep on covering your queries!

Roadmap : Continuing the Implementation

•  Optimize Indexing Insert/Update/Delete Operations – Each of our secondary indexes is currently creating and

committing a transaction for each operation – A single transaction envelope will improve performance

•  Add Support for Parallel Array Indexes – MongoDB does not support indexing the following two

fields: o {a: [1, 2], b: [1, 2]}

– “it could get out of hand” – Ticketed on 3/24/2010,

jira.mongodb.org/browse/SERVER-826 – Benchmark coming soon…

•  Add Crash Safety – Our implementation is not [yet] crash safe with the

MongoDB PK/heap storage mechanism. – MongoDB journal is separate from Fractal Tree Index

logs. – Need to create a transactional envelope around both of

•  Replace MongoDB data store and PK index – A clustering index on _id eliminates the need for two

storage systems – Compression greatly reduces disk footprint – This is a large task

We are looking for evaluators!

Email me at tim@tokutek.com

See me after the presentation

Questions?

Tim Callaghan tim@tokutek.com

@tmcallaghan

More detailed benchmark information in my blogs at

www.tokutek.com/tokuview

20121024 mongodb-boston (1)

Documents

Transcript of 20121024 mongodb-boston (1)

MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins

Mongodb in-anger-boston-rb-2011

MongoDB€¦ · mongoDB یفغؼه مّص لوف mongoDB عص صْجْه نیُبفه مْؿ لوف mongoDB بث عبک عّغك معبِچ لوف بُْج ّ ؽغپ نجٌپ لوف

Using MongoDB for the Art Genome Project (Mongo Boston 2011)

Session objectives Big Data - MongoDB · PDF file03.08.2017 · Big Data - MongoDB Session objectives Big Data Overview NoSQL introduction MongoDB introduction MongoDB –Java Programming

Automate MongoDB with MongoDB Management Service (MMS)

Realtime Analytics with MongoDB - MongoDB Meetup NYC

Automate MongoDB with MongoDB Management Service

Introduction to MongoDB - nymph332088.github.io€¦ · Introduction to MongoDB ... MongoDB Analysis MongoDB Demo with Large-scale Data . ... Beyond the data modeling SQL language

A Morning with MongoDB Barcelona: MongoDB Basic Concepts

Morning with MongoDB Paris 2012 - MongoDB Basic Concepts

MongoDB Backups and Disaster Recovery - Austin MongoDB Meetup

Mark 20121024

Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6

MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey

MongoDB Days Germany: Data Processing with MongoDB

MongoDB Days Silicon Valley: MongoDB and IBM LinuxOne

mongodb training | mongodb online training | mongodb training and certification | mongodb course

Keio BS 20121024 Daiwa

MongoDB World 2016: MongoDB + Google Cloud

Keio BS　20121024　Daiwa