Post on 14-Oct-2020
MongoDB Revs You Up: What Storage Engine is Right for You?
Jon Tobin, Director of Solution Eng. ---------------------
Jon.Tobin@percona.com @jontobs
Linkedin.com/in/jonathanetobin
www.percona.com
Agenda
• How did we get here?
• What storage engines are available?
• Why does the data structure ma8er?
• What makes them unique?
• Where should I start my evalua<on?
• How can I evaluate the engines?
First: Background
www.percona.com
Let’s Level Set
{ “_id” : ObjectId(“507f1f77bcf86cd799439011”), “studentID” : 100, “firstName” : “Jonathan”, “middleName” : “Eli”, “lastName” : “Tobin”, “classes” : [ { “courseID” : “PHY101”, “grade” : “B”, “courseName” : “Physics 101” “credits” : 3 …
www.percona.com
MongoDB History
• MMAP rules the universe; concurrency suffers • Per mongod lock
• v1.2.0 – December 10th 2009 • v2.0.7 – August 9th 2012
• Per database lock • v2.2.0 – August 29th 2012 • v2.6.8 – August 25th 2015
• Concurrency!! • Per document lock
• MongoDB, Inc acquires wiredTiger – December 16th, 2014 • v3.0 – March 3rd 2015
• First implementa<on of a storage engine API
Storage Engines & Data Structures
www.percona.com
MongoDB Storage Engines
MongoDB, Inc. & Percona Server for MongoDB • MMAP • wiredTiger
MongoDB Enterprise Advanced Only • In Memory • Encrypted (wiredTiger)
Percona Server for MongoDB • PerconaFT • RocksDB
www.percona.com
B-‐tree Overview
www.percona.com
B-‐tree Insert
Pivot Rule >=
www.percona.com
B-‐tree Search
www.percona.com
B-‐tree -‐ Importance of I/O
15 hours VS 91 hours AWS – Insert 200M Rows – Predictable I/O Response VS Not
6x
www.percona.com
What’s the Problem?
Performance is I/O limited when data is > RAM Each insert/update requires at least 1 I/O
plus an I/O for every extra index
www.percona.com
What’s Up With: MMAP
Overview • Very basic “storage engine” • Collec<on level lock • Highly reliant on the OS for caching • Uses b-‐tree indexes to point to disk offset
• At the offset is the “record” • In the record is the document
Best Use • In place updates
• Record migra<on should be minimized • $inc, $set, etc
• Read only*
www.percona.com
What’s Up With: MMAP
Problems • Record alloca<on is fixed size
• Space inefficient (powerof2) • What if document grows bigger than record?
Probably not for you. Going the “way of the dodo”
www.percona.com
What’s Up With: wiredTiger
Overview • Concurrency: Document level • Supports mul<ple data structures
• B-‐tree (v3.0 +) • LSM tree (v3.2 +)
• Controls cache
Best Use • Depends on data structure
• B-‐tree: reads (point or small range) / dataset close to cache • LSM: random updates
Promising but s9ll a bit of a “black box”
www.percona.com
What’s Up With: RocksDB
Overview • Wri8en & maintained by Facebook • Cut it’s teeth @ Parse • Data Structure = LSM Trees • Uses Google’s LevelDB API • Space efficient + compression • Excellent core scaling
Best Use • Point queries • Updates • Easy incremental backups Has very advanced func9onality. Lots of poten9al
www.percona.com
What’s Up With: LSMs
memtbl
Level 0
Level 1
Level 2
Level 3
Level 4
• Writes go to memTable + journal • Memtable fills up and overflows (flush) to file(s) • Files are read only • Acts like layers of logs • Files are eventually merged and old files are marked for deletion • Files are like small structured trees
www.percona.com
What’s Up With: LSMs – Range Ops
memtbl
Level 0
Level 1
Level 2
Level 3
Level 4
• Range scans are tough • Each file is it’s own tree • No good way to tell if data lies in any file • Read amplification is H-I-G-H
RANGE SCAN
www.percona.com
What’s Up With: LSMs – Point Ops
memtbl
Level 0
Level 1
Level 2
Level 3
Level 4
• Point operations are tough too • However, Bloom filters work well • Filter determines if the required info exists in a set • Can have false positives
www.percona.com
Fractal Tree Indexes
www.percona.com
Fractal -‐ Insert
www.percona.com
Fractal – Message InjecOon
www.percona.com
What’s Up With: PerconaFT
Overview • Developed by MIT, SUNY Stony Brook & Rutgers • Concurrency: Document level • Unique data structure
• Fractal Tree • Controls cache
• Compresses well (quicklz, zlib, lzma) Best Uses • Best compression
• CPU efficient (rela<vely) • Sequen<al workloads S9ll developing as a pluggable engine. Needs to learn API
Benchmarks
Disclaimer: They’re just benchmarks. It’s all made up. (like economics & meteorology)
www.percona.com
Insert Workload
collec<ons = 8 database name = sbtest writer threads = 16 documents per collec<on = 10,000,000 feedback seconds = 20 auto commit = N run seconds = 1200 oltp range size = 100 oltp point selects = 0 oltp simple ranges = 0 oltp sum ranges = 0 oltp order ranges = 0 oltp disOnct ranges = 0 oltp index updates = 0 oltp non index updates = 0 oltp inserts = 20
Applies to all benchmarks in this presentation
www.percona.com
What’s Up With: Writes
0
100
200
300
400
500
600
700
800
20
60
100
140
180
220
260
300
340
380
420
460
500
540
580
620
660
700
740
780
820
860
900
940
980
1020
1060
1100
1140
1180
TPS
Elapsed Seconds
Mongo Engines -‐ Write TPS
PerconaFT
wiredTiger
RocksDB
www.percona.com
Read Workload
run seconds = 1200 oltp range size = 100 oltp point selects = 10 oltp simple ranges = 1 oltp sum ranges = 1 oltp order ranges = 1 oltp disOnct ranges = 1 oltp index updates = 0 oltp non index updates = 0 oltp inserts = 0
www.percona.com
What’s Up With: Reads
0
200
400
600
800
1000
1200
1400
1600 20
60
100
140
180
220
260
300
340
380
420
460
500
540
580
620
660
700
740
780
820
860
900
940
980
1020
1060
1100
1140
1180
Axis Title
Axis Title
Mongo Engines -‐ Read TPS
PerconaFT
wiredTiger
RocksDB
www.percona.com
Update Workload
run seconds = 1200 oltp range size = 100 oltp point selects = 0 oltp simple ranges = 0 oltp sum ranges = 0 oltp order ranges = 0 oltp disOnct ranges = 0 oltp index updates = 50 oltp non index updates = 5 oltp inserts = 0
www.percona.com
What’s Up With: Updates
0
50
100
150
200
250
300
350
400
450
500 20
60
100
140
180
220
260
300
340
380
420
460
500
540
580
620
660
700
740
780
820
860
900
940
980
1020
1060
1100
1140
1180
TPS
Elapsed Seconds
Mongo Engines -‐ Updates
PerconaFT
wiredTiger
RocksDB
www.percona.com
Mixed Workload
run seconds = 1200 oltp range size = 100 oltp point selects = 10 oltp simple ranges = 1 oltp sum ranges = 1 oltp order ranges = 1 oltp disOnct ranges = 1 oltp index updates = 50 oltp non index updates = 5 oltp inserts = 10
www.percona.com
What’s Up With: Mixed Workloads
0
50
100
150
200
250 20
60
100
140
180
220
260
300
340
380
420
460
500
540
580
620
660
700
740
780
820
860
900
940
980
1020
1060
1100
1140
1180
TPS
Elapsed Seconds
Mongo Engines -‐ Mixed
PerconaFT
wiredTiger
RocksDB
www.percona.com
EvaluaOon Resources
• Flashback – replay Mongo opera<ons in real <me or as fast as possible with your workload
• Benchrun – javascript benchmark harness in MongoDB. Cut out driver problems
• Sysbench & iiBench for Mongo • Yahoo Cloud Services Benchmark • Mongo-‐perf
*Whenever possible, run with YOUR workload, or a workload that accurately simulates yours.
www.percona.com www.percona.com
Percona Live Data Performance Conference
• April 18-‐21 in Santa Clara, CA at the Santa Clara Conven<on Center
• Register with code “WebinarPL” to receive 15% off at registra<on
• MongoDB, MySQL, NoSQL, Data in the Cloud
www.perconalive.com