Understanding and tuning WiredTiger, the new high performance database engine in MongoDB / Henrik...
-
Upload
ontico -
Category
Engineering
-
view
1.616 -
download
10
Transcript of Understanding and tuning WiredTiger, the new high performance database engine in MongoDB / Henrik...
Understanding and tuning WiredTigerthe new high performance database engine in MongoDB
Henrik IngoSolutions Architect, MongoDB
Agenda:
- MongoDB and NoSQL - Storage Engine API - WiredTiger configuration + performance
3
Most popular NoSQL database
4
5 NoSQL categories
Key Value Wide Column Document
Graph Map Reduce
Redis, Riak Cassandra
Neo4j Hadoop
5
MongoDB is a Document Database
MongoDBRich Queries
• Find Paul’s cars• Find everybody in London with a car
built between 1970 and 1980
Geospatial • Find all of the car owners within 5km of Trafalgar Sq.
Text Search • Find all the cars described as having leather seats
Aggregation • Calculate the average value of Paul’s car collection
Map Reduce• What is the ownership pattern of colors
by geography over time? (is purple trending up in China?)
{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
6
Operational Database Landscape
MongoDB 3.0 & storage engines
8
MongoDB until 3.0
Read-heavy apps
• Great performance• B-tree• Low overhead
• Good scale-out perf• Secondary reads• Sharding
Write-heavy apps
• Good scale-out perf• Sharding
• Per-node efficiency wish-list:• Doc level locking• Write-optimized data
structures (LSM)• Compression
Other
• Multi statement transactions• In-memory engine• SSD optimized engine• etc...
9
Current state in MongoDB 2.6
Read-heavy apps
• Great performance• B-tree• Low overhead
• Good scale-out perf• Secondary reads• Sharding
Write-heavy apps
• Good scale-out perf• Sharding
• Per-node efficiency wish-list:• Doc level locking• Write-optimized data
structures (LSM)• Compression
Other
• Complex transactions• In-memory engine• SSD optimized engine• etc...
How to get all of the above?
10
MongoDB 3.0 Storage Engine API
MMAP
Read-heavy app
WiredTiger
Write-heavy app
3rd party
Special app
11
MMAP
Read-heavy app
WiredTiger
Write-heavy app
3rd party
Special app
• One at a time:– Many engines built into mongod– Choose 1 at startup– All data stored by the same engine– Incompatible on-disk data formats (obviously)– Compatible client API
• Compatible Oplog & Replication– Same replica set can mix different engines– No-downtime migration possible
MongoDB 3.0 Storage Engine API
12
• MMAPv1– Improved MMAP (collection-level locking)
• WiredTiger– Discussed next
• RocksDB– LSM style engine developed by Facebook– Based on LevelDB
• TokuMXse– Fractal Tree indexing engine from Percona
Some existing engines
13
• Heap– In-memory engine
• Devnull– Write all data to /dev/null– Based on idea from famous flash animation...
• SSD optimized engine (e.g. Fusion-IO)• KV simple key-value engine
Some rumored engines
https://github.com/mongodb/mongo/tree/master/src/mongo/db/storage
WiredTiger
15
• Modern NoSQL database engine– flexible schema
• Advanced database engine– Secondary indexes, MVCC, non-locking algorithms– Multi-statement transactions (not in MongoDB)
• Very modular, tunable– Btree, LSM and columnar indexes– Snappy, Zlib, 3rd-party compression– Index prefix compression, etc...– Encryption at rest
• Built by creators of BerkeleyDB• Acquired by MongoDB in 2014• source.wiredtiger.com, @WiredTigerInc
What is WiredTiger
16
Choosing WiredTiger at server startup
mongod --storageEngine wiredTiger
http://docs.mongodb.org/master/reference/program/mongod/#cmdoption--storageEngine
Default engine:MongoDB 3.0 = MMAP
MongoDB 3.2 = WiredTiger
17
Main tunables exposed as MongoDB options
mongod --storageEngine wiredTiger --wiredTigerCacheSizeGB 8 --wiredTigerDirectoryForIndexes /data/indexes --wiredTigerCollectionBlockCompressor zlib --dbpath /data/datafiles
http://docs.mongodb.org/master/reference/program/mongod/#cmdoption--storageEngine
18
All WiredTiger options via configString (hidden)
mongod --storageEngine wiredTiger --wiredTigerEngineConfigString "cache_size=8GB,eviction=(threads_min=4,threads_max=8), checkpoint(wait=30)"
--wiredTigerCollectionConfigString "block_compressor=zlib"
--wiredTigerIndexConfigString "type=lsm,block_compressor=zlib" --wiredTigerDirectoryForIndexes /data/indexes
See docs for wiredtiger_open() & WT_SESSION::create()http://source.wiredtiger.com/2.5.0/group__wt.html#ga9e6adae3fc6964ef837a62795c7840edhttp://source.wiredtiger.com/2.5.0/struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb
19
Also via createCollection(), createIndex()
db.createCollection( "users", { storageEngine: { wiredTiger: { configString: "block_compressor=none" } } )
http://docs.mongodb.org/master/reference/method/db.createCollection/#db.createCollectionhttp://docs.mongodb.org/master/reference/method/db.collection.createIndex/#db.collection.createIndex
20
• db.serverStatus()• db.collection.stats()
More...
Understanding and OptimizingWiredTiger
22
Understanding WiredTiger architectureW
iredT
iger
SE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
23
Covering 90% of your optimization needsW
iredT
iger
SE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
Decompression time
Disk seek time
24
Strategy 1: fit working set in CacheW
iredT
iger
SE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
cache_size = 80%
25
Strategy 2: fit working set in OS Disk CacheW
iredT
iger
SE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
cache_size = 10%
OS Disk Cache (Remaining: 90%)
26
Strategy 3: SSD disk + compression to save €W
iredT
iger
SE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical diskSSD
27
Strategy 4: SSD disk (no compression)W
iredT
iger
SE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical diskSSD
28
Compression benchmarks
29
What problem is solved by LSM indexes?P
erfo
rman
ce
Fast reads Fast writesBoth
Easy: Add indexes
Easy: No indexes
Hard: Smart schema design (hire a consultant) LSM index structures (or columnar)
30
2B inserts (with 3 secondary indexes)
http://smalldatum.blogspot.fi/2014/12/read-modify-write-optimized.html