Managing a Maturing MongoDB Ecosystem
-
Upload
mongodb -
Category
Technology
-
view
1.213 -
download
2
description
Transcript of Managing a Maturing MongoDB Ecosystem
![Page 1: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/1.jpg)
Charity Majors@mipsytipsy
Thursday, June 20, 13
![Page 2: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/2.jpg)
Managing a maturing MongoDB ecosystem
Thursday, June 20, 13
![Page 3: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/3.jpg)
automating with chef
performance tuning
disaster recovery
Thursday, June 20, 13
![Page 4: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/4.jpg)
chef.
Thursday, June 20, 13
![Page 5: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/5.jpg)
Basic replica set
Thursday, June 20, 13
![Page 6: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/6.jpg)
How do I chef that?
... grab the AWS and mongodb cookbooks, create a site wrapper cookbook
Thursday, June 20, 13
![Page 7: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/7.jpg)
make a role for your cluster,
launch some nodes,
Thursday, June 20, 13
![Page 8: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/8.jpg)
initiate the replica set,
... and you’re done.
Thursday, June 20, 13
![Page 9: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/9.jpg)
Adding snapshots
Thursday, June 20, 13
![Page 10: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/10.jpg)
adding RAID for EBS volumes
Thursday, June 20, 13
![Page 11: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/11.jpg)
this will bootstrap a new node for the cluster from snapshots
with this role ...
Thursday, June 20, 13
![Page 12: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/12.jpg)
multiple clusters
distinct cluster name, backup host, backup volumes
Thursday, June 20, 13
![Page 13: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/13.jpg)
sharding
Thursday, June 20, 13
![Page 14: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/14.jpg)
assign a shard name per cluster, per role
treat them like ordinary replica sets
Thursday, June 20, 13
![Page 15: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/15.jpg)
Arbiters
• Mongod processes that do nothing but vote
• Highly reliable
• To provision an arbiter, use the LWRP
• Easy to run multiple arbiters on a single host
Thursday, June 20, 13
![Page 16: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/16.jpg)
arbiter LWRP
Thursday, June 20, 13
![Page 17: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/17.jpg)
replica set with arbiters
Thursday, June 20, 13
![Page 18: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/18.jpg)
run multiple arbiters on a single host:
Thursday, June 20, 13
![Page 19: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/19.jpg)
Managing votes with arbiters
Thursday, June 20, 13
![Page 20: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/20.jpg)
tuning and performance.
Thursday, June 20, 13
![Page 21: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/21.jpg)
resources and provisioning
tuning your filesystem
snapshotting and warmups
fragmentation
Thursday, June 20, 13
![Page 22: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/22.jpg)
Provisioning tips
• Memory is your primary scaling constraint
• Your working set must fit in to memory
• in 2.4, estimate with:
• Page faults? Your working set may not fit
Thursday, June 20, 13
![Page 23: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/23.jpg)
Disk options
• If you’re on Amazon:
• EBS
• Dedicated SSD
• Provisioned IOPS
• Ephemeral
• If not:
• use SSDs!
Thursday, June 20, 13
![Page 24: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/24.jpg)
EBS classic
EBS with PIOPS:
... just say no to EBS
Thursday, June 20, 13
![Page 25: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/25.jpg)
SSD (hi1.4xlarge)
• 8 cores
• 60 gigs RAM
• 2 1-TB SSD drives
• 120k random reads/sec
• 85k random writes/sec
• expensive! $2300/mo on demand
Thursday, June 20, 13
![Page 26: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/26.jpg)
PIOPS
• Up to 2000 IOPS/volume
• Up to 1024 GB/volume
• Variability of < 0.1%
• Costs double regular EBS
• Supports snapshots
• RAID together multiple volumes for more storage/performance
Thursday, June 20, 13
![Page 27: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/27.jpg)
• multiply that by 2-3x depending on your spikiness
Estimating PIOPS
• estimate how many IOPS to provision with the “tps” column of sar -d 1
Thursday, June 20, 13
![Page 28: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/28.jpg)
EphemeralStorage
• Cheap
• Fast
• No network latency
• No snapshot capability
• Data is lost forever if you stop or resize the instance
Thursday, June 20, 13
![Page 29: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/29.jpg)
Filesystem and limits
• Raise file descriptor limits
• Raise connection limits
• Mount with noatime and nodiratime
• Consider putting the journal on a separate volume
Thursday, June 20, 13
![Page 30: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/30.jpg)
Blockdev
• Your default blockdev is probably wrong
• Too large? you will underuse memory
• Too small? you will hit the disk too much
• Experiment.
Thursday, June 20, 13
![Page 31: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/31.jpg)
Snapshot best practices
• Set priority = 0
• Set hidden = 1
• Consider setting votes = 0
• Lock mongo or stop mongod before snapshot
• Consider running continuous compaction on snapshot node
Thursday, June 20, 13
![Page 32: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/32.jpg)
Restoring from snapshot
• EBS snapshot will lazily-load blocks from S3
• run “dd” on each of the data files to pull blocks down
• Always warm up a secondary before promoting
• warm up both indexes and data
• http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/
• in mongodb 2.2 and above you can use the touch command:
Thursday, June 20, 13
![Page 33: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/33.jpg)
Fragmentation
• Your RAM gets fragmented too!
• Leads to underuse of memory
• Deletes are not the only source of fragmentation
• Repair, compact, or resync regularly
Thursday, June 20, 13
![Page 34: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/34.jpg)
3 ways to fix fragmentation:
• Re-sync a secondary from scratch
• hard on your primary; rs.syncFrom() a secondary
• Repair a secondary
• can cause small discrepancies in your data
• Run continuous compaction on your snapshot node
• won’t reset padding factors
• not appropriate if you do lots of deletes
Thursday, June 20, 13
![Page 35: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/35.jpg)
Fragmentation is terrible
Thursday, June 20, 13
![Page 36: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/36.jpg)
Upgrade!
mongo is getting faster. :)
Thursday, June 20, 13
![Page 37: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/37.jpg)
disasters and recovery.
Thursday, June 20, 13
![Page 38: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/38.jpg)
Finding bad queries
• db.currentOp()
• mongodb.log
• profiling collection
Thursday, June 20, 13
![Page 39: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/39.jpg)
db.currentOp()
• Check the queue size
• Any indexes building?
• Sort by num_seconds
• Sort by num_yields, locktype
• Consider adding comments to your queries
• Run explain() on queries that are long-running
Thursday, June 20, 13
![Page 40: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/40.jpg)
mongodb.log
• Configure output with --slowms
• Look for high execution time, nscanned, ntoreturn
• See which queries are holding long locks
• Match connection ids to IPs
Thursday, June 20, 13
![Page 41: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/41.jpg)
system.profile collection
• Enable profiling with db.setProfiling()
• Does not persist through restarts
• Like mongodb.log, but queryable
• Writes to this collection incur some cost
• Use db.system.profile.find() to get slow queries for a certain collection, time range, execution time, etc
Thursday, June 20, 13
![Page 42: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/42.jpg)
• Know what your tipping point looks like
• Don’t switch your primary or restart
• Do kill queries before the tipping point
• Write your kill script before you need it
• Don’t kill internal mongo operations, only queries.
... when queries pile up ...
Thursday, June 20, 13
![Page 43: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/43.jpg)
can’t elect a master?
• Never run with an even number of votes (max 7)
• You need > 50% of votes to elect a primary
• Set your priority levels explicitly if you need warmup
• Consider delegating voting to arbiters
• Set snapshot nodes to be nonvoting if possible.
• Check your mongo log. Is something vetoing? Do they have an inconsistent view of the cluster state?
Thursday, June 20, 13
![Page 44: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/44.jpg)
secondaries crashing?
• Some rare mongo bugs will cause all secondaries to crash unrecoverably
• Never kill oplog tailers or other internal database operations, this can also trash secondaries
• Arbiters are more stable than secondaries, consider using them to form a quorum with your primary
Thursday, June 20, 13
![Page 45: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/45.jpg)
replication stops?
• Other rare bugs will stop replication or cause secondaries to exit without a corrupt op
• The correct way to fix this is to re-snapshot off the primary and rebuild your secondaries.
• However, you can sometimes *dangerously* repair a secondary:
1. stop mongo
2. bring it back up in standalone mode
3. repair the offending collection
4. restart mongo again as part of the replica set
Thursday, June 20, 13
![Page 46: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/46.jpg)
• Everything is getting vaguely slower?
• check your padding factor, try compaction
• You rs.remove() a node and get weird driver errors?
• always shut down mongod after removing from replica set
• Huge background flush spike?
• probably an EBS or disk problem
• You run out of connection limits?
• possibly a driver bug
• hard-coded to 80% of soft ulimit until 20k is reached.
Thursday, June 20, 13
![Page 47: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/47.jpg)
• It looks like all I/O stops for a while?
• check your mongodb.log for large newExtent warnings
• also make sure you aren’t reaching PIOPS limits
• You get weird driver errors after adding/removing/re-electing?
• some drivers have problems with this, you may have to restart
Thursday, June 20, 13
![Page 48: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/48.jpg)
Glossary of resources
• Opscode AWS cookbook
• https://github.com/opscode-cookbooks/aws
• edelight MongoDB cookbook
• https://github.com/edelight/chef-mongodb
• Parse MongoDB cookbook fork
• https://github.com/ParsePlatform/Ops/tree/master/chef/cookbooks/mongodb
• Parse compaction scripts and warmup scripts
• http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/
• http://blog.parse.com/2013/03/26/always-be-compacting/
Thursday, June 20, 13
![Page 49: Managing a Maturing MongoDB Ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b6ea8d4a7959ff2d8b461b/html5/thumbnails/49.jpg)
Charity Majors@mipsytipsy
Thursday, June 20, 13