Production MongoDB in the Cloud
-
Upload
bridgetkromhout -
Category
Technology
-
view
457 -
download
2
description
Transcript of Production MongoDB in the Cloud
Production MongoDB
in the CloudFrom Essentials to Corner Cases
Who are we?
Mike Hobbs & Bridget Kromhout
Social Commerce&
Brand Interest Graph Analytics
Why MongoDB?
● Scalable, high-performance, open source● Dynamic schemas for unstructured data● Query language close to SQL in power● "Eventually consistent" is hard to program right
Our configuration12-node cluster (4 shards x 3 replica sets)Several other non-sharded replica setsDesired webapp response time is < 10ms
Total data size: 110 GBTotal index size: 28 GBLargest collection: 49 GBLargest index: 8.1 GB
EC2: EBS, instance size, replicationMongoDB: right for only some data sets
Memory & iowait
Working set needs to fit in memory
● Indexes● Frequently accessed records
Avoid swapping!!!EBS latency in EC2 is an issue.
FragmentationFragmentation steals from your most precious resource by reserving memory that is not used.
Run a compaction when your storageSize significantly exceeds your data sizemongos> db.widgets.stats()
..."size" : 5097988,"storageSize" : 22507520,
Padding can reduce fragmentation and I/Odb.widgets.insert({widg_id: "72120", padding: "XXXX...XXX"})db.widgets.update({widg_id: "72120"}, { $unset: {padding: ""}, $set: {desc: "Grout remover", price: "13.39", instock: true} })
Replica sets
"optime" : { "t" : 1365165841000 , "i" : 1 }, "optimeDate" : { "$date" : "Fri Apr 5 07:44:01 2013" },
test-3-1.yourdomain
test-3-2.yourdomain
test-3-3.yourdomain
test-3-1.yourdomain
test
Elections
08:52:06 [rsMgr] can't see a majority of the set, relinquishing primary 08:52:06 [rsMgr] replSet relinquishing primary state 08:52:06 [rsMgr] replSet SECONDARY 08:52:12 [rsMgr] replSet can't see a majority, will not try to elect self
Primary always determined by an election.
2-member replSet without an arbiter: if the secondary goes offline, the primary will step down:
Priorities can rig elections.
Ensure availability of an odd number of voting members.
Manual primary changes
No "become primary now" command. Manual stepdowns with recusal timeout are best option.
test-1:PRIMARY> rs.stepDown(300)Wed Apr 3 11:45:36 DBClientCursor::init call() failedWed Apr 3 11:45:36 query failed : admin.$cmd { replSetStepDown: 300.0 } to: 127.0.0.1:27017Wed Apr 3 11:45:36 Error: error doing query: failed src/mongo/shell/collection.js:155Wed Apr 3 11:45:36 trying reconnect to 127.0.0.1:27017Wed Apr 3 11:45:36 reconnect 127.0.0.1:27017 oktest-1:SECONDARY>
This triggers an election.
(Obviously, make sure your preferred candidate(s) can win.)
States: down (initializing), startup2, secondary, primary
replSet back to standalone? No. Test server: replicaset of 1, shard of 1. removed --replSet but shard configuration needed manual update:db.shards.update({host:"testreplset/test.domain.net"}, {$set:{host:"test.domain.net"}})
UpdatedExisting values no longer returned by mongos, butvisible when connected to mongod:> db.schedule.update({_id:...}, {$set:{lock:true}}, false, true); db.runCommand("getlasterror"){ "updatedExisting" : true, "n" : 1, "connectionId" : 73, "err" : null, "ok" : 1}
Solution: re-adding --replSet to the mongod startup line and reverting shard configs. (Bug open with 10gen.)
ShardingCan increase parallelization of CPU & I/OCarefully choose a shard key (nontrivial to change)Must run config servers & mongosDoesn't ensure high availabilityDoesn't help if you're already out of memory
256GB collection max for initial sharding
Rebalancing data across shardsQueries block while servers negotiate final hand-off.
Updating indexes after hand-off can be slow.
Best run off-peakmongos> use configswitched to db configmongos> db.settings.find(){ "_id" : "balancer", "activeWindow" : { "start" : "23:00", "stop" : "6:00" }}
Mongos & replSet primary changesApplication-level errors talking to mongos after an election:
pymongo.errors.AutoReconnect: could not connect to localhost:27020: [Errno 111] Connection refusedpymongo.errors.OperationFailure: database error: error querying server
Mongos errors talking to mongod on original primary:
Tue Apr 2 09:01:05 [conn3288] Socket say send() errno:110 Connection timed out 10.141.131.214:27017Tue Apr 2 09:01:05 [conn3288] DBException in process: socket exception [SEND_ERROR] for 10.141.131.214:27017
Connection pool checked lazily; invalid connections can persist for days, depending on load. Can clear manually:mongos> db.adminCommand({connPoolSync:1});{ "ok" : 1 }mongos>
Failure handlingApplications must handle fail-over outages:AutoReconnect & OperationFailure in pymongo
def auto_reconnect(func, *args, **kwargs):""" Executes func, retrying on AutoReconnect """for _ in range(100):
try:return func(self, *args, **kwargs)
except pymongo.errors.AutoReconnect:pass
except pymongo.errors.OperationFailure:pass
time.sleep(0.1)raise TimeoutError()
MMS (MongoDB Monitoring Service)● free; hosted by 10gen● need to run agent locally● 10gen's commercial support relies on MMS
Profiling queries [1]Finding bad queries that are actively running:$ mongo | tee mongo.log> db.currentOp()...bye$ grep numYields mongo.log
"numYields" : 0,"numYields" : 62247,"numYields" : 0,...
# Use your favorite viewer to find the op with 62247 yields
Helpful to get server back to a responsive state:$ mongo> db.killOp(10883898)
Profiling queries [2]Using nscanned to find queries that likely aren't using indexes:$ grep -P 'nscanned:\d\d' /var/log/mongodb.log
... or in real-time:$ tail -f /var/log/mongodb.log | grep -P 'nscanned:\d\d'
MongoDB also provides the setProfilingLevel() command which can log all queries to system.profile collection. > db.system.profile.find({nscanned:{$gte:10}})
system.profile does incur some performance overhead, though.
Nagios● plugin uses pymongo● set up service groups
Ideas for the future
● Better reconnect handling in applications● Lose the EBS? Ephemeral disk faster; rely
on replication to keep data persistent.● Intelligent use of mongo profiling (reduce
observer effect of setProfilingLevel)● Use more MMS alerts● Going to 2.4.x (fast counts, hashed
sharding)