A tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
-
Upload
hujak-hrvatska-udruga-java-korisnika-croatian-java-user-association -
Category
Technology
-
view
852 -
download
2
description
Transcript of Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
![Page 1: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/1.jpg)
MongoDBReplication
![Page 3: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/3.jpg)
MotivationAvailability & data safety
Read scalability
Helping backups
![Page 4: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/4.jpg)
Data migration
Delayed members
Oplog Tailing (Meteor. js)
https://meteorhacks.com/mongodb-oplog-and-meteor.html
![Page 5: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/5.jpg)
Basics
![Page 6: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/6.jpg)
TerminologyPrimary + Secondaries
Master + Slaves problematic — renamed
Arbiter
![Page 11: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/11.jpg)
Limits50 replica set members
12 before 2.7.8
7 voting members
![Page 12: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/12.jpg)
Example
![Page 13: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/13.jpg)
Single instance$ mkdir 1$ mongod --dbpath 1 --port 27001 --logpath log1$ mongo --port 27001> db.test.insert({ name: "Philipp", city: "Wien" })> db.test.find()
Stop instance
![Page 14: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/14.jpg)
Add replication$ mkdir 2$ mkdir 3$ mongod --replSet javantura --dbpath 1 --port 27001 --logpath log1 --oplogSize 20$ mongod --replSet javantura --dbpath 2 --port 27002 --logpath log2 --oplogSize 20$ mongod --replSet javantura --dbpath 3 --port 27003 --logpath log3 --oplogSize 20
![Page 15: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/15.jpg)
Connect
$ hostname$ mongo --port 27001> db.test.find()
![Page 16: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/16.jpg)
Configure replicationStart on the old instance, otherwise data lostrs.initiate()rs.status()rs.add("PK-MBP:27002")rs.add("PK-MBP:27003")rs.status()db.isMaster()db.test.find()db.test.insert({ name: "Peter", city: "Steyr" })db.test.find()
![Page 17: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/17.jpg)
Read from secondaries$ mongo --port 27002> db.test.find()> rs.slaveOk()> db.test.find()> db.test.insert({ name: "Dieter", city: "Graz" })
slaveOk only valid for the current connection
![Page 18: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/18.jpg)
FailoverKill primary with [Ctrl]+[C]Write to new primary> rs.status()> db.test.insert({ name: "Dieter", city: "Graz" })> db.test.find()
![Page 19: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/19.jpg)
Restart old primary$ mongod --replSet name --dbpath 1 --port 27001 --logpath log1 --oplogSize 20$ mongo --port 27001> rs.status()> rs.slaveOk()> db.test.find()
![Page 20: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/20.jpg)
Inner detailsCapped collection in oplog.rs of the local database> use local> show collectionsme 0.000MB / 0.008MBoplog.rs 0.000MB / 20.000MBreplset.minvalid 0.000MB / 0.008MBslaves 0.000MB / 0.008MBstartup_log 0.003MB / 10.000MBsystem.indexes 0.001MB / 0.008MBsystem.replset 0.000MB / 0.008MB
![Page 21: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/21.jpg)
Inner details> db.oplog.rs.find(){ "h": NumberLong("-265486071808715859"), "ns": "test.test", "o": { "_id": ObjectId("541a8ed285ea5f8ae059d530"), "name": "Dieter" "city": "Graz" }, "op": "i", "ts": Timestamp(1411026642, 1), "v": 2}...
![Page 22: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/22.jpg)
Election
![Page 23: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/23.jpg)
Heartbeat2s interval
10s until election
![Page 24: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/24.jpg)
Election rules1. Priority
2. Optime
3. Connections
![Page 25: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/25.jpg)
Prioritycfg = rs.conf()cfg.members[0].priority = 0cfg.members[1].priority = 1cfg.members[2].priority = 2rs.reconfig(cfg)
![Page 26: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/26.jpg)
Optime
![Page 27: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/27.jpg)
Connections
![Page 28: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/28.jpg)
ElectionCandidate node asks for a vote
Others can veto
![Page 29: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/29.jpg)
ElectionOne yes for one node within 30s
Majority yes elects a new primary
![Page 30: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/30.jpg)
![Page 31: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/31.jpg)
Issues
![Page 32: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/32.jpg)
CAPSelect Availability or Consistency
Partition-tolerance is a prerequisite for distributed systems
"The network is reliable":http://aphyr.com/posts/288-the-network-is-reliable
![Page 33: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/33.jpg)
RollbackOld primary rolls back unreplicated changes once it rejoins the replica set
![Page 34: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/34.jpg)
Rollback filerollback/ in data folder
File name: <database>.<collection>.
<timestamp>.bson
![Page 35: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/35.jpg)
Election timeAt times 5 to 7 minutes
http://www.tokutek.com/2014/07/explaining-ark-part-2-how-elections-and-failover-currently-work/
![Page 36: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/36.jpg)
Missing synchronization during election
Old primary sends last changes to a single node
If not new primary: rollback
![Page 37: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/37.jpg)
Remember
Replication is asynchronous
![Page 38: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/38.jpg)
Multiple primariesUnlikely but possible
Bugs: https://jira.mongodb.org/browse/SERVER-9765
Test script with no replies: https://groups.google.com/forum/#!topic/mongodb-dev/-mH6BOYyzeI
![Page 39: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/39.jpg)
Kyle Kingsbury @aphyr: Call Me Maybehttp://aphyr.com/tags/jepsen
PostgreSQL, Redis, MongoDB, Riak, Zookeeper, RabbitMQ, etcd + Consul,
ElasticSearch
![Page 40: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/40.jpg)
http://aphyr.com/posts/284-call-me-maybe-mongodb
05/2013 version 2.4
Up to 42% data lost
Data written to old primary: rollback
![Page 41: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/41.jpg)
![Page 42: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/42.jpg)
WriteConcernConfigure durability vs performance
https://github.com/mongodb/mongo-java-driver/blob/master/src/main/com/mongodb/WriteConcern. java
![Page 43: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/43.jpg)
WriteConcern. UNACKNOWLEDGED
w=0, j=0
Fire and forget
Default until 11/2012
![Page 44: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/44.jpg)
WriteConcern. ACKNOWLEDGED
w=1, j=0
Current default
Operation completed successfully in memory
![Page 45: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/45.jpg)
WriteConcern. JOURNALED
w=1, j=1
Operation written to the journal file
Since 1.8, single server durability
![Page 46: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/46.jpg)
WriteConcern.FSYNCEDw=1, fsync=true
Operation written to disk
![Page 47: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/47.jpg)
WriteConcern. REPLICA_ACKNOWLEDGED
w=2, j=0
Acknowledged by primary and at least one secondary
w is the server number
![Page 48: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/48.jpg)
WriteConcern. MAJORITY
w=majority, j=0
Acknowledgement by the majority of nodes
wtimeout recommended
![Page 49: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/49.jpg)
WriteConcern. MAJORITY
Nearly no data lost, but high overhead
![Page 50: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/50.jpg)
Write concern performancehttps://blog.serverdensity.com/mongodb-on-google-
compute-engine-tips-and-benchmarks/
3 x 1,000 inserts on GCE
Local 10GB system diskDedicated 200GB disk
Dedicated 200GB for data and journal
![Page 51: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/51.jpg)
n1-standard-2
![Page 52: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/52.jpg)
n1-highmem-8
![Page 54: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/54.jpg)
Backup Slides
![Page 55: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/55.jpg)
Oplog
![Page 56: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/56.jpg)
Replication via logsMongoDB: Operations log (Oplog)
MySQL: Binary log (Binlog)
![Page 57: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/57.jpg)
Naiv approach: Transmit original queryStatement Based Replication (SBR)DELETE FROM test.table WHERE quantity > 20 LIMIT 1
db.collection.remove({ quantity: { $gt: 20 }}, true)//justOne: true
![Page 58: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/58.jpg)
Unambiguous representation
Row-Based Replication (RBR): Oplog
![Page 59: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/59.jpg)
MongoDBAsynchronous replication
Secondaries can get the Oplog from...
their primary
another secondary with more recent data
![Page 60: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn](https://reader034.fdocuments.us/reader034/viewer/2022052623/559867d41a28ab9a738b459f/html5/thumbnails/60.jpg)
Oplog size32bit: 48MB
64bit OS X: 183MB
64bit *nix, Windows: 1GB to 50GB (5% free disk)