Confidential
MongoDB 101 (session one)
Art van Scheppingen
Senior Support Engineer - Severalnines AB
Confidential
Who am I?
☐ Senior Support Engineer at Severalnines AB☐ Worked with MySQL for over 16 years☐ Has been in a DBA environment for 6 years☐ Polyglot peristence proponent☐ Organizer of Polyglot Persistence meetups (Serveralnines)
☐ Amsterdam, Berlin, Paris, London, Stockholm and Dublin
Confidential
Who is Severalnines?
☐ Database automation / orchestration☐ Software to deploy, monitor, manage and scale☐ Support for MySQL (all flavours), MongoDB and PostgreSQL☐ Main product: ClusterControl
Confidential
Orchestration System: ClusterControl
☐ ClusterControl☐ http://severalnines.com/getting-started☐ Deploy☐ Monitor☐ Manage☐ Scale
☐ Community edition
Confidential
MongoDB support
Confidential
MongoDB support
Confidential
Logistics
Confidential
Morning / Afternoon sessions
☐ Morning session - Art van Scheppingen☐ Basics☐ Cluster/Schema Design patterns
☐ Afternoon session - David Murphy / Kim☐ Sharding☐ Engines☐ Common Operations issues
Confidential
Agenda for this morning
☐ Part 1: Basics (9:00 - 10:30)☐ MongoDB Primer☐ Running mongod☐ Mongo command basics☐ CRUD: create, read, update, delete
Confidential
Agenda for this morning
☐ Part 3: Essentials (11:00 - 12:00)☐ Aggregate☐ Import / export data☐ Backup / restore☐ Schema design patterns
Confidential
Prerequisites
☐ MongoDB community locally installed☐ https://www.mongodb.com/download-center
☐ Download the zip code data set:☐ http://media.mongodb.org/zips.json
☐ You have to know and understand JSON data structures
Confidential
Prerequisites for this afternoon
☐ Clone the following github repo:☐ https://github.com/dbmurphy/MongoDB32Labs
☐ If you are not running on Linux: run a VM with Linux☐ Recommended install for VMs:
☐ Install VirtualBox ☐ https://www.virtualbox.org/wiki/Downloads
☐ Download Fedora or Ubuntu☐ https://getfedora.org/en/cloud/download/
☐ https://cloud-images.ubuntu.com/xenial/current/
Confidential
Windows users
☐ Windows installations require the Windows powershell☐ Set path: $env:Path += ";c:\Program Files\MongoDB\Server\3.2\bin\"☐ The --fork parameter does not work in the Windows binary
☐ Solution: open multiple command / shell windows and leave MongoDB running in the foreground
Confidential
Basics: MongoDB
Confidential
What is MongoDB?
☐ Document data store☐ Not a key/value store!☐ Data stored in JSON
☐ Philosophy☐ Flexibility☐ Scalability☐ Geo distributed☐ Strong consistency
Confidential
What is MongoDB?
☐ Originally data was only stored as BSON on disk (MMAP)☐ BSON is binary JSON☐ MongoDB v3.0 allows other storage engines
☐ No-SQL?☐ Javascript based query language☐ Similar feature set in MongoDB queries
Confidential
What are MongoDB advantages?
☐ Doesn’t require a lot of memory☐ No preallocated buffer pools (except for WiredTiger)☐ Makes use of the filesystem cache to cache data☐ Indexes are loaded in memory
☐ Allows high levels of concurrency☐ Strong consistency☐ Easy for scaling reads
☐ Scale replicaSets
☐ Easy for scaling writes☐ Scale shards by adding replicaSets
Confidential
What are MongoDB disadvantages
☐ It is a different approach to problems☐ Different solutions
☐ Not ACID compliant☐ Atomicity only on collection (MMAP) or document (WiredTiger)☐ But there is transaction-like semantics
Confidential
Terminology
☐ Database☐ Contains collections
☐ Collections☐ Collection of documents (think of a table)
☐ Document☐ BSON document (may contain links to docs in other collections)☐ BSON is a binary representation of JSON☐ Document size is limited to 16MB (megabytes)
☐ Fields☐ The properties of a BSON object (think of columns)
Confidential
Basics: document example
{
"_id" : ObjectId("57e171765ffbf76ca639bd65"),
"foo" : "bar",
"counter" : NumberLong(10010101),
"array" : [
"one",
"two",
"three"
],
"subbranch" : {
"another" : "json",
"object" : "in here"
}
}
Confidential
Basics: fields
☐ Field names are strings☐ Field names may not start with the dollar sign character ($)
☐ Preserved for (matching) functions and operators
☐ Field names may not contain dot characters (.)☐ Dot notation is used to access arrays and embedded
documents
☐ Field names may not contain a null character☐ Field name “_id” is reserved
Confidential
Basics: accessing array and embedded fields
{
"_id" : ObjectId("57e171765ffbf76ca639bd65"),
"foo" : "bar",
"counter" : NumberLong(10010101),
"array" : [
"one",
"two",
"three"
],
"subbranch" : {
"another" : "json",
"object" : "in here"
}
}
"array.2"
"subbranch.object"
Confidential
Basics: single server
Confidential
Basics: Running mongod
☐ Running mongod in the foregroundmongodmongod --port <port> --host <hostname> --dbpath ~/data/db
☐ Running mongod in the background (fork)mongod --fork
Confidential
Basics: Checkpointing
☐ MongoDB checkpoints every 60 seconds (both MMAP and WiredTiger)
☐ Between checkpoints all modifications are written to the journal (every 100ms)
Confidential
Exercise 1: run mongod on your laptop
☐ Create the data directory and run mongod (Linux/Mac):mkdir plam101mongod --dbpath plam101 --logpath mongo-101.log --fork
☐ Create the data directory and run mongod (Windows):mkdir plam101mongod --dbpath plam101 --logpath mongo-101.log
Confidential
Exercise 1: connect to mongod
☐ Verify that mongod is running (Linux/MacOS):ps ax | grep mongodtail mongo-101.logmongo
☐ Verify that mongod is running (Windows):Get-Process mongod
Get-Content -Path mongo-101.log
mongo
Confidential
Exercise 1: directory layout
$ ls -la plam101/total 163848drwxr-xr-x 8 youruser staff 272 Sep 30 11:48 .drwxr-xr-x 5 youruser staff 170 Sep 30 11:48 ..drwxr-xr-x 2 youruser staff 68 Sep 30 11:48 _tmpdrwxr-xr-x 2 youruser staff 68 Sep 30 11:48 journal-rw------- 1 youruser staff 67108864 Sep 30 11:48 local.0-rw------- 1 youruser staff 16777216 Sep 30 11:48 local.ns-rw-r--r-- 1 youruser staff 0 Sep 30 11:48 mongod.lock-rw-r--r-- 1 youruser staff 69 Sep 30 11:48 storage.bson
Confidential
Creating databases
☐ MongoDB databases are created implicitly when changing to a non existent database and inserting data
> use new_database
switched to db new_database
> show databases
local 0.000GB
> db.somecollection.insert({"foo": "bar"}) WriteResult({ "nInserted" : 1 })
> show databases
local 0.000GB
new_database 0.000GB
Confidential
Creating databases: directory layout
$ ls -la plam101/total 163848drwxr-xr-x 8 youruser staff 272 Sep 30 11:48 .drwxr-xr-x 5 youruser staff 170 Sep 30 11:48 ..drwxr-xr-x 2 youruser staff 68 Sep 30 11:48 _tmpdrwxr-xr-x 2 youruser staff 68 Sep 30 11:48 journal-rw------- 1 youruser staff 67108864 Sep 30 11:48 local.0-rw------- 1 youruser staff 16777216 Sep 30 11:48 local.ns-rw-r--r-- 1 youruser staff 0 Sep 30 11:48 mongod.lock-rw-r--r-- 1 youruser staff 69 Sep 30 11:48 storage.bson-rw------- 1 youruser staff 67108864 Sep 30 11:52 new_database.0-rw------- 1 youruser staff 16777216 Sep 30 11:52 new_database.ns
Confidential
Dropping databases
☐ To drop a database you have to change to the database you want to drop, only then you can drop it
> use new_database
switched to db new_database
> db.dropDatabase()
{ "dropped" : "new_database", "ok" : 1 }
Confidential
Basics: replicaSets
Confidential
Basics: replicaSet
Confidential
Basics: replicaSet
☐ Replication is transported through the “oplog”☐ The oplog is a special collection in a replicaSet
☐ All transactions are stored in the oplog (except for local db)☐ Oplog resides inside the local database☐ Limited in size (on disk)☐ Sliding window of transactions☐ Purging of transactions happens via FIFO
☐ Oplog durability is one of the most important metrics☐ Select first and last transaction from the oplog: durability in sec.
☐ All nodes send/receive heartbeats
Confidential
Basics: initial sync
☐ Adding a new secondary:☐ Node gets added to the cluster
☐ Cluster will check how advanced the new secondary is (last executed transaction)
☐ If secondary is too far behind an initial sync is executed☐ Copy document by document
☐ Kickstarting / seeding☐ Make a full (binary) copy of the primary to the secondary☐ Add to the cluster☐ Only last transactions from the oplog will be sent
Confidential
Exercise 2: run a replicaSet
☐ Kill previous daemonkillall mongod
☐ Create the data directories and run mongod:mkdir plam101-rs1 plam101-rs2 plam101-rs3mongod --dbpath plam101-rs1 --logpath mongo-101-rs1.log --port 27001 --replSet myrs
--forkmongod --dbpath plam101-rs2 --logpath mongo-101-rs2.log --port 27002 --replSet myrs
--forkmongod --dbpath plam101-rs3 --logpath mongo-101-rs3.log --port 27003 --replSet myrs
--fork
Confidential
Exercise 2: initiate the replicaSet
☐ Connect to the first nodeMongoconnecting to: 127.0.0.1:27001/test>
☐ Initiate the replicaSet:> rs.initiate(){ "info2" : "no configuration explicitly specified -- making one", "me" : "<yourhost>.local:27001", "ok" : 1}myrs:SECONDARY>myrs:PRIMARY> rs.status()
Confidential
Exercise 2: add new members
☐ Connect to the first node$ mongomyrs:PRIMARY>
☐ Add the other members:myrs:PRIMARY> rs.add("<yourhost>.local:27002"){ "ok" : 1 }myrs:PRIMARY> rs.add("<yourhost>.local:27003"){ "ok" : 1 }
Confidential
Exercise 2: Watch the oplog grow
☐ Run rs.printReplicationInfo()myrs:PRIMARY> rs.printReplicationInfo()
☐ Insert some big rows: (should take a few minutes)myrs:PRIMARY> var doc = {"foo": "bar"}
myrs:PRIMARY> for (i = 0; i < 10000; i++) { doc["foo"] += i; db.inserttest.insert(doc);
}
WriteResult({ "nInserted" : 1 })
☐ Run rs.printReplicationInfo() and spot the differencemyrs:PRIMARY> rs.printReplicationInfo()
Confidential
Exercise 2: Maybe adjust the oplog size?
myrs:PRIMARY> rs.printReplicationInfo()
configured oplog size: 192MB
log length start to end: 183secs (0.05hrs)
oplog first event time: Tue Sep 27 2016 16:19:36 GMT+0200 (CEST)
oplog last event time: Tue Sep 27 2016 16:22:39 GMT+0200 (CEST)
now: Tue Sep 27 2016 16:25:22 GMT+0200 (CEST)
Confidential
Demo: oplog too small, full sync necessary
☐ Stop one secondary☐ Insert some big rows: (should take a few minutes)myrs:PRIMARY> var doc = {"foo": "bar"}
myrs:PRIMARY> for (i = 0; i < 10000; i++) { doc["foo"] += i; db.inserttest.insert(doc);
}
WriteResult({ "nInserted" : 1 })
☐ Start secondary again and watch log file
Confidential
Basics: High Availability
Confidential
Basics: node failure
Confidential
Basics: node failure
☐ After a primary is lost (heartbeat timeout) no write operations can happen
☐ Read operations can still happen☐ Remaining nodes start electing a new primary
Confidential
Basics: election voting
Confidential
Basics: election voting
☐ All remaining nodes vote for a new primary☐ Priority: higher values make a node more eligible to become a
primary☐ Votes: allows a node to vote for a new primary☐ Only nodes with priority and voting power can vote☐ You can set the priority (numeric) per node
☐ You can set the voting (on and off) per node
☐ Up to 7 nodes can vote
Confidential
Basics: election voting
Confidential
Basics: node recovery
Confidential
Basics: durability
☐ Replication happens, like MySQL replication, asynchronously☐ Eventual consistency
☐ Writeconcern☐ Wait for confirmation from secondary nodes☐ numeric, majority or <tag>
☐ Wait for write to journal
Confidential
Basics: durability
Confidential
Basics: durability
Confidential
Basics: understanding eventual consistency
Confidential
Basics: read from secondary
Confidential
Basics: eventual consistency and secondary
Confidential
Basics: arbiter node
Confidential
Basics: arbiter node
☐ The arbiter node will not store any data☐ The arbiter node will confirm writes☐ The arbiter node will take part in voting for a new primary
Confidential
Demo: node recovery on a replicaSet
Confidential
Demo: node recovery on a replicaSet
☐ 3 node cluster (node 3 is arbiter)☐ Data gets inserted on Primary (node 1)☐ Secondary (node 2) fails☐ Some time passes☐ Primary (node 1) fails and node 2 comes back up☐ Node 2 becomes primary☐ Inserting data into new primary (node 2)☐ Node 1 comes up again and realizes it is no longer primary☐ Fetches oplog from the new primary which is more advanced☐ Node 1 performs a rollback
Confidential
Basics: Sharding
Confidential
Basics: sharded cluster
Confidential
Basics: sharded cluster
Confidential
Demo: sharding
Confidential
Running commands in the mongo shell
☐ MongoDB built in commands☐ db.help() is your friend
☐ Collection built in commands☐ db.<collection>.help()
☐ For replicaSets and shards the helpers start with rs and sh☐ rs.help()☐ sh.help()
Confidential
Scripting in the mongo shell
☐ Mongo shell runs Javascript☐ Create variables, functions, etc☐ You can iterate over cursors
cursor = db.collection.find();
while ( cursor.hasNext() ) {
printjson( cursor.next() );
}
cursor = db.collection.find();
while ( cursor.hasNext() ) {
doc = cursor.next(); doc["newfield"] = "something"; db.collection.save(doc);
}
Confidential
Recap
☐ MongoDB basics☐ MongoDB terminology☐ replicaSet and replication☐ Durability and eventual consistency☐ Sharding
☐ Running MongoDB☐ Single server☐ ReplicaSet
Confidential
Basics: CRUD
Confidential
Create: inserting data
☐ As you have seen, inserting data is very simple:db.<collection>.insert(document, { [writeConcern], [ordered] })db.<collection>.insertOne(document, { [writeConcern] }) (3.2)db.<collection>.insertMany(document, { [writeConcern], [ordered] }) (3.2)
☐ Comparable to SQLINSERT INTO <collection> VALUES (document);
☐ Ordered defaults to true
Confidential
Create: inserting data
☐ Every document must have an identifier (_id)☐ If no identifier (_id) has been provided, one will be generated
☐ For example:> db.somecollection.insert({"foo": "bar"})
WriteResult({ "nInserted" : 1 })
> db.somecollection.insert({"_id": "1234","foo": "bar"})
WriteResult({ "nInserted" : 1 })
> db.somecollection.insert({"_id": "foobar","foo": "bar"})
WriteResult({ "nInserted" : 1 })
Confidential
Create: inserting multiple documents
> db.somecollection.insert( [ {"foo": "bar"}, {"_id": "1234","foo": "bar"}, {"_id":
"foobar","foo": "bar"} ] )
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 3,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
Confidential
Insert: insert durability (writeConcern)
☐ Similar to the insert method:db.<collection>.insert(document, { writeconcern: {w: <value>, j: <boolean>, wtimeout: <number>}})
☐ W option☐ Wait for confirmation from other nodes☐ Number, “majority” or <tag>
☐ J option☐ Setting to true will wait for the journal write
☐ Wtimeout option☐ Timeout for the writeconcern
Confidential
Insert: wait for journal write
Confidential
Insert: wait for other node to write
Confidential
Insert: insert durability
☐ Examples:> db.somecollection.insert({"foo": "bar"}, {writeconcern: { w: 1, j: false}} )
WriteResult({ "nInserted" : 1 })
> db.somecollection.insert({"foo": "bar"}, {writeconcern: { w: majority, j: true}} )
WriteResult({ "nInserted" : 1 })
> db.somecollection.insert({"foo": "bar"}, {writeconcern: { w: 2, j: true, wtimeout:
100}} )
WriteResult({ "nInserted" : 1 })
Confidential
Create: inserting multiple documents
☐ As you have seen, inserting data is very simple:db.<collection>.insert(document, [writeConcern], [ordered])
☐ Comparable to SQLINSERT INTO <collection> VALUES (document);
☐ Ordered defaults to true.☐ writeConcern will be explained in the afternoon session
Confidential
Create: exercise
1. Create a new collection named mytest in the test database by inserting the following JSON document:
{ "_id": 1, "name": "mytest1" }
2. Insert a second document in the mytest collection:{ "_id": 2, "name": "mytest2", "testdata": "test1234" }
3. Insert a couple of documents in the mytest collection:[ { "name": "mytest3", "testdata": "test1234" },{ "name": "mytest4", "testdata": "test1234" }{ "name": "mytest5", "testdata": "test1234" } ]
Confidential
Create: exercise answer
> use test
> db.mytest.insert({ "_id": 1, "name": "mytest1" })
WriteResult({ "nInserted" : 1 })
> db.mytest.insert({ "_id": 2, "name": "mytest2", "testdata": "test1234" })
WriteResult({ "nInserted" : 1 })
> db.mytest.insert( [ { "name": "mytest3", "testdata": "test1234" }, { "name":
"mytest4", "testdata": "test1234" }, { "name": "mytest5", "testdata": "test1234" } ] )
BulkWriteResult({
"nInserted" : 3,
...
})
Confidential
Read: finding your data
☐ The find command will retrieve your data as a cursordb.<collection>.find(query, projection)db.<collection>.findOne(query, projection)var cursor = db.<collection>.find(query, projection)
☐ Query: selection filter using query operators☐ Projection: fields to return from the document☐ SQL equivalent:SELECT projection FROM collection WHERE query
Confidential
Read: finding your data
☐ Example:> db.somecollection.find({"_id": "1234"}, {"_id": 1, "foo": 1})
{ "_id" : "1234", "foo" : "bar" }
> var cursor = db.somecollection.find({"_id": "1234"}, {"_id": 1, "foo": 1})
> while (cursor.hasNext()) { printjson(cursor.next()); }
{
"_id" : "1234",
"foo" : "bar"
}
Confidential
Read: query operators
☐ Basic query operators are $eq, $gt, $gte, $lt, $lte, $ne, $in, $nin
☐ Logical query operators are $and, $or, $not, $nor☐ Element (array) operators are $exists, $type☐ Other noteworthy operators are $regex, $text, $geoWithin
See also: https://docs.mongodb.com/manual/reference/operator/
Confidential
Read: finding your data
☐ Example: all documents with sale value less than 100> db.somecollection.find({"sale_value": {"$lt":"100"} })
{ "_id" : "1002", "sale_value" : 75 }
{ "_id" : "1004", "sale_value" : 52 }
{ "_id" : "1008", "sale_value" : 95 }
Confidential
Read: sort and limit
☐ You can sort a result by appending the query with the sort function:
db.somecollection.find().sort({"_id": 1})
db.somecollection.find().sort({"_id": -1})
db.somecollection.find().sort({"_id": 1, "foo": -1})
See also: https://docs.mongodb.com/manual/reference/method/cursor.sort/
☐ Limiting a result is done similarly:db.somecollection.find().limit(10)
See also: https://docs.mongodb.com/manual/reference/method/cursor.limit/
Confidential
Read: exercise
☐ Prior to this exercise, import the zipcodes data set:$ mongoimport -d test -c zipcodes zips.json
1. Find the first document2. Find the last document3. Find the zipcodes with a population greater than 100,000
Confidential
Read: exercise answers
1. Find the first document> db.zipcodes.findOne()
> db.zipcodes.find().limit(1)
{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338,
"state" : "MA" }
2. Find the last document> db.zipcodes.find().sort({"_id": -1}).limit(1)
{ "_id" : "99950", "city" : "KETCHIKAN", "loc" : [ -133.18479, 55.942471 ], "pop" : 422,
"state" : "AK" }
3. Find the zipcodes with a population greater than 100,000> db.zipcodes.find( { "pop": { "$gt": 100000}} )
Confidential
Update: update method
☐ The update command enables you to update one or many rows
db.<collection>.update(query, update, options)
☐ Query: selection filter using query operators (same as find)☐ Update: modification to apply☐ Options: upsert, multi, and writeConcern☐ SQL equivalent:UPDATE <collection> SET update WHERE query
Confidential
Update: update operators
☐ Most important update operators:☐ $set and $unset will update/remove the specified field(s)☐ $inc and $mul will operate on the value of the field☐ $rename will rename a field
See also: https://docs.mongodb.com/manual/reference/operator/update/
Confidential
Update: the update options
☐ Upsert☐ if document exists: update or else insert a new document
☐ Multi☐ By default only one document gets updated☐ Setting multi to true will update multiple documents at once
New in 3.2:db.<collection>.updateOne(query, update, options)db.<collection>.updateMany(query, update, options)db.<collection>.replaceOne(query, update, options)
Confidential
Update: update durability
☐ Similar to the insert method:db.<collection>.update(query, update, { w: <value>, j: <boolean>, wtimeout: <number>})
☐ W option☐ Number, “majority” or <tag>
☐ J option☐ Setting to true will wait for the journal write
☐ Wtimeout option☐ Timeout for the writeconcern
Confidential
Update: example update
☐ Example:db.somecollection.update( {"_id": ObjectId("57e171765ffbf76ca639bd65")}, { $set: { "foo": "barbar", "array.1": "four" }, $inc: {"counter": 2} })
Confidential
Update: example document update
☐ Example:db.somecollection.update( {"_id": ObjectId("57e171765ffbf76ca639bd65")}, { "replace": "all", "contents": ["with","this","new","document"] })
Confidential
Update: save method
☐ The save method performs either an insert or update command
db.<collection>.save(document, writeConcern)
☐ If no _id field has been provided an insert will be performed.☐ The _id field will be filled with an ObjectID
☐ If an _id field has been provided an update will happen☐ Update will be performed with upsert enabled
Confidential
Update: example save
☐ Example:db.somecollection.save({"foo": "bar"})db.somecollection.save({"_id", "1234","foo": "bar"})
Confidential
Update: example save complex
☐ Example:> var doc=db.somecollection.findOne()> doc{ "_id" : ObjectId("57e16f925ffbf76ca639bd64"), "foo" : "bar" }> doc["counter"]=00> db.somecollection.save(doc)WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })> db.somecollection.findOne()
{
"_id" : ObjectId("57e16f925ffbf76ca639bd64"),
"foo" : "bar",
"counter" : 0
}
Confidential
Update: exercise
1. Increase the population of zipcode 90210 (Beverly Hills) by 12. Iterate over the zipcodes collection and add a new field called
“votes” with a value of 0
Confidential
Update: exercise answers
1. Increase the population of zipcode 90210 (Beverly Hills) by 1> db.zipcodes.update( { "_id": "90210" }, { "$inc": {"pop": 1}})WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
2. Iterate over the zipcodes collection and add a new field called “votes” with a value of 0
> var cur = db.zipcodes.find()> while (cur.hasNext()) { var zip = cur.next(); zip["votes"] = 0; db.zipcodes.save(zip); }WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 0 })
Confidential
Delete: removing documents
☐ To remove documents you can use the following methods:db.<collection>.remove(query, options) db.<collection>.deleteOne(filter, options) //new in 3.2db.<collection>.deleteMany(filter, options) // new in 3.2
☐ Query and filter are basically the same☐ Pass empty document ( {} ) to match all documents
☐ Options are writeConcern and justOne (only for remove method)
Confidential
Recap
☐ Creating databases / collections☐ Dropping databases / collections☐ CRUD
☐ Creating data☐ Reading data☐ Updating data☐ Deleting data
Confidential
Break: 10:30 - 11:00
Next hour: basic management and design patterns
Confidential
Aggregations
Confidential
CRUD bonus: aggregate
☐ The aggregate method fills the gaps for the find method☐ Creates aggregates of matching documents☐ Aggregate allows multiple pipelines☐ Most important pipelines:
☐ $unwind: Unwinds arrays (e.g. like a JOIN)☐ $group: Groups documents together (like GROUP BY)
☐ $match: Only match certain documents. When used in a second pipeline it acts as a HAVING
☐ $sort: Sorts the result set (like ORDER BY)
Confidential
Aggregate: group accumulators
☐ Most important accumulators:☐ $sum☐ $avg☐ $min☐ $max☐ $push / $addToSet
Confidential
Aggregate: match / group example
> db.somecollection.find()
{ "_id" : ObjectId("57e16f925ffbf76ca639bd64"), "foo" : "bar",
"counter" : 1 }
{ "_id" : "1234", "foo" : "bar" }
{ "_id" : "2345", "foo" : "barbar" }
{ "_id" : "foobar", "foo" : "bar" }
> db.somecollection.aggregate([ { $match: {"foo": "bar"}}, { $group: { "_id": "$foo", "count": {$sum: 1}}}
]){ "_id" : "bar", "count" : 3 }
Confidential
Aggregate: unwind / group example (1)
> db.someothercol.find()
{ "_id" : 1, "name" : "blogpost 1", "tags" : [ "music", "literature" ] }
{ "_id" : 2, "name" : "blogpost 2", "tags" : [ "dogs", "cats", "kittens" ] }
{ "_id" : 3, "name" : "blogpost 3", "tags" : [ "memes" ] }
{ "_id" : 4, "name" : "blogpost 4", "tags" : [ "memes", "kittens" ] }
Confidential
Aggregate: unwind / group example (2)
> db.someothercol.aggregate([{$unwind: "$tags"}])
{ "_id" : 1, "name" : "blogpost 1", "tags" : "music" }
{ "_id" : 1, "name" : "blogpost 1", "tags" : "literature" }
{ "_id" : 2, "name" : "blogpost 2", "tags" : "dogs" }
{ "_id" : 2, "name" : "blogpost 2", "tags" : "cats" }
{ "_id" : 2, "name" : "blogpost 2", "tags" : "kittens" }
{ "_id" : 3, "name" : "blogpost 3", "tags" : "memes" }
{ "_id" : 4, "name" : "blogpost 4", "tags" : "memes" }
{ "_id" : 4, "name" : "blogpost 4", "tags" : "kittens"
Confidential
Aggregate: unwind / group example (3)
db.someothercol.aggregate([
{$unwind: "$tags"},
{$group: {"_id": "$tags", "count": {$sum: 1}}},
{$sort: {"count": 1}}
])
{ "_id" : "cats", "count" : 1 }
{ "_id" : "dogs", "count" : 1 }
{ "_id" : "literature", "count" : 1 }
{ "_id" : "music", "count" : 1 }
{ "_id" : "memes", "count" : 2 }
{ "_id" : "kittens", "count" : 2 }
Confidential
Aggregate: exercise
1. From the zipcodes collection, calculate the total population per city and sort by total population descending
2. From the zipcodes collection, calculate the average population per zipcode of New York (hint: the _id field is the zipcode)
Confidential
Aggregate: exercise
☐ From the zipcodes collection, calculate the total population per city and sort by total population descending
db.zipcodes.aggregate([ { $group: { "_id": "$city", "total_pop": {$sum: "$pop"}} }, { $sort: { "total_pop": -1} }])
Confidential
Aggregate: exercise
☐ From the zipcodes collection, calculate the average population per zipcode of New York
db.zipcodes.aggregate([ { $match: { "city": "NEW YORK"} }, { $group:{ "_id": "$_id", "average_pop": {$avg: "$pop"}} }])
Confidential
Basic Management
Confidential
Exporting data
☐ Exporting data can be done via mongoexport☐ Format limitations
☐ JSON or CSV☐ BSON rich documents are not supported☐ binData☐ objectId☐ Date
☐ Some tricks are applied when using mongoimport
☐ In general: not a reliable way to make backups!
Confidential
Exporting data to JSON
☐ Example:Mongoexport -d test -c mytest -o mytest.json
☐ Contents should be similar to this:{"_id":1.0,"name":"mytest1"}
{"_id":2.0,"name":"mytest2","testdata":"test1234"}
{"_id":{"$oid":"57ea8420506638730683f57c"},"name":"mytest3","testdata":"test1234"}
{"_id":{"$oid":"57ea8420506638730683f57d"},"name":"mytest4","testdata":"test1234"}
{"_id":{"$oid":"57ea8420506638730683f57e"},"name":"mytest5","testdata":"test1234"}
Confidential
Importing data
☐ Importing data can be done via mongoimport☐ Counterpart of mongoexport
☐ Format limitations☐ JSON or CSV
☐ Example:mongoimport -d test -c mytest mytest.json
Confidential
Types of backups
☐ Logical backups☐ Dump of your data
☐ Physical backups☐ File(system) copy of your data
Confidential
Logical backups
☐ Mongodump☐ MongoDB Backup☐ Mongob
Confidential
Logical backups: mongodump
☐ Mongodump☐ BSON dump of the data☐ BSON files per database / collection☐ Archive
☐ OEM tool☐ Works great but needs some wrapping
Confidential
Logical backups: MongoDB Backup
☐ MongoDB Backup☐ https://www.npmjs.com/package/mongodb-backup☐ Nodejs backup solution☐ CLI and API☐ Can stream backups
Confidential
Logical backups: Mongob
☐ Mongob☐ https://github.com/cmpitg/mongob☐ Python based CLI tool☐ MongoDB instance or bz2 target☐ Can copy data between collections☐ Incremental backups☐ Rate limiting
Confidential
Physical backups: Filesystem snapshots
☐ Filesystem snapshots☐ LVM☐ ZFS☐ XFS (xfs_freeze)☐ EBS
Confidential
Physical backups: Strata
☐ MongoRocks Strata☐ https://github.com/facebookgo/rocks-strata☐ Backs up on file level☐ Supports incremental backups☐ Queryable backups
Confidential
Restore
☐ To restore from a mongodump
Confidential
Exercise: backup using mongodump
1. Create a backup using mongodump2. Log into MongoDB and drop a collection3. Restore the collection using the dump created earlier
Confidential
Exercise: backup using mongodump
1. Create a backup using mongodump$ mongodump --gzip --archive=dump.gz
2. Log into MongoDB and drop a collection> db.inserttest.drop()
3. Restore the collection using the dump created earlier$ mongorestore --port 27003 -d test -c inserttest --gzip --archive=dump.gz
2016-09-27T19:33:19.574+0200 creating intents for archive
2016-09-27T19:33:19.704+0200 reading metadata for test.inserttest from archive
'dump.gz'
2016-09-27T19:33:19.731+0200 restoring test.inserttest from archive 'dump.gz'
2016-09-27T19:33:40.624+0200 restoring indexes for collection test.inserttest from
metadata
2016-09-27T19:33:40.635+0200 finished restoring test.inserttest (146411 documents)
2016-09-27T19:33:40.635+0200 done
Confidential
Schema design patterns
Confidential
Normalized data
{
_id: "@percona",
name: "Percona Twitter account"
}
{
twitter_id: "@percona",
joined: ISODate("2009-04-02"),
location: "Raleigh, NC 27617"
}
Confidential
Embedded document (One-to-one)
{
_id: "@percona",
name: "Percona Twitter account"
info: {
joined: ISODate("2009-04-02"),
location: "Raleigh, NC 27617"
}
}
Confidential
Embedded document (1 on many)
{
_id: "@percona",
name: "Percona Twitter account"
lasttweets: [{
tweet_id: 780892298024456193,
tweettime: ISODate("2016-09-27T15:10:01"),
tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open
source software roadmap and time for Q/A http://hubs.ly/H04wLsv0"
},{
tweet_id: 780874621386158080,
tweettime: ISODate("2016-09-27T13:59:23"),
tweet: "Problems solved, before they appear! Come to #PerconaLive to get hands on
training and more. http://hubs.ly/H04rr-J0"
}]
}
Confidential
How not to use embedded documents!
{
_id: 780892298024456193,
tweettime: ISODate("2016-09-27T15:10:01"),
tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open
source software roadmap and time for Q/A http://hubs.ly/H04wLsv0",
twitterhandle: {
name: "Percona Twitter account"
info: {
joined: ISODate("2009-04-02"),
location: "Raleigh, NC 27617",
}
}
}
Confidential
Document references (1 on many)
{
_id: "@percona",
name: "Percona Twitter account"
info: { joined: ISODate("2009-04-02"), location: "Raleigh, NC 27617" }
}
{
_id:780892298024456193,
tweettime: ISODate("2016-09-27T15:10:01"),
tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open
source software roadmap and time for Q/A http://hubs.ly/H04wLsv0",
twitterhandle: "@percona"
}
Confidential
Impact of various data models
☐ Document growth☐ Reallocation of the same document impacts performance☐ Writing to the same document often creates hotspots
Confidential
Impact of various data models
1 2 3 4 5
1 3 4 5
2
2
Confidential
Impact of various data models
☐ Atomicity☐ No single write operation can change more than one document☐ Writing to multiple documents is not atomic☐ Write all changes to a single document at the same time
Confidential
Impact of various data models
{
_id: "@percona",
name: "Percona Twitter account"
lasttweets: [{
tweet_id: 780892298024456193,
tweettime: ISODate("2016-09-27T15:10:01"),
tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open
source software roadmap and time for Q/A http://hubs.ly/H04wLsv0"
},{
tweet_id: 780874621386158080,
tweettime: ISODate("2016-09-27T13:59:23"),
tweet: "Problems solved, before they appear! Come to #PerconaLive to get hands on
training and more. http://hubs.ly/H04rr-J0"
}]
}
Confidential
Impact of various data models
☐ Sharding☐ Sharding documents requires a shard key
☐ Choosing the right shard key is the start of your document structure
☐ Choosing the wrong shard key may impact performance
Confidential
Impact of various data models
☐ Example: which field to use as a shard key?{
_id:780892298024456193,
tweettime: ISODate("2016-09-27T15:10:01"),
tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open
source software roadmap and time for Q/A http://hubs.ly/H04wLsv0",
twitterhandle: "@percona"
}
Confidential
Impact of various data models
☐ Indexes☐ Every index consumes disk space and memory☐ Each new index has a negative impact on write performance☐ High read-to-write ratio will benefit from indexes☐ High write-to-read ratio will benefit from having less indexes
Confidential
Impact of various data models
☐ Number of collections☐ Having many collections has no performance penalty
☐ Having many collections will improve performance (concurrency)
☐ MMAPv1 limited in number of namespaces
☐ Large number of (small) documents☐ Can give more random disk access
Confidential
Recap
☐ Aggregations☐ Why you need to know about them
☐ Basic management☐ Import/export data☐ Backup/restore
☐ Schema design patterns
Confidential
Exercise: setting up the env
☐ Setup the environment for this afternoon☐ Clone git repo☐ Create a cluster by running the following command:
./build_process.sh
☐ This should build a sharded cluster using Percona Server MongoDB
Confidential
Lunch: 12:00 - 13:30
See you at the afternoon session !
Top Related