Richmond MUG – May 2014
description
Transcript of Richmond MUG – May 2014
Richmond MUG – May 2014
MongoDB 2.6Jason Ford – Principal Engineer, Snagajob
MongoDB World!First National MongoDB Conference
June 23 – 25 in New York City
Use discount code mug_25 to get 25% off registration
Meetup Calendar- Today: MongoDB 2.6
- July 8: MongoDB World Post-Mortem
- September 9: TBD
- November 4: TBD (Second Anniversary)
Richmond MUG – May 2014
MongoDB 2.6Jason Ford – Principal Engineer, Snagajob
Overview- In development for a full year
- (longer than any prior release)- First major rewrite of the codebase
- Including full rewrite of the query engine- Some significant new features, but
primary goal of release is foundation for future development
Read Operations- Largely transparent
- New framework highly extendable- .maxTimeMS() operator
- Allows for timeouts on a per-operation basis
- Great for adhoc queries- Available in all drivers- Indexes
Indexes- Background Index builds to secondary
nodes- Index builds can resume if interrupted
- Index Intersection- Great for ad-hoc queries- Still want dedicated compound indexes
for oft-used queries
- dropDups option deprecated
Indexes- Consider a collection with these indexes:
- { qty : 1 }- { item : 1}
- Index Intersection may be used to support the following query:
db.orders.find({ item: “abc123”, qty: { $gt : 15}})- Emphasis on MAY
- Single index queries may be more efficient
Read Operations- Text Search
- Beta feature in 2.4, now enabled by default
- Probably only practical for small collections- Indexes are very large
- Query execution framework completely rewritten- Query parser, optimizer, cache, etc- Find queries are noticeably faster
Cached Query Plan Interface- New insight/control provided into
mongoDB’s query execution- mongoDB query optimizer has long tried
to figure out the most efficient use of indexes on a per-query basis, and cache them- db.collection.getPlanCache() provides an interface to view and clear stored query strategies by query shape
Cached Query Plan Interface
- db.jobseeker.getPlanCache().help()
Aggregation Framework- Introduced in 2.2- Finally seems fully baked in 2.6- Queries return a cursor
- Used to return a single document (16MB limit)
- Results can be output to a new collection- $out operator
Aggregation Frameworkdb.jobseeker.aggregate( { $project : { _id: 0, alert : '$p.n'} }, { $unwind : "$alert" }, { $group : { _id : "$alert", count: {$sum : 1} } }, { $out : "alertsummary" })
Write Operations- Insert, Update, Delete completely
rewritten to use commands
- Write operations always returns a WriteResult object
- Forget about “fire and forget”
- Even a {w:0} specification sends back a yes/no response
Write OperationsSample Update Command (db.runCommand):{
update: 'collection name' , updates: [{ q: { a : 1 } , u: { $inc : { x : 1}} , multi: true/false , upsert: true/false }, ...] , writeConcern: { w: 1, j: true, wtimeout: 1000 } , ordered: true/false}
WriteResult Structure { "ok" : 1, "n" : 0, "nModified": 1, (Applies only to update) "nRemoved": 1, (Applies only to removes) "writeErrors" : [ { "index" : 0, "code" : 11000, "errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: t1.t.$a_1 dup key: { : 1.0 }" } ], writeConcernError: { code : 22, errInfo: { wtimeout : true }, errmsg: "Could not replicate operation within requested timeout" } }
Write Operations- WriteConcern can be specified on a per-
operation basis
- Field Order- _id field will ALWAYS be first- Field order will be preserved (unless a
field is renamed)
db.products.insert( { item: "envelopes", qty : 100, type: "Clasp" }, { writeConcern: { w: "majority", wtimeout: 5000 } })
Bulk Write Operations- All write operations can now happen in
bulk- Super cool fluid language- Significant performance increase
Bulk Write OperationsOLD WAY// get cursorvar cursor = db.myCollection.find({}, {_id:1}); // returns 100,000 documents
// iterate through and update each documentwhile(cursor.hasNext()){ var doc = cursor.next(); db.myCollection.update({_id : doc._id}, { $set : { up : x }});}
TIME: 67.4 Seconds
Bulk Write OperationsNEW WAY// create bulk objectvar bulk = db.myCollection.initializeUnorderedBulkOp();
// add update operations to BulkOpfor (var x = 0; x < 100000; x++){ bulk.find({_id : x }).update({ $set : { up : x }});}
// send update operations to the databasebulk.execute();
TIME: 5.5 Seconds (62 seconds faster)
Storage- Power of 2 Allocation (introduced in 2.2)
now set as the default allocation strategy
- Each record has a size in bytes that is a power of 2 (e.g. 32, 64, 128, 256, 512...16777216.)
- Smallest allocation size is 32 bytes
Storage- Two advantages/goals:
1. The limited number of record allocation sizes makes it easier for mongo to reuse existing allocations, reducing fragmentation2. The space allocated for each document is usually larger than the data they hold. This allows documents to grow while minimizing the chance that mongo will need to allocate space as data is added to a document.
Storage- Power of 2 sizes replaces previous “Exact
Fit” allocation strategy - allocated the exact size needed plus a small (configurable) padding factor
- Was inefficient for heavy write operations and inefficient for reallocating space
Sharding & Replication- Ability to merge Chunks
- Chunks must be contiguous- Chunks must be on same shard- One chunk must be empty
Sharding & ReplicationAbility to remove orphaned documents
orphaned Documents: documents on a shard that also exist in chunks on other shards as a result of failed migrations or incomplete migration cleanup due to abnormal shutdownDelete orphaned documents using cleanupOrphaned to reclaim disk space and reduce confusion.
Sharding & ReplicationAbility to remove orphaned documents
- Must be run on admin db of the primary member of a replica set (NOT mongos)db.runCommand( {
"cleanupOrphaned": "test.info", "startingAtKey": { x: 10 }, "secondaryThrottle": true} )
Security- Integration (Enterprise Edition Only)
- Kerberos introduced in 2.4- 2.6 adds LDAP and x.509 protocols
- There’s also a Windows Enterprise Edition now- Linux Enterprise introduced in 2.4
Security- User-Defined Roles & Collection Level
Access- Before: readonly and full admin were the only options (per database)
- 2.6 adds Role-Based Access Control- Separate upgrade
- Users are granted Roles- Roles have Privileges- Privileges are an action and a resource
- ex: Update (action) on product db (resource)
SecurityBuilt in Database Roles:- read (read only access)- readWrite (CRUD, create, rename, and
drop collections, create and drop indexes)- dbAdmin (read access to system.profile
collection – weirdly specific, but ok)- userAdmin (create and modify roles and
users)- dbOwner (readWrite + dbAdmin + userAdmin)
SecurityBuilt in Cluster Roles (create on admin DB):- clusterManager (add/remove shards,
change replset and cluster config, manage chunks, etc)- clusterMonitor (read access to cluster admin info)
- hostManager (misc admin commangs (killop/shutdown/repairDatabase)
- clusterAdmin (all of the above + dropDatabase)
SecurityOther Roles (adminDB):- backup, restore
(mongodump/mongorestore)- readAnyDatabase, readWriteAnyDatabase, userAdminAnyDatabase, dbAdminAnyDatabase- root (readWriteAnyDatabase, dbAdminAnyDatabase, userAdminAnyDatabase, clusterAdmin)
SecurityCustom Roles:db.runCommand({ createRole: "myClusterwideAdmin", privileges: [ { resource: { cluster: true }, actions: [ "addShard" ] }, { resource: { db: "config", collection: "" }, actions: [ "find", "update", "insert", "remove" ] }, { resource: { db: "users", collection: "usersCollection" }, actions: [ "update", "insert", "remove" ] }, { resource: { db: "", collection: "" }, actions: [ "find" ] } ], roles: [ { role: "read", db: "admin" } ]})
LOTS of new stuff here – check out documentation
SecurityUser Creation Example:use productsdb.createUser( { "user" : "accountAdmin01", "pwd": "cleartext password", "customData" : { employeeId: 12345 }, "roles" : [ { role: " myClusterwideAdmin ", db: "admin" }, { role: "readAnyDatabase", db: "admin" }, "readWrite" ] })This user has readWrite permissions on products DB, read permissions on all DBs, and has the permissions of the role we created earlier.
Miscellaneous$min & $max conditional updates
- Ex: db.scores.update( { _id: 1 }, { $min: { lowScore: 150 } } )
Enhancements to 2D sphere indexes
rs.printReplicationInfo()rs.printSlaveReplicationInfo()
– human readable helper methods
mongoexport supports --skip, --limit, --sort
The Future“You’ll see the benefits in better performance and new innovations. We re-wrote the entire query execution engine to improve scalability, and took our first step in building a sophisticated query planner by introducing index intersection. We’ve made the codebase easier to maintain, and made it easier to implement new features. Finally, MongoDB 2.6 lays the foundation for massive improvements to concurrency in MongoDB 2.8, including document-level locking.”
- Eliot Horowitz, CTO and Co-Founder, MongoDB
Richmond MUG – May 2014
MongoDB 2.6Jason Ford – Principal Engineer, Snagajob