Richmond MUG – May 2014

Post on 24-Feb-2016

82 views 0 download

Tags:

description

Richmond MUG – May 2014. MongoDB 2.6. Jason Ford – Principal Engineer, Snagajob. MongoDB World!. First National MongoDB Conference June 23 – 25 in New York City Use discount code mug_25 to get 25% off registration. Meetup Calendar. Today: MongoDB 2.6. July 8: MongoDB World Post-Mortem. - PowerPoint PPT Presentation

Transcript of Richmond MUG – May 2014

Richmond MUG – May 2014

MongoDB 2.6Jason Ford – Principal Engineer, Snagajob

MongoDB World!First National MongoDB Conference

June 23 – 25 in New York City

Use discount code mug_25 to get 25% off registration

Meetup Calendar- Today: MongoDB 2.6

- July 8: MongoDB World Post-Mortem

- September 9: TBD

- November 4: TBD (Second Anniversary)

Richmond MUG – May 2014

MongoDB 2.6Jason Ford – Principal Engineer, Snagajob

Overview- In development for a full year

- (longer than any prior release)- First major rewrite of the codebase

- Including full rewrite of the query engine- Some significant new features, but

primary goal of release is foundation for future development

Read Operations- Largely transparent

- New framework highly extendable- .maxTimeMS() operator

- Allows for timeouts on a per-operation basis

- Great for adhoc queries- Available in all drivers- Indexes

Indexes- Background Index builds to secondary

nodes- Index builds can resume if interrupted

- Index Intersection- Great for ad-hoc queries- Still want dedicated compound indexes

for oft-used queries

- dropDups option deprecated

Indexes- Consider a collection with these indexes:

- { qty : 1 }- { item : 1}

- Index Intersection may be used to support the following query:

db.orders.find({ item: “abc123”, qty: { $gt : 15}})- Emphasis on MAY

- Single index queries may be more efficient

Read Operations- Text Search

- Beta feature in 2.4, now enabled by default

- Probably only practical for small collections- Indexes are very large

- Query execution framework completely rewritten- Query parser, optimizer, cache, etc- Find queries are noticeably faster

Cached Query Plan Interface- New insight/control provided into

mongoDB’s query execution- mongoDB query optimizer has long tried

to figure out the most efficient use of indexes on a per-query basis, and cache them- db.collection.getPlanCache() provides an interface to view and clear stored query strategies by query shape

Cached Query Plan Interface

- db.jobseeker.getPlanCache().help()

Aggregation Framework- Introduced in 2.2- Finally seems fully baked in 2.6- Queries return a cursor

- Used to return a single document (16MB limit)

- Results can be output to a new collection- $out operator

Aggregation Frameworkdb.jobseeker.aggregate( { $project : { _id: 0, alert : '$p.n'} }, { $unwind : "$alert" }, { $group : { _id : "$alert", count: {$sum : 1} } }, { $out : "alertsummary" })

Write Operations- Insert, Update, Delete completely

rewritten to use commands

- Write operations always returns a WriteResult object

- Forget about “fire and forget”

- Even a {w:0} specification sends back a yes/no response

Write OperationsSample Update Command (db.runCommand):{

update: 'collection name' , updates: [{ q: { a : 1 } , u: { $inc : { x : 1}} , multi: true/false , upsert: true/false }, ...] , writeConcern: { w: 1, j: true, wtimeout: 1000 } , ordered: true/false}

WriteResult Structure { "ok" : 1, "n" : 0, "nModified": 1, (Applies only to update) "nRemoved": 1, (Applies only to removes) "writeErrors" : [ { "index" : 0, "code" : 11000, "errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: t1.t.$a_1 dup key: { : 1.0 }" } ], writeConcernError: { code : 22, errInfo: { wtimeout : true }, errmsg: "Could not replicate operation within requested timeout" } }

Write Operations- WriteConcern can be specified on a per-

operation basis

- Field Order- _id field will ALWAYS be first- Field order will be preserved (unless a

field is renamed)

db.products.insert( { item: "envelopes", qty : 100, type: "Clasp" }, { writeConcern: { w: "majority", wtimeout: 5000 } })

Bulk Write Operations- All write operations can now happen in

bulk- Super cool fluid language- Significant performance increase

Bulk Write OperationsOLD WAY// get cursorvar cursor = db.myCollection.find({}, {_id:1}); // returns 100,000 documents

// iterate through and update each documentwhile(cursor.hasNext()){ var doc = cursor.next(); db.myCollection.update({_id : doc._id}, { $set : { up : x }});}

TIME: 67.4 Seconds

Bulk Write OperationsNEW WAY// create bulk objectvar bulk = db.myCollection.initializeUnorderedBulkOp();

// add update operations to BulkOpfor (var x = 0; x < 100000; x++){ bulk.find({_id : x }).update({ $set : { up : x }});}

// send update operations to the databasebulk.execute();

TIME: 5.5 Seconds (62 seconds faster)

Storage- Power of 2 Allocation (introduced in 2.2)

now set as the default allocation strategy

- Each record has a size in bytes that is a power of 2 (e.g. 32, 64, 128, 256, 512...16777216.)

- Smallest allocation size is 32 bytes

Storage- Two advantages/goals:

1. The limited number of record allocation sizes makes it easier for mongo to reuse existing allocations, reducing fragmentation2. The space allocated for each document is usually larger than the data they hold. This allows documents to grow while minimizing the chance that mongo will need to allocate space as data is added to a document.

Storage- Power of 2 sizes replaces previous “Exact

Fit” allocation strategy - allocated the exact size needed plus a small (configurable) padding factor

- Was inefficient for heavy write operations and inefficient for reallocating space

Sharding & Replication- Ability to merge Chunks

- Chunks must be contiguous- Chunks must be on same shard- One chunk must be empty

Sharding & ReplicationAbility to remove orphaned documents

orphaned Documents: documents on a shard that also exist in chunks on other shards as a result of failed migrations or incomplete migration cleanup due to abnormal shutdownDelete orphaned documents using cleanupOrphaned to reclaim disk space and reduce confusion.

Sharding & ReplicationAbility to remove orphaned documents

- Must be run on admin db of the primary member of a replica set (NOT mongos)db.runCommand( {

"cleanupOrphaned": "test.info", "startingAtKey": { x: 10 }, "secondaryThrottle": true} )

Security- Integration (Enterprise Edition Only)

- Kerberos introduced in 2.4- 2.6 adds LDAP and x.509 protocols

- There’s also a Windows Enterprise Edition now- Linux Enterprise introduced in 2.4

Security- User-Defined Roles & Collection Level

Access- Before: readonly and full admin were the only options (per database)

- 2.6 adds Role-Based Access Control- Separate upgrade

- Users are granted Roles- Roles have Privileges- Privileges are an action and a resource

- ex: Update (action) on product db (resource)

SecurityBuilt in Database Roles:- read (read only access)- readWrite (CRUD, create, rename, and

drop collections, create and drop indexes)- dbAdmin (read access to system.profile

collection – weirdly specific, but ok)- userAdmin (create and modify roles and

users)- dbOwner (readWrite + dbAdmin + userAdmin)

SecurityBuilt in Cluster Roles (create on admin DB):- clusterManager (add/remove shards,

change replset and cluster config, manage chunks, etc)- clusterMonitor (read access to cluster admin info)

- hostManager (misc admin commangs (killop/shutdown/repairDatabase)

- clusterAdmin (all of the above + dropDatabase)

SecurityOther Roles (adminDB):- backup, restore

(mongodump/mongorestore)- readAnyDatabase, readWriteAnyDatabase, userAdminAnyDatabase, dbAdminAnyDatabase- root (readWriteAnyDatabase, dbAdminAnyDatabase, userAdminAnyDatabase, clusterAdmin)

SecurityCustom Roles:db.runCommand({ createRole: "myClusterwideAdmin", privileges: [ { resource: { cluster: true }, actions: [ "addShard" ] }, { resource: { db: "config", collection: "" }, actions: [ "find", "update", "insert", "remove" ] }, { resource: { db: "users", collection: "usersCollection" }, actions: [ "update", "insert", "remove" ] }, { resource: { db: "", collection: "" }, actions: [ "find" ] } ], roles: [ { role: "read", db: "admin" } ]})

LOTS of new stuff here – check out documentation

SecurityUser Creation Example:use productsdb.createUser( { "user" : "accountAdmin01", "pwd": "cleartext password", "customData" : { employeeId: 12345 }, "roles" : [ { role: " myClusterwideAdmin ", db: "admin" }, { role: "readAnyDatabase", db: "admin" }, "readWrite" ] })This user has readWrite permissions on products DB, read permissions on all DBs, and has the permissions of the role we created earlier.

Miscellaneous$min & $max conditional updates

- Ex: db.scores.update( { _id: 1 }, { $min: { lowScore: 150 } } )

Enhancements to 2D sphere indexes

rs.printReplicationInfo()rs.printSlaveReplicationInfo()

– human readable helper methods

mongoexport supports --skip, --limit, --sort

The Future“You’ll see the benefits in better performance and new innovations. We re-wrote the entire query execution engine to improve scalability, and took our first step in building a sophisticated query planner by introducing index intersection. We’ve made the codebase easier to maintain, and made it easier to implement new features. Finally, MongoDB 2.6 lays the foundation for massive improvements to concurrency in MongoDB 2.8, including document-level locking.”

- Eliot Horowitz, CTO and Co-Founder, MongoDB

Richmond MUG – May 2014

MongoDB 2.6Jason Ford – Principal Engineer, Snagajob