Mongo db mug_2012-02-07

Post on 26-Jun-2015

229 views 0 download

description

Data extraction for reporting on Mongo using

Transcript of Mongo db mug_2012-02-07

O Where Clause, Where Clause! Wherefore art thou Where Clause?

(a.k.a. Aggregation for Reporting)

Overview

• Discuss and demonstrate aggregating data • Specifically addresses reporting needs• Example study: Aggregating Video Game Stats

Disclaimer!

Kills

Sales

Player

Product

Dataset{

"_id" : ObjectId("50fc77ee364c74eba1afe1e3"),"fragdate" : ISODate("2012-12-24T00:00:19.901Z"),"gameId" : 1221,"gameName" : "Christmas Blitz","kill" : {

"_id" : ObjectId("50acfd45712e8bc7832ea7cb"),"username" : "player002","avatar" : "avatar.com/player002.png","displayname" : "Sniper the Clown","rank" : "Sniper","motto" : "If you run, you'll just die tired."

},"player" : {

"userid" : 1,"username" : "ArmyD00d1221","avatar" : "avatar.com/armyd00d1221.png","displayname" : "Army Grunt"

},"server" : "app01.fragzilla.com"

}

Aggregate the number of times each

player was killed

Report Details{

"_id" : ObjectId("50fc77ee364c74eba1afe1e3"),"fragdate" : ISODate("2012-12-24T00:00:19.901Z"),"gameId" : 1221,"gameName" : "Christmas Blitz","kill" : {

"_id" : ObjectId("50acfd45712e8bc7832ea7cb"),"username" : "player002","avatar" : "avatar.com/player002.png","displayname" : "Sniper the Clown","rank" : "Sniper","motto" : "If you run, you'll just die tired."

},"player" : {

"userid" : 1,"username" : "ArmyD00d1221","avatar" : "avatar.com/armyd00d1221.png","displayname" : "Army Grunt"

},"server" : "app01.fragzilla.com"

}

Only aggregate kills on these three players:• Sniper the Clown• Kurious Killer• My L1ttl3 P0wn13

Only on Dec 23, 2012Between 2pm and 10pm

Relational DB

IdfragDategameIDgameNameserverfkKilledfkPlayer

Kills IdusernameavatardisplayNamerankmotto

Killed

IdusernameavatardisplayNamerankmotto

Player

Could be the same table

Relational DB

IdfragDategameIDgameNameserverfkKilledfkPlayer

Kills IdusernameavatardisplayNamerankmotto

Killed

IdusernameavatardisplayNamerankmotto

Player

Could be the same table

SELECT tk.fragDate, k.id, count(k.id) FROM test.kills tkJOIN players p ON tk.fkPlayer = p.id JOIN killed k ON tk.fkKilled = k.idWHERE k.id IN (1,2,3)GROUP BY fragDate, k.id;

Sidenote: Exploration

• Software Engineering tends to have more clearly defined goals

• Report Engineering tends to have more clearly defined questions

Query From The Shell

Output

Next Step: Delimited Output

Display in Excel, R, Processing, etc

Aggregation: Big Picture

• Somewhere between Mongo Queries and Map/Reduce implementations

• Best suited for totaling and averaging functions• Similar functionality to SQL Group By clause

Complexity

MongoQueries

MongoAggregation Framework

Map/ReduceImplementations

Anatomy of Aggregation Framework

db.collection.aggregate( [ {do something},

{do something else},{do even more stuff}

])

Aggregate command

Pipeline Operators

Pipeline Operators

• Pipelines: transforms documents from the collection as they pass through– grep e server.log | less

• Expressions: produce output documents based on calculations performed on input documents

Pipelines:

$project$match$limit$skip$unwind$group$sort

Expressions$group Operators:$addToSet$first$last$max$min$avg$sum

Boolean Operators:$and$or$not

Comparison Operators:$cmp$eq$gt$lt$ne

Arithmetic Operators:$add$subtract$multiply$divide

String Operators:$strcasecmp$substr$toLower$toUpper

Date Operators:$year$month$hour

See http://docs.mongodb.org/manual/reference/aggregation/#aggregation-expression-operatorsFor an exhaustive list

Our Aggregation Query

Our Aggregation Query

All the magic goes between the []

Our Aggregation Query

$match:Provides a query-like interface to filter

documents out of the aggregation pipeline. The $match drops

documents that do not match the condition from the aggregation

pipeline, and it passes documents that match along the pipeline unaltered.

Our Aggregation Query

$project: Reshapes a document stream by renaming, adding, or

removing fields. Also use $project to create computed values or sub-

objects

Our Aggregation Query

$groupGroups documents together for the

purpose of calculating aggregate values based on a collection of

documents. Practically, group often supports tasks such as average page

views for each page in a website on a daily basis.

Our Aggregation Query

“numKills”: { $sum: “$numKills” }

Our Aggregation Query$sort

The $sort pipeline operator sorts all input documents and returns them to

the pipeline in sorted order.

{ $sort : { <sort-key> } }

Aggregation Output

{"result" : [

{"_id" : {"displayname" : "My L1ttl3 P0wn13","eventhour" : 21},"numKills" : 133},{"_id" : {"displayname" : "Kurious Killer","eventhour" : 21},"numKills" : 130},

// ******* Omitted for brevity *******{"_id" : {"displayname" : "Sniper the Clown","eventhour" : 2},"numKills" : 6}

],"ok" : 1

}

Produces a document with two fields: result and ok

Aggregation Output

{"result" : [

{"_id" : {"displayname" : "My L1ttl3 P0wn13","eventhour" : 21},"numKills" : 133},{"_id" : {"displayname" : "Kurious Killer","eventhour" : 21},"numKills" : 130},

// ******* Omitted for brevity *******{"_id" : {"displayname" : "Sniper the Clown","eventhour" : 2},"numKills" : 6}

],"ok" : 1

}

Recap: Aggregation Framework

db.collection.aggregate( [ {do something},

{do something else},{do even more stuff}

])

Aggregate command

Pipeline Operators

We’re not quite done…

We can’t really give something like this to our customers:

But if we had…

A database config.

But if we had…

To run our aggregation.

But if we had…

Inside a node server.

Q/A/Comments

Will Buttonwillb@mylist.com@wfbutton