Back to Basics Webinar 5: Introduction to the Aggregation Framework
Webinar: Exploring the Aggregation Framework
-
Upload
mongodb -
Category
Technology
-
view
2.598 -
download
0
Transcript of Webinar: Exploring the Aggregation Framework
![Page 1: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/1.jpg)
Exploring the Aggregation Framework
Jason Mimick - Senior Consulting [email protected] @jmimick
Original Slide Credits:Jay Runkel [email protected] al
![Page 2: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/2.jpg)
2
Warning or WhewThis is a “101” beginner talk!Assuming you know some basics about MongoDBBut basically nothing about the Aggregation Framework
![Page 3: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/3.jpg)
3
Agenda
1. Analytics in MongoDB?2. Aggregation Framework3. Aggregation Framework in Action
– US Census Data– Aggregation Framework Options
4. New 3.2 stuff– Friends of friends $lookup for self-joins
![Page 4: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/4.jpg)
4
Analytics in MongoDB?
CreateReadUpdateDelete
Analytics
?
GroupCountDerive ValuesFilterAverageSort
![Page 5: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/5.jpg)
5
For Example: US Census Data• Census data from 1990, 2000, 2010
• Question: Which US Division has the fastest growing population density?– We only want to include data states with more than 1M people– We only want to include divisions larger than 100K square miles
Division = a group of US StatesPopulation density = Area of division/# of
peopleData is provided at the state level
![Page 6: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/6.jpg)
6
US Regions and Divisions
![Page 7: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/7.jpg)
7
How would we solve this in SQL?
SELECT GROUP BY HAVING
Of course, we don’t have SQL
we’re a noSQL database
![Page 8: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/8.jpg)
8
The Aggregation Framework
![Page 9: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/9.jpg)
9
Core Concept: Pipeline
ps -ef | grep mongod
![Page 10: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/10.jpg)
10
What is the Aggregation Pipeline?A Series of Document Transformations
– Executed in stages– Original input is a collection– Output as a cursor or a collection
Rich Library of Functions– Filter, compute, group, and summarize data– Output of one stage sent to input of next– Operations executed in sequential order
![Page 11: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/11.jpg)
11
An Example Aggregation Pipeline
![Page 12: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/12.jpg)
12
Syntax
>db.foo.aggregate( [ { stage1 },{ stage2 },{ stage3 }, … ])mongo shell
1 db - variable pointing to current database
2 collection name
3 aggregate - method on collection
4 array of objects, each a pipeline operator
5 pipeline operators
1 2 3 4 ...5...
![Page 13: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/13.jpg)
13
Syntax - Driver - Java
db.hospital.aggregate( [ { "$group" : { "_id" : "$PatientID, "count" : { "$sum" : 1 } } },{ "$match" : { "count" : { "$gte" : 5 } } },
{ "$sort" : { "count" : -1 } } ] )
![Page 14: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/14.jpg)
14
Some Popular Pipeline Operators$match Filter documents
$project Reshape documents
$group Summarize documents
$unwind Expand arrays in documents
$sort Order documents
$limit/$skip Paginate documents
$redact Restrict documents
$geoNear Proximity sort documents
$let,$map Define variables
![Page 15: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/15.jpg)
15
80+ operators available as of MongoDB 3.2
![Page 16: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/16.jpg)
Aggregation Framework in Action(let’s play with the census data)
![Page 17: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/17.jpg)
17
cData Collection• Document For Each State
– Name– Region– Division
• Census Data For 1990, 2000, 2010– Population– Housing Units– Occupied Housing Units
• Census Data is an array with three subdocuments
![Page 18: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/18.jpg)
18
Count, Distinct
• Check out cData docs • count()• distinct()
When you starting building your aggregations you need to ‘get to know’ your data!
![Page 19: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/19.jpg)
19
Simple $groupCensus data has a collection called regions> db.regions.findOne(){
"_id" : ObjectId("54d0e1ac28099359f5660f9f"),"state" : "Connecticut","region" : "Northeast","regNum" : 1,"division" : "New England","divNum" : 1
}
How can we find out how many states are in each region?
![Page 20: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/20.jpg)
20
> db.regions.aggregate( [ { "$group" : { "_id" : "$region", "count" : { "$sum" : 1 } }
} ] )
{ "_id" : "West", "count" : 13 }{ "_id" : "South", "count" : 17 }{ "_id" : "Midwest", "count" : 12 }{ "_id" : "Northeast", "count" : 9 }
// make more readable - store your pipeline ops in variables>var group = { "$group" : { "_id" : "$region", "count" : { "$sum" : 1 } } };db.regions.aggregate( [ group ] )
![Page 21: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/21.jpg)
21
$group• Group documents by value
– _id - field reference, object, constant
– Other output fields are computed• $max, $min, $avg, $sum• $addToSet, $push• $first, $last
– Processes all data in memory by default
![Page 22: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/22.jpg)
22
Total US Area
Back to cData…
Can we use $group to find the total area of the US (according to these data)?
![Page 23: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/23.jpg)
23
db.cData.aggregate([{"$group" : {"_id" : null,
"totalArea" : {$sum : "$areaM"}, "avgArea" : {$avg : "$areaM"} }}])
{ "_id" : null, "totalArea" : 3802067.0700000003, "avgArea" : 73116.67442307693 }
![Page 24: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/24.jpg)
24
Area By Regiondb.cData.aggregate([
{"$group" : {"_id" : "$region", "totalArea" : {$sum : "$areaM"},
"avgArea" : {$avg : "$areaM"}, "numStates" : {$sum : 1},
"states" : {$push : "$name"}}}]){ "_id" : null, "totalArea" : 5393.18, "avgArea" : 2696.59, "numStates" : 2, "states" : [ "District of Columbia", "Puerto Rico" ] }{ "_id" : "Northeast", "totalArea" : 181319.86, "avgArea" : 20146.65111111111, "numStates" : 9, "states" : [ "New Jersey", "Vermont", "Maine", "New Hampshire", "Rhode Island", "Pennsylvania", "Connecticut", "Massachusetts", "New York" ] }{ "_id" : "Midwest", "totalArea" : 821724.3700000001, "avgArea" : 68477.03083333334, "numStates" : 12, "states" : [ "Iowa", "Missouri", "Ohio", "Indiana", "North Dakota", "Wisconsin", "Illinois", "Minnesota", "Kansas", "South Dakota", "Michigan", "Nebraska" ] }{ "_id" : "West", "totalArea" : 1873251.6300000001, "avgArea" : 144096.27923076923, "numStates" : 13, "states" : [ "Colorado", "Wyoming", "California", "Utah", "Nevada", "Alaska", "Hawaii", "Montana", "New Mexico", "Arizona", "Idaho", "Oregon", "Washington" ] }{ "_id" : "South", "totalArea" : 920378.03, "avgArea" : 57523.626875, "numStates" : 16, "states" : [ "Alabama", "Georgia", "Maryland", "South Carolina", "Florida", "Mississippi", "Arkansas", "Louisiana", "North Carolina", "Texas", "West Virginia", "Oklahoma", "Virginia", "Delaware", "Kentucky", "Tennessee" ] }
![Page 25: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/25.jpg)
25
Calculating Average State Area By Region
{ $group: { _id: "$region", avgAreaM: {$avg: ”$areaM" }}}
{ _id: ”North East", avgAreaM: 154}
{ _id: “West", avgAreaM: 300}
{ state: ”New York", areaM: 218, region: “North East"}
{ state: ”New Jersey", areaM: 90, region: “North East”}
{ state: “California", areaM: 300, region: “West"}
![Page 26: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/26.jpg)
26
Calculating Total Area and State Count
{ $group: { _id: "$region", totArea: {$sum: ”$areaM" }, sCount : {$sum : 1}}}
{ _id: ”North East", totArea: 308 sCount: 2}
{ _id: “West", totArea: 300, sCount: 1}
{ state: ”New York", areaM: 218, region: “North East"}
{ state: ”New Jersey", areaM: 90, region: “North East”}
{ state: “California", area: 300, region: “West"}
![Page 27: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/27.jpg)
27
Total US Population By Yeardb.cData.aggregate( [{$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {$sum : "$data.totalPop"}}}, {$sort : {"totalPop" : 1}}])
{ "_id" : 1990, "totalPop" : 248709873 }{ "_id" : 2000, "totalPop" : 281421906 }{ "_id" : 2010, "totalPop" : 312471327 }
![Page 28: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/28.jpg)
28
$unwind• Flattens arrays• Create documents from array elements
• Array replaced by element value• Missing/empty fields → no output• Non-array fields → error
• Pipe to $group to aggregate{ "a" : "foo", "b" : [1, 2, 3] }
{ "a" : "foo", "b" : 1 }{ "a" : "foo", "b" : 2 }{ "a" : "foo", "b" : 3 }
![Page 29: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/29.jpg)
29
$unwind{ $unwind: $census }
{ state: “New York, census: 1990}
{ state: ”New York", census: [1990, 2000, 2010]}
{ state: ”New Jersey", census: [1990, 2000]}
{ state: “California", census: [1980, 1990, 2000, 2010]}{ state: ”Delaware", census: [1990, 2000]}
{ state: “New York, census: 2000}
{ state: “New York, census: 2010}
{ state: “New Jersey, census: 1990}
{ state: “New Jersey, census: 2000}
…
![Page 30: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/30.jpg)
30
Southern State Population By Yeardb.cData.aggregate( [{$match : {"region" : "South"}}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop” : {"$sum” :
"$data.totalPop"}}}])
{ "_id" : 2010, "totalPop" : 113954021 }{ "_id" : 2000, "totalPop" : 99664761 }{ "_id" : 1990, "totalPop" : 84839030 }
![Page 31: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/31.jpg)
31
$match
• Filter documents–Uses existing query syntax
![Page 32: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/32.jpg)
32
$match{ $match: { “region” : “West” }}
{ state: ”New York", areaM: 218, region: “North East"}
{ state: ”Oregon", areaM: 245, region: “West”}
{ state: “California", area: 300, region: “West"}
{ state: ”Oregon", areaM: 245, region: “West”}
{ state: “California", area: 300, region: “West"}
![Page 33: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/33.jpg)
33
Population Delta By State from 1990 to 2010
db.cData.aggregate([{$unwind : "$data"},
{$sort : {"data.year" : 1}},{$group :{"_id" : "$name", "pop1990" : {"$first" : "$data.totalPop"}, "pop2010" : {"$last" : "$data.totalPop"}}}, {$project : {"_id" : 0, "name" : "$_id",
"delta" : {"$subtract" : ["$pop2010", "$pop1990"]}, "pop1990" : 1,
"pop2010” : 1} }])
![Page 34: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/34.jpg)
34
{ "pop1990" : 3725789, "pop2010" : 3725789, "name" : "Puerto Rico", "delta" : 0 }{ "pop1990" : 4866692, "pop2010" : 6724540, "name" : "Washington", "delta" : 1857848 }{ "pop1990" : 4877185, "pop2010" : 6346105, "name" : "Tennessee", "delta" : 1468920 }{ "pop1990" : 1227928, "pop2010" : 1328361, "name" : "Maine", "delta" : 100433 }{ "pop1990" : 1006749, "pop2010" : 1567582, "name" : "Idaho", "delta" : 560833 }{ "pop1990" : 1108229, "pop2010" : 1360301, "name" : "Hawaii", "delta" : 252072 }{ "pop1990" : 3665228, "pop2010" : 6392017, "name" : "Arizona", "delta" : 2726789 }{ "pop1990" : 638800, "pop2010" : 672591, "name" : "North Dakota", "delta" : 33791 }{ "pop1990" : 6187358, "pop2010" : 8001024, "name" : "Virginia", "delta" : 1813666 }{ "pop1990" : 550043, "pop2010" : 710231, "name" : "Alaska", "delta" : 160188 }{ "pop1990" : 1109252, "pop2010" : 1316470, "name" : "New Hampshire", "delta" : 207218 }
{ "pop1990" : 10847115, "pop2010" : 11536504, "name" : "Ohio", "delta" : 689389 }{ "pop1990" : 6016425, "pop2010" : 6547629, "name" : "Massachusetts", "delta" : 531204 }
{ "pop1990" : 6628637, "pop2010" : 9535483, "name" : "North Carolina", "delta" : 2906846 }
{ "pop1990" : 3287116, "pop2010" : 3574097, "name" : "Connecticut", "delta" : 286981 }{ "pop1990" : 17990455, "pop2010" : 19378102, "name" : "New York", "delta" : 1387647 }{ "pop1990" : 29760021, "pop2010" : 37253956, "name" : "California", "delta" : 7493935 }
{ "pop1990" : 16986510, "pop2010" : 25145561, "name" : "Texas", "delta" : 8159051 }{ "pop1990" : 11881643, "pop2010" : 12702379, "name" : "Pennsylvania", "delta" : 820736 }
{ "pop1990" : 2842321, "pop2010" : 3831074, "name" : "Oregon", "delta" : 988753 }
![Page 35: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/35.jpg)
35
$sort, $limit, $skip• Sort documents by one or more
fields– Same order syntax as cursors– Waits for earlier pipeline operator to
return– In-memory unless early and indexed
• Limit and skip follow cursor behavior
![Page 36: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/36.jpg)
36
$first, $last
• Collection operations like $push and $addToSet
• Must be used in $group• $first and $last determined by document
order• Typically used with $sort to ensure ordering is
known
![Page 37: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/37.jpg)
37
$project• Reshape/Transform Documents
– Include, exclude or rename fields– Inject computed fields– Create sub-document fields
![Page 38: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/38.jpg)
38
Including and Excluding Fields{ $project: { “_id” : 0, “pop1990” : 1, “pop2010” : 1}
{ "_id" : "Virginia”, "pop1990" : 453588, "pop2010" : 3725789}
{ "_id" : "South Dakota", "pop1990" : 453588, "pop2010" : 3725789}
{ "pop1990" : 453588, "pop2010" : 3725789}
{ "pop1990" : 453588, "pop2010" : 3725789}
![Page 39: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/39.jpg)
39
{ ”name" : “South Dakota”, ”delta" : 118176}
Renaming and Computing Fields{ $project: { “_id” : 0, “pop1990” : 0, “pop2010” : 0, “name” : “$_id”, "delta" : {"$subtract" : ["$pop2010", "$pop1990"]}}}
{ "_id" : "Virginia”, "pop1990" : 6187358, "pop2010" : 8001024}
{ "_id" : "South Dakota", "pop1990" : 696004, "pop2010" : 814180}
{ ”name" : “Virginia”, ”delta" : 1813666}
![Page 40: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/40.jpg)
40
Compare number of people living within 500KM of Memphis, TN in 1990, 2000, 2010
![Page 41: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/41.jpg)
41
Compare number of people living within 500KM of Memphis, TN in 1990, 2000, 2010
db.cData.aggregate([ {$geoNear : { "near" : {"type" : "Point", "coordinates" : [90, 35]}, “distanceField” : "dist.calculated", “maxDistance” : 500000, “includeLocs” : "dist.location", “spherical” : true }}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {"$sum" : "$data.totalPop"}, "states" : {"$addToSet" : "$name"}}}, {$sort : {"_id" : 1}}])
![Page 42: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/42.jpg)
42
{ "_id" : 1990, "totalPop" : 22644082, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] }
{ "_id" : 2000, "totalPop" : 25291421, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] }
{ "_id" : 2010, "totalPop" : 27337350, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] }
![Page 43: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/43.jpg)
43
$geoNear
• Order/Filter Documents by Location– Requires a geospatial index– Output includes physical distance– Must be first aggregation stage
![Page 44: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/44.jpg)
44
{ "_id" : ”Tennessee", "pop1990" : 4877185, "pop2010" : 6346105, “center” : {“type” : “Point”, “coordinates” :
[86.6, 37.8]}}
{ "_id" : "Virginia”, "pop1990" : 6187358, "pop2010" : 8001024, “center” : {“type” : “Point”, “coordinates” :
[78.6, 37.5]}}
$geoNear{$geoNear : { "near”: {"type”: "Point", "coordinates”: [90, 35]}, maxDistance : 500000, spherical : true }}
{ "_id" : ”Tennessee", "pop1990" : 4877185, "pop2010" : 6346105, “center” : {“type” : “Point”, “coordinates” :
[86.6, 37.8]}}
![Page 45: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/45.jpg)
45
What if I want to save the results to a collection?
db.cData.aggregate([ {$geoNear : { "near" : {"type" : "Point", "coordinates" : [90, 35]}, “distanceField” : "dist.calculated", “maxDistance” : 500000, “includeLocs” : "dist.location", “spherical” : true }}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {"$sum" : "$data.totalPop"}, "states" : {"$addToSet" : "$name"}}}, {$sort : {"_id" : 1}}, {$out : “peopleNearMemphis”}])
![Page 46: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/46.jpg)
46
$out
db.cData.aggregate([ <pipeline stages>, {“$out” :“resultsCollection”}])
• Save aggregation results to a new collection• NOTE: Overwrites any data existing in collection• Transform documents - ETL
![Page 47: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/47.jpg)
47
Back To The Original Question
• Which US Division has the fastest growing population density?– We only want to include data states with more than 1M people– We only want to include divisions larger than 100K square miles
![Page 48: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/48.jpg)
48
Division with Fastest Growing Pop Density
db.cData.aggregate( [{$match : {"data.totalPop" : {"$gt" : 1000000}}}, {$unwind : "$data"}, {$sort : {"data.year" : 1}}, {$group : {"_id" : "$name", "pop1990" : {"$first" : "$data.totalPop"},
"pop2010" : {"$last" : "$data.totalPop"}, "areaM" : {"$first" : "$areaM"}, "division" : {"$first" : "$division"}}}, {$group : {"_id" : "$division", "totalPop1990" : {"$sum" : "$pop1990"}, "totalPop2010" : {"$sum" : "$pop2010"},
"totalAreaM" : {"$sum" : "$areaM"}}}, {$match : {"totalAreaM" : {"$gt" : 100000}}}, {$project : {"_id" : 0, "division" : "$_id", "density1990" : {"$divide" : ["$totalPop1990", "$totalAreaM"]}, "density2010" : {"$divide" : ["$totalPop2010", "$totalAreaM"]}, "denDelta" : {"$subtract" : [{"$divide" : ["$totalPop2010", "$totalAreaM"]},{"$divide" : ["$totalPop1990”,"$totalAreaM"]}]}, "totalAreaM" : 1, "totalPop1990" : 1, "totalPop2010" : 1}}, {$sort : {"denDelta" : -1}}])
![Page 49: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/49.jpg)
49
{ "totalPop1990" : 42293785, "totalPop2010" : 58277380, "totalAreaM" : 290433.39999999997, "division" : "South Atlantic", "density1990" : 145.62300685802668, "density2010" : 200.6566049221612, "denDelta" : 55.03359806413451 }
{ "totalPop1990" : 38577263, "totalPop2010" : 49169871, "totalAreaM" : 344302.94999999995, "division" : "Pacific", "density1990" : 112.0445322934352, "density2010" : 142.80990331334658, "denDelta" : 30.765371019911385 }
{ "totalPop1990" : 37602286, "totalPop2010" : 40872375, "totalAreaM" : 109331.91, "division" : "Mid-Atlantic", "density1990" : 343.9278249140621, "density2010" : 373.8375648975674, "denDelta" : 29.90973998350529 }
{ "totalPop1990" : 26702793, "totalPop2010" : 36346202, "totalAreaM" : 444052.01, "division" : "West South Central", "density1990" : 60.134381555890265, "density2010" : 81.85122729204626, "denDelta" : 21.716845736155996 }
{ "totalPop1990" : 15176284, "totalPop2010" : 18432505, "totalAreaM" : 183403.9, "division" : "East South Central", "density1990" : 82.74788049763391, "density2010" : 100.50225213313348, "denDelta" : 17.754371635499567 }
{ "totalPop1990" : 42008942, "totalPop2010" : 46421564, "totalAreaM" : 301368.57, "division" : "East North Central", "density1990" : 139.39390560867048, "density2010" : 154.03585052017866, "denDelta" : 14.641944911508176 }
{ "totalPop1990" : 12406123, "totalPop2010" : 20512410, "totalAreaM" : 618711.92, "division" : "Mountain", "density1990" : 20.051533838236054, "density2010" : 33.153410071685705, "denDelta" : 13.101876233449651 }
{ "totalPop1990" : 16324886, "totalPop2010" : 19018666, "totalAreaM" : 372541.8, "division" : "West North Central", "density1990" : 43.820280033005695, "density2010" : 51.05109279012449, "denDelta" : 7.230812757118798 }
![Page 50: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/50.jpg)
Aggregate Options
![Page 51: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/51.jpg)
51
Aggregate optionsdb.cData.aggregate([<pipeline stages>], {‘explain’ : false 'allowDiskUse' : true, 'cursor' : {'batchSize' : 5}})
explain – similar to find().explain()allowDiskUse – enable use of disk to store intermediate resultscursor – specify the size of the initial result
![Page 52: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/52.jpg)
New things in 3.2
![Page 53: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/53.jpg)
53
$sample
{ $sample: { size: <positive integer> } }
● If WT - pseudo-random cursor to return docs
● If MMAPv1 - uses _id index to randomly select docs
Used by Compass, Useful for unit tests, etc
![Page 54: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/54.jpg)
54
$lookup• Performs a left outer join to another collection in the same database to filter in
documents from the “joined” collection for processing.• To each input document, the $lookup stage adds a new array field whose
elements are the matching documents from the “joined” collection.
{ $lookup: { from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field> }}
CANNOT BE SHARDED
https://docs.mongodb.org/master/reference/operator/aggregation/lookup/
![Page 55: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/55.jpg)
55
• Sample data:> db.data.find(){ "_id" : ObjectId("565e759ae6f9919371a53896"), "v" : 14, "k" : 0 }{ "_id" : ObjectId("565e759ae6f9919371a53897"), "v" : 664, "k" : 1 }{ "_id" : ObjectId("565e759ae6f9919371a53898"), "v" : 701, "k" : 1 }{ "_id" : ObjectId("565e759ae6f9919371a53899"), "v" : 312, "k" : 1 }{ "_id" : ObjectId("565e759ae6f9919371a5389a"), "v" : 10, "k" : 2 }{ "_id" : ObjectId("565e759ae6f9919371a5389b"), "v" : 686, "k" : 0 }{ "_id" : ObjectId("565e759ae6f9919371a5389c"), "v" : 669, "k" : 2 }{ "_id" : ObjectId("565e759ae6f9919371a5389d"), "v" : 273, "k" : 2 }{ "_id" : ObjectId("565e759ae6f9919371a5389e"), "v" : 473, "k" : 0 }{ "_id" : ObjectId("565e759ae6f9919371a5389f"), "v" : 158, "k" : 2 }
> db.keys.find(){ "_id" : 0, "name" : "East Meter" }{ "_id" : 1, "name" : "Central Meter 12" }{ "_id" : 2, "name" : "New HIFI Monitor" }
![Page 56: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/56.jpg)
56
• Try to find ave “v” value but lookup name of “k”db.data.aggregate( [ { "$lookup" : { "from" : "keys", "localField" : "k", "foreignField" : "_id", "as" : "name" } }, { "$unwind" : "$name" }, { "$project" : { "k" : "$k", "name" : "$name.name", "v" : "$v" } }, { "$group" : { "_id" : "$name", "aveValue" : { "$avg" : "$v" } } }, { "$project" : { "_id" : 0, "name" : "$_id", "aveValue" : "$aveValue" } }]);
{ "aveValue" : 277.5, "name" : "New HIFI Monitor"}{ "aveValue" : 559, "name" : "Central Meter 12"}{ "aveValue" : 391, "name" : "East Meter"}
![Page 57: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/57.jpg)
57
friends of friends
Use $lookup to perform "self-joins" for graph problems.Simple case: find the friends of someone's friendsCan extend this to find cliques, paths, etc.
Dataset:
{ "_id" : 1, "name" : "FLOYD", "friends" : [ "BILLIE", "MARGENE", "HERMINIA", "LACRESHA", "SHAUN", "INOCENCIA", "DEANA", "MARAGRET", "MICHELE", "KARLENE", "KASSANDRA", "JOAN", "HIRAM" ] }
{ "_id" : 2, "name" : "ELIDA", "friends" : [ "ALI", "KESHIA" ] }
...
![Page 58: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/58.jpg)
58
![Page 59: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/59.jpg)
59
don't forget your indexes…Running FOF.friendsOfFriends(1)2016-01-26T10:19:41.201-0500 I COMMAND [conn6] command friendship.friends command: aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind: "$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField: "name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind: "$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, { $project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:42505581740 keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount: { r: 1124 } }, Database: { acquireCount: { r: 562 } }, Collection: { acquireCount: { r: 562 } } } protocol:op_command 48ms
with indexes { "friends" : 1 } & { "name" : 1 }:
2016-01-26T10:17:45.167-0500 I COMMAND [conn6] command friendship.friends command: aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind: "$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField: "name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind: "$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, { $project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:39053867824 keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount: { r: 32 } }, Database: { acquireCount: { r: 16 } }, Collection: { acquireCount: { r: 16 } } } protocol:op_command 2ms
![Page 60: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/60.jpg)
60
lots of new mathematical operators
$stdDevSamp Calculates standard deviation. { $stdDevSamp: <array> }$stdDevPop Calculates population standard deviation. { $stdDevPop: <array> }$sqrt Calculates the square root. { $sqrt: <number> }$abs Returns the absolute value of a number. { $abs: <number> }$log Calculates the log of a number in the specified base. { $log: [ <number>, <base> ] }$log10 Calculates the log base 10 of a number. { $log10: <number> }$ln Calculates the natural log of a number. { $ln: <number> }$pow Raises a number to the specified exponent. { $pow: [ <number>, <exponent> ] }$exp Raises e to the specified exponent. { $exp: <number> }$trunc Truncates a number to its integer. { $trunc: <number> }$ceil Returns the smallest integer greater than or equal to the specified number.{$ceil:<number>}
$floor Returns the largest integer less than or equal to the specified number. {$floor: <number>}
![Page 61: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/61.jpg)
61
new array operators
$slice Returns a subset of an array.{ $slice: [ <array>, <n> ] } or { $slice: [ <array>, <position>, <n> ] }
$arrayElemAt Returns the element at the specified array index.{ $arrayElemAt: [ <array>, <idx> ] }$concatArrays Concatenates arrays. { $concatArrays: [ <array1>, <array2>, ... ]}$isArray Determines if the operand is an array. { $isArray: [ <expression> ] }$filter Selects a subset of the array based on the condition.
{ $filter: { input: <array>, as: <string>, cond: <expression> }}
![Page 62: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/62.jpg)
Summary
![Page 63: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/63.jpg)
63
Analytics in MongoDB?
CreateReadUpdateDelete
Analytics
?
GroupCountDerive ValuesFilterAverageSort
YES!
![Page 64: Webinar: Exploring the Aggregation Framework](https://reader035.fdocuments.us/reader035/viewer/2022062522/5873ecf61a28abb1528b47bb/html5/thumbnails/64.jpg)
64
Framework Use Cases
• Basic aggregation queries
• Ad-hoc reporting
• Real-time analytics
• Visualizing and reshaping data