Introduction to the New Aggregation Framework

53
Aggregation Framework Jeremy Mikola @jmikola

Transcript of Introduction to the New Aggregation Framework

Page 1: Introduction to the New Aggregation Framework

Aggregation Framework

Jeremy Mikola@jmikola

Page 2: Introduction to the New Aggregation Framework
Page 3: Introduction to the New Aggregation Framework

MapReduce

Page 4: Introduction to the New Aggregation Framework

MapReduce

Page 5: Introduction to the New Aggregation Framework

MapReduceExtremely versatile, powerfulIntended for complex data analysisOverkill for simple aggregation tasks

AveragesSummationGrouping

Page 6: Introduction to the New Aggregation Framework

MapReduce in MongoDBImplemented with JavaScript

Single-threadedDifficult to debug

ConcurrencyAppearance of parallelismWrite locks

Page 7: Introduction to the New Aggregation Framework

Aggregation Framework

Page 8: Introduction to the New Aggregation Framework

Aggregation FrameworkDeclarative (BSON, not JavaScript)Implemented in C++Flexible and functional

Operation pipelineComputational expressions

Plays nice with sharding

Page 9: Introduction to the New Aggregation Framework

Pipeline

Page 10: Introduction to the New Aggregation Framework

PipelineProcess a stream of documents

Original input is a collectionFinal output is a result document

Series of operatorsFilter or transform dataInput/output chain

ps ax | grep mongod | head ­n 1

Page 11: Introduction to the New Aggregation Framework

Pipeline Operators$match$project$group$unwind$sort$limit$skip

Page 12: Introduction to the New Aggregation Framework

$matchFilter documentsPlace early in the pipeline if possibleUses existing query syntaxNo geospatial operations or $where

Page 13: Introduction to the New Aggregation Framework

$matchMatching Field Values

{  title: "The Great Gatsby",  pages: 218,  language: "English"}

{  title: "War and Peace",  pages: 1440,  language: "Russian"}

{  title: "Atlas Shrugged",  pages: 1088,  language: "English"}

{ $match: {  language: "Russian"}}

{  title: "War and Peace",  pages: 1440,  language: "Russian"}

Page 14: Introduction to the New Aggregation Framework

$matchMatching with Query Operators

{  title: "The Great Gatsby",  pages: 218,  language: "English"}

{  title: "War and Peace",  pages: 1440,  language: "Russian"}

{  title: "Atlas Shrugged",  pages: 1088,  language: "English"}

{ $match: {  pages: { $gt: 1000 }}}

{  title: "War and Peace",  pages: 1440,  language: "Russian"}

{  title: "Atlas Shrugged",  pages: 1088,  language: "English"}

Page 15: Introduction to the New Aggregation Framework

$projectReshape documentsInclude, exclude or rename fieldsInject computed fieldsCreate sub-document fields

Page 16: Introduction to the New Aggregation Framework

$projectIncluding and Excluding Fields

{  _id: 375,  title: "The Great Gatsby",  ISBN: "9781857150193",  available: true,  pages: 218,  subjects: [    "Long Island",    "New York",    "1920s"  ],  language: "English"}

{ $project: {  _id: 0,  title: 1,  language: 1}}

{  title: "The Great Gatsby",  language: "English"}

Page 17: Introduction to the New Aggregation Framework

$projectRenaming and Computing Fields

{  _id: 375,  title: "The Great Gatsby",  ISBN: "9781857150193",  available: true,  pages: 218,  subjects: [    "Long Island",    "New York",    "1920s"  ],  language: "English"}

{ $project: {  upperTitle: {    $toUpper: "$title"  },  lang: "$language"}}

{  _id: 375,  title: "THE GREAT GATSBY",  lang: "English"}

Page 18: Introduction to the New Aggregation Framework

$projectCreating Sub-document Fields

{  _id: 375,  title: "The Great Gatsby",  ISBN: "9781857150193",  available: true,  pages: 218,  subjects: [    "Long Island",    "New York",    "1920s"  ],  language: "English"}

{ $project: {  title: 1,  stats: {    pages: "$pages",    language: "$language",  }}}

{  _id: 375,  title: "The Great Gatsby",  stats: {    pages: 218,    language: "English"  }}

Page 19: Introduction to the New Aggregation Framework

$groupGroup documents by an ID

Field path referenceObject with multiple referencesConstant value

Other output fields are computed$max, $min, $avg, $sum$addToSet, $push$first, $last

Processes all data in memory

Page 20: Introduction to the New Aggregation Framework

$groupCalculating an Average

{  title: "The Great Gatsby",  pages: 218,  language: "English"}

{  title: "War and Peace",  pages: 1440,  language: "Russian"}

{  title: "Atlas Shrugged",  pages: 1088,  language: "English"}

{ $group: {  _id: "$language",  avgPages: { $avg: "$pages" }}}

{  _id: "Russian",  avgPages: 1440}

{  _id: "English",  avgPages: 653}

Page 21: Introduction to the New Aggregation Framework

$groupSummating Fields and Counting

{  title: "The Great Gatsby",  pages: 218,  language: "English"}

{  title: "War and Peace",  pages: 1440,  language: "Russian"}

{  title: "Atlas Shrugged",  pages: 1088,  language: "English"}

{ $group: {  _id: "$language",  numTitles: { $sum: 1 },  sumPages: { $sum: "$pages" }}}

{  _id: "Russian",  numTitles: 1,  sumPages: 1440}

{  _id: "English",  numTitles: 2,  sumPages: 1306}

Page 22: Introduction to the New Aggregation Framework

$groupCollecting Distinct Values

{  title: "The Great Gatsby",  pages: 218,  language: "English"}

{  title: "War and Peace",  pages: 1440,  language: "Russian"}

{  title: "Atlas Shrugged",  pages: 1088,  language: "English"}

{ $group: {  _id: "$language",  titles: { $addToSet: "$title" }}}

{  _id: "Russian",  titles: [ "War and Peace" ]}

{  _id: "English",  titles: [    "Atlas Shrugged",    "The Great Gatsby"  ]}

Page 23: Introduction to the New Aggregation Framework

$unwindOperate on an array fieldYield documents for each array value

Nothing for absent or empty fieldsError for non-array fields

Complements $group

Page 24: Introduction to the New Aggregation Framework

$unwindYielding Multiple Documents from One

{  title: "The Great Gatsby",  ISBN: "9781857150193",  subjects: [    "Long Island",    "New York",    "1920s"  ]}

{ $unwind: "$subjects" }

{  title: "The Great Gatsby",  ISBN: "9781857150193",  subjects: "Long Island"}

{  title: "The Great Gatsby",  ISBN: "9781857150193",  subjects: "New York"}

{  title: "The Great Gatsby",  ISBN: "9781857150193",  subjects: "1920s"}

Page 25: Introduction to the New Aggregation Framework

$sortSort documents by one or more fieldsUses familiar cursor formatWaits for earlier pipeline operator to returnIn-memory unless early and indexed

Page 26: Introduction to the New Aggregation Framework

$sortSort All Documents in the Pipeline

{ title: "The Great Gatsby" }

{ title: "Brave New World" }

{ title: "The Grapes of Wrath" }

{ title: "Animal Farm" }

{ title: "Lord of the Flies" }

{ title: "Fathers and Sons" }

{ title: "Invisible Man" }

{ title: "Fahrenheit 451" }

{ $sort: { title: 1 }}

{ title: "Animal Farm" }

{ title: "Brave New World" }

{ title: "Fahrenheit 451" }

{ title: "Fathers and Sons" }

{ title: "Invisible Man" }

{ title: "Lord of the Flies" }

{ title: "The Grapes of Wrath" }

{ title: "The Great Gatsby" }

Page 27: Introduction to the New Aggregation Framework

$limitLimit Documents through the Pipeline

{ title: "The Great Gatsby" }

{ title: "Brave New World" }

{ title: "The Grapes of Wrath" }

{ title: "Animal Farm" }

{ title: "Lord of the Flies" }

{ title: "Fathers and Sons" }

{ title: "Invisible Man" }

{ title: "Fahrenheit 451" }

{ $limit: 5 }

{ title: "The Great Gatsby" }

{ title: "Brave New World" }

{ title: "The Grapes of Wrath" }

{ title: "Animal Farm" }

{ title: "Lord of the Flies" }

Page 28: Introduction to the New Aggregation Framework

$skipSkip Over Documents in the Pipeline

{ title: "The Great Gatsby" }

{ title: "Brave New World" }

{ title: "The Grapes of Wrath" }

{ title: "Animal Farm" }

{ title: "Lord of the Flies" }

{ title: "Fathers and Sons" }

{ title: "Invisible Man" }

{ title: "Fahrenheit 451" }

{ $skip: 5 }

{ title: "Fathers and Sons" }

{ title: "Invisible Man" }

{ title: "Fahrenheit 451" }

Page 29: Introduction to the New Aggregation Framework

Extending the FrameworkAdding new pipeline operators, expressions$out and $tee for output control

https://jira.mongodb.org/browse/SERVER-3253

Page 30: Introduction to the New Aggregation Framework

Expressions

Page 31: Introduction to the New Aggregation Framework

ExpressionsReturn computed valuesUsed with $project and $groupMay be nested

Page 32: Introduction to the New Aggregation Framework

Boolean OperatorsInput array of one or more values

$and, $orShort-circuit logic

Inverse values with $notBSON standard for converting non-booleans

null, undefined, zero values → falseNon-zero values, strings, dates, objects → true

Page 33: Introduction to the New Aggregation Framework

Comparison OperatorsCompare numbers, strings, and datesInput array with two operands

$cmp, $eq, $ne$gt, $gte, $lt, $lte

Page 34: Introduction to the New Aggregation Framework

Arithmetic OperatorsInput array of one or more numbers

$add, $multiplyInput array of two operands

$subtract, $divide, $mod

Page 35: Introduction to the New Aggregation Framework

String Operators$strcasecmp case-insensitive comparison

$cmp is case-sensitive$toLower and $toUpper case change$substr for sub-string extractionNot encoding awareAssumes Latin alphabet

Page 36: Introduction to the New Aggregation Framework

Date OperatorsExtract values from date objects

$dayOfYear, $dayOfMonth, $dayOfWeek$year, $month, $week$hour, $minute, $second

Page 37: Introduction to the New Aggregation Framework

Conditional Operators$cond ternary operator$ifNull

Page 38: Introduction to the New Aggregation Framework

Usage

Page 39: Introduction to the New Aggregation Framework

Usagecollection.aggregate() method

Mongo shellMost drivers

aggregate database commandResult limited by BSON document size

Page 40: Introduction to the New Aggregation Framework

Collection Method

db.books.aggregate([  { $project: { language: 1 }},  { $group: { _id: "$language", numTitles: { $sum: 1 }}}])

{  result: [    { _id: "Russian", numTitles: 1 },    { _id: "English", numTitles: 2 }  ],  ok: 1}

Page 41: Introduction to the New Aggregation Framework

Database Command

db.runCommand({  aggregate: "books",  pipeline: [    { $project: { language: 1 }},    { $group: { _id: "$language", numTitles: { $sum: 1 }}}  ]})

{  result: [    { _id: "Russian", numTitles: 1 },    { _id: "English", numTitles: 2 }  ],  ok: 1}

Page 42: Introduction to the New Aggregation Framework

Optimization

Page 43: Introduction to the New Aggregation Framework

Early Filtering[  { $match:   { /* match criteria         */ }},  { $sort:    { /* sort matched documents */ }},  { $limit:   { /* limit document stream  */ }},  { $group:   { /* group by a field       */ }},  { $sort:    { /* sort by computed value */ }},  { $project: { /* reshape result         */ }}]

Page 44: Introduction to the New Aggregation Framework

Early Filtering

db.collection  .find({ /* match criteria         */ })  .sort({ /* sort matched documents */ })  .limit( /* limit document stream  */  )

[  { $group:   { /* group by a field       */ }},  { $sort:    { /* sort by computed value */ }},  { $project: { /* reshape result         */ }}]

Page 45: Introduction to the New Aggregation Framework

Memory Usage$group and $sort entirely in-memoryOperation memory usage limits

Warning at >5%Error at >10%

Page 46: Introduction to the New Aggregation Framework

Sharding

Page 47: Introduction to the New Aggregation Framework

ShardingSplit the pipeline at first $group or $sort

Shards execute pipeline before that pointmongos merges results and continues

Early $match may excuse shardsCPU and memory implications for mongos

Page 48: Introduction to the New Aggregation Framework

Sharding[  { $project: { /* select fields     */ }},  { $sort:    { /* sort by timestamp */ }},  { $group:   { /* group by minute   */ }},  { $sort:    { /* sort by timestamp */ }},  { $project: { /* reshape result    */ }}]

Page 49: Introduction to the New Aggregation Framework

Shardingshard1$project$sort

shard2$project$sort

shard3$project$sort

↙mongos$group$sort$project

↓Result

Page 50: Introduction to the New Aggregation Framework

Demo Time

Page 51: Introduction to the New Aggregation Framework

Thanks!mongodb.org/downloadsdocs.mongodb.org/manual/aggregation

Questions?

Page 52: Introduction to the New Aggregation Framework

Photo Creditshttp://dilbert.com/strips/comic/2012-09-05

Page 53: Introduction to the New Aggregation Framework