Expressivity and Complexity of MongoDB queriesebotoeva/presentations/bccx-ICDT-18... ·...
Transcript of Expressivity and Complexity of MongoDB queriesebotoeva/presentations/bccx-ICDT-18... ·...
Expressivity and Complexity of MongoDB queries
Elena Botoeva
Faculty of Computer Science, Free University of Bozen-Bolzano, Italy
joint work with Diego Calvanese, Benjamin Cogrel, and Guohui Xiao
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 1/22
MongoDB
a document database system• Very popular• Stores JSON-like documents• Offers powerful ad hoc query languages
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 2/22
Example: JSON documentFrom a collection (of documents) about distinguished computer scientists
{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},{"award": "Turing Award", "by": "ACM", "year": 2001},{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}}
Keys
ValuesLiterals
Arrays
Nested Objects
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 3/22
Example: JSON documentFrom a collection (of documents) about distinguished computer scientists
{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},{"award": "Turing Award", "by": "ACM", "year": 2001},{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}}
Keys
ValuesLiterals
Arrays
Nested Objects
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 3/22
Example: JSON documentFrom a collection (of documents) about distinguished computer scientists
{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},{"award": "Turing Award", "by": "ACM", "year": 2001},{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}}
Keys
Values
LiteralsArrays
Nested Objects
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 3/22
Example: JSON documentFrom a collection (of documents) about distinguished computer scientists
{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},{"award": "Turing Award", "by": "ACM", "year": 2001},{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}}
Keys
Values
Literals
Arrays
Nested Objects
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 3/22
Example: JSON documentFrom a collection (of documents) about distinguished computer scientists
{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},{"award": "Turing Award", "by": "ACM", "year": 2001},{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}}
Keys
ValuesLiterals
Arrays
Nested Objects
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 3/22
Example: JSON documentFrom a collection (of documents) about distinguished computer scientists
{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},{"award": "Turing Award", "by": "ACM", "year": 2001},{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}}
Keys
ValuesLiterals
Arrays
Nested Objects
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 3/22
Example: Find query
db.bios.find({$and: [
{"awards.year": {$eq: 1999}} ,{"name.first": {$eq: "Kristen"}}
]},{"name": true , "birth": true}
)
When evaluated over the document about Kristen Nygaard:
{
"_id": 4,
"birth": "1926-08-27",
"name": { "first": "Kristen", "last": "Nygaard" }
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 4/22
Example: Find query
db.bios.find({$and: [
{"awards.year": {$eq: 1999}} ,{"name.first": {$eq: "Kristen"}}
]},{"name": true , "birth": true}
)
When evaluated over the document about Kristen Nygaard:
{
"_id": 4,
"birth": "1926-08-27",
"name": { "first": "Kristen", "last": "Nygaard" }
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 4/22
Example: Aggregation Framework query
Retrieves scientists who received two awards in the same year.
db.bios.aggregate ([{$project: { "name": true ,
"award1": "$awards", "award2": "$awards" } },{$unwind: "$award1"},{$unwind: "$award2"},{$project: { "name": true , "award1": true , "award2": true ,
"twoInOneYear": { $and: [{$eq: ["$award1.year", "$award2.year"]},{$ne: ["$award1.award", "$award2.award"]}
]} }},{$match: { "twoInOneYear": true } },
])
When evaluated over the document about Kristen Nygaard:
{ "_id": 4,"name": {"first": "Kristen", "last": "Nygaard"}"award1": {"award": "Turing Award", "by": "ACM", "year": 2001},"award2": {"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"},"twoInOneYear": true }
This query performs a join within a document.
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 5/22
Example: Aggregation Framework query
Retrieves scientists who received two awards in the same year.
db.bios.aggregate ([{$project: { "name": true ,
"award1": "$awards", "award2": "$awards" } },{$unwind: "$award1"},{$unwind: "$award2"},{$project: { "name": true , "award1": true , "award2": true ,
"twoInOneYear": { $and: [{$eq: ["$award1.year", "$award2.year"]},{$ne: ["$award1.award", "$award2.award"]}
]} }},{$match: { "twoInOneYear": true } },
])
When evaluated over the document about Kristen Nygaard:
{ "_id": 4,"name": {"first": "Kristen", "last": "Nygaard"}"award1": {"award": "Turing Award", "by": "ACM", "year": 2001},"award2": {"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"},"twoInOneYear": true }
This query performs a join within a document.
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 5/22
Example: Aggregation Framework query
Retrieves scientists who received two awards in the same year.
db.bios.aggregate ([{$project: { "name": true ,
"award1": "$awards", "award2": "$awards" } },{$unwind: "$award1"},{$unwind: "$award2"},{$project: { "name": true , "award1": true , "award2": true ,
"twoInOneYear": { $and: [{$eq: ["$award1.year", "$award2.year"]},{$ne: ["$award1.award", "$award2.award"]}
]} }},{$match: { "twoInOneYear": true } },
])
When evaluated over the document about Kristen Nygaard:
{ "_id": 4,"name": {"first": "Kristen", "last": "Nygaard"}"award1": {"award": "Turing Award", "by": "ACM", "year": 2001},"award2": {"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"},"twoInOneYear": true }
This query performs a join within a document.Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 5/22
Example: Another Aggregation Framework query
Retrieves pairs of scientists who received the same award the same year.
db.bios.aggregate ([{$unwind: "$awards"},{$project: { "awards": 1,
"doc._id": "$_id","doc.name": "$name" }},
{$group: { _id: { "awardYear": "$awards.year","awardName": "$awards.award" },
"docs": {$addToSet: "$doc"} }},{$project: { "doc1": "$docs",
"doc2": "$docs" }},{$unwind: "$doc1"},{$unwind: "$doc2"},{$project: { "name1": "$doc1.name",
"name2": "$doc2.name","awardName": "$_id.awardName","awardYear": "$_id.awardYear","toJoin": {$ne: ["$doc1._id", "$doc2._id"]} }},
{$match: {"toJoin": true}}])
This query performs a join across documents.
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 6/22
Example: Another Aggregation Framework query
Retrieves pairs of scientists who received the same award the same year.
db.bios.aggregate ([{$unwind: "$awards"},{$project: { "awards": 1,
"doc._id": "$_id","doc.name": "$name" }},
{$group: { _id: { "awardYear": "$awards.year","awardName": "$awards.award" },
"docs": {$addToSet: "$doc"} }},{$project: { "doc1": "$docs",
"doc2": "$docs" }},{$unwind: "$doc1"},{$unwind: "$doc2"},{$project: { "name1": "$doc1.name",
"name2": "$doc2.name","awardName": "$_id.awardName","awardYear": "$_id.awardYear","toJoin": {$ne: ["$doc1._id", "$doc2._id"]} }},
{$match: {"toJoin": true}}])
This query performs a join across documents.
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 6/22
Our Contributions
1 formalised the JSON data model
2 formalised a fragment of the aggregation framework query language⇒MQuery
3 analysed the expressivity and complexity of MQuery
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 7/22
Formalisation of the data model
{}
4 [ ] 1926-08-27 [ ] 2002-08-10 {}
Kristen Nygaard{} {} {} OOP Simula
RosingPrize
NorwegianData Asso-
ciation
1999 TuringAward
ACM 2001 IEEE Johnvon Neumann
Medal
IEEE 2001
id nameawards birth contribs death
first last0 1 2 0 1
award by year award by year award by year
Document: finite unordered, unranked, node- and edge-labeled tree
Collection: a forest of unique trees (primary key)
Simplifying assumptions (set semantics)• No order between
I documents in the collectionI key-value pairsI values in an array
• Multiplicity of values in an array is ignored
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 8/22
Formalisation of the data model
{}
4 [ ] 1926-08-27 [ ] 2002-08-10 {}
Kristen Nygaard{} {} {} OOP Simula
RosingPrize
NorwegianData Asso-
ciation
1999 TuringAward
ACM 2001 IEEE Johnvon Neumann
Medal
IEEE 2001
id nameawards birth contribs death
first last0 1 2 0 1
award by year award by year award by year
Document: finite unordered, unranked, node- and edge-labeled tree
Collection: a forest of unique trees (primary key)
Simplifying assumptions (set semantics)• No order between
I documents in the collectionI key-value pairsI values in an array
• Multiplicity of values in an array is ignored
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 8/22
MongoDB aggregation framework: MQuery
match unwind project group lookup
• A query is a multi-stage pipeline applied to collection• A stage is a forest transformation operator
match selects trees according to a Boolean criterion
unwind flattens arrays at a given path
project modifies trees by renaming, introducing, or removing paths
group combines different trees, may create arrays
lookup joins input trees with trees in an external collection
We formalised a fragment of this language as MQuery, orMMUPGL.
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 9/22
MongoDB aggregation framework: MQuery
match unwind project group lookup
• A query is a multi-stage pipeline applied to collection• A stage is a forest transformation operator
match selects trees according to a Boolean criterion
unwind flattens arrays at a given path
project modifies trees by renaming, introducing, or removing paths
group combines different trees, may create arrays
lookup joins input trees with trees in an external collection
We formalised a fragment of this language as MQuery, orMMUPGL.
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 9/22
Match operator: µϕSelects trees according the criterion ϕ
Query 1db.bios.aggregate([
{$match: {"name.first": {$eq: "Kristen"}}}])
bios . µname.first=“Kristen”
input = output{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 10/22
Match operator: µϕSelects trees according the criterion ϕ
Query 2db.bios.aggregate([
{$match: {"awards.year": {$eq: 1999}}}])
bios . µawards.year=1999
input = output{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 10/22
Match operator: µϕSelects trees according the criterion ϕ
Query 3db.bios.aggregate([
{$match: {"awards": {$eq: {"award": "Rosing Prize", "year": 1999}}}}])
bios . µawards={”award”: ”Rosing Prize”, ”year”: 1999}
input = output{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 10/22
Match operator: µϕSelects trees according the criterion ϕ
Query 4db.bios.aggregate([
{$match: {"awards": {$eq: {"year": 1999, "award": "Rosing Prize"}}}}])
bios . µawards={”year”: 1999, ”award”: ”Rosing Prize”}
Filtered out by the implementation but kept with our semantics{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 10/22
Unwind operator: ωpFlattens arrays at a given path p
Query 1db.bios.aggregate([
{$unwind: "$awards"}])
bios . ωawards
Input{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 11/22
Unwind operator: ωpFlattens arrays at a given path p
Query 1db.bios.aggregate([
{$unwind: "$awards"}])
bios . ωawards
Output{ "_id": 4,
"awards": {"award": "Rosing Prize", "year": 1999},
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
{ "_id": 4,
"awards": {"award": "Turing Award", "by": "ACM", "year": 2001},
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
...
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 11/22
Unwind operator: ωpFlattens arrays at a given path p
Query 2db.bios.aggregate([
{$unwind: "$publications"}])
bios . ωpublications
Input{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
OutputEmpty
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 11/22
Project operator: ρp/d,...Projects path p according to its definition d
Query 1db.bios.aggregate([
{$project: { "awards": true,"awardNames": "$awards.award","firstName": "$name.first" }}
])
bios . ρawards, awardNames/awards.award, firstName/name.first
Input{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 12/22
Project operator: ρp/d,...Projects path p according to its definition d
Query 1db.bios.aggregate([
{$project: { "awards": true,"awardNames": "$awards.award","firstName": "$name.first" }}
])
bios . ρawards, awardNames/awards.award, firstName/name.first
Output{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"awardNames": [ "Rosing Prize", "Turing Award", "IEEE John von Neumann Medal" ],
"firstName": "Kristen"
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 12/22
Project operator: ρp/d,...Projects path p according to its definition d
Query 2db.bios.aggregate([
{$project: { "calledJohn": {$eq: ["$name.first", "John"]},"sameFirstAndLastNames": {$eq: ["$name.first", "$name.last"]},"newArray": ["$name.first", "$name.last"],"condValue": {cond: { if: {$eq: ["$_id", 4]},
then: "$name.first",else: "$awards" }},
"invisible": "$abc" }}])
bios . ρcalledJohn/(name.first=“John”), sameFirstAndLastNames/(name.first=name.last),newArray/[name.first,name.last], condValue/( id=4?name.first:awards)
Input{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999},
{"award": "Turing Award", "by": "ACM", "year": 2001},
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 12/22
Project operator: ρp/d,...Projects path p according to its definition d
Query 2db.bios.aggregate([
{$project: { "calledJohn": {$eq: ["$name.first", "John"]},"sameFirstAndLastNames": {$eq: ["$name.first", "$name.last"]},"newArray": ["$name.first", "$name.last"],"condValue": {cond: { if: {$eq: ["$_id", 4]},
then: "$name.first",else: "$awards" }},
"invisible": "$abc" }}])
bios . ρcalledJohn/(name.first=“John”), sameFirstAndLastNames/(name.first=name.last),newArray/[name.first,name.last], condValue/( id=4?name.first:awards)
Output{ "_id": 4,
"calledJohn": false,
"sameFirstAndLastNames": false,
"newArray": [ "Kristen", "Nygaard" ],
"condValue": "Kristen"
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 12/22
Group operator: γG:AGroups trees according to G and collects values according to A
Querydb.bios.aggregate([
{$unwind: "$awards"},{$group: { "_id": {"year": "$awards.year"},
"names": {$addToSet: "$name"} }},])
bios . ωawards . γyear/awards.year:names/name
Input{ "_id": 4,
"awards": [ { "award": "Rosing Prize", "year": 1999 },
{ "award": "Turing Award", "year": 2001 },
{ "award": "IEEE John von Neumann Medal", "year": 2001 } ],
"name": { "first": "Kristen", "last": "Nygaard" }
}
{ "_id": 6,
"awards": [ { "award": "Award for the Advancement of Free Software", "year": 2001 },
{ "award": "NLUUG Award", "year": 2003 } ],
"name": { "first": "Guido", "last": "van Rossum" }
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 13/22
Group operator: γG:AGroups trees according to G and collects values according to A
Querydb.bios.aggregate([
{$unwind: "$awards"},{$group: { "_id": {"year": "$awards.year"},
"names": {$addToSet: "$name"} }},])
bios . ωawards . γyear/awards.year:names/name
Output{ "_id": { "year": 2003 },
"names": [ { "first": "Guido", "last": "van Rossum" } ]
},
{ "_id": { "year": 2001 },
"names": [ { "first": "Kristen", "last": "Nygaard" },
{ "first": "Guido", "last": "van Rossum" } ]
},
{ "_id": { "year": 1999 },
"names": [ { "first": "Kristen", "last": "Nygaard" } ]
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 13/22
Lookup operator: λp1=C.p2p
Performs left outer join to the collection C and stores joined documents under p
Querydb.bios.aggregate([
{$unwind: "$awards"},{$group: {_id: {"year": "$awards.year"}, "names": {$addToSet: "$name"} }},{$lookup: { from: "Events", localField: "_id.year",
foreignField: "year",as: "joinedDocs" }}
])
bios . ωawards . γyear/awards.year:names/name . λid.year=Events.year
joinedDocs
bios{ "_id": 4,
"awards": [ { "award": "Rosing Prize", "year": 1999 },
{ "award": "Turing Award", "year": 2001 },
{ "award": "IEEE John von Neumann Medal", "year": 2001 } ],
"name": { "first": "Kristen", "last": "Nygaard" }
}
{ "_id": 6,
"awards": [ { "award": "Award for the Advancement of Free Software", "year": 2001 },
{ "award": "NLUUG Award", "year": 2003 } ],
"name": { "first": "Guido", "last": "van Rossum" }
}
Events{ "_id": 1,
"year": 1997,
"event": "Deep Blue defeats Garry Kasparov"
}
{ "_id": 2,
"year": 1999,
"event": "Melissa virus outbreak"
}
{ "_id": 3,
"year": 1999,
"event": "Jeff Bezos is person of the year"
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 14/22
Lookup operator: λp1=C.p2p
Performs left outer join to the collection C and stores joined documents under p
Querydb.bios.aggregate([
{$unwind: "$awards"},{$group: {_id: {"year": "$awards.year"}, "names": {$addToSet: "$name"} }},{$lookup: { from: "Events", localField: "_id.year",
foreignField: "year",as: "joinedDocs" }}
])
bios . ωawards . γyear/awards.year:names/name . λid.year=Events.year
joinedDocs
Output{ "_id": { "year": 2003 },
"names": [ { "first": "Guido", "last": "van Rossum" } ],
"joinedDocs": []
},
{ "_id": { "year": 2001 },
"names": [ { "first": "Kristen", "last": "Nygaard" },
{ "first": "Guido", "last": "van Rossum" } ]
"joinedDocs": []
},
{ "_id": { "year": 1999 },
"names": [ { "first": "Kristen", "last": "Nygaard" } ]
"joinedDocs": [ { "_id": 2, "year": 1999, "event": "Melissa virus outbreak" },
{ "_id": 3, "year": 1999, "event": "Jeff Bezos is person of the year" } ]
}
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 14/22
Expressivity of MQuery
Characterized in terms of Nested Relational Algebra (NRA)
1 Nested relational view of JSON documents
2 Translation from NRA to MQuery
3 Translation from MQuery to NRA
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 15/22
Nested Relational View
idawards
award yearbirth
contribslit
death name.first name.last
4
Rosing Prize 1999Turing Award 2001IEEE John von
Neumann Medal2001
1926-08-27OOP
Simula2002-08-10 Kristen Nygaard
Only possible for well-typed forests• Each path is typed• Analogous to complex object types and JSON schema
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 16/22
Nested Relational Algebra (NRA)Recap.
• Relational Algebra operators:I SelectionI Extended projectionI Cross-productI UnionI Minus
• Unnest: flattens a nested sub-relationid
awardsaward year
name.first
4Rosing Prize 1999
Turing Award 2001IEEE John von. . . 2001
Kristen
χawards====⇒id award year name.first4 Rosing Prize 1999 Kristen4 Turing Award 2001 Kristen4 IEEE John von. . . 2001 Kristen
• Nest: creates nested sub-relation
id award year name.first4 Rosing Prize 1999 Kristen4 Turing Award 2001 Kristen4 IEEE John von. . . 2001 Kristen
ν{award}→awards==========⇒
idawardsaward
year name.first
4 Rosing Prize 1999 Kristen
4Turing Award
IEEE John von. . .2001 Kristen
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 17/22
Compact Translation from NRA to MQuery
Expressivity
• MQuery (MMUPGL) captures NRA• MMUPG captures NRA over a single collection
Main technical challenge
“Linearize” a tree-shaped NRA expression into a MongoDB pipeline
For two MQueries q1 and q2, we construct a pipeline that does the following:
{}
12 a
{}
11 b
x y x y
F
⇒ ⇒
{}
1 {}
12 a
{}
2 {}
12 a
{}
1 {}
11 b
{}
2 {}
11 b
actRel rel1
x y
actRel rel2
x y
actRel rel1
x y
actRel rel2
x y
{}
1 t1
{}
2 t2
actRel rel1 actRel rel2
t1 ∈ (F . q1) t2 ∈ (F . q2)
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 18/22
Compact Translation from NRA to MQuery
Expressivity
• MQuery (MMUPGL) captures NRA• MMUPG captures NRA over a single collection
Main technical challenge
“Linearize” a tree-shaped NRA expression into a MongoDB pipeline
For two MQueries q1 and q2, we construct a pipeline that does the following:
{}
12 a
{}
11 b
x y x y
F
⇒ ⇒
{}
1 {}
12 a
{}
2 {}
12 a
{}
1 {}
11 b
{}
2 {}
11 b
actRel rel1
x y
actRel rel2
x y
actRel rel1
x y
actRel rel2
x y
{}
1 t1
{}
2 t2
actRel rel1 actRel rel2
t1 ∈ (F . q1) t2 ∈ (F . q2)
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 18/22
Compact Translation from MQuery to NRA
ExpressivityWell-typed MQuery is captured by NRA• Stages that transform well-typed forests into well-typed forests
• match⇒ selection• unwind⇒ unnest• project⇒ projection• group⇒ nest• lookup⇒ left outer join
Challenges
MQuery stages can “look” inside arrays
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 19/22
Complexity of MQuery
Data complexity: AC0
Fragment Query complexity Combined complexityMM LOGSPACE-complete
MMP,MMPGL PTIME-completeMMU LOGSPACE-complete NP-complete
MMUP,MMUL,MMUPL NP-completeMMUG PSPACE-hard
MMUPG,MMUPGL TA[2nO(1),nO(1)]-complete∗
NRA TA[2nO(1),nO(1)]-complete
∗ The class of problems solvable by an alternating Turing machine running inexponential time with polynomially many alternations.
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 20/22
Concluding remarks
Technical reporthttp://arxiv.org/abs/1603.09291
(Expected) Outcomes• Enable the integration of MongoDB within the OBDA framework• Could influence the evolution of MongoDB
Future work• Relaxed notion of well-typedness• Bag and list semantics• New operators (e.g. graph-lookup)
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 21/22
See youat the poster!
Elena Botoeva(FUB) Expressivity and Complexity of MongoDB queries 22/22