2011 Mongo FR - Indexing in MongoDB
-
Upload
antoinegirbal -
Category
Documents
-
view
1.451 -
download
2
Transcript of 2011 Mongo FR - Indexing in MongoDB
MongoDB
Indexing and Query Optimizer Details
Antoine GirbalMongo FRMarch 23, 2011
What will we cover?
Many details of how indexing and the query optimizer work
A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations.
Well discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge).
Much of the material will be presented through examples.
Diagrams are to aid understanding some details will be left out.
Btree (conceptual diagram)
123456789
{_id:4,x:6}
Find One Document
db.c.find( {x:6} ).limit( 1 )
Index {x:1}
Find One Document
1234567896?
{_id:4,x:6}
Find One Document
> db.c.find( {x:6} ).limit( 1 ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 1,"nscannedObjects" : 1,"n" : 1,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {"x" : [[6,6]]}}Uses a btree cursor to find the object. Index ranges are around a single value.
Find One Document
123456789
6?
{_id:4,x:6}
Find One Document
123456789
6?
{_id:4,x:6}
Find One Document
1234566696?
{_id:4,x:6}Now we have duplicate x values
Find One Document
123456669
6?
{_id:4,x:6}
Equality Match
db.c.find( {x:6} )
Index {x:1}
Several documents to be returned
9Equality Match
123456666?
{_id:4,x:6}{_id:5,x:6}{_id:1,x:6}
Equality Match
> db.c.find( {x:6} ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 3,"nscannedObjects" : 3,"n" : 3,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {"x" : [[6,6]]}}
Equality Match
123456669
6?
Full Document Matcher
db.c.find( {x:6,y:1} )
Index {x:1}
Object content needs to be checked
9Full Document Matcher
123456666?
{y:4,x:6}{y:5,x:6}{y:1,x:6}
Full Document Matcher
> db.c.find( {x:6,y:1} ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 3,"nscannedObjects" : 3,"n" : 1,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {"x" : [[6,6]]}}Documents for all matching index keys are scanned, but only one document matched on non index keys.
Range Match
db.c.find( {x:{$gte:4,$lte:7}} )
Index {x:1}
8Range Match
123456794
7{_id:4,x:[8,9]}
8
Multikeys
> db.c.find( {x:{$gt:7}} ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 2,"nscannedObjects" : 2,"n" : 1,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : true,"indexOnly" : false,"indexBounds" : {"x" : [[7,1.7976931348623157e+308]]}}All keys in valid range are scanned, but the matcher rejects duplicate documents making n == 1.
Multikeys
123456789
Range Types
Explicit inequality
db.c.find( {x:{$gt:4,$lt:7}} )
db.c.find( {x:{$gt:4}} )
db.c.find( {x:{$ne:4}} )
Regular expression prefix
db.c.find( {x:/^a/} )
Data typedb.c.find( {x:/a/} )
Range Types
db.c.find( {x:/^a/} )"indexBounds" : {"x" : [["a","b"],[/^a/,/^a/]]}
2 ranges scanned of 2 different types: string and regex
Range Types
db.c.find( {x:/a/} )"indexBounds" : {"x" : [["",{}],[/a/,/a/]]}
Here the index only helps to restrict type, not efficient in practice
Set Match
db.c.find( {x:{$in:[3,6]}} )
Index {x:1}
8Set Match
123456793,
6
Set Match
> db.c.find( {x:{$in:[3,6]}} ).explain(){"cursor" : "BtreeCursor x_1 multi","nscanned" : 3,"nscannedObjects" : 2,"n" : 2,"millis" : 8,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {"x" : [[3,3],[6,6]]}}Why is nscanned 3? This is an algorithmic detail, when there are disjoint ranges for a key nscanned may be higher than the number of matching keys.
Set Match
123456789
All Match
db.c.find( {x:{$all:[3,6]}} )
Index {x:1}
8All Match
123456793?
{_id:4,x:[3,6]}
All Match
> db.c.find( {x:{$all:[3,6]}} ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 1,"nscannedObjects" : 1,"n" : 1,"millis" : 0,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : true,"indexOnly" : false,"indexBounds" : {"x" : [[3,3]]}}The first entry in the $all match array is always used for index bounds. Note this may not be the least numerous indexed value in the $all array.
All Match
123456789
Limit
db.c.find( {x:{$lt:6},y:3} ).limit( 3 )
Index {x:1}
8Limit
123456796? db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 4,"nscannedObjects" : 4,"n" : 3,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : true,"indexOnly" : false,"indexBounds" : {"x" : [[-1.7976931348623157e+308,6]]}}Scan until three matches are found, then stop.
Skip
db.c.find( {x:{$lt:6},y:3} ).skip( 3 )
Index {x:1}
8Skip
123456796? db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 5,"nscannedObjects" : 5,"n" : 1,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : true,"indexOnly" : false,"indexBounds" : {"x" : [[-1.7976931348623157e+308,6]]}}All skipped documents are scanned.
Sort
db.c.find( {x:{$lt:6}} ).sort( {x:1} )
Index {x:1}
Sorting along index key uses index btree ordering
8Sort
123456796? db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 5,"nscannedObjects" : 5,"n" : 4,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : true,"indexOnly" : false,"indexBounds" : {"x" : [[-1.7976931348623157e+308,6]]}}Find uses the btree cursor to easily sort data
Sort
db.c.find( {x:{$lt:6}} ).sort( {y:1} )
Index {x:1}
Using non-indexed key to sort data will need to scan & order
8Sort
123456796? db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 5,"nscannedObjects" : 5,"n" : 4,"scanAndOrder" : true,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : true,"indexOnly" : false,"indexBounds" : {"x" : [[-1.7976931348623157e+308,6]]}}
Sort and scanAndOrder
With scanAndOrder sort, all documents must be touched even if there is a limit spec.
With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.
Count
Count uses the same indexed but only scans in the index, not the object data in storage
With some operators the full document must be checked. Some of these cases:
$all
$size
array match
Negation - $ne, $nin, $not, etc.
With current semantics, all multikey elements must match negation constraints
Multikey de duplication works without loading full document
Covered Indexes
db.c.find( {x:6}, {x:1,_id:0} )
Index {x:1}
Id would be returned by default, but isnt in the index so we need to exclude to return only indexed fields.
Covered Indexes
> db.c.find( {x:6}, {x:1,_id:0} ).explain(){"cursor" : "BtreeCursor x_1","nscanned" : 1,"nscannedObjects" : 1,"n" : 1,"millis" : 0,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : true,"indexBounds" : {"x" : [[6,6]]}}IndexOnly is true, and isMultiKey must be false. Currently we set isMultiKey to true the first time we save a doc where the field is a multikey array.
Two Equality Bounds
db.c.find( {x:5,y:c} )
Index {x:1,y:1}
Two Equality Bounds
?
5
c
1
b
3
d
4
g
5
d
5
f
6
c
7
a
9
b
5
c
Two Equality Bounds
> db.c.find( {x:5,y:'c'} ).explain(){"cursor" : "BtreeCursor x_1_y_1","nscanned" : 1,"nscannedObjects" : 1,"n" : 1,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {"x" : [[5,5]],"y" : [["c","c"]]}}2 Ranges applied to narrow down the data to scan.
Two Equality Bounds
?
1
b
3
d
4
g
5
c
5
d
5
f
5
c
6
c
7
a
9
b
Two Set Bounds
db.c.find( {x:{$in:[5,9]},y:{$in:[c,f]}} )
Index {x:1,y:1}
Two Set Bounds
,
5
c
1
b
3
d
4
g
5
d
5
f
6
c
7
a
9
f
5
c
5
f,
9
c
9
f,
Two Set Bounds
> db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain(){"cursor" : "BtreeCursor x_1_y_1 multi","nscanned" : 5,"nscannedObjects" : 3,"n" : 3,"millis" : 0,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,...
"indexBounds" : {"x" : [[5,5],[9,9]],"y" : [["c","c"],["f","f"]]}}
Disjoint $or Criteria
db.c.find( {$or:[{x:5},{y:d}]} )
Indexes {x:1}, {y:1}
Does 2 sequential find for each clause
Must not return same document twice, so it checks whether it satisfies previous clause
Disjoint $or Criteria
?
1
b
3
d
4
g
5
d
6
a
7
e
9
f
5
c
d
7
g
5
?
1
b
3
d
4
g
5
d
6
a
7
e
9
f
5
c
7
g
Disjoint $or Criteria
> db.c.find( {$or:[{x:5},{y:'d'}]} ).explain(){"clauses" : [{"cursor" : "BtreeCursor x_1","nscanned" : 2,"nscannedObjects" : 2,"n" : 2,"millis" : 0,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {"x" : [[5,5]]}},
{"cursor" : "BtreeCursor y_1","nscanned" : 2,"nscannedObjects" : 2,"n" : 1,"millis" : 1,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {"y" : [["d","d"]]}}],"nscanned" : 4,"nscannedObjects" : 4,"n" : 3,"millis" : 1}
Unindexed $or Clause
db.c.find( {$or:[{x:5},{y:d}]} )
Index {x:1} (no index on y)
Unindexed $or Clause
Since y is not indexed, we must do a full collection scan to match y:d. Since a full scan is required, we dont use the index on x to match x:5.
> db.c.find( {$or:[{x:5},{y:'d'}]} ).explain(){"cursor" : "BasicCursor","nscanned" : 9,"nscannedObjects" : 9,"n" : 3,"millis" : 0,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {}}
Automatic Index Selection
(Query Optimizer)
Optimal Index
find( {x:5} )
Index {x:1}
Index {x:1,y:1}
find( {x:5} ).sort( {y:1 } )Index {x:1,y:1}
find( {} ).sort( {x:1} )Index {x:1}
find( {x:{$gt:1,$lt:7}} ).sort( {x:1} )Index {x:1}
Optimal Index
Rule of Thumb
No scanAndOrder
All fields with index useful constraints are indexed
If there is a range or sort it is the last field of the index used to resolve the query
If multiple optimal indexes exist, one chosen arbitrarily.
Multiple Candidate Indexes
find( {x:4,y:a} )
Index {x:1} or {y:1}?
find( {x:4} ).sort( {y:1} )Index {x:1} or {y:1}?
Note: {x:1,y:1} is optimal
find( {x:{$gt:2,$lt:7},y:{$gt:a,$lt:f}} )Index {x:1,y:1} or {y:1,x:1}?
Multiple Candidate Indexes
The only index selection criterion is nscanned
find( {x:4,y:a} )
Index {x:1} or {y:1} ?
If fewer documents match {y:a} than {x:4} then nscanned for {y:1} will be less so we pick {y:1}
find( {x:{$gt:2,$lt:7},y:{$gt:b,$lt:f}} )Index {x:1,y:1} or {y:1,x:1} ?
If fewer distinct values of 2 < x < 7 than distinct values of b < y < f then {x:1,y:1} chosen (rule of thumb)
Multiple Candidate Indexes
The only index selection criterion is nscanned
Pretty good, but doesnt cover every case, eg
Overhead of using an index versus doing a collection scan
Cost of scanAndOrder vs ordered index
Cost of loading full document vs just index key
Cost of scanning adjacent btree keys vs non adjacent keys/documents
Competing Indexes
At most one query plan per index
Run in interleaved fashion
Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.
Competing Indexes
Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query).
We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.
Learning a Query Plan
When an index is chosen for a query the querys pattern and nscanned are recorded
find( {x:3,y:c} )
{Pattern: {x:equality, y:equality}, Index: {x:1}, nscanned: 50}
find( {x:{$gt:5},y:{$lt:z}} )
{Pattern: {x:gt bound, y:lt bound}, Index: {y:1}, nscanned: 500}
Learning a Query Plan
When a new query matches the same pattern, the same query plan is used
find( {x:5,y:z} )
Use index {x:1}
find( {x:{$gt:20},y:{$lt:b}} )
Use index {y:1}
Un-Learning a Query Plan
100 writes to the collection
Indexes added / removed
Bad Plan Insurance
If nscanned for a new query using a recorded plan is much worse than the recorded nscanned for an earlier query with the same pattern, we start interleaving other plans with the current plan.
Currently much worse means 10x
Thanks!
Feature Requestsjira.mongodb.org
Supportgroups.google.com/group/mongodb-user
Click to edit the title text formatClick to edit Master title style
3/21/11
Click to edit the title text formatClick to edit Master title style
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level
Ninth Outline LevelClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level
3/21/11