mongodb + ex.fm

Post on 09-Jun-2015

215 views 1 download

Tags:

description

_id, padding factor, and bucketing, oh my! Slides from my talk at MongoPGH http://www.10gen.com/events/mongodb-pgh May 15, 2012

Transcript of mongodb + ex.fm

mongo @ ex.fm

Lucas HrabovskyCTO

#MongoPGH

ex.fm turns websites into CD’s

browser extensions

_id and indexes

• Bad Ideas– ObjectId("4fb284…") – Big Compound Indexes– Long,VariableWidthStringsMissIndexes

• Good Ideas–Make _id mean something– Fixed Width Hashes– Use _id as a compound index

activity feeds: first attempt

db.user.feed.find({‘username’: ‘lucas’, ‘verb’: ‘love’}).sort({‘created’: -1})

{“_id”: “201109122304-lucas-dan-c7dede43…”, "username”: “lucas”, "created”: 201109122304, "actor”: “dan”, “verb”: “love”}

Working just fine for 4MM documents, but getting slow…

new version of activity feeds

db.user.feed.find({‘vid’: /^lucas-/}).sort({‘vid’: -1})

{“_id”: “201109122304-lucas-dan-c7dede43…”, ”uid”: “lucas-201109122304”, ”vid”: lucas-love-201109122304, "actor”: “dan”}

Fast for all 3 use cases!

removing indexes pays off

Don’t need to buy more/bigger machines!

sites! sites! sites!

padding factor

• Variable document size• Allocate for the latest and fattest• Document moves• Can be very inefficient• More RAM!• Pre-allocate to prevent moves

unbounded embedded lists

• Useful for followers, favorites• Good for a few things, bad for lots• Constantly bumping up padding

factor• Lots of document moves

a metaphor

• You run a coffee shop and can buy only one size of cup. Which size do you buy?

• On average, each customer has only one cup

• Heavy drinkers have hundreds of cups

credit: Macintex macintex.deviantart.com

bucketing!

• Split list across multiple documents• Median number of items = bucket

size• Pre-allocate• Easy seeking and traversal• Much faster

site.meta 1

site.songs 1 site.songs 2

site.meta 2

Allocated and unused

Allocated and full of data

hey charts!

same charts when using bucketing

site.meta 1

site.songs 1 -2

site.songs 1 - 1 site.songs 2 - 1

site.songs 2 -6

site.songs 2 - 3 site.songs 2 - 4

site.songs 2 - 5

site.songs 2 - 2

site.meta 2

Allocated and unused

Allocated and full of data

doesn’t work for everything…

• Picking right bucket size • Defragging• Random insertion– Easy for things you don’t much care

about the order of–More difficult is you’re going to insert

and change the order later

micro documents

db.site.songs.find({_id: /^bfc25de08d964a8a41226c6016dd7753-/}).sort({_id:-1})

{ "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029114", ”s" : 18436532 }{ "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029113", ”s" : 18804590 }{ "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029112", ”s" : 18804591 }

paying it back

• Bent mongoengine to make this easy• Follow github.com/exfm• Also added tooling for– Trace all queries– Aggregate tracing by request

middleware– Raise exceptions when queries miss an

index

thanks!

github.com/exfmlucas@ex.fm