Mongo or Die: How MongoDB Powers Doodle or Die

Post on 16-May-2015

5.367 views 3 download

Tags:

description

"Doodle or Die" is a popular online drawing game built on Node.js and MongoDB. It started off as an entry to the 2011 Node Knockout competition and after winning the category for "Most Fun" has continued to grow into a game that has thousands of players play each day who have produced millions of drawings. This talk will use Doodle or Die as a vehicle to showcase the many strengths of using MongoDB as a go-to database for small and midsize applications. It will also cover lessons learned as we used MongoDB to rapidly develop and scale Doodle or Die.

Transcript of Mongo or Die: How MongoDB Powers Doodle or Die

Aaron Silverman

Mongo or Die!How MongoDB powers Doodle or Die

(@Zugwalt)

Doodle or Die

@DoodleOrDie

What is Doodle or Die?

Telephone

I'd like some beer!

I'd like some deer

I see no deer

I’ve no idea!

Phrase

Phrase

Phrase

Phrase

Doodle or Die

Shining Apple

Doodle

Phrase

Phrase Eat your fruit or DIE!

Doodle

What Powers Doodle or Die?

Started very, very small

4 Cores128MB RAM

Node Server

MongoDB Server

Small - 2GB

Got serious about our servers

MongoDB Server8 Cores256MB RAM

Node Server

Large - 5GB

Called in some Reinforcements

MongoDB Server12 Cores1GB RAM

Node Server

In the last 30 Days:

• 2,500,000 page views

• 100,000 uniques

• 35,000 active player accounts

• 2,000,000 new doodles and descriptions

“Small Data”Mongo DB4 GB data size<1 GB index size~10 queries/sec

Amazon170 GB data size8 GB in/month200 GB out/month

MongoHQ: $67• Amazon S3: $70

• Player Info• Chain Info (excluding

doodles)• Group Info• Game State• Logs

• Doodles• Static Content• Compressed

Database Backups

MongoDB - $65 / month

Daily backups to Amazon S3: $16/mo

Amazon S3 - $70 / month

Node - $62/ month

Total Cost To Host: $197/monthMongoDB $65Amazon S3 $70Node $62

Total $197

PAAS Provides Easy Upgrade Path

General Principles

Custom _idPartially random string generated using ShortId node module

ObjectIdObjectId("4fd02d5d78315a502d15cdde")ObjectId("4fd02d5a78315a502d15cddd")ObjectId("4fd02d5878315a502d15cddc")

ShortId"8rOIwh2VD""1qyY61Lu1""5GQnbx-1"

• Shorter, less cumbersome in code and queries

db.players.findOne({_id: ‘58mwYlTKV’});

db.chains.update({_id: ‘58mwYlTKV’}, {$set: activePlayer_id: ‘88ueYaL6V’});

http://doodleordie.com/c/5ONtvvSGH

• Randomness could help with sharding; more importantly makes it harder to cheat

<span class="doodle" data-jsonp="http://doodles.s3.amazonaws.com/d2/Eh8-Po2R5/1Em5kj3LY.js">

Question Oriented Subdocuments

What chain is this player working on right now?

What are this player’s stats?

Which players are not eligible to be assigned this chain?

Goal is for most “Questions” to be able to be answered in one query from one sub document

Related “Questions” will share common ancestors

db.players.findOne({_id: ‘58mwYlTKV’}, {‘game.recentSkips’: 1});

db.players.findOne({_id: ‘58mwYlTKV’}, {game: 1});

Indexes are designed to make answering questions easy!

Which player is working on this chain?

db.players.ensureIndex({‘game.activeChain_id’: 1});

What chains are recently awaiting a new doodle?

db.chains.ensureIndex({inUse: -1, activeState: 1 lastModified" : -1});

Doodle or Die Collections

Disclaimer: Much of the original Doodle or Die code and schema were created during a weekend long hackathon!

players

chains

groups

sessions

log

Primary

Support

Players

players

account

info

login

chainHistory

game

stats QueryFrequency

Often

Rarely

players

account

info

login

chainHistory

stats

game • activeState• activeChain_id• activeStepIndex• recentSkips

Answers question: “What is this player working on right now?”db.players.findOne({_id: ‘58mwYlTKV’}, {game: 1});

players

account

info

login

stats

game

chainHistory

Chain1• datePlayed• dateViewed

ChainN

Answers the question: “What has the player worked on?”db.players.findOne({_id: ‘58mwYlTKV’}, {chainHistory: 1});

players

account

info

login

stats

game

chainHistory

Chain1• datePlayed• dateViewed

ChainN

YUCK!Plan to refactor out along with a refactor of how chains store steps

players

account

info

login

game

chainHistory

Answers the question: “How active/good is this player?”

stats

• totalSteps• drawSteps• phraseSteps• numSkips• numLikes

db.players.findOne({_id: ‘58mwYlTKV’}, {stats: 1});

Chains

chains

• activePlayer_id• activeState• numSteps• lastModified

ineligiblePlayer_idssteps

chains are assigned (not as many questions)db.chains.findOne({inUse: false, activeState : player.game.activeState, ineligiblePlayer_ids: {$ne: player._id}, lastModified: { $gte: timeRange}});

chains

• activePlayer_id• activeState• numSteps• lastModified

*

steps

ineligiblePlayer_ids

[player_id1 player_id2, player_id3 … player_idN]

Note: $addToSet and $pull work great in maintaining this array

chains

• activePlayer_id• activeState• numSteps• lastModified

*ineligiblePlayer_ids

steps

[

]

Step1• player_id• state• date

StepN

content

• phrase

Description Step Content:

Doodle Step Content:• url (points to S3)• time• numStrokes• numDistinctColors

• phrase

Description Step Content:

Doodle Step Content:• url (points to S3)• time• numStrokes• numDistinctColors

doodles

• player_id• date• url (points to S3)• time• numStrokes• numDistinctColors

descriptions

• player_id• date• text

We plan to stop embedding steps in chains, and link to “doodles” and “descriptions” collections

How Does it all Work?

players queried for users who are associated with the authenticated twitter account

chain history used to load thumbnails, previous chain is loaded

db.players.find({‘login.twitter.uid’: ‘XXXXXXX’}, {game: 1, chainHistory: 1});

db.chains.find({_id: {$in: [player.game.activeChain_id, player.game.lastChain_id]}});

chains collection searched and atomically updated for unclaimed eligible chain to be given to player

db.chains.findAndModify( {inUse: false, activeState : player.game.activeState, ineligiblePlayer_ids: {$ne: player._id}, lastModified: { $gte: timeRange}}, {$set: {inUse: true, activePlayer_id: player._id, $addToSet: {ineligiblePlayer_ids: player._id}});

Content saved to chain

db.chains.update({_id: chain._id, activePlayer_id: player._id}}, {$inc: {numSteps: 1}, $set: inUse: false, lastModified: datePlayed} $unset: {activePlayer_id: 1}, $push: {steps: { player_id: player._id, state: chain.activeState, content: content, date: datePlayed}}});

Doodle strokes will be saved to S3(but url to S3 and metadata saved to chain)

{"color":"#000000","size":15,"path":[307,66,308,66,308,68,308,69,308,70,308,71,308,72,308,73,306,76,305,79,305,81,302,83,302,84,302,85,302,86,301,87,300,89,300,90,300,91,300,92,300,95,300,97,300,102,300,103,300,104,300,108,300,109,300,110,301,114,303,116,304,116,305,119,305,121,307,124,309,127,310,130,310,131,313,132,314,133,316,137,317,138,320,140,321,140,323,143,326,143,328,143,333,144,337,144,341,144,343,144,347,143,352,143,354,141,357,140,358,140,359,139,362,138,363,138,365,137,368,135,369,132,370,131,371,130,374,128,375,128,376,125,378,125,379,124,380,123,381,123,382,120,385,119,386,118,388,115,391,112,394,109,394,107,394,106,397,104,397,103,397,102,397,101,397,100,397,98,397,96,397,94,397,93,397,91,397,90,397,88,397,87,397,86,397,83,395,80,395,79,395,77,394,74,393,72,393,70,393,67,392,65,391,63,389,62,388,59,386,57,383,54,383,52,381,49,379,48,378,47,376,46,375,44,374,43,372,43,370,42,369,42,368,42,364,41,362,41,359,41,355,41,351,42,349,42,347,42,346,44,343,44,342,45,339,46,337,47,336,48,333,49,332,51,330,51,328,51,327,52,326,53,323,53,322,53,321,54,320,55,317,55,316,56,314,58,309,59,306,61,306,62,305,62,304,63]}

Player’s chainHistory retrieved

db.chains.find({$in: chain_idArr});

db.players.find({‘login.urlSafe’: urlSafeUid}, {chainHistory: 1});

Retrieved chains ordered (on server) based on previously sorted chainHistory

chainHistory filtered and sorted on server, Applicable chains/steps retrieved

Stats are loaded from the counting log which is essentially a bunch of counters incremented

db.log.find({_id: {$in: [‘2012-06-20’, ‘2012-06-20’, ‘2012-06-20’,

‘2012-06-20’]}});

Extremely simple implementation using nested subdocuments for organization

Some Additional Doodle or Die Tricks

Build assumptions into queries

Eve tries to join despite being banned!

groups: { _id: ‘8rOIwh2VD’, members: [ {name: ‘Alice’, dateJoined: ‘2012-05-24’}, {name: ‘Bob’, dateJoined: ‘2012-05-25’}

] banned: [ {name: ‘Eve’, dateBanned: ‘2012-05-25’}

]}

Alice and Bob are in an awesome group

groups.update( {_id: ‘8rOIwh2VD’}, {$push: {members: {name: ‘Eve’, dateJoined: new Date()}

}, }, function(err) {

if (err) throw err;callback();

});

Bugs in our code fail to detect Eve’s trickery and the update query is run!

groups: { _id: ‘8rOIwh2VD’, members: [ {name: ‘Alice’, dateJoined: ‘2012-05-24’}, {name: ‘Bob’, dateJoined: ‘2012-05-25’}, {name: ‘Eve’, dateJoined: ‘2012-05-26’}

] banned: [ {name: ‘Eve’, dateRequested: ‘2012-05-25’}

]}

Blast! Eve got through! How can we help prevent this?

Bake in our assumptions!

groups.update( {_id: ‘8rOIwh2VD’, ‘members.name’: {$ne: ‘Eve’}, ‘banned.name’: {$ne: ‘Eve’}}, {$push: {members: {name: ‘Eve’, dateJoined: new Date()}

}, }, function(err, updateCount) { if (err) throw err; if (updateCount !== 1) throw new Error(‘bugs!’); callback(updateCount === 1); });

Always Specify Fields

To assign a player a new chain, we only need their _id and game information

players

account

info

login

chainHistory

stats

gamegame

To load a player’s profile content, we just need their history, stats, and profile info

players

account

info

login

chainHistory

stats

gamechainHistory

stats

info

We need to assign Bob a new chain, lets figure out what state he needs next

db.players.find({name: ‘Bob’});

This will fetch and send back the ENTIRE player object!

Lets specify that we only want our “game” subdocument

db.players.find({name: ‘Bob’}, {fields: [‘game’]});

Hooray! Much less to retrieve and send over the series of tubes.

Store Everything!

If you have information available, save it even if you don’t plan on using it!

db.chains.update({_id: ‘2VmVO18hs’}, {$push: {player_id: ‘1qyY61Lu1’, state: ‘draw’, date: new Date(), content: {time: 106896,

count: 27,

width: 520,

height: 390,

step_id: ‘1i7RlFbgU’ } });

Some Pain Points

Less obvious database structure

mysql> DESC players;+----------------+---------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+----------------+---------+------+-----+---------+-------+| id | int(11) | NO | PRI | NULL | || activeStateId | int(11) | YES | | NULL | || activeStepId | int(11) | YES | | NULL | |+----------------+---------+------+-----+---------+-------+

Data integrity up to application

Mysql> ALTER TABLE players ADD CONSTRAINT fk_players_chains FOREIGN KEY (activeChainId) REFERENCES chains(id);

/* Oh no! This is going to mess things up! */

players.update({_id: ‘c58D4’}, {$set: {‘game.activeChain_id’: ‘bad_id’}});

No mature GUI query development tools

If we could start over would we still use MongoDB?

Absolutely!

• NoSQL in general is great for rapid prototyping

• Fantastic performance

• Intuitive language keeps it easy to run one-off queries / refactor schema

• Excellent (and now officially supported) Node driver

Questions?

@DoodleOrDie@Zugwalt