Using NoSQL with Yo' SQL

53
Using NoSQL with Yo’ SQL Supplementing your app with a slice of MongoDB Rich Thornett Dribbble Thursday, June 9, 2011

description

Supplementing a relational database application with MongoDB.

Transcript of Using NoSQL with Yo' SQL

Page 1: Using NoSQL with Yo' SQL

Using NoSQL with Yo’ SQLSupplementing your app with a slice of MongoDB

Rich ThornettDribbble

Thursday, June 9, 2011

Page 2: Using NoSQL with Yo' SQL

DribbbleWhat are you working on?

Show and tell for creatives via screenshots

Thursday, June 9, 2011

Page 3: Using NoSQL with Yo' SQL

Your Father's WebappDribbble is a typical web application:

Ruby on Rails + Relational Database

We <3 PostgreSQL

But for certain tasks ...

Thursday, June 9, 2011

Page 4: Using NoSQL with Yo' SQL

Alternative Values

More flexible data structures

Easier horizontal scaling

log | scale | optimize | aggregate | cache

Thursday, June 9, 2011

Page 5: Using NoSQL with Yo' SQL

NoSQLNo == Not Only(but sounds a bit stronger, no?)

• No: Fixed table schemas• No: Joins• Yes: Scale horizontally

ExamplesMemcached, Redis, CouchDB, Cassandra, MongoDB ...

Thursday, June 9, 2011

Page 6: Using NoSQL with Yo' SQL

Exploring MongoDB• Persistent data store• Powerful query language (closest to RDBMs)• Broad feature set• Great community and documentation

Utility belt that fits us?

Thursday, June 9, 2011

Page 7: Using NoSQL with Yo' SQL

What is MongoDB?A document-oriented NoSQL database

Collections & Documentsv.

Tables & Rows

Thursday, June 9, 2011

Page 8: Using NoSQL with Yo' SQL

What's a document?Our old friend JavaScript

{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"

},tags: ['can', 'haz', 'arrays']

}

Documents are BSON (binary encoded JSON)

Thursday, June 9, 2011

Page 9: Using NoSQL with Yo' SQL

Embedded DocumentsAvoid joins for "belongs to" associations

{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"

},tags: ['can', 'haz', 'arrays']

})

Thursday, June 9, 2011

Page 10: Using NoSQL with Yo' SQL

Arrays

{_id: ObjectId("4ddfe31db6bc16ab615e573d"),description: "This is a BSON document",embedded_doc: {description: "I belong to my parent document"

},tags: ['can', 'haz', 'arrays']

})

Avoid joins for "tiny relations"

thing tagsthing_taggings

Relational Cruft

Thursday, June 9, 2011

Page 11: Using NoSQL with Yo' SQL

Googley“With MongoDB we can ... grow our data set horizontally on a cluster of commodity hardware and do distributed

(read parallel execution of) queries/updates/inserts/deletes.”

--Markus Gattolhttp://www.markus-gattol.name/ws/mongodb.html

Thursday, June 9, 2011

Page 12: Using NoSQL with Yo' SQL

Replica Sets

• Read Scaling

• Data Redundancy

• Automated Failover

• Maintenance

• Disaster Recovery

Automate the storing of multiple copies of data

Thursday, June 9, 2011

Page 13: Using NoSQL with Yo' SQL

Dude, who sharded?Relax, not you.

YouSpecify a shard key for a collection

MongoPartitions the collection across machines

ApplicationBlissfully unaware (mostly :)

Auto-sharding

Thursday, June 9, 2011

Page 14: Using NoSQL with Yo' SQL

CoSQL

Cachin

gAnalytics

LoggingScali

ng

Flexibility

MongoDB

MIND THE APP

WEBAPP

RDBMS

Thursday, June 9, 2011

Page 15: Using NoSQL with Yo' SQL

Ads

Let's Mongo!

• Orthogonal to primary app

• Few joins

• Integrity not critical

Thursday, June 9, 2011

Page 16: Using NoSQL with Yo' SQL

From the Console

db.ads.insert({advertiser_id: 1,type: 'text',url: 'http://dribbbler-on-the-roof.com',copy: 'Watch me!',runs: [{start: new Date(2011, 4, 7),end: new Date(2011, 4, 14)

}],created_at: new Date()

})

Create a text ad

But there are drivers for all major languages

Thursday, June 9, 2011

Page 17: Using NoSQL with Yo' SQL

QueryingQuery by match

db.ads.find({advertiser_id: 1})

Paging active ads// Page 2 of text ads running this monthdb.ads.find({ type: 'text',runs: {$elemMatch: {start: {$lte: new Date(2011, 4, 10)},end: {$gte: new Date(2011, 4, 10)}

}}

}).sort({created_at: -1}).skip(15).limit(15)Thursday, June 9, 2011

Page 18: Using NoSQL with Yo' SQL

Advanced Queries$gt$lt$gte$lte$all$exists

$size$type$elemMatch$not$where

$mod$ne$in$nin$nor$or

http://www.mongodb.org/display/DOCS/Advanced+Queries

count | distinct | groupGroup does not work across shards, use map/reduce instead.

Thursday, June 9, 2011

Page 19: Using NoSQL with Yo' SQL

Polymorphism// Banner ad has additional fieldsdb.ads.insert({

advertiser_id: 1,type: 'banner',url: 'http://dribbble-me-this.com',copy: 'Buy me!',runs: [],image_file_name: 'ad.png',image_content_type: 'image/png',image_file_size: '33333'

})

Easy inheritance. Document has whatever fields it needs.

Single | Multiple | Joinedtable inheritance all present difficulties

No DB changes to create new subclasses in MongoThursday, June 9, 2011

Page 20: Using NoSQL with Yo' SQL

Logging

• Scale and query horizontally

• Add fields on the fly

• Writes: Fast, asynchronous, atomic

Thursday, June 9, 2011

Page 21: Using NoSQL with Yo' SQL

Volume Logging

• Ad impressions

• Screenshot views

• Profile views

Fast, asynchronous writes and sharding FTW!

Thursday, June 9, 2011

Page 22: Using NoSQL with Yo' SQL

Real-time Analyticsdb.trends.update( {date: "2011-04-10 13:00"}, // search criteria { $inc: { // increment 'user.simplebits.likes_received': 1, 'country.us.likes_received': 1, 'city.boston.likes_received': 1 } }, true // upsert)

What people and locations are trending this hour?

upsert: Update document (if present) or insert it$inc: Increment field by amount (if present) or set to amount

Thursday, June 9, 2011

Page 23: Using NoSQL with Yo' SQL

Flex Benefits

• Add/nest new fields to measure with ease

• Atomic upsert with $incReplaces two-step, transactional find-and-update/create

• Live, cached aggregation

Thursday, June 9, 2011

Page 24: Using NoSQL with Yo' SQL

Scouting

Thursday, June 9, 2011

Page 25: Using NoSQL with Yo' SQL

db.users.insert( { name: 'Dan Cederholm',

available: true,skills: ['html', 'css', 'illustration', 'icon design'] }

)

Design a Designer

Thursday, June 9, 2011

Page 26: Using NoSQL with Yo' SQL

db.users.ensureIndex({location: '2d'})db.users.insert( { name: 'Dan Cederholm',

// Salem longitude/latitudelocation: [-70.8972222, 42.5194444],available: true,skills: ['html', 'css', 'illustration', 'icon design'] }

)

Geospatial Indexing

Thursday, June 9, 2011

Page 27: Using NoSQL with Yo' SQL

Search by Location

// Find users in the Boston area who:// are available for work// have expertise in HTML and icon designdb.users.find({ location: {$near: boston, $maxDistance: .7234842}, available: true, skills: {$all: ['html', 'icon design']}})

Within area// $maxDistance: Find users in Boston area (w/in 50 miles)db.users.find({location: {$near: boston, $maxDistance: 0.7234842}})

Within area, matching criteria

boston = [-71.0602778, 42.3583333] // long/lat

Thursday, June 9, 2011

Page 28: Using NoSQL with Yo' SQL

Search PowerFlexible Documents

+Rich Query Language

+Geospatial Indexing

Thursday, June 9, 2011

Page 29: Using NoSQL with Yo' SQL

Stats

Thursday, June 9, 2011

Page 30: Using NoSQL with Yo' SQL

Unique Views

unique = remote_ip address / DAY

a.k.a visitors per day

Thursday, June 9, 2011

Page 31: Using NoSQL with Yo' SQL

CollectionsInput and output

MapReturns 0..N key/value pairs per document

ReduceAggregates values per key

Aggregate by key => GROUP BY in SQL

Map/Reducehttp://www.mongodb.org/display/DOCS/MapReduce

Thursday, June 9, 2011

Page 32: Using NoSQL with Yo' SQL

StrategyTwo-pass map/reduce to calculate unique visitors

Pass 1GROUP BY: profile, visitorCOUNT: visits per visitor per profile

Pass 2GROUP BY: profileCOUNT: visitors

Thursday, June 9, 2011

Page 33: Using NoSQL with Yo' SQL

Profile View Data

// Profile 1{profile_id: 1, remote_ip: '127.0.0.1'}{profile_id: 1, remote_ip: '127.0.0.1'}{profile_id: 1, remote_ip: '127.0.0.2'}

// Profile 2{profile_id: 2, remote_ip: '127.0.0.4'}{profile_id: 2, remote_ip: '127.0.0.4'}

Visits on a given day

Thursday, June 9, 2011

Page 34: Using NoSQL with Yo' SQL

Pass 1: Map Function

map = function() { var key = {

profile_id: this.profile_id,remote_ip: this.remote_ip

};

emit(key, {count: 1});}

Count visits per remote_ip per profileKEY = profile, remote_ip

Thursday, June 9, 2011

Page 35: Using NoSQL with Yo' SQL

Reduce Function

reduce = function(key, values) { var count = 0;

values.forEach(function(v) { count += v.count; });

return {count: count};}

Counts(occurrences of key)

Thursday, June 9, 2011

Page 36: Using NoSQL with Yo' SQL

Pass 1: Run Map/Reduce

db.profile_views.mapReduce(map, reduce, {out: 'profile_views_by_visitor'})

// Results: Unique visitors per profiledb.profile_views_by_visitor.find(){ "_id": { "profile_id": 1, "remote_ip": "127.0.0.1" }, "value": { "count": 2 } }{ "_id": { "profile_id": 1, "remote_ip": "127.0.0.2" }, "value": { "count": 1 } }{ "_id": { "profile_id": 2, "remote_ip": "127.0.0.4" }, "value": { "count": 1 } }

Count visits per remote_ip per profile

Thursday, June 9, 2011

Page 37: Using NoSQL with Yo' SQL

Pass 2: Map/Reduce

map = function() { emit(this._id.profile_id, {count: 1});}

Count visitors per profileKEY = profile_id

Thursday, June 9, 2011

Page 38: Using NoSQL with Yo' SQL

Pass 2: Results

// Same reduce function as beforedb.profile_views_by_visitor.mapReduce(map, reduce, {out: 'profile_views_unique'})

// Resultsdb.profile_views_unique.find(){ "_id" : 1, "value" : { "count" : 2 } }{ "_id" : 2, "value" : { "count" : 1 } }

Count visitors per profile

Thursday, June 9, 2011

Page 39: Using NoSQL with Yo' SQL

Map/Deduce

Large data sets, you get:• Horizontal scaling• Parallel processing across cluster

Can be clunkier than GROUP BY in SQL. But ...

JavaScript functions offers flexibility/power

Thursday, June 9, 2011

Page 40: Using NoSQL with Yo' SQL

ActivitySELECT * FROM everything;

Too many tables to JOIN or UNIONThursday, June 9, 2011

Page 41: Using NoSQL with Yo' SQL

Relational solutionDenormalized events table as activity log.

Column | Type | ------------------------+-----------------------------+ id | integer | event_type | character varying(255) | subject_type | character varying(255) | actor_type | character varying(255) | secondary_subject_type | character varying(255) | subject_id | integer | actor_id | integer | secondary_subject_id | integer | recipient_id | integer | secondary_recipient_id | integer | created_at | timestamp without time zone |

We use James Golick’s timeline_fu gem for Rails:https://github.com/jamesgolick/timeline_fu

Thursday, June 9, 2011

Page 42: Using NoSQL with Yo' SQL

DirectionIncoming Activity

(recipients)Generated Activity

(actors)

Thursday, June 9, 2011

Page 43: Using NoSQL with Yo' SQL

ComplicationsMultiple recipients• Subscribe to comments for a shot• Twitter-style @ mentions in comments

Confusing names• Generic names make queries and view logic hard to follow

N+1• Each event may require several lookups to get actor, subject, etc

Thursday, June 9, 2011

Page 44: Using NoSQL with Yo' SQL

Events in Mongo

{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [], // Multiple recipients secondary_recipient_id: 3, created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}

Comment on a Screenshot containing an @ mentionScreenshot owner and @user should be recipients.

Mongo version of our timeline_events table

Thursday, June 9, 2011

Page 45: Using NoSQL with Yo' SQL

Mongo Event v.2

{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1, recipients: [1, 2], recipients: [

{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}

], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}

Why is a user a recipient?

Thursday, June 9, 2011

Page 46: Using NoSQL with Yo' SQL

Mongo Event v.3

{ event_type: "created", subject_type: "Comment", actor_type: "User", subject_id: 999, actor_id: 1 user_id: 1, comment_id 999, screenshot_id: 555, recipients: [

{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}

], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}

Meaningful names

Thursday, June 9, 2011

Page 47: Using NoSQL with Yo' SQL

Mongo Event v.4

{ event_type: "created", subject_type: "Comment", user_id: 1, comment_id: 999, screenshot_id: 999, user: {id: 1, login: "simplebits", avatar: "dancederholm-peek.png"}, comment: {id: 999, text: "Great shot!”}, screenshot: {id: 555, title: "Shot heard around the world"}, recipients: [

{user_id: 2, reason: 'screenshot owner'},{user_id: 3, reason: 'mention'}

], created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"}

Denormalize to eliminate N+1s in view

Thursday, June 9, 2011

Page 48: Using NoSQL with Yo' SQL

Denormalizing?You're giving up RDBMs benefits to optimize.

Optimize your optimizations.

Document flexibility:Data structures can mirror the view

Thursday, June 9, 2011

Page 49: Using NoSQL with Yo' SQL

Caching

• Grabs free memory as needed; no configured cache size• Relies on OS to reclaim memory (LRU)

http://www.mongodb.org/display/DOCS/Caching

MongoDB uses memory-mapped files

Thursday, June 9, 2011

Page 50: Using NoSQL with Yo' SQL

Replace Redis/Memcached?

FREQUENTLY accessed items LIKELY in memory

Good enough for you?One less moving part.

Thursday, June 9, 2011

Page 51: Using NoSQL with Yo' SQL

Cache Namespaces

// Clear collection to expiredb.ads_cache.remove()

'ad_1''ad_2''ad_3'

Memcached keys are flatNo simple way to expire all

Collection

can serve as an expirable namespace

Thursday, June 9, 2011

Page 52: Using NoSQL with Yo' SQL

Time to Mongo?Versatility?

Data structure flexibility worth more than joins?

Easier horizontal scaling?

http://www.mongodb.org

log | scale | optimize | aggregate | cache

Thursday, June 9, 2011

Page 53: Using NoSQL with Yo' SQL

Cheers!

Rich Thornett

Dribbblehttp://dribbble.com

@frogandcode

Thursday, June 9, 2011