What Drove Wordnik Non-Relational?

35
NoSQL N Why Wordnik wen Tony @feh Now 2011 nt Non-Relational Tam hguy

description

Wordnik's technical co-founder Tony Tam describes the reason for going NoSQL. During his talk Tony will discuss the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability will be covered as well as how NoSQL technologies have changed the way Wordnik scales.

Transcript of What Drove Wordnik Non-Relational?

Page 1: What Drove Wordnik Non-Relational?

NoSQL NWhy Wordnik weny

Tony @feh

Now 2011nt Non-RelationalTam

hguy

Page 2: What Drove Wordnik Non-Relational?

What this Ta

• 5 Key reasons why N R l ti l da Non-Relational da

• Process for selectioProcess for selectio• Optimizations and tip

survivors of the batt

alk is About

Wordnik migrated into t batabasen migrationn, migrationips from living p gtle field

Page 3: What Drove Wordnik Non-Relational?

Why Should

• MongoDB user for a• Lessons learned, an

processprocess• We migrated from Mg

with no downtime• W h i t ti• We have interesting

needs, likely relevan, y

d You Care?

almost 2 yearsnalysis, benefits from

MySQL to MongoDB y g

/ h ll i d tg/challenging data nt to youy

Page 4: What Drove Wordnik Non-Relational?

More on

• World’s fastest upda• Based on input of text • Word Graph as basis tWord Graph as basis t

• Synchronous & asyn

• 10’s of Billions of dostoragestorage

• 20M daily REST APy• Powered by Swagger

Powered APIswagg

Wordnik

ating English dictionaryup to 8k words/secondto our analysisto our analysisnchronous processing

ocuments in NR

PI calls, billions served,OSS API framework

ger.wordnik.com

Page 5: What Drove Wordnik Non-Relational?

Architectu

• 2008: Wordnik was EC2 t kEC2 stack

• 2009: Introduced pu2009: Introduced pupowered wordnik.co

• 2009: drank NoSQL• 2010 S l• 2010: Scala• 2011: Micro SOA2011: Micro SOA

ral History

born as a LAMP AWS

ublic REST APIublic REST API, om, partner APIsL cool-aid

Page 6: What Drove Wordnik Non-Relational?

Non-relational

• Moved to NR becau• Speed• StabilityStability• Scaling• Simplicity

• But• But…• MySQL can go a LON

• Takes right team, rig• NR ff i i l t• NR offerings simply to

scaling MySQL

l by Necessity

use of “4S”

G wayght reasons (+ patience)

lli t fo compelling to focus on

Page 7: What Drove Wordnik Non-Relational?

Wordnik’s 5 WWordnik s 5 WWhys for NoSQLWhys for NoSQL

Page 8: What Drove Wordnik Non-Relational?

Why #1: Speed bu

• Inserting data fast (5d M SQLcaused MySQL may

• Maintaining indexes laa ta g de es a• Operations for consiste

"cannot be turned off”cannot be turned off

• Devised twisted schblocking• Ak h “ / l• Aka the “master/slave

umps with MySQL

50k recs/second) hyhem

argely to blamea ge y to b a eency unnecessary but

hemes to avoid client

”tango”

Page 9: What Drove Wordnik Non-Relational?

Why #2: Retrie

• Objects typically ma• Object Hierarchy alway

• Lots of static data sLots of static data, s• “Noun” is not getting re

lifetime!• Logic like this is probaLogic like this is proba

• Since storage is che• I’ll choose speed

eval Complexity

apped to tablesys => inner + outer joins

so why join?so why join?enamed in my code’s

bly in application logicbly in application logic

eap

Page 10: What Drove Wordnik Non-Relational?

Why #2: Retrieeval Complexity

One definition = 10+

50 requests d!per second!

Page 11: What Drove Wordnik Non-Relational?

Why #2: Retrie

• Embed objects in ro• Fil i ll• Filtering gets really na• Native XML in MySQLy Q

• If a full table-scan is

• OK then cache it!OK, then cache it!• Layers of caching intro

• Stale data/corruptio• Object versionitis• Object versionitis• Cache stampedes

eval Complexity

ows “sort of works”sty

L?s OK…

oduced layers of complexityn

Page 12: What Drove Wordnik Non-Relational?

Why #3: Obje

• Object models beingk f i tsake of persistence

• This is backwards!s s bac a ds• Extra abstraction for th

• OK, then performan• In application joins acr• In-application joins acr• “Who ran the fetch all

–any sysadmin

• “My zillionth ORM laMy zillionth ORM launderstand” (and ca

ect Modeling

g compromised for

he wrong reason

nce suffersross objectsross objectsquery against production?!”

ayer that only Iayer that only I an maintain)

Page 13: What Drove Wordnik Non-Relational?

Why #4:

• Needed "cloud frien• Easy up, easy down!

• Startup: Sync your dStartup: Sync your dclients when ready f

• Sh td A• Shutdown: Announc

• Adding MySQL instaAdding MySQL insta• Snapshot + bin filesmysql> change master tMASTER_USER='xxx', MASMASTER LOG FILE ' tMASTER_LOG_FILE='masteMASTER_LOG_POS=1035435

Scaling

dly storage"

data and announce todata, and announce to for business

d t d lce your departure and leave

ances was a danceances was a dance

to MASTER_HOST='db1', STER_PASSWORD='xxx',

l 000431'er-relay.000431', 5402;

Page 14: What Drove Wordnik Non-Relational?

Why #4:

• What about those V• So convenient! But… • Can the database succCan the database succ

• VM Performance:• Memory, CPU or I/O—• C d t b• Can your database rea

with lots of RAM?

Scaling

VMs?they kind of suckceed on a VM?ceed on a VM?

—Pick only onell d CPU di k I/Oally reduce CPU or disk I/O

Page 15: What Drove Wordnik Non-Relational?

Why #5: B

• BI tools use relational • I hi h i h f• Is this the right reason for

• Can we work around this?

• Let’s have a BI tool revolu

• True service architectu• True service architectuconstraints impractica

• Distributed sharding mconstraints impracticaconstraints impractica

Big Picture

constraints for discoveryh ?r them?

?

ution, too!

ure makes relationalure makes relational l/impossible

makes relational l/impossiblel/impossible

Page 16: What Drove Wordnik Non-Relational?

Why #5: B

• Is your app smarter • The logic line is probab

• What does count(*What does count(add 5k records/sec?• Maybe eventual consis

• 2PC? Do some rea• 2PC? Do some reahttp://eaipatterns.com/docs

Big Picture

than your database?bly blurry!

*) really mean when y) really mean when y?stency is not so bad…

ading and decide!ading and decide!/IEEE_Software_Design_2PC.pd

Page 17: What Drove Wordnik Non-Relational?

Ok, I’

• I thought deciding w• Many quickly maturing• Divergent features tacDivergent features tac

• Wordnik spent 8 wetesting NoSQL solut• This is a long time! (fo• This is a long time! (fo• Wrote ODM classes an

• Surprise! There we• Be prepared to compro

’m in!

was easy!?g productskle different needskle different needs

eeks researching and tionsr a startup)r a startup)nd migrated our data

re surprisesomise

Page 18: What Drove Wordnik Non-Relational?

Choice Made• We went with Mong

• Fastest to implementFastest to implement• Most reliable• Best community

• Wh ?• Why?• Why #1: Fast loading/ry g• Why #2: Fast ODM (50• Why #3: Document Mo• Why #4: MMF => KernWhy #4: MMF Kern• Why #5: It’s 2011, is th

e, Now What?oDB ***

retrieval0 tps => 1000 tps!)odels === Object modelsnel-managed memory + RSnel managed memory RShere no progress?

Page 19: What Drove Wordnik Non-Relational?

More on Wh

• Testing, testing, test• Used our migration too

• Read from MySQLRead from MySQL, • We loaded 5+ billion d

• In the end, one serv• I t 100k d /• Insert 100k records/se• Read 250k records/se• Support concurrent loa

hy MongoDB

tingols to load testwrite to MongoDBwrite to MongoDBocuments, many times over

ver could…t i dec sustained

c sustainedading/reading

Page 20: What Drove Wordnik Non-Relational?

Migration

• Iterated ODM mapp• Some issues

• Type SafetyType Safetycur.next.get(”iWasAnIntOn

• D S i• Dates as Stringsobj.put("a_date", "2011-1

obj.put("a_date", new Dat

• Storage SizeStorage Sizeobj.put("very_long_field_

obj.put("vsfn", true)

& Testing

ping multiple times

nce").asInstanceOf[Long]

12-31") !=

te("2011-12-31"))

_name", true) >>

Page 21: What Drove Wordnik Non-Relational?

Migration

• Expect data model i• Wordnik migrated table

• Easier to migrate teEasier to migrate, te• _id field used same

• Auto Increment?• Used MySQL to “chUsed MySQL to ch

• One row per mon• Run out of seque

• Need exclusive lockNeed exclusive lock

& Testing

iterationse to Mongo collection "as-is”esteste MySQL PK

eck-out” sequenceseck-out sequencesngo collectionnces => get more

ks here!ks here!

Page 22: What Drove Wordnik Non-Relational?

Migration

• Sequence generatoSequenceGenerator.check

• Sequence generatoSequence generato• Centralized UID mana

& Testing

r in-processkout("doc_metadata,100")

r as web servicer as web servicegement

Page 23: What Drove Wordnik Non-Relational?

Migration

• Expect data access • So much more flexibilit

• Reach into objectsReach into objects> db.dictionary_entry.f

• A h l bj• Access to a whole obje• Overwrite a whole objej

• Not always! This clo> db.foo.save({foo:"bar

• Update a single field> db.foo.update({_id:18

& Testing

pattern iterationsty!

find({"hdr.sr":"cmu"})

iect tree at query timeect at once… when desiredobbers the whole recordr”})

d:8727353},{$set:{foo:"bar"}})

Page 24: What Drove Wordnik Non-Relational?

Flip the

• Migrate production w• We temporarily halted • Added a switch to flip bAdded a switch to flip b• Instrument, monitor, fli

• Profiling your code i• Wh t i l ?• What is slow?• Build this in your app f

Switch

with zero downtimeloading databetween MySQL/MongoDBbetween MySQL/MongoDBip it, analyze, flip back

is key

from day 1

Page 25: What Drove Wordnik Non-Relational?

Flip the Switch

Page 26: What Drove Wordnik Non-Relational?

Flip the

• Storage selected at l h h ldval h = shouldUseMongo

case true => new Mo

case _ => new MySQL

}

h.find(...)

• Hot swappable storaHot-swappable stora• It worked!

Switch

runtimeb h {oDb match {

ongoDbSentenceDAO

LDbSentenceDAO

age via configurationage via configuration

Page 27: What Drove Wordnik Non-Relational?

Then W

• Watch our deploymei lmapping layer

• Settled on in-house, tySett ed o ouse, tyhttps://github.com/fehguy

• S t h ( f• Some gotchas (of co• Locking issues on longLocking issues on long

minute)

• W t f th• We want more of th• Migrated shared files tMigrated shared files t• Easy-IT

What?

ent, many iterations to

ype-safe mapper ype sa e appey/mongodb-benchmark-tools

)ourse)g-running updates (more in ag running updates (more in a

i !is!to Mongo GridFSto Mongo GridFS

Page 28: What Drove Wordnik Non-Relational?

Performance +

• Loading data is fast• Fixed collection paddin• Tail of collection is alwTail of collection is alw• Append faster than My

• But... random acces• I d i RAM? Y• Indexes in RAM? Yes• Data in RAM? No, > 2• Limited by disk I/O /se• EC2 EBS f t• EC2 + EBS for storage

+ Optimization

!ng, similarly-sized records

ways in memoryways in memoryySQL in every case tested

ss started getting slows2TB per serverek performance?e?

Page 29: What Drove Wordnik Non-Relational?

Performance +

• Moved to physical d• DAS & 72GB RAM =>

performance

• Good move? Depe• If “access anything any• You want to support thYou want to support th

+ Optimization

data centergreat uncached

nds on use caseytime”, not many options

his?his?

Page 30: What Drove Wordnik Non-Relational?

Performance +

• Inserts are fast, how• Well… update => find • Lock acquired at “find”Lock acquired at find

• If hitting disk, lock ti

• Easy answer, pre-fe• Oh d NEVER d “• Oh, and NEVER do “u

large collection

+ Optimization

w about updates?object, update it, save

” released after “save”, released after saveme could be large

etch on updated t ll d ” i tpdate all records” against a

Page 31: What Drove Wordnik Non-Relational?

Performance +

• Indexes• Can't always keep inde

thing"• Right-balanced b-tree • I d hit di k >• Indexes hit disk => mu

+ Optimization

ex in ram. MMF "does it's

keeps necessary index hottute your pager

17

Page 32: What Drove Wordnik Non-Relational?

More Mong

• We modeled our wo

0M Nodes0M Edges0M Edges0μS edge fetch

go, Please!

ord graph in mongo

Page 33: What Drove Wordnik Non-Relational?

More Mong

• Analytics rolled-up f• Send to Hadoop, load

go, Please!

from aggregation jobsto mongo for fast access

Page 34: What Drove Wordnik Non-Relational?

What’s

• Liberate our models• stop worrying about ho

most part)

• New features almos• Some MySQL left

• Less on each release• Less on each release

s next

sow to store them (for the

st always NR

Page 35: What Drove Wordnik Non-Relational?

Quest

• See more about Wordnik APhttp://devehttp://deve

• Migrating from MySQL to Mohttp://www.slideshare.net/fehguy/mig

• Maintaining your MongoDB http://www.slideshare

• Swagger API Frameworkhttp://sw

• Mapping Benchmarkpp ghttps://github.com/f

• Wordnik OSS ToolsWordnik OSS Toolshttps://github.c

tions?

PIseloper wordnik comeloper.wordnik.com

ongoDBgrating-from-mysql-to-mongodb-at-wordn

Installatione.net/fehguy/mongo-sv-tony-tam

wagger.wordnik.com

fehguy/mongodb-benchmark-tools

com/wordnik/wordnik-oss