What Drove Wordnik Non-Relational?
-
Upload
dataversity -
Category
Technology
-
view
595 -
download
1
description
Transcript of What Drove Wordnik Non-Relational?
NoSQL NWhy Wordnik weny
Tony @feh
Now 2011nt Non-RelationalTam
hguy
What this Ta
• 5 Key reasons why N R l ti l da Non-Relational da
• Process for selectioProcess for selectio• Optimizations and tip
survivors of the batt
alk is About
Wordnik migrated into t batabasen migrationn, migrationips from living p gtle field
Why Should
• MongoDB user for a• Lessons learned, an
processprocess• We migrated from Mg
with no downtime• W h i t ti• We have interesting
needs, likely relevan, y
d You Care?
almost 2 yearsnalysis, benefits from
MySQL to MongoDB y g
/ h ll i d tg/challenging data nt to youy
More on
• World’s fastest upda• Based on input of text • Word Graph as basis tWord Graph as basis t
• Synchronous & asyn
• 10’s of Billions of dostoragestorage
• 20M daily REST APy• Powered by Swagger
Powered APIswagg
Wordnik
ating English dictionaryup to 8k words/secondto our analysisto our analysisnchronous processing
ocuments in NR
PI calls, billions served,OSS API framework
ger.wordnik.com
Architectu
• 2008: Wordnik was EC2 t kEC2 stack
• 2009: Introduced pu2009: Introduced pupowered wordnik.co
• 2009: drank NoSQL• 2010 S l• 2010: Scala• 2011: Micro SOA2011: Micro SOA
ral History
born as a LAMP AWS
ublic REST APIublic REST API, om, partner APIsL cool-aid
Non-relational
• Moved to NR becau• Speed• StabilityStability• Scaling• Simplicity
• But• But…• MySQL can go a LON
• Takes right team, rig• NR ff i i l t• NR offerings simply to
scaling MySQL
l by Necessity
use of “4S”
G wayght reasons (+ patience)
lli t fo compelling to focus on
Wordnik’s 5 WWordnik s 5 WWhys for NoSQLWhys for NoSQL
Why #1: Speed bu
• Inserting data fast (5d M SQLcaused MySQL may
• Maintaining indexes laa ta g de es a• Operations for consiste
"cannot be turned off”cannot be turned off
• Devised twisted schblocking• Ak h “ / l• Aka the “master/slave
umps with MySQL
50k recs/second) hyhem
argely to blamea ge y to b a eency unnecessary but
hemes to avoid client
”tango”
Why #2: Retrie
• Objects typically ma• Object Hierarchy alway
• Lots of static data sLots of static data, s• “Noun” is not getting re
lifetime!• Logic like this is probaLogic like this is proba
• Since storage is che• I’ll choose speed
eval Complexity
apped to tablesys => inner + outer joins
so why join?so why join?enamed in my code’s
bly in application logicbly in application logic
eap
Why #2: Retrieeval Complexity
One definition = 10+
50 requests d!per second!
Why #2: Retrie
• Embed objects in ro• Fil i ll• Filtering gets really na• Native XML in MySQLy Q
• If a full table-scan is
• OK then cache it!OK, then cache it!• Layers of caching intro
• Stale data/corruptio• Object versionitis• Object versionitis• Cache stampedes
eval Complexity
ows “sort of works”sty
L?s OK…
oduced layers of complexityn
Why #3: Obje
• Object models beingk f i tsake of persistence
• This is backwards!s s bac a ds• Extra abstraction for th
• OK, then performan• In application joins acr• In-application joins acr• “Who ran the fetch all
–any sysadmin
• “My zillionth ORM laMy zillionth ORM launderstand” (and ca
ect Modeling
g compromised for
he wrong reason
nce suffersross objectsross objectsquery against production?!”
ayer that only Iayer that only I an maintain)
Why #4:
• Needed "cloud frien• Easy up, easy down!
• Startup: Sync your dStartup: Sync your dclients when ready f
• Sh td A• Shutdown: Announc
• Adding MySQL instaAdding MySQL insta• Snapshot + bin filesmysql> change master tMASTER_USER='xxx', MASMASTER LOG FILE ' tMASTER_LOG_FILE='masteMASTER_LOG_POS=1035435
Scaling
dly storage"
data and announce todata, and announce to for business
d t d lce your departure and leave
ances was a danceances was a dance
to MASTER_HOST='db1', STER_PASSWORD='xxx',
l 000431'er-relay.000431', 5402;
Why #4:
• What about those V• So convenient! But… • Can the database succCan the database succ
• VM Performance:• Memory, CPU or I/O—• C d t b• Can your database rea
with lots of RAM?
Scaling
VMs?they kind of suckceed on a VM?ceed on a VM?
—Pick only onell d CPU di k I/Oally reduce CPU or disk I/O
Why #5: B
• BI tools use relational • I hi h i h f• Is this the right reason for
• Can we work around this?
• Let’s have a BI tool revolu
• True service architectu• True service architectuconstraints impractica
• Distributed sharding mconstraints impracticaconstraints impractica
Big Picture
constraints for discoveryh ?r them?
?
ution, too!
ure makes relationalure makes relational l/impossible
makes relational l/impossiblel/impossible
Why #5: B
• Is your app smarter • The logic line is probab
• What does count(*What does count(add 5k records/sec?• Maybe eventual consis
• 2PC? Do some rea• 2PC? Do some reahttp://eaipatterns.com/docs
Big Picture
than your database?bly blurry!
*) really mean when y) really mean when y?stency is not so bad…
ading and decide!ading and decide!/IEEE_Software_Design_2PC.pd
Ok, I’
• I thought deciding w• Many quickly maturing• Divergent features tacDivergent features tac
• Wordnik spent 8 wetesting NoSQL solut• This is a long time! (fo• This is a long time! (fo• Wrote ODM classes an
• Surprise! There we• Be prepared to compro
’m in!
was easy!?g productskle different needskle different needs
eeks researching and tionsr a startup)r a startup)nd migrated our data
re surprisesomise
Choice Made• We went with Mong
• Fastest to implementFastest to implement• Most reliable• Best community
• Wh ?• Why?• Why #1: Fast loading/ry g• Why #2: Fast ODM (50• Why #3: Document Mo• Why #4: MMF => KernWhy #4: MMF Kern• Why #5: It’s 2011, is th
e, Now What?oDB ***
retrieval0 tps => 1000 tps!)odels === Object modelsnel-managed memory + RSnel managed memory RShere no progress?
More on Wh
• Testing, testing, test• Used our migration too
• Read from MySQLRead from MySQL, • We loaded 5+ billion d
• In the end, one serv• I t 100k d /• Insert 100k records/se• Read 250k records/se• Support concurrent loa
hy MongoDB
tingols to load testwrite to MongoDBwrite to MongoDBocuments, many times over
ver could…t i dec sustained
c sustainedading/reading
Migration
• Iterated ODM mapp• Some issues
• Type SafetyType Safetycur.next.get(”iWasAnIntOn
• D S i• Dates as Stringsobj.put("a_date", "2011-1
obj.put("a_date", new Dat
• Storage SizeStorage Sizeobj.put("very_long_field_
obj.put("vsfn", true)
& Testing
ping multiple times
nce").asInstanceOf[Long]
12-31") !=
te("2011-12-31"))
_name", true) >>
Migration
• Expect data model i• Wordnik migrated table
• Easier to migrate teEasier to migrate, te• _id field used same
• Auto Increment?• Used MySQL to “chUsed MySQL to ch
• One row per mon• Run out of seque
• Need exclusive lockNeed exclusive lock
& Testing
iterationse to Mongo collection "as-is”esteste MySQL PK
eck-out” sequenceseck-out sequencesngo collectionnces => get more
ks here!ks here!
Migration
• Sequence generatoSequenceGenerator.check
• Sequence generatoSequence generato• Centralized UID mana
& Testing
r in-processkout("doc_metadata,100")
r as web servicer as web servicegement
Migration
• Expect data access • So much more flexibilit
• Reach into objectsReach into objects> db.dictionary_entry.f
• A h l bj• Access to a whole obje• Overwrite a whole objej
• Not always! This clo> db.foo.save({foo:"bar
• Update a single field> db.foo.update({_id:18
& Testing
pattern iterationsty!
find({"hdr.sr":"cmu"})
iect tree at query timeect at once… when desiredobbers the whole recordr”})
d:8727353},{$set:{foo:"bar"}})
Flip the
• Migrate production w• We temporarily halted • Added a switch to flip bAdded a switch to flip b• Instrument, monitor, fli
• Profiling your code i• Wh t i l ?• What is slow?• Build this in your app f
Switch
with zero downtimeloading databetween MySQL/MongoDBbetween MySQL/MongoDBip it, analyze, flip back
is key
from day 1
Flip the Switch
Flip the
• Storage selected at l h h ldval h = shouldUseMongo
case true => new Mo
case _ => new MySQL
}
h.find(...)
• Hot swappable storaHot-swappable stora• It worked!
Switch
runtimeb h {oDb match {
ongoDbSentenceDAO
LDbSentenceDAO
age via configurationage via configuration
Then W
• Watch our deploymei lmapping layer
• Settled on in-house, tySett ed o ouse, tyhttps://github.com/fehguy
• S t h ( f• Some gotchas (of co• Locking issues on longLocking issues on long
minute)
• W t f th• We want more of th• Migrated shared files tMigrated shared files t• Easy-IT
What?
ent, many iterations to
ype-safe mapper ype sa e appey/mongodb-benchmark-tools
)ourse)g-running updates (more in ag running updates (more in a
i !is!to Mongo GridFSto Mongo GridFS
Performance +
• Loading data is fast• Fixed collection paddin• Tail of collection is alwTail of collection is alw• Append faster than My
• But... random acces• I d i RAM? Y• Indexes in RAM? Yes• Data in RAM? No, > 2• Limited by disk I/O /se• EC2 EBS f t• EC2 + EBS for storage
+ Optimization
!ng, similarly-sized records
ways in memoryways in memoryySQL in every case tested
ss started getting slows2TB per serverek performance?e?
Performance +
• Moved to physical d• DAS & 72GB RAM =>
performance
• Good move? Depe• If “access anything any• You want to support thYou want to support th
+ Optimization
data centergreat uncached
nds on use caseytime”, not many options
his?his?
Performance +
• Inserts are fast, how• Well… update => find • Lock acquired at “find”Lock acquired at find
• If hitting disk, lock ti
• Easy answer, pre-fe• Oh d NEVER d “• Oh, and NEVER do “u
large collection
+ Optimization
w about updates?object, update it, save
” released after “save”, released after saveme could be large
etch on updated t ll d ” i tpdate all records” against a
Performance +
• Indexes• Can't always keep inde
thing"• Right-balanced b-tree • I d hit di k >• Indexes hit disk => mu
+ Optimization
ex in ram. MMF "does it's
keeps necessary index hottute your pager
17
More Mong
• We modeled our wo
0M Nodes0M Edges0M Edges0μS edge fetch
go, Please!
ord graph in mongo
More Mong
• Analytics rolled-up f• Send to Hadoop, load
go, Please!
from aggregation jobsto mongo for fast access
What’s
• Liberate our models• stop worrying about ho
most part)
• New features almos• Some MySQL left
• Less on each release• Less on each release
s next
sow to store them (for the
st always NR
Quest
• See more about Wordnik APhttp://devehttp://deve
• Migrating from MySQL to Mohttp://www.slideshare.net/fehguy/mig
• Maintaining your MongoDB http://www.slideshare
• Swagger API Frameworkhttp://sw
• Mapping Benchmarkpp ghttps://github.com/f
• Wordnik OSS ToolsWordnik OSS Toolshttps://github.c
tions?
PIseloper wordnik comeloper.wordnik.com
ongoDBgrating-from-mysql-to-mongodb-at-wordn
Installatione.net/fehguy/mongo-sv-tony-tam
wagger.wordnik.com
fehguy/mongodb-benchmark-tools
com/wordnik/wordnik-oss