NoSQL Distilled: A Brief Guide to the Emerging World of ... · Chapter 3: More Details on Data...
Transcript of NoSQL Distilled: A Brief Guide to the Emerging World of ... · Chapter 3: More Details on Data...
NoSQLDistilledABriefGuidetotheEmergingWorldofPolyglotPersistence
PramodJ.SadalageMartinFowler
UpperSaddleRiver,NJ•Boston•Indianapolis•SanFranciscoNewYork•Toronto•Montreal•London•Munich•Paris•Madrid
Capetown•Sydney•Tokyo•Singapore•MexicoCity
Manyofthedesignationsusedbymanufacturersandsellerstodistinguishtheirproductsareclaimedastrademarks.Wherethosedesignationsappearinthisbook,andthepublisherwasawareofatrademarkclaim,thedesignationshavebeenprintedwithinitialcapitallettersorinallcapitals.
Theauthorsandpublisherhavetakencareinthepreparationofthisbook,butmakenoexpressedorimpliedwarrantyofanykindandassumenoresponsibilityforerrorsoromissions.Noliabilityisassumedforincidentalorconsequentialdamagesinconnectionwithorarisingoutoftheuseoftheinformationorprogramscontainedherein.
Thepublisheroffersexcellentdiscountsonthisbookwhenorderedinquantityforbulkpurchasesorspecialsales,whichmayincludeelectronicversionsand/orcustomcoversandcontentparticulartoyourbusiness,traininggoals,marketingfocus,andbrandinginterests.Formoreinformation,pleasecontact:
U.S.CorporateandGovernmentSales(800)382–[email protected]
ForsalesoutsidetheUnitedStatespleasecontact:
VisitusontheWeb:informit.com/aw
LibraryofCongressCataloging-in-PublicationData:
Sadalage,PramodJ.NoSQLdistilled:abriefguidetotheemergingworldofpolyglotpersistence/PramodJSadalage,MartinFowler.p.cm.Includesbibliographicalreferencesandindex.ISBN978-0-321-82662-6(pbk.:alk.paper)--ISBN0-321-82662-0(pbk.:alk.paper)1.Databases--Technologicalinnovations.2.Informationstorageandretrievalsystems.I.Fowler,Martin,1963-II.Title.QA76.9.D32S2282013005.74--dc23
Copyright©2013PearsonEducation,Inc.
Allrightsreserved.PrintedintheUnitedStatesofAmerica.Thispublicationisprotectedbycopyright,andpermissionmustbeobtainedfromthepublisherpriortoanyprohibitedreproduction,storageinaretrievalsystem,ortransmissioninanyformorbyanymeans,electronic,mechanical,photocopying,recording,orlikewise.Toobtainpermissiontousematerialfromthiswork,pleasesubmitawrittenrequesttoPearsonEducation,Inc.,PermissionsDepartment,OneLakeStreet,UpperSaddleRiver,NewJersey07458,oryoumayfaxyourrequestto(201)236–3290.
ISBN-13:978-0-321-82662-6ISBN-10:0-321-82662-0TextprintedintheUnitedStatesonrecycledpaperatRRDonnelleyinCrawfordsville,Indiana.Firstprinting,August2012
FormyteachersGajananChinchwadkar,DattatrayaMhaskar,andArvindParchure.You
inspiredmethemost,thankyou.—Pramod
ForCindy—Martin
Contents
Preface
PartI:Understand
Chapter1:WhyNoSQL?1.1TheValueofRelationalDatabases
1.1.1GettingatPersistentData1.1.2Concurrency1.1.3Integration1.1.4A(Mostly)StandardModel
1.2ImpedanceMismatch1.3ApplicationandIntegrationDatabases1.4AttackoftheClusters1.5TheEmergenceofNoSQL1.6KeyPoints
Chapter2:AggregateDataModels2.1Aggregates
2.1.1ExampleofRelationsandAggregates2.1.2ConsequencesofAggregateOrientation
2.2Key-ValueandDocumentDataModels2.3Column-FamilyStores2.4SummarizingAggregate-OrientedDatabases2.5FurtherReading2.6KeyPoints
Chapter3:MoreDetailsonDataModels3.1Relationships3.2GraphDatabases3.3SchemalessDatabases3.4MaterializedViews3.5ModelingforDataAccess3.6KeyPoints
Chapter4:DistributionModels4.1SingleServer4.2Sharding4.3Master-SlaveReplication4.4Peer-to-PeerReplication
4.5CombiningShardingandReplication4.6KeyPoints
Chapter5:Consistency5.1UpdateConsistency5.2ReadConsistency5.3RelaxingConsistency
5.3.1TheCAPTheorem5.4RelaxingDurability5.5Quorums5.6FurtherReading5.7KeyPoints
Chapter6:VersionStamps6.1BusinessandSystemTransactions6.2VersionStampsonMultipleNodes6.3KeyPoints
Chapter7:Map-Reduce7.1BasicMap-Reduce7.2PartitioningandCombining7.3ComposingMap-ReduceCalculations
7.3.1ATwoStageMap-ReduceExample7.3.2IncrementalMap-Reduce
7.4FurtherReading7.5KeyPoints
PartII:Implement
Chapter8:Key-ValueDatabases8.1WhatIsaKey-ValueStore8.2Key-ValueStoreFeatures
8.2.1Consistency8.2.2Transactions8.2.3QueryFeatures8.2.4StructureofData8.2.5Scaling
8.3SuitableUseCases8.3.1StoringSessionInformation8.3.2UserProfiles,Preferences8.3.3ShoppingCartData
8.4WhenNottoUse8.4.1RelationshipsamongData8.4.2MultioperationTransactions8.4.3QuerybyData8.4.4OperationsbySets
Chapter9:DocumentDatabases9.1WhatIsaDocumentDatabase?9.2Features
9.2.1Consistency9.2.2Transactions9.2.3Availability9.2.4QueryFeatures9.2.5Scaling
9.3SuitableUseCases9.3.1EventLogging9.3.2ContentManagementSystems,BloggingPlatforms9.3.3WebAnalyticsorReal-TimeAnalytics9.3.4E-CommerceApplications
9.4WhenNottoUse9.4.1ComplexTransactionsSpanningDifferentOperations9.4.2QueriesagainstVaryingAggregateStructure
Chapter10:Column-FamilyStores10.1WhatIsaColumn-FamilyDataStore?10.2Features
10.2.1Consistency10.2.2Transactions10.2.3Availability10.2.4QueryFeatures10.2.5Scaling
10.3SuitableUseCases10.3.1EventLogging10.3.2ContentManagementSystems,BloggingPlatforms10.3.3Counters10.3.4ExpiringUsage
10.4WhenNottoUse
Chapter11:GraphDatabases11.1WhatIsaGraphDatabase?
11.2Features11.2.1Consistency11.2.2Transactions11.2.3Availability11.2.4QueryFeatures11.2.5Scaling
11.3SuitableUseCases11.3.1ConnectedData11.3.2Routing,Dispatch,andLocation-BasedServices11.3.3RecommendationEngines
11.4WhenNottoUse
Chapter12:SchemaMigrations12.1SchemaChanges12.2SchemaChangesinRDBMS
12.2.1MigrationsforGreenFieldProjects12.2.2MigrationsinLegacyProjects
12.3SchemaChangesinaNoSQLDataStore12.3.1IncrementalMigration12.3.2MigrationsinGraphDatabases12.3.3ChangingAggregateStructure
12.4FurtherReading12.5KeyPoints
Chapter13:PolyglotPersistence13.1DisparateDataStorageNeeds13.2PolyglotDataStoreUsage13.3ServiceUsageoverDirectDataStoreUsage13.4ExpandingforBetterFunctionality13.5ChoosingtheRightTechnology13.6EnterpriseConcernswithPolyglotPersistence13.7DeploymentComplexity13.8KeyPoints
Chapter14:BeyondNoSQL14.1FileSystems14.2EventSourcing14.3MemoryImage14.4VersionControl14.5XMLDatabases
14.6ObjectDatabases14.7KeyPoints
Chapter15:ChoosingYourDatabase15.1ProgrammerProductivity15.2Data-AccessPerformance15.3StickingwiththeDefault15.4HedgingYourBets15.5KeyPoints15.6FinalThoughts
Bibliography
Index
Preface
We’vespentsometwentyyearsintheworldofenterprisecomputing.We’veseenmanythingschangeinlanguages,architectures,platforms,andprocesses.Butthroughallthistimeonethinghasstayedconstant—relationaldatabasesstorethedata.Therehavebeenchallengers,someofwhichhavehadsuccessinsomeniches,butonthewholethedatastoragequestionforarchitectshasbeenthequestionofwhichrelationaldatabasetouse.Thereisalotofvalueinthestabilityofthisreign.Anorganization’sdatalastsmuchlongerthatits
programs(atleastthat’swhatpeopletellus—we’veseenplentyofveryoldprogramsoutthere).It’svaluabletohaveastabledatastoragethat’swellunderstoodandaccessiblefrommanyapplicationprogrammingplatforms.Now,however,there’sanewchallengerontheblockundertheconfrontationaltagofNoSQL.It’s
bornoutofaneedtohandlelargerdatavolumeswhichforcedafundamentalshifttobuildinglargehardwareplatformsthroughclustersofcommodityservers.Thisneedhasalsoraisedlong-runningconcernsaboutthedifficultiesofmakingapplicationcodeplaywellwiththerelationaldatamodel.Theterm“NoSQL”isveryill-defined.It’sgenerallyappliedtoanumberofrecentnonrelational
databasessuchasCassandra,Mongo,Neo4J,andRiak.Theyembraceschemalessdata,runonclusters,andhavetheabilitytotradeofftraditionalconsistencyforotherusefulproperties.AdvocatesofNoSQLdatabasesclaimthattheycanbuildsystemsthataremoreperformant,scalemuchbetter,andareeasiertoprogramwith.Isthisthefirstrattleofthedeathknellforrelationaldatabases,oryetanotherpretendertothe
throne?Ouranswertothatis“neither.”Relationaldatabasesareapowerfultoolthatweexpecttobeusingformanymoredecades,butwedoseeaprofoundchangeinthatrelationaldatabaseswon’tbetheonlydatabasesinuse.OurviewisthatweareenteringaworldofPolyglotPersistencewhereenterprises,andevenindividualapplications,usemultipletechnologiesfordatamanagement.Asaresult,architectswillneedtobefamiliarwiththesetechnologiesandbeabletoevaluatewhichonestousefordifferingneeds.Hadwenotthoughtthat,wewouldn’thavespentthetimeandeffortwritingthisbook.ThisbookseekstogiveyouenoughinformationtoanswerthequestionofwhetherNoSQL
databasesareworthseriousconsiderationforyourfutureprojects.Everyprojectisdifferent,andthere’snowaywecanwriteasimpledecisiontreetochoosetherightdatastore.Instead,whatweareattemptinghereistoprovideyouwithenoughbackgroundonhowNoSQLdatabaseswork,sothatyoucanmakethosejudgmentsyourselfwithouthavingtotrawlthewholeweb.We’vedeliberatelymadethisasmallbook,soyoucangetthisoverviewprettyquickly.Itwon’tansweryourquestionsdefinitively,butitshouldnarrowdowntherangeofoptionsyouhavetoconsiderandhelpyouunderstandwhatquestionsyouneedtoask.
WhyAreNoSQLDatabasesInteresting?WeseetwoprimaryreasonswhypeopleconsiderusingaNoSQLdatabase.
•Applicationdevelopmentproductivity.Alotofapplicationdevelopmenteffortisspentonmappingdatabetweenin-memorydatastructuresandarelationaldatabase.ANoSQLdatabasemayprovideadatamodelthatbetterfitstheapplication’sneeds,thussimplifyingthatinteractionandresultinginlesscodetowrite,debug,andevolve.
•Large-scaledata.Organizationsarefindingitvaluabletocapturemoredataandprocessit
morequickly.Theyarefindingitexpensive,ifevenpossible,todosowithrelationaldatabases.Theprimaryreasonisthatarelationaldatabaseisdesignedtorunonasinglemachine,butitisusuallymoreeconomictorunlargedataandcomputingloadsonclustersofmanysmallerandcheapermachines.ManyNoSQLdatabasesaredesignedexplicitlytorunonclusters,sotheymakeabetterfitforbigdatascenarios.
What’sintheBookWe’vebrokenthisbookupintotwoparts.ThefirstpartconcentratesoncoreconceptsthatwethinkyouneedtoknowinordertojudgewhetherNoSQLdatabasesarerelevantforyouandhowtheydiffer.InthesecondpartweconcentratemoreonimplementingsystemswithNoSQLdatabases.Chapter1beginsbyexplainingwhyNoSQLhashadsucharapidrise—theneedtoprocesslarger
datavolumesledtoashift,inlargesystems,fromscalingverticallytoscalinghorizontallyonclusters.ThisexplainsanimportantfeatureofthedatamodelofmanyNoSQLdatabases—theexplicitstorageofarichstructureofcloselyrelateddatathatisaccessedasaunit.Inthisbookwecallthiskindofstructureanaggregate.Chapter2describeshowaggregatesmanifestthemselvesinthreeofthemaindatamodelsin
NoSQLland:key-value(“Key-ValueandDocumentDataModels,”p.20),document(“Key-ValueandDocumentDataModels,”p.20),andcolumnfamily(“Column-FamilyStores,”p.21)databases.Aggregatesprovideanaturalunitofinteractionformanykindsofapplications,whichbothimprovesrunningonaclusterandmakesiteasiertoprogramthedataaccess.Chapter3shiftstothedownsideofaggregates—thedifficultyofhandlingrelationships(“Relationships,”p.25)betweenentitiesindifferentaggregates.Thisleadsusnaturallytographdatabases(“GraphDatabases,”p.26),aNoSQLdatamodelthatdoesn’tfitintotheaggregate-orientedcamp.WealsolookatthecommoncharacteristicofNoSQLdatabasesthatoperatewithoutaschema(“SchemalessDatabases,”p.28)—afeaturethatprovidessomegreaterflexibility,butnotasmuchasyoumightfirstthink.Havingcoveredthedata-modelingaspectofNoSQL,wemoveontodistribution:Chapter4
describeshowdatabasesdistributedatatorunonclusters.Thisbreaksdownintosharding(“Sharding,”p.38)andreplication,thelatterbeingeithermaster-slave(“Master-SlaveReplication,”p.40)orpeer-to-peer(“Peer-to-PeerReplication,”p.42)replication.Withthedistributionmodelsdefined,wecanthenmoveontotheissueofconsistency.NoSQLdatabasesprovideamorevariedrangeofconsistencyoptionsthanrelationaldatabases—whichisaconsequenceofbeingfriendlytoclusters.SoChapter5talksabouthowconsistencychangesforupdates(“UpdateConsistency,”p.47)andreads(“ReadConsistency,”p.49),theroleofquorums(“Quorums,”p.57),andhowevensomedurability(“RelaxingDurability,”p.56)canbetradedoff.Ifyou’veheardanythingaboutNoSQL,you’llalmostcertainlyhaveheardoftheCAPtheorem;the“TheCAPTheorem”sectiononp.53explainswhatitisandhowitfitsin.Whilethesechaptersconcentrateprimarilyontheprinciplesofhowdatagetsdistributedandkept
consistent,thenexttwochapterstalkaboutacoupleofimportanttoolsthatmakethiswork.Chapter6describesversionstamps,whichareforkeepingtrackofchangesanddetectinginconsistencies.Chapter7outlinesmap-reduce,whichisaparticularwayoforganizingparallelcomputationthatfitsinwellwithclustersandthuswithNoSQLsystems.Oncewe’redonewithconcepts,wemovetoimplementationissuesbylookingatsomeexample
databasesunderthefourkeycategories:Chapter8usesRiakasanexampleofkey-valuedatabases,Chapter9takesMongoDBasanexamplefordocumentdatabases,Chapter10choosesCassandratoexplorecolumn-familydatabases,andfinallyChapter11plucksNeo4Jasanexampleofgraph
databases.Wemuststressthatthisisnotacomprehensivestudy—therearetoomanyouttheretowriteabout,letaloneforustotry.Nordoesourchoiceofexamplesimplyanyrecommendations.Ouraimhereistogiveyouafeelforthevarietyofstoresthatexistandforhowdifferentdatabasetechnologiesusetheconceptsweoutlinedearlier.You’llseewhatkindofcodeyouneedtowritetoprogramagainstthesesystemsandgetaglimpseofthemindsetyou’llneedtousethem.AcommonstatementaboutNoSQLdatabasesisthatsincetheyhavenoschema,thereisno
difficultyinchangingthestructureofdataduringthelifeofanapplication.Wedisagree—aschemalessdatabasestillhasanimplicitschemathatneedschangedisciplinewhenyouimplementit,soChapter12explainshowtododatamigrationbothforstrongschemasandforschemalesssystems.AllofthisshouldmakeitclearthatNoSQLisnotasinglething,norisitsomethingthatwill
replacerelationaldatabases.Chapter13looksatthisfutureworldofPolyglotPersistence,wheremultipledata-storageworldscoexist,evenwithinthesameapplication.Chapter14thenexpandsourhorizonsbeyondthisbook,consideringothertechnologiesthatwehaven’tcoveredthatmayalsobeapartofthispolyglot-persistentworld.Withallofthisinformation,youarefinallyatapointwhereyoucanmakeachoiceofwhatdata
storagetechnologiestouse,soourfinalchapter(Chapter15,“ChoosingYourDatabase,”p.147)offerssomeadviceonhowtothinkaboutthesechoices.Inourview,therearetwokeyfactors—findingaproductiveprogrammingmodelwherethedatastoragemodeliswellalignedtoyourapplication,andensuringthatyoucangetthedataaccessperformanceandresilienceyouneed.SincethisisearlydaysintheNoSQLlifestory,we’reafraidthatwedon’thaveawell-definedproceduretofollow,andyou’llneedtotestyouroptionsinthecontextofyourneeds.Thisisabriefoverview—we’vebeenverydeliberateinlimitingthesizeofthisbook.We’ve
selectedtheinformationwethinkisthemostimportant—sothatyoudon’thaveto.Ifyouaregoingtoseriouslyinvestigatethesetechnologies,you’llneedtogofurtherthanwhatwecoverhere,butwehopethisbookprovidesagoodcontexttostartyouonyourway.Wealsoneedtostressthatthisisaveryvolatilefieldofthecomputerindustry.Importantaspectsof
thesestoresarechangingeveryyear—newfeatures,newdatabases.We’vemadeastrongefforttofocusonconcepts,whichwethinkwillbevaluabletounderstandevenastheunderlyingtechnologychanges.We’reprettyconfidentthatmostofwhatwesaywillhavethislongevity,butabsolutelysurethatnotallofitwill.
WhoShouldReadThisBookOurtargetaudienceforthisbookispeoplewhoareconsideringusingsomeformofaNoSQLdatabase.Thismaybeforanewproject,orbecausetheyarehittingbarriersthataresuggestingashiftonanexistingproject.OuraimistogiveyouenoughinformationtoknowwhetherNoSQLtechnologymakessensefor
yourneeds,andifsowhichtooltoexploreinmoredepth.Ourprimaryimaginedaudienceisanarchitectortechnicallead,butwethinkthisbookisalsovaluableforpeopleinvolvedinsoftwaremanagementwhowanttogetanoverviewofthisnewtechnology.Wealsothinkthatifyou’readeveloperwhowantsanoverviewofthistechnology,thisbookwillbeagoodstartingpoint.Wedon’tgointothedetailsofprogramminganddeployingspecificdatabaseshere—weleavethat
forspecialistbooks.We’vealsobeenveryfirmonapagelimit,tokeepthisbookabriefintroduction.Thisisthekindofbookwethinkyoushouldbeabletoreadonaplaneflight:Itwon’tanswerallyourquestionsbutshouldgiveyouagoodsetofquestionstoask.
Ifyou’vealreadydelvedintotheworldofNoSQL,thisbookprobablywon’tcommitanynewitemstoyourstoreofknowledge.However,itmaystillbeusefulbyhelpingyouexplainwhatyou’velearnedtoothers.MakingsenseoftheissuesaroundNoSQLisimportant—particularlyifyou’retryingtopersuadesomeonetoconsiderusingNoSQLinaproject.
WhatAretheDatabasesInthisbook,we’vefollowedacommonapproachofcategorizingNoSQLdatabasesaccordingtotheirdatamodel.Hereisatableofthefourdatamodelsandsomeofthedatabasesthatfiteachmodel.Thisisnotacomprehensivelist—itonlymentionsthemorecommondatabaseswe’vecomeacross.Atthetimeofwriting,youcanfindmorecomprehensivelistsathttp://nosql-database.organdhttp://nosql.mypopescu.com/kb/nosql.Foreachcategory,wemarkwithitalicsthedatabaseweuseasanexampleintherelevantchapter.Ourgoalistopickarepresentativetoolfromeachofthecategoriesofthedatabases.Whilewetalk
aboutspecificexamples,mostofthediscussionshouldapplytotheentirecategory,eventhoughtheseproductsareuniqueandcannotbegeneralizedassuch.Wewillpickonedatabaseforeachofthekey-value,document,columnfamily,andgraphdatabases;whereappropriate,wewillmentionotherproductsthatmayfulfillaspecificfeatureneed.
Thisclassificationbydatamodelisuseful,butcrude.Thelinesbetweenthedifferentdatamodels,suchasthedistinctionbetweenkey-valueanddocumentdatabases(“Key-ValueandDocumentDataModels,”p.20),areoftenblurry.Manydatabasesdon’tfitcleanlyintocategories;forexample,
OrientDBcallsitselfbothadocumentdatabaseandagraphdatabase.
AcknowledgmentsOurfirstthanksgotoourcolleaguesatThoughtWorks,manyofwhomhavebeenapplyingNoSQLtoourdeliveryprojectsoverthelastcoupleofyears.Theirexperienceshavebeenaprimarysourcebothofourmotivationinwritingthisbookandofpracticalinformationonthevalueofthistechnology.Thepositiveexperiencewe’vehadsofarwithNoSQLdatastoresisthebasisofourviewthatthisisanimportanttechnologyandasignificantshiftindatastorage.We’dalsoliketothankvariousgroupswhohavegivenpublictalks,publishedarticles,andblogs
ontheiruseofNoSQL.Muchprogressinsoftwaredevelopmentgetshiddenwhenpeopledon’tsharewiththeirpeerswhatthey’velearned.ParticularthanksheregotoGoogleandAmazonwhosepapersonBigtableandDynamowereveryinfluentialingettingtheNoSQLmovementgoing.Wealsothankcompaniesthathavesponsoredandcontributedtotheopen-sourcedevelopmentofNoSQLdatabases.AninterestingdifferencewithpreviousshiftsindatastorageisthedegreetowhichtheNoSQLmovementisrootedinopen-sourcework.ParticularthanksgotoThoughtWorksforgivingusthetimetoworkonthisbook.Wejoined
ThoughtWorksataroundthesametimeandhavebeenhereforoveradecade.ThoughtWorkscontinuestobeaveryhospitablehomeforus,asourceofknowledgeandpractice,andawelcomeenvironmentofopenlysharingwhatwelearn—sodifferentfromthetraditionalsystemsdeliveryorganizations.BethanyAnders-Beck,IliasBartolini,TimBerglund,DuncanCraig,PaulDuvall,OrenEini,Perryn
Fowler,MichaelHunger,EricKascic,JoshuaKerievsky,AnandKrishnaswamy,BobbyNorton,AdeOshineye,ThiyaguPalanisamy,PrasannaPendse,DanPritchett,DavidRice,MikeRoberts,MarkoRodriquez,AndrewSlocum,TobyTripp,SteveVinoski,DeanWampler,JimWebber,andWeeWitthawaskulreviewedearlydraftsofthisbookandhelpedusimproveitwiththeiradvice.Additionally,PramodwouldliketothankSchaumburgLibraryforprovidinggreatserviceand
quietspaceforwriting;ArhanaandArula,mybeautifuldaughters,fortheirunderstandingthatdaddywouldgotothelibraryandnottakethemalong;Rupali,mybelovedwife,forherimmensesupportandhelpinkeepingmefocused.
PartI:Understand
Chapter1.WhyNoSQL?
Foralmostaslongaswe’vebeeninthesoftwareprofession,relationaldatabaseshavebeenthedefaultchoiceforseriousdatastorage,especiallyintheworldofenterpriseapplications.Ifyou’reanarchitectstartinganewproject,youronlychoiceislikelytobewhichrelationaldatabasetouse.(Andoftennoteventhat,ifyourcompanyhasadominantvendor.)Therehavebeentimeswhenadatabasetechnologythreatenedtotakeapieceoftheaction,suchasobjectdatabasesinthe1990’s,butthesealternativesnevergotanywhere.Aftersuchalongperiodofdominance,thecurrentexcitementaboutNoSQLdatabasescomesasa
surprise.Inthischapterwe’llexplorewhyrelationaldatabasesbecamesodominant,andwhywethinkthecurrentriseofNoSQLdatabasesisn’taflashinthepan.
1.1.TheValueofRelationalDatabasesRelationaldatabaseshavebecomesuchanembeddedpartofourcomputingculturethatit’seasytotakethemforgranted.It’sthereforeusefultorevisitthebenefitstheyprovide.
1.1.1.GettingatPersistentDataProbablythemostobviousvalueofadatabaseiskeepinglargeamountsofpersistentdata.Mostcomputerarchitectureshavethenotionoftwoareasofmemory:afastvolatile“mainmemory”andalargerbutslower“backingstore.”Mainmemoryisbothlimitedinspaceandlosesalldatawhenyoulosepowerorsomethingbadhappenstotheoperatingsystem.Therefore,tokeepdataaround,wewriteittoabackingstore,commonlyseenadisk(althoughthesedaysthatdiskcanbepersistentmemory).Thebackingstorecanbeorganizedinallsortsofways.Formanyproductivityapplications(such
aswordprocessors),it’safileinthefilesystemoftheoperatingsystem.Formostenterpriseapplications,however,thebackingstoreisadatabase.Thedatabaseallowsmoreflexibilitythanafilesysteminstoringlargeamountsofdatainawaythatallowsanapplicationprogramtogetatsmallbitsofthatinformationquicklyandeasily.
1.1.2.ConcurrencyEnterpriseapplicationstendtohavemanypeoplelookingatthesamebodyofdataatonce,possiblymodifyingthatdata.Mostofthetimetheyareworkingondifferentareasofthatdata,butoccasionallytheyoperateonthesamebitofdata.Asaresult,wehavetoworryaboutcoordinatingtheseinteractionstoavoidsuchthingsasdoublebookingofhotelrooms.Concurrencyisnotoriouslydifficulttogetright,withallsortsoferrorsthatcantrapeventhemost
carefulprogrammers.Sinceenterpriseapplicationscanhavelotsofusersandothersystemsallworkingconcurrently,there’salotofroomforbadthingstohappen.Relationaldatabaseshelphandlethisbycontrollingallaccesstotheirdatathroughtransactions.Whilethisisn’tacure-all(youstillhavetohandleatransactionalerrorwhenyoutrytobookaroomthat’sjustgone),thetransactionalmechanismhasworkedwelltocontainthecomplexityofconcurrency.Transactionsalsoplayaroleinerrorhandling.Withtransactions,youcanmakeachange,andifan
erroroccursduringtheprocessingofthechangeyoucanrollbackthetransactiontocleanthingsup.
1.1.3.IntegrationEnterpriseapplicationsliveinarichecosystemthatrequiresmultipleapplications,writtenby
differentteams,tocollaborateinordertogetthingsdone.Thiskindofinter-applicationcollaborationisawkwardbecauseitmeanspushingthehumanorganizationalboundaries.Applicationsoftenneedtousethesamedataandupdatesmadethroughoneapplicationhavetobevisibletoothers.Acommonwaytodothisisshareddatabaseintegration[HohpeandWoolf]wheremultiple
applicationsstoretheirdatainasingledatabase.Usingasingledatabaseallowsalltheapplicationstouseeachothers’dataeasily,whilethedatabase’sconcurrencycontrolhandlesmultipleapplicationsinthesamewayasithandlesmultipleusersinasingleapplication.
1.1.4.A(Mostly)StandardModelRelationaldatabaseshavesucceededbecausetheyprovidethecorebenefitsweoutlinedearlierina(mostly)standardway.Asaresult,developersanddatabaseprofessionalscanlearnthebasicrelationalmodelandapplyitinmanyprojects.Althoughtherearedifferencesbetweendifferentrelationaldatabases,thecoremechanismsremainthesame:Differentvendors’SQLdialectsaresimilar,transactionsoperateinmostlythesameway.
1.2.ImpedanceMismatchRelationaldatabasesprovidemanyadvantages,buttheyarebynomeansperfect.Evenfromtheirearlydays,therehavebeenlotsoffrustrationswiththem.Forapplicationdevelopers,thebiggestfrustrationhasbeenwhat’scommonlycalledtheimpedance
mismatch:thedifferencebetweentherelationalmodelandthein-memorydatastructures.Therelationaldatamodelorganizesdataintoastructureoftablesandrows,ormoreproperly,relationsandtuples.Intherelationalmodel,atupleisasetofname-valuepairsandarelationisasetoftuples.(Therelationaldefinitionofatupleisslightlydifferentfromthatinmathematicsandmanyprogramminglanguageswithatupledatatype,whereatupleisasequenceofvalues.)AlloperationsinSQLconsumeandreturnrelations,whichleadstothemathematicallyelegantrelationalalgebra.Thisfoundationonrelationsprovidesacertaineleganceandsimplicity,butitalsointroduces
limitations.Inparticular,thevaluesinarelationaltuplehavetobesimple—theycannotcontainanystructure,suchasanestedrecordoralist.Thislimitationisn’ttrueforin-memorydatastructures,whichcantakeonmuchricherstructuresthanrelations.Asaresult,ifyouwanttousearicherin-memorydatastructure,youhavetotranslateittoarelationalrepresentationtostoreitondisk.Hencetheimpedancemismatch—twodifferentrepresentationsthatrequiretranslation(seeFigure1.1).
Figure1.1.Anorder,whichlookslikeasingleaggregatestructureintheUI,issplitintomanyrowsfrommanytablesinarelationaldatabase
Theimpedancemismatchisamajorsourceoffrustrationtoapplicationdevelopers,andinthe1990smanypeoplebelievedthatitwouldleadtorelationaldatabasesbeingreplacedwithdatabasesthatreplicatethein-memorydatastructurestodisk.Thatdecadewasmarkedwiththegrowthofobject-orientedprogramminglanguages,andwiththemcameobject-orienteddatabases—bothlookingtobethedominantenvironmentforsoftwaredevelopmentinthenewmillennium.However,whileobject-orientedlanguagessucceededinbecomingthemajorforcein
programming,object-orienteddatabasesfadedintoobscurity.Relationaldatabasessawoffthechallengebystressingtheirroleasanintegrationmechanism,supportedbyamostlystandardlanguageofdatamanipulation(SQL)andagrowingprofessionaldividebetweenapplicationdevelopersanddatabaseadministrators.Impedancemismatchhasbeenmademucheasiertodealwithbythewideavailabilityofobject-
relationalmappingframeworks,suchasHibernateandiBATISthatimplementwell-knownmappingpatterns[FowlerPoEAA],butthemappingproblemisstillanissue.Object-relationalmappingframeworksremovealotofgruntwork,butcanbecomeaproblemoftheirownwhenpeopletrytoohardtoignorethedatabaseandqueryperformancesuffers.Relationaldatabasescontinuedtodominatetheenterprisecomputingworldinthe2000s,butduring
thatdecadecracksbegantoopenintheirdominance.
1.3.ApplicationandIntegrationDatabasesTheexactreasonswhyrelationaldatabasestriumphedoverOOdatabasesarestillthesubjectofanoccasionalpubdebatefordevelopersofacertainage.Butinourview,theprimaryfactorwastheroleofSQLasanintegrationmechanismbetweenapplications.Inthisscenario,thedatabaseactsasanintegrationdatabase—withmultipleapplications,usuallydevelopedbyseparateteams,storing
theirdatainacommondatabase.Thisimprovescommunicationbecausealltheapplicationsareoperatingonaconsistentsetofpersistentdata.Therearedownsidestoshareddatabaseintegration.Astructurethat’sdesignedtointegratemany
applicationsendsupbeingmorecomplex—indeed,oftendramaticallymorecomplex—thananysingleapplicationneeds.Furthermore,shouldanapplicationwanttomakechangestoitsdatastorage,itneedstocoordinatewithalltheotherapplicationsusingthedatabase.Differentapplicationshavedifferentstructuralandperformanceneeds,soanindexrequiredbyoneapplicationmaycauseaproblematichitoninsertsforanother.Thefactthateachapplicationisusuallyaseparateteamalsomeansthatthedatabaseusuallycannottrustapplicationstoupdatethedatainawaythatpreservesdatabaseintegrityandthusneedstotakeresponsibilityforthatwithinthedatabaseitself.Adifferentapproachistotreatyourdatabaseasanapplicationdatabase—whichisonlydirectly
accessedbyasingleapplicationcodebasethat’slookedafterbyasingleteam.Withanapplicationdatabase,onlytheteamusingtheapplicationneedstoknowaboutthedatabasestructure,whichmakesitmucheasiertomaintainandevolvetheschema.Sincetheapplicationteamcontrolsboththedatabaseandtheapplicationcode,theresponsibilityfordatabaseintegritycanbeputintheapplicationcode.Interoperabilityconcernscannowshifttotheinterfacesoftheapplication,allowingforbetter
interactionprotocolsandprovidingsupportforchangingthem.Duringthe2000swesawadistinctshifttowebservices[Daigneau],whereapplicationswouldcommunicateoverHTTP.Webservicesenabledanewformofawidelyusedcommunicationmechanism—achallengertousingtheSQLwithshareddatabases.(Muchofthisworkwasdoneunderthebannerof“Service-OrientedArchitecture”—atermmostnotableforitslackofaconsistentmeaning.)Aninterestingaspectofthisshifttowebservicesasanintegrationmechanismwasthatitresultedin
moreflexibilityforthestructureofthedatathatwasbeingexchanged.IfyoucommunicatewithSQL,thedatamustbestructuredasrelations.However,withaservice,youareabletousericherdatastructureswithnestedrecordsandlists.TheseareusuallyrepresentedasdocumentsinXMLor,morerecently,JSON.Ingeneral,withremotecommunicationyouwanttoreducethenumberofroundtripsinvolvedintheinteraction,soit’susefultobeabletoputarichstructureofinformationintoasinglerequestorresponse.Ifyouaregoingtouseservicesforintegration,mostofthetimewebservices—usingtextover
HTTP—isthewaytogo.However,ifyouaredealingwithhighlyperformance-sensitiveinteractions,youmayneedabinaryprotocol.Onlydothisifyouaresureyouhavetheneed,astextprotocolsareeasiertoworkwith—considertheexampleoftheInternet.Onceyouhavemadethedecisiontouseanapplicationdatabase,yougetmorefreedomof
choosingadatabase.Sincethereisadecouplingbetweenyourinternaldatabaseandtheserviceswithwhichyoutalktotheoutsideworld,theoutsideworlddoesn’thavetocarehowyoustoreyourdata,allowingyoutoconsidernonrelationaloptions.Furthermore,therearemanyfeaturesofrelationaldatabases,suchassecurity,thatarelessusefultoanapplicationdatabasebecausetheycanbedonebytheenclosingapplicationinstead.Despitethisfreedom,however,itwasn’tapparentthatapplicationdatabasesledtoabigrushto
alternativedatastores.Mostteamsthatembracedtheapplicationdatabaseapproachstuckwithrelationaldatabases.Afterall,usinganapplicationdatabaseyieldsmanyadvantagesevenignoringthedatabaseflexibility(whichiswhywegenerallyrecommendit).Relationaldatabasesarefamiliarandusuallyworkverywellor,atleast,wellenough.Perhaps,giventime,wemighthaveseentheshifttoapplicationdatabasestoopenarealcrackintherelationalhegemony—butsuchcrackscamefrom
anothersource.
1.4.AttackoftheClustersAtthebeginningofthenewmillenniumthetechnologyworldwashitbythebustingofthe1990sdot-combubble.WhilethissawmanypeoplequestioningtheeconomicfutureoftheInternet,the2000sdidseeseverallargewebpropertiesdramaticallyincreaseinscale.Thisincreaseinscalewashappeningalongmanydimensions.Websitesstartedtrackingactivityand
structureinaverydetailedway.Largesetsofdataappeared:links,socialnetworks,activityinlogs,mappingdata.Withthisgrowthindatacameagrowthinusers—asthebiggestwebsitesgrewtobevastestatesregularlyservinghugenumbersofvisitors.Copingwiththeincreaseindataandtrafficrequiredmorecomputingresources.Tohandlethis
kindofincrease,youhavetwochoices:uporout.Scalingupimpliesbiggermachines,moreprocessors,diskstorage,andmemory.Butbiggermachinesgetmoreandmoreexpensive,nottomentionthattherearereallimitsasyoursizeincreases.Thealternativeistouselotsofsmallmachinesinacluster.Aclusterofsmallmachinescanusecommodityhardwareandendsupbeingcheaperatthesekindsofscales.Itcanalsobemoreresilient—whileindividualmachinefailuresarecommon,theoverallclustercanbebuilttokeepgoingdespitesuchfailures,providinghighreliability.Aslargepropertiesmovedtowardsclusters,thatrevealedanewproblem—relationaldatabasesare
notdesignedtoberunonclusters.Clusteredrelationaldatabases,suchastheOracleRACorMicrosoftSQLServer,workontheconceptofashareddisksubsystem.Theyuseacluster-awarefilesystemthatwritestoahighlyavailabledisksubsystem—butthismeanstheclusterstillhasthedisksubsystemasasinglepointoffailure.Relationaldatabasescouldalsoberunasseparateserversfordifferentsetsofdata,effectivelysharding(“Sharding,”p.38)thedatabase.Whilethisseparatestheload,alltheshardinghastobecontrolledbytheapplicationwhichhastokeeptrackofwhichdatabaseservertotalktoforeachbitofdata.Also,weloseanyquerying,referentialintegrity,transactions,orconsistencycontrolsthatcrossshards.Aphraseweoftenhearinthiscontextfrompeoplewho’vedonethisis“unnaturalacts.”Thesetechnicalissuesareexacerbatedbylicensingcosts.Commercialrelationaldatabasesare
usuallypricedonasingle-serverassumption,sorunningonaclusterraisedpricesandledtofrustratingnegotiationswithpurchasingdepartments.Thismismatchbetweenrelationaldatabasesandclustersledsomeorganizationtoconsideran
alternativeroutetodatastorage.Twocompaniesinparticular—GoogleandAmazon—havebeenveryinfluential.Bothwereontheforefrontofrunninglargeclustersofthiskind;furthermore,theywerecapturinghugeamountsofdata.Thesethingsgavethemthemotive.Bothweresuccessfulandgrowingcompanieswithstrongtechnicalcomponents,whichgavethemthemeansandopportunity.Itwasnowondertheyhadmurderinmindfortheirrelationaldatabases.Asthe2000sdrewon,bothcompaniesproducedbriefbuthighlyinfluentialpapersabouttheirefforts:BigTablefromGoogleandDynamofromAmazon.It’softensaidthatAmazonandGoogleoperateatscalesfarremovedfrommostorganizations,so
thesolutionstheyneededmaynotberelevanttoanaverageorganization.Whileit’struethatmostsoftwareprojectsdon’tneedthatlevelofscale,it’salsotruethatmoreandmoreorganizationsarebeginningtoexplorewhattheycandobycapturingandprocessingmoredata—andtorunintothesameproblems.So,asmoreinformationleakedoutaboutwhatGoogleandAmazonhaddone,peoplebegantoexploremakingdatabasesalongsimilarlines—explicitlydesignedtoliveinaworld
ofclusters.Whiletheearliermenacestorelationaldominanceturnedouttobephantoms,thethreatfromclusterswasserious.
1.5.TheEmergenceofNoSQLIt’sawonderfulironythattheterm“NoSQL”firstmadeitsappearanceinthelate90sasthenameofanopen-sourcerelationaldatabase[StrozziNoSQL].LedbyCarloStrozzi,thisdatabasestoresitstablesasASCIIfiles,eachtuplerepresentedbyalinewithfieldsseparatedbytabs.Thenamecomesfromthefactthatthedatabasedoesn’tuseSQLasaquerylanguage.Instead,thedatabaseismanipulatedthroughshellscriptsthatcanbecombinedintotheusualUNIXpipelines.Otherthantheterminologicalcoincidence,Strozzi’sNoSQLhadnoinfluenceonthedatabaseswedescribeinthisbook.Theusageof“NoSQL”thatwerecognizetodaytracesbacktoameetuponJune11,2009inSan
FranciscoorganizedbyJohanOskarsson,asoftwaredeveloperbasedinLondon.TheexampleofBigTableandDynamohadinspiredabunchofprojectsexperimentingwithalternativedatastorage,anddiscussionsofthesehadbecomeafeatureofthebettersoftwareconferencesaroundthattime.JohanwasinterestedinfindingoutmoreaboutsomeofthesenewdatabaseswhilehewasinSanFranciscoforaHadoopsummit.Sincehehadlittletimethere,hefeltthatitwouldn’tbefeasibletovisitthemall,sohedecidedtohostameetupwheretheycouldallcometogetherandpresenttheirworktowhoeverwasinterested.Johanwantedanameforthemeetup—somethingthatwouldmakeagoodTwitterhashtag:short,
memorable,andwithouttoomanyGooglehitssothatasearchonthenamewouldquicklyfindthemeetup.Heaskedforsuggestionsonthe#cassandraIRCchannelandgotafew,selectingthesuggestionof“NoSQL”fromEricEvans(adeveloperatRackspace,noconnectiontotheDDDEricEvans).Whileithadthedisadvantageofbeingnegativeandnotreallydescribingthesesystems,itdidfitthehashtagcriteria.Atthetimetheywerethinkingofonlynamingasinglemeetingandwerenotexpectingittocatchontonamethisentiretechnologytrend[Oskarsson].Theterm“NoSQL”caughtonlikewildfire,butit’sneverbeenatermthat’shadmuchinthewayof
astrongdefinition.Theoriginalcall[NoSQLMeetup]forthemeetupaskedfor“open-source,distributed,nonrelationaldatabases.”Thetalksthere[NoSQLDebrief]werefromVoldemort,Cassandra,Dynomite,HBase,Hypertable,CouchDB,andMongoDB—butthetermhasneverbeenconfinedtothatoriginalseptet.There’snogenerallyaccepteddefinition,noranauthoritytoprovideone,soallwecandoisdiscusssomecommoncharacteristicsofthedatabasesthattendtobecalled“NoSQL.”Tobeginwith,thereistheobviouspointthatNoSQLdatabasesdon’tuseSQL.Someofthemdo
havequerylanguages,anditmakessenseforthemtobesimilartoSQLinordertomakethemeasiertolearn.Cassandra’sCQLislikethis—“exactlylikeSQL(exceptwhereit’snot)”[CQL].ButsofarnonehaveimplementedanythingthatwouldfiteventheratherflexiblenotionofstandardSQL.ItwillbeinterestingtoseewhathappensifanestablishedNoSQLdatabasedecidestoimplementareasonablystandardSQL;theonlypredictableoutcomeforsuchaneventualityisplentyofargument.Anotherimportantcharacteristicofthesedatabasesisthattheyaregenerallyopen-sourceprojects.
AlthoughthetermNoSQLisfrequentlyappliedtoclosed-sourcesystems,there’sanotionthatNoSQLisanopen-sourcephenomenon.MostNoSQLdatabasesaredrivenbytheneedtorunonclusters,andthisiscertainlytrueofthose
thatweretalkedaboutduringtheinitialmeetup.Thishasaneffectontheirdatamodelaswellastheirapproachtoconsistency.RelationaldatabasesuseACIDtransactions(p.19)tohandleconsistency
acrossthewholedatabase.Thisinherentlyclasheswithaclusterenvironment,soNoSQLdatabasesofferarangeofoptionsforconsistencyanddistribution.However,notallNoSQLdatabasesarestronglyorientedtowardsrunningonclusters.Graph
databasesareonestyleofNoSQLdatabasesthatusesadistributionmodelsimilartorelationaldatabasesbutoffersadifferentdatamodelthatmakesitbetterathandlingdatawithcomplexrelationships.NoSQLdatabasesaregenerallybasedontheneedsoftheearly21stcenturywebestates,sousually
onlysystemsdevelopedduringthattimeframearecalledNoSQL—thusrulingouthoardsofdatabasescreatedbeforethenewmillennium,letaloneBC(BeforeCodd).NoSQLdatabasesoperatewithoutaschema,allowingyoutofreelyaddfieldstodatabaserecords
withouthavingtodefineanychangesinstructurefirst.ThisisparticularlyusefulwhendealingwithnonuniformdataandcustomfieldswhichforcedrelationaldatabasestousenameslikecustomField6orcustomfieldtablesthatareawkwardtoprocessandunderstand.AlloftheabovearecommoncharacteristicsofthingsthatweseedescribedasNoSQLdatabases.
Noneofthesearedefinitional,andindeedit’slikelythattherewillneverbeacoherentdefinitionof“NoSQL”(sigh).However,thiscrudesetofcharacteristicshasbeenourguideinwritingthisbook.OurchiefenthusiasmwiththissubjectisthattheriseofNoSQLhasopeneduptherangeofoptionsfordatastorage.Consequently,thisopeningupshouldn’tbeconfinedtowhat’susuallyclassedasaNoSQLstore.Wehopethatotherdatastorageoptionswillbecomemoreacceptable,includingmanythatpredatetheNoSQLmovement.Thereisalimit,however,towhatwecanusefullydiscussinthisbook,sowe’vedecidedtoconcentrateonthisnoDefinition.Whenyoufirsthear“NoSQL,”animmediatequestioniswhatdoesitstandfor—a“no”toSQL?
MostpeoplewhotalkaboutNoSQLsaythatitreallymeans“NotOnlySQL,”butthisinterpretationhasacoupleofproblems.Mostpeoplewrite“NoSQL”whereas“NotOnlySQL”wouldbewritten“NOSQL.”Also,therewouldn’tbemuchpointincallingsomethingaNoSQLdatabaseunderthe“notonly”meaning—becausethen,OracleorPostgreswouldfitthatdefinition,wewouldprovethatblackequalswhiteandwouldallgetrunoveroncrosswalks.Toresolvethis,wesuggestthatyoudon’tworryaboutwhatthetermstandsfor,butratherabout
whatitmeans(whichisrecommendedwithmostacronyms).Thus,when“NoSQL”isappliedtoadatabase,itreferstoanill-definedsetofmostlyopen-sourcedatabases,mostlydevelopedintheearly21stcentury,andmostlynotusingSQL.The“not-only”interpretationdoeshaveitsvalue,asitdescribestheecosystemthatmanypeople
thinkisthefutureofdatabases.Thisisinfactwhatweconsidertobethemostimportantcontributionofthiswayofthinking—it’sbettertothinkofNoSQLasamovementratherthanatechnology.Wedon’tthinkthatrelationaldatabasesaregoingaway—theyarestillgoingtobethemostcommonformofdatabaseinuse.Eventhoughwe’vewrittenthisbook,westillrecommendrelationaldatabases.Theirfamiliarity,stability,featureset,andavailablesupportarecompellingargumentsformostprojects.Thechangeisthatnowweseerelationaldatabasesasoneoptionfordatastorage.Thispointof
viewisoftenreferredtoaspolyglotpersistence—usingdifferentdatastoresindifferentcircumstances.Insteadofjustpickingarelationaldatabasebecauseeveryonedoes,weneedtounderstandthenatureofthedatawe’restoringandhowwewanttomanipulateit.Theresultisthatmostorganizationswillhaveamixofdatastoragetechnologiesfordifferentcircumstances.Inordertomakethispolyglotworldwork,ourviewisthatorganizationsalsoneedtoshiftfrom
integrationdatabasestoapplicationdatabases.Indeed,weassumeinthisbookthatyou’llbeusinga
NoSQLdatabaseasanapplicationdatabase;wedon’tgenerallyconsiderNoSQLdatabasesagoodchoiceforintegrationdatabases.Wedon’tseethisasadisadvantageaswethinkthatevenifyoudon’tuseNoSQL,shiftingtoencapsulatingdatainservicesisagooddirectiontotake.InouraccountofthehistoryofNoSQLdevelopment,we’veconcentratedonbigdatarunningon
clusters.Whilewethinkthisisthekeythingthatdrovetheopeningupofthedatabaseworld,itisn’ttheonlyreasonweseeprojectteamsconsideringNoSQLdatabases.Anequallyimportantreasonistheoldfrustrationwiththeimpedancemismatchproblem.Thebigdataconcernshavecreatedanopportunityforpeopletothinkfreshlyabouttheirdatastorageneeds,andsomedevelopmentteamsseethatusingaNoSQLdatabasecanhelptheirproductivitybysimplifyingtheirdatabaseaccesseveniftheyhavenoneedtoscalebeyondasinglemachine.So,asyoureadtherestofthisbook,remembertherearetwoprimaryreasonsforconsidering
NoSQL.Oneistohandledataaccesswithsizesandperformancethatdemandacluster;theotheristoimprovetheproductivityofapplicationdevelopmentbyusingamoreconvenientdatainteractionstyle.
1.6.KeyPoints•Relationaldatabaseshavebeenasuccessfultechnologyfortwentyyears,providingpersistence,concurrencycontrol,andanintegrationmechanism.
•Applicationdevelopershavebeenfrustratedwiththeimpedancemismatchbetweentherelationalmodelandthein-memorydatastructures.
•Thereisamovementawayfromusingdatabasesasintegrationpointstowardsencapsulatingdatabaseswithinapplicationsandintegratingthroughservices.
•Thevitalfactorforachangeindatastoragewastheneedtosupportlargevolumesofdatabyrunningonclusters.Relationaldatabasesarenotdesignedtorunefficientlyonclusters.
•NoSQLisanaccidentalneologism.Thereisnoprescriptivedefinition—allyoucanmakeisanobservationofcommoncharacteristics.
•ThecommoncharacteristicsofNoSQLdatabasesare•Notusingtherelationalmodel•Runningwellonclusters•Open-source•Builtforthe21stcenturywebestates•Schemaless
•ThemostimportantresultoftheriseofNoSQLisPolyglotPersistence.
Chapter2.AggregateDataModels
Adatamodelisthemodelthroughwhichweperceiveandmanipulateourdata.Forpeopleusingadatabase,thedatamodeldescribeshowweinteractwiththedatainthedatabase.Thisisdistinctfromastoragemodel,whichdescribeshowthedatabasestoresandmanipulatesthedatainternally.Inanidealworld,weshouldbeignorantofthestoragemodel,butinpracticeweneedatleastsomeinklingofit—primarilytoachievedecentperformance.Inconversation,theterm“datamodel”oftenmeansthemodelofthespecificdatainanapplication.
Adevelopermightpointtoanentity-relationshipdiagramoftheirdatabaseandrefertothatastheirdatamodelcontainingcustomers,orders,products,andthelike.However,inthisbookwe’llmostlybeusing“datamodel”torefertothemodelbywhichthedatabaseorganizesdata—whatmightbemoreformallycalledametamodel.Thedominantdatamodelofthelastcoupleofdecadesistherelationaldatamodel,whichisbest
visualizedasasetoftables,ratherlikeapageofaspreadsheet.Eachtablehasrows,witheachrowrepresentingsomeentityofinterest.Wedescribethisentitythroughcolumns,eachhavingasinglevalue.Acolumnmayrefertoanotherrowinthesameordifferenttable,whichconstitutesarelationshipbetweenthoseentities.(We’reusinginformalbutcommonterminologywhenwespeakoftablesandrows;themoreformaltermswouldberelationsandtuples.)OneofthemostobviousshiftswithNoSQLisamoveawayfromtherelationalmodel.Each
NoSQLsolutionhasadifferentmodelthatituses,whichweputintofourcategorieswidelyusedintheNoSQLecosystem:key-value,document,column-family,andgraph.Ofthese,thefirstthreeshareacommoncharacteristicoftheirdatamodelswhichwewillcallaggregateorientation.Inthischapterwe’llexplainwhatwemeanbyaggregateorientationandwhatitmeansfordatamodels.
2.1.AggregatesTherelationalmodeltakestheinformationthatwewanttostoreanddividesitintotuples(rows).Atupleisalimiteddatastructure:Itcapturesasetofvalues,soyoucannotnestonetuplewithinanothertogetnestedrecords,norcanyouputalistofvaluesortupleswithinanother.Thissimplicityunderpinstherelationalmodel—itallowsustothinkofalloperationsasoperatingonandreturningtuples.Aggregateorientationtakesadifferentapproach.Itrecognizesthatoften,youwanttooperateon
datainunitsthathaveamorecomplexstructurethanasetoftuples.Itcanbehandytothinkintermsofacomplexrecordthatallowslistsandotherrecordstructurestobenestedinsideit.Aswe’llsee,key-value,document,andcolumn-familydatabasesallmakeuseofthismorecomplexrecord.However,thereisnocommontermforthiscomplexrecord;inthisbookweusetheterm“aggregate.”AggregateisatermthatcomesfromDomain-DrivenDesign[Evans].InDomain-DrivenDesign,
anaggregateisacollectionofrelatedobjectsthatwewishtotreatasaunit.Inparticular,itisaunitfordatamanipulationandmanagementofconsistency.Typically,weliketoupdateaggregateswithatomicoperationsandcommunicatewithourdatastorageintermsofaggregates.Thisdefinitionmatchesreallywellwithhowkey-value,document,andcolumn-familydatabaseswork.Dealinginaggregatesmakesitmucheasierforthesedatabasestohandleoperatingonacluster,sincetheaggregatemakesanaturalunitforreplicationandsharding.Aggregatesarealsoofteneasierforapplicationprogrammerstoworkwith,sincetheyoftenmanipulatedatathroughaggregatestructures.
2.1.1.ExampleofRelationsandAggregatesAtthispoint,anexamplemayhelpexplainwhatwe’retalkingabout.Let’sassumewehavetobuildane-commercewebsite;wearegoingtobesellingitemsdirectlytocustomersovertheweb,andwewillhavetostoreinformationaboutusers,ourproductcatalog,orders,shippingaddresses,billingaddresses,andpaymentdata.WecanusethisscenariotomodelthedatausingarelationdatastoreaswellasNoSQLdatastoresandtalkabouttheirprosandcons.Forarelationaldatabase,wemightstartwithadatamodelshowninFigure2.1.
Figure2.1.Datamodelorientedaroundarelationaldatabase(usingUMLnotation[FowlerUML])
Figure2.2presentssomesampledataforthismodel.
Figure2.2.TypicaldatausingRDBMSdatamodelAswe’regoodrelationalsoldiers,everythingisproperlynormalized,sothatnodataisrepeatedin
multipletables.Wealsohavereferentialintegrity.Arealisticordersystemwouldnaturallybemoreinvolvedthanthis,butthisisthebenefitoftherarefiedairofabook.Nowlet’sseehowthismodelmightlookwhenwethinkinmoreaggregate-orientedterms(Figure
2.3).
Figure2.3.AnaggregatedatamodelAgain,wehavesomesampledata,whichwe’llshowinJSONformatasthat’sacommon
representationfordatainNoSQLland.Clickheretoviewcodeimage
//incustomers{"id":1,"name":"Martin","billingAddress":[{"city":"Chicago"}]}
//inorders{"id":99,"customerId":1,"orderItems":[{"productId":27,"price":32.45,"productName":"NoSQLDistilled"}],"shippingAddress":[{"city":"Chicago"}]"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress":{"city":"Chicago"}}],}
Inthismodel,wehavetwomainaggregates:customerandorder.We’veusedtheblack-diamondcompositionmarkerinUMLtoshowhowdatafitsintotheaggregationstructure.Thecustomercontainsalistofbillingaddresses;theordercontainsalistoforderitems,ashippingaddress,andpayments.Thepaymentitselfcontainsabillingaddressforthatpayment.Asinglelogicaladdressrecordappearsthreetimesintheexampledata,butinsteadofusingIDsit’s
treatedasavalueandcopiedeachtime.Thisfitsthedomainwherewewouldnotwanttheshippingaddress,northepayment’sbillingaddress,tochange.Inarelationaldatabase,wewouldensurethattheaddressrowsaren’tupdatedforthiscase,makinganewrowinstead.Withaggregates,wecancopythewholeaddressstructureintotheaggregateasweneedto.Thelinkbetweenthecustomerandtheorderisn’twithineitheraggregate—it’sarelationship
betweenaggregates.Similarly,thelinkfromanorderitemwouldcrossintoaseparateaggregatestructureforproducts,whichwehaven’tgoneinto.We’veshowntheproductnameaspartoftheorderitemhere—thiskindofdenormalizationissimilartothetradeoffswithrelationaldatabases,butismorecommonwithaggregatesbecausewewanttominimizethenumberofaggregatesweaccessduringadatainteraction.Theimportantthingtonoticehereisn’ttheparticularwaywe’vedrawntheaggregateboundaryso
muchasthefactthatyouhavetothinkaboutaccessingthatdata—andmakethatpartofyourthinkingwhendevelopingtheapplicationdatamodel.Indeedwecoulddrawouraggregateboundariesdifferently,puttingalltheordersforacustomerintothecustomeraggregate(Figure2.4).
Figure2.4.Embedalltheobjectsforcustomerandthecustomer’sordersUsingtheabovedatamodel,anexampleCustomerandOrderwouldlooklikethis:
Clickheretoviewcodeimage
//incustomers{"customer":{"id":1,"name":"Martin","billingAddress":[{"city":"Chicago"}],"orders":[{"id":99,"customerId":1,"orderItems":[{"productId":27,"price":32.45,"productName":"NoSQLDistilled"}],"shippingAddress":[{"city":"Chicago"}]"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft","billingAddress":{"city":"Chicago"}}],}]}}
Likemostthingsinmodeling,there’snouniversalanswerforhowtodrawyouraggregateboundaries.Itdependsentirelyonhowyoutendtomanipulateyourdata.Ifyoutendtoaccessacustomertogetherwithallofthatcustomer ’sordersatonce,thenyouwouldpreferasingleaggregate.However,ifyoutendtofocusonaccessingasingleorderatatime,thenyoushouldpreferhavingseparateaggregatesforeachorder.Naturally,thisisverycontext-specific;someapplicationswillpreferoneortheother,evenwithinasinglesystem,whichisexactlywhymanypeoplepreferaggregateignorance.
2.1.2.ConsequencesofAggregateOrientationWhiletherelationalmappingcapturesthevariousdataelementsandtheirrelationshipsreasonablywell,itdoessowithoutanynotionofanaggregateentity.Inourdomainlanguage,wemightsaythatanorderconsistsoforderitems,ashippingaddress,andapayment.Thiscanbeexpressedintherelationalmodelintermsofforeignkeyrelationships—butthereisnothingtodistinguishrelationshipsthatrepresentaggregationsfromthosethatdon’t.Asaresult,thedatabasecan’tuseaknowledgeofaggregatestructuretohelpitstoreanddistributethedata.Variousdatamodelingtechniqueshaveprovidedwaysofmarkingaggregateorcomposite
structures.Theproblem,however,isthatmodelersrarelyprovideanysemanticsforwhatmakesanaggregaterelationshipdifferentfromanyother;wheretherearesemantics,theyvary.Whenworkingwithaggregate-orienteddatabases,wehaveaclearersemanticstoconsiderbyfocusingontheunitofinteractionwiththedatastorage.Itis,however,notalogicaldataproperty:It’sallabouthowthedataisbeingusedbyapplications—aconcernthatisoftenoutsidetheboundsofdatamodeling.Relationaldatabaseshavenoconceptofaggregatewithintheirdatamodel,sowecallthem
aggregate-ignorant.IntheNoSQLworld,graphdatabasesarealsoaggregate-ignorant.Beingaggregate-ignorantisnotabadthing.It’softendifficulttodrawaggregateboundarieswell,particularlyifthesamedataisusedinmanydifferentcontexts.Anordermakesagoodaggregatewhenacustomerismakingandreviewingorders,andwhentheretailerisprocessingorders.However,ifaretailerwantstoanalyzeitsproductsalesoverthelastfewmonths,thenanorderaggregatebecomesatrouble.Togettoproductsaleshistory,you’llhavetodigintoeveryaggregateinthedatabase.Soanaggregatestructuremayhelpwithsomedatainteractionsbutbeanobstacleforothers.Anaggregate-ignorantmodelallowsyoutoeasilylookatthedataindifferentways,soitisabetterchoicewhenyoudon’thaveaprimarystructureformanipulatingyourdata.Theclinchingreasonforaggregateorientationisthatithelpsgreatlywithrunningonacluster,
whichasyou’llrememberisthekillerargumentfortheriseofNoSQL.Ifwe’rerunningonacluster,weneedtominimizehowmanynodesweneedtoquerywhenwearegatheringdata.Byexplicitlyincludingaggregates,wegivethedatabaseimportantinformationaboutwhichbitsofdatawillbemanipulatedtogether,andthusshouldliveonthesamenode.Aggregateshaveanimportantconsequencefortransactions.Relationaldatabasesallowyouto
manipulateanycombinationofrowsfromanytablesinasingletransaction.SuchtransactionsarecalledACIDtransactions:Atomic,Consistent,Isolated,andDurable.ACIDisarathercontrivedacronym;therealpointistheatomicity:Manyrowsspanningmanytablesareupdatedasasingleoperation.Thisoperationeithersucceedsorfailsinitsentirety,andconcurrentoperationsareisolatedfromeachothersotheycannotseeapartialupdate.It’softensaidthatNoSQLdatabasesdon’tsupportACIDtransactionsandthussacrificeconsistency.
Thisisarathersweepingsimplification.Ingeneral,it’struethataggregate-orienteddatabasesdon’thaveACIDtransactionsthatspanmultipleaggregates.Instead,theysupportatomicmanipulationofasingleaggregateatatime.Thismeansthatifweneedtomanipulatemultipleaggregatesinanatomic
way,wehavetomanagethatourselvesintheapplicationcode.Inpractice,wefindthatmostofthetimeweareabletokeepouratomicityneedstowithinasingleaggregate;indeed,that’spartoftheconsiderationfordecidinghowtodivideupourdataintoaggregates.Weshouldalsorememberthatgraphandotheraggregate-ignorantdatabasesusuallydosupportACIDtransactionssimilartorelationaldatabases.Aboveall,thetopicofconsistencyismuchmoreinvolvedthanwhetheradatabaseisACIDornot,aswe’llexploreinChapter5.
2.2.Key-ValueandDocumentDataModelsWesaidearlieronthatkey-valueanddocumentdatabaseswerestronglyaggregate-oriented.Whatwemeantbythiswasthatwethinkofthesedatabasesasprimarilyconstructedthroughaggregates.BothofthesetypesofdatabasesconsistoflotsofaggregateswitheachaggregatehavingakeyorIDthat’susedtogetatthedata.Thetwomodelsdifferinthatinakey-valuedatabase,theaggregateisopaquetothedatabase—just
somebigblobofmostlymeaninglessbits.Incontrast,adocumentdatabaseisabletoseeastructureintheaggregate.Theadvantageofopacityisthatwecanstorewhateverwelikeintheaggregate.Thedatabasemayimposesomegeneralsizelimit,butotherthanthatwehavecompletefreedom.Adocumentdatabaseimposeslimitsonwhatwecanplaceinit,definingallowablestructuresandtypes.Inreturn,however,wegetmoreflexibilityinaccess.Withakey-valuestore,wecanonlyaccessanaggregatebylookupbasedonitskey.Witha
documentdatabase,wecansubmitqueriestothedatabasebasedonthefieldsintheaggregate,wecanretrievepartoftheaggregateratherthanthewholething,anddatabasecancreateindexesbasedonthecontentsoftheaggregate.Inpractice,thelinebetweenkey-valueanddocumentgetsabitblurry.PeopleoftenputanIDfield
inadocumentdatabasetodoakey-valuestylelookup.Databasesclassifiedaskey-valuedatabasesmayallowyoustructuresfordatabeyondjustanopaqueaggregate.Forexample,Riakallowsyoutoaddmetadatatoaggregatesforindexingandinteraggregatelinks,Redisallowsyoutobreakdowntheaggregateintolistsorsets.YoucansupportqueryingbyintegratingsearchtoolssuchasSolr.Asanexample,RiakincludesasearchfacilitythatusesSolr-likesearchingonanyaggregatesthatarestoredasJSONorXMLstructures.Despitethisblurriness,thegeneraldistinctionstillholds.Withkey-valuedatabases,weexpectto
mostlylookupaggregatesusingakey.Withdocumentdatabases,wemostlyexpecttosubmitsomeformofquerybasedontheinternalstructureofthedocument;thismightbeakey,butit’smorelikelytobesomethingelse.
2.3.Column-FamilyStoresOneoftheearlyandinfluentialNoSQLdatabaseswasGoogle’sBigTable[Changetc.].Itsnameconjuredupatabularstructurewhichitrealizedwithsparsecolumnsandnoschema.Asyou’llsoonsee,itdoesn’thelptothinkofthisstructureasatable;rather,itisatwo-levelmap.But,howeveryouthinkaboutthestructure,ithasbeenamodelthatinfluencedlaterdatabasessuchasHBaseandCassandra.Thesedatabaseswithabigtable-styledatamodelareoftenreferredtoascolumnstores,butthat
namehasbeenaroundforawhiletodescribeadifferentanimal.Pre-NoSQLcolumnstores,suchasC-Store[C-Store],werehappywithSQLandtherelationalmodel.Thethingthatmadethemdifferentwasthewayinwhichtheyphysicallystoreddata.Mostdatabaseshavearowasaunitofstoragewhich,inparticular,helpswriteperformance.However,therearemanyscenarioswherewritesare
rare,butyouoftenneedtoreadafewcolumnsofmanyrowsatonce.Inthissituation,it’sbettertostoregroupsofcolumnsforallrowsasthebasicstorageunit—whichiswhythesedatabasesarecalledcolumnstores.Bigtableanditsoffspringfollowthisnotionofstoringgroupsofcolumns(columnfamilies)
together,butpartcompanywithC-StoreandfriendsbyabandoningtherelationalmodelandSQL.Inthisbook,werefertothisclassofdatabasesascolumn-familydatabases.Perhapsthebestwaytothinkofthecolumn-familymodelisasatwo-levelaggregatestructure.As
withkey-valuestores,thefirstkeyisoftendescribedasarowidentifier,pickinguptheaggregateofinterest.Thedifferencewithcolumn-familystructuresisthatthisrowaggregateisitselfformedofamapofmoredetailedvalues.Thesesecond-levelvaluesarereferredtoascolumns.Aswellasaccessingtherowasawhole,operationsalsoallowpickingoutaparticularcolumn,sotogetaparticularcustomer ’snamefromFigure2.5youcoulddosomethinglikeget('1234','name').
Figure2.5.Representingcustomerinformationinacolumn-familystructureColumn-familydatabasesorganizetheircolumnsintocolumnfamilies.Eachcolumnhastobepart
ofasinglecolumnfamily,andthecolumnactsasunitforaccess,withtheassumptionthatdataforaparticularcolumnfamilywillbeusuallyaccessedtogether.Thisalsogivesyouacoupleofwaystothinkabouthowthedataisstructured.•Row-oriented:Eachrowisanaggregate(forexample,customerwiththeIDof1234)withcolumnfamiliesrepresentingusefulchunksofdata(profile,orderhistory)withinthataggregate.
•Column-oriented:Eachcolumnfamilydefinesarecordtype(e.g.,customerprofiles)withrowsforeachoftherecords.Youthenthinkofarowasthejoinofrecordsinallcolumnfamilies.
Thislatteraspectreflectsthecolumnarnatureofcolumn-familydatabases.Sincethedatabaseknowsaboutthesecommongroupingsofdata,itcanusethisinformationforitsstorageandaccessbehavior.Eventhoughadocumentdatabasedeclaressomestructuretothedatabase,eachdocumentis
stillseenasasingleunit.Columnfamiliesgiveatwo-dimensionalqualitytocolumn-familydatabases.ThisterminologyisasestablishedbyGoogleBigtableandHBase,butCassandralooksatthings
slightlydifferently.ArowinCassandraonlyoccursinonecolumnfamily,butthatcolumnfamilymaycontainsupercolumns—columnsthatcontainnestedcolumns.ThesupercolumnsinCassandraarethebestequivalenttotheclassicBigtablecolumnfamilies.Itcanstillbeconfusingtothinkofcolumn-familiesastables.Youcanaddanycolumntoanyrow,
androwscanhaveverydifferentcolumnkeys.Whilenewcolumnsareaddedtorowsduringregulardatabaseaccess,definingnewcolumnfamiliesismuchrarerandmayinvolvestoppingthedatabaseforittohappen.TheexampleofFigure2.5illustratesanotheraspectofcolumn-familydatabasesthatmaybe
unfamiliarforpeopleusedtorelationaltables:theorderscolumnfamily.Sincecolumnscanbeaddedfreely,youcanmodelalistofitemsbymakingeachitemaseparatecolumn.Thisisveryoddifyouthinkofacolumnfamilyasatable,butquitenaturalifyouthinkofacolumn-familyrowasanaggregate.Cassandrausestheterms“wide”and“skinny.”Skinnyrowshavefewcolumnswiththesamecolumnsusedacrossthemanydifferentrows.Inthiscase,thecolumnfamilydefinesarecordtype,eachrowisarecord,andeachcolumnisafield.Awiderowhasmanycolumns(perhapsthousands),withrowshavingverydifferentcolumns.Awidecolumnfamilymodelsalist,witheachcolumnbeingoneelementinthatlist.Aconsequenceofwidecolumnfamiliesisthatacolumnfamilymaydefineasortorderforits
columns.Thiswaywecanaccessordersbytheirorderkeyandaccessrangesofordersbytheirkeys.WhilethismightnotbeusefulifwekeyedordersbytheirIDs,itwouldbeifwemadethekeyoutofaconcatenationofdateandID(e.g.,20111027-1001).Althoughit’susefultodistinguishcolumnfamiliesbytheirwideorskinnynature,there’sno
technicalreasonwhyacolumnfamilycannotcontainbothfield-likecolumnsandlist-likecolumns—althoughdoingthiswouldconfusethesortordering.
2.4.SummarizingAggregate-OrientedDatabasesAtthispoint,we’vecoveredenoughmaterialtogiveyouareasonableoverviewofthethreedifferentstylesofaggregate-orienteddatamodelsandhowtheydiffer.Whattheyallshareisthenotionofanaggregateindexedbyakeythatyoucanuseforlookup.This
aggregateiscentraltorunningonacluster,asthedatabasewillensurethatallthedataforanaggregateisstoredtogetherononenode.Theaggregatealsoactsastheatomicunitforupdates,providingauseful,iflimited,amountoftransactionalcontrol.Withinthatnotionofaggregate,wehavesomedifferences.Thekey-valuedatamodeltreatsthe
aggregateasanopaquewhole,whichmeansyoucanonlydokeylookupforthewholeaggregate—youcannotrunaquerynorretrieveapartoftheaggregate.Thedocumentmodelmakestheaggregatetransparenttothedatabaseallowingyoutodoqueries
andpartialretrievals.However,sincethedocumenthasnoschema,thedatabasecannotactmuchonthestructureofthedocumenttooptimizethestorageandretrievalofpartsoftheaggregate.Column-familymodelsdividetheaggregateintocolumnfamilies,allowingthedatabasetotreat
themasunitsofdatawithintherowaggregate.Thisimposessomestructureontheaggregatebutallowsthedatabasetotakeadvantageofthatstructuretoimproveitsaccessibility.
2.5.FurtherReading
Formoreonthegeneralconceptofaggregates,whichareoftenusedwithrelationaldatabasestoo,see[Evans].TheDomain-DrivenDesigncommunityisthebestsourceforfurtherinformationaboutaggregates—recentinformationusuallyappearsathttp://domaindrivendesign.org.
2.6.KeyPoints•Anaggregateisacollectionofdatathatweinteractwithasaunit.AggregatesformtheboundariesforACIDoperationswiththedatabase.
•Key-value,document,andcolumn-familydatabasescanallbeseenasformsofaggregate-orienteddatabase.
•Aggregatesmakeiteasierforthedatabasetomanagedatastorageoverclusters.•Aggregate-orienteddatabasesworkbestwhenmostdatainteractionisdonewiththesameaggregate;aggregate-ignorantdatabasesarebetterwheninteractionsusedataorganizedinmanydifferentformations.
Chapter3.MoreDetailsonDataModels
Sofarwe’vecoveredthekeyfeatureinmostNoSQLdatabases:theiruseofaggregatesandhowaggregate-orienteddatabasesmodelaggregatesindifferentways.WhileaggregatesareacentralpartoftheNoSQLstory,thereismoretothedatamodelingsidethanthat,andwe’llexplorethesefurtherconceptsinthischapter.
3.1.RelationshipsAggregatesareusefulinthattheyputtogetherdatathatiscommonlyaccessedtogether.Buttherearestilllotsofcaseswheredatathat’srelatedisaccesseddifferently.Considertherelationshipbetweenacustomerandallofhisorders.Someapplicationswillwanttoaccesstheorderhistorywhenevertheyaccessthecustomer;thisfitsinwellwithcombiningthecustomerwithhisorderhistoryintoasingleaggregate.Otherapplications,however,wanttoprocessordersindividuallyandthusmodelordersasindependentaggregates.Inthiscase,you’llwantseparateorderandcustomeraggregatesbutwithsomekindofrelationship
betweenthemsothatanyworkonanordercanlookupcustomerdata.ThesimplestwaytoprovidesuchalinkistoembedtheIDofthecustomerwithintheorder ’saggregatedata.Thatway,ifyouneeddatafromthecustomerrecord,youreadtheorder,ferretoutthecustomerID,andmakeanothercalltothedatabasetoreadthecustomerdata.Thiswillwork,andwillbejustfineinmanyscenarios—butthedatabasewillbeignorantoftherelationshipinthedata.Thiscanbeimportantbecausetherearetimeswhenit’susefulforthedatabasetoknowabouttheselinks.Asaresult,manydatabases—evenkey-valuestores—providewaystomaketheserelationships
visibletothedatabase.Documentstoresmakethecontentoftheaggregateavailabletothedatabasetoformindexesandqueries.Riak,akey-valuestore,allowsyoutoputlinkinformationinmetadata,supportingpartialretrievalandlink-walkingcapability.Animportantaspectofrelationshipsbetweenaggregatesishowtheyhandleupdates.Aggregate-
orienteddatabasestreattheaggregateastheunitofdata-retrieval.Consequently,atomicityisonlysupportedwithinthecontentsofasingleaggregate.Ifyouupdatemultipleaggregatesatonce,youhavetodealyourselfwithafailurepartwaythrough.Relationaldatabaseshelpyouwiththisbyallowingyoutomodifymultiplerecordsinasingletransaction,providingACIDguaranteeswhilealteringmanyrows.Allofthismeansthataggregate-orienteddatabasesbecomemoreawkwardasyouneedtooperate
acrossmultipleaggregates.Therearevariouswaystodealwiththis,whichwe’llexplorelaterinthischapter,butthefundamentalawkwardnessremains.Thismayimplythatifyouhavedatabasedonlotsofrelationships,youshouldpreferarelational
databaseoveraNoSQLstore.Whilethat’strueforaggregate-orienteddatabases,it’sworthrememberingthatrelationaldatabasesaren’tallthatstellarwithcomplexrelationshipseither.WhileyoucanexpressqueriesinvolvingjoinsinSQL,thingsquicklygetveryhairy—bothwithSQLwritingandwiththeresultingperformance—asthenumberofjoinsmountsup.Thismakesitagoodmomenttointroduceanothercategoryofdatabasesthat’softenlumpedinto
theNoSQLpile.
3.2.GraphDatabasesGraphdatabasesareanoddfishintheNoSQLpond.MostNoSQLdatabaseswereinspiredbythe
needtorunonclusters,whichledtoaggregate-orienteddatamodelsoflargerecordswithsimpleconnections.Graphdatabasesaremotivatedbyadifferentfrustrationwithrelationaldatabasesandthushaveanoppositemodel—smallrecordswithcomplexinterconnections,somethinglikeFigure3.1.
Figure3.1.AnexamplegraphstructureInthiscontext,agraphisn’tabarchartorhistogram;instead,werefertoagraphdatastructureof
nodesconnectedbyedges.InFigure3.1wehaveawebofinformationwhosenodesareverysmall(nothingmorethana
name)butthereisarichstructureofinterconnectionsbetweenthem.Withthisstructure,wecanaskquestionssuchas“findthebooksintheDatabasescategorythatarewrittenbysomeonewhomafriendofminelikes.”Graphdatabasesspecializeincapturingthissortofinformation—butonamuchlargerscalethana
readablediagramcouldcapture.Thisisidealforcapturinganydataconsistingofcomplexrelationshipssuchassocialnetworks,productpreferences,oreligibilityrules.Thefundamentaldatamodelofagraphdatabaseisverysimple:nodesconnectedbyedges(also
calledarcs).Beyondthisessentialcharacteristicthereisalotofvariationindatamodels—inparticular,whatmechanismsyouhavetostoredatainyournodesandedges.Aquicksampleofsomecurrentcapabilitiesillustratesthisvarietyofpossibilities:FlockDBissimplynodesandedgeswithnomechanismforadditionalattributes;Neo4JallowsyoutoattachJavaobjectsaspropertiestonodesandedgesinaschemalessfashion(“Features,”p.113);InfiniteGraphstoresyourJavaobjects,whicharesubclassesofitsbuilt-intypes,asnodesandedges.
Onceyouhavebuiltupagraphofnodesandedges,agraphdatabaseallowsyoutoquerythatnetworkwithqueryoperationsdesignedwiththiskindofgraphinmind.Thisiswheretheimportantdifferencesbetweengraphandrelationaldatabasescomein.Althoughrelationaldatabasescanimplementrelationshipsusingforeignkeys,thejoinsrequiredtonavigatearoundcangetquiteexpensive—whichmeansperformanceisoftenpoorforhighlyconnecteddatamodels.Graphdatabasesmaketraversalalongtherelationshipsverycheap.Alargepartofthisisbecausegraphdatabasesshiftmostoftheworkofnavigatingrelationshipsfromquerytimetoinserttime.Thisnaturallypaysoffforsituationswherequeryingperformanceismoreimportantthaninsertspeed.Mostofthetimeyoufinddatabynavigatingthroughthenetworkofedges,withqueriessuchas
“tellmeallthethingsthatbothAnnaandBarbaralike.”Youdoneedastartingplace,however,sousuallysomenodescanbeindexedbyanattributesuchasID.SoyoumightstartwithanIDlookup(i.e.,lookupthepeoplenamed“Anna”and“Barbara”)andthenstartusingtheedges.Still,graphdatabasesexpectmostofyourqueryworktobenavigatingrelationships.Theemphasisonrelationshipsmakesgraphdatabasesverydifferentfromaggregate-oriented
databases.Thisdatamodeldifferencehasconsequencesinotheraspects,too;you’llfindsuchdatabasesaremorelikelytorunonasingleserverratherthandistributedacrossclusters.ACIDtransactionsneedtocovermultiplenodesandedgestomaintainconsistency.Theonlythingtheyhaveincommonwithaggregate-orienteddatabasesistheirrejectionoftherelationalmodelandanupsurgeinattentiontheyreceivedaroundthesametimeastherestoftheNoSQLfield.
3.3.SchemalessDatabasesAcommonthemeacrossalltheformsofNoSQLdatabasesisthattheyareschemaless.Whenyouwanttostoredatainarelationaldatabase,youfirsthavetodefineaschema—adefinedstructureforthedatabasewhichsayswhattablesexist,whichcolumnsexist,andwhatdatatypeseachcolumncanhold.Beforeyoustoresomedata,youhavetohavetheschemadefinedforit.WithNoSQLdatabases,storingdataismuchmorecasual.Akey-valuestoreallowsyoutostore
anydatayoulikeunderakey.Adocumentdatabaseeffectivelydoesthesamething,sinceitmakesnorestrictionsonthestructureofthedocumentsyoustore.Column-familydatabasesallowyoutostoreanydataunderanycolumnyoulike.Graphdatabasesallowyoutofreelyaddnewedgesandfreelyaddpropertiestonodesandedgesasyouwish.Advocatesofschemalessnessrejoiceinthisfreedomandflexibility.Withaschema,youhaveto
figureoutinadvancewhatyouneedtostore,butthatcanbehardtodo.Withoutaschemabindingyou,youcaneasilystorewhateveryouneed.Thisallowsyoutoeasilychangeyourdatastorageasyoulearnmoreaboutyourproject.Youcaneasilyaddnewthingsasyoudiscoverthem.Furthermore,ifyoufindyoudon’tneedsomethingsanymore,youcanjuststopstoringthem,withoutworryingaboutlosingolddataasyouwouldifyoudeletecolumnsinarelationalschema.Aswellashandlingchanges,aschemalessstorealsomakesiteasiertodealwithnonuniformdata:
datawhereeachrecordhasadifferentsetoffields.Aschemaputsallrowsofatableintoastraightjacket,whichbecomesawkwardifyouhavedifferentkindsofdataindifferentrows.Youeitherendupwithlotsofcolumnsthatareusuallynull(asparsetable),oryouendupwithmeaninglesscolumnslikecustomcolumn4.Schemalessnessavoidsthis,allowingeachrecordtocontainjustwhatitneeds—nomore,noless.Schemalessnessisappealing,anditcertainlyavoidsmanyproblemsthatexistwithfixed-schema
databases,butitbringssomeproblemsofitsown.IfallyouaredoingisstoringsomedataanddisplayingitinareportasasimplelistoffieldName:valuelinesthenaschemaisonlygoingtoget
intheway.Butusuallywedowithourdatamorethanthis,andwedoitwithprogramsthatneedtoknowthatthebillingaddressiscalledbillingAddressandnotaddressForBillingandthatthequantifyfieldisgoingtobeaninteger5andnotfive.Thevital,ifsometimesinconvenient,factisthatwheneverwewriteaprogramthataccessesdata,
thatprogramalmostalwaysreliesonsomeformofimplicitschema.UnlessitjustsayssomethinglikeClickheretoviewcodeimage
//pseudocodeforeach(Recordrinrecords){foreach(Fieldfinr.fields){print(f.name,f.value)}}
itwillassumethatcertainfieldnamesarepresentandcarrydatawithacertainmeaning,andassumesomethingaboutthetypeofdatastoredwithinthatfield.Programsarenothumans;theycannotread“qty”andinferthatthatmustbethesameas“quantity”—atleastnotunlesswespecificallyprogramthemtodoso.So,howeverschemalessourdatabaseis,thereisusuallyanimplicitschemapresent.Thisimplicitschemaisasetofassumptionsaboutthedata’sstructureinthecodethatmanipulatesthedata.Havingtheimplicitschemaintheapplicationcoderesultsinsomeproblems.Itmeansthatinorder
tounderstandwhatdataispresentyouhavetodigintotheapplicationcode.Ifthatcodeiswellstructuredyoushouldbeabletofindaclearplacefromwhichtodeducetheschema.Buttherearenoguarantees;italldependsonhowcleartheapplicationcodeis.Furthermore,thedatabaseremainsignorantoftheschema—itcan’tusetheschematohelpitdecidehowtostoreandretrievedataefficiently.Itcan’tapplyitsownvalidationsuponthatdatatoensurethatdifferentapplicationsdon’tmanipulatedatainaninconsistentway.Thesearethereasonswhyrelationaldatabaseshaveafixedschema,andindeedthereasonswhy
mostdatabaseshavehadfixedschemasinthepast.Schemashavevalue,andtherejectionofschemasbyNoSQLdatabasesisindeedquitestartling.Essentially,aschemalessdatabaseshiftstheschemaintotheapplicationcodethataccessesit.This
becomesproblematicifmultipleapplications,developedbydifferentpeople,accessthesamedatabase.Theseproblemscanbereducedwithacoupleofapproaches.Oneistoencapsulatealldatabaseinteractionwithinasingleapplicationandintegrateitwithotherapplicationsusingwebservices.Thisfitsinwellwithmanypeople’scurrentpreferenceforusingwebservicesforintegration.Anotherapproachistoclearlydelineatedifferentareasofanaggregateforaccessbydifferentapplications.Thesecouldbedifferentsectionsinadocumentdatabaseordifferentcolumnfamiliesanacolumn-familydatabase.AlthoughNoSQLfansoftencriticizerelationalschemasforhavingtobedefinedupfrontand
beinginflexible,that’snotreallytrue.RelationalschemascanbechangedatanytimewithstandardSQLcommands.Ifnecessary,youcancreatenewcolumnsinanad-hocwaytostorenonuniformdata.Wehaveonlyrarelyseenthisdone,butitworkedreasonablywellwherewehave.Mostofthetime,however,nonuniformityinyourdataisagoodreasontofavoraschemalessdatabase.Schemalessnessdoeshaveabigimpactonchangesofadatabase’sstructureovertime,particularly
formoreuniformdata.Althoughit’snotpracticedaswidelyasitoughttobe,changingarelationaldatabase’sschemacanbedoneinacontrolledway.Similarly,youhavetoexercisecontrolwhenchanginghowyoustoredatainaschemalessdatabasesothatyoucaneasilyaccessbotholdandnew
data.Furthermore,theflexibilitythatschemalessnessgivesyouonlyapplieswithinanaggregate—ifyouneedtochangeyouraggregateboundaries,themigrationiseverybitascomplexasitisintherelationalcase.We’lltalkmoreaboutdatabasemigrationlater(“SchemaMigrations,”p.123).
3.4.MaterializedViewsWhenwetalkedaboutaggregate-orienteddatamodels,westressedtheiradvantages.Ifyouwanttoaccessorders,it’susefultohaveallthedataforanordercontainedinasingleaggregatethatcanbestoredandaccessedasaunit.Butaggregate-orientationhasacorrespondingdisadvantage:Whathappensifaproductmanagerwantstoknowhowmuchaparticularitemhassoldoverthelastcoupleofweeks?Nowtheaggregate-orientationworksagainstyou,forcingyoutopotentiallyreadeveryorderinthedatabasetoanswerthequestion.Youcanreducethisburdenbybuildinganindexontheproduct,butyou’restillworkingagainsttheaggregatestructure.Relationaldatabaseshaveanadvantageherebecausetheirlackofaggregatestructureallowsthem
tosupportaccessingdataindifferentways.Furthermore,theyprovideaconvenientmechanismthatallowsyoutolookatdatadifferentlyfromthewayit’sstored—views.Aviewislikearelationaltable(itisarelation)butit’sdefinedbycomputationoverthebasetables.Whenyouaccessaview,thedatabasecomputesthedataintheview—ahandyformofencapsulation.Viewsprovideamechanismtohidefromtheclientwhetherdataisderiveddataorbasedata—but
can’tavoidthefactthatsomeviewsareexpensivetocompute.Tocopewiththis,materializedviewswereinvented,whichareviewsthatarecomputedinadvanceandcachedondisk.Materializedviewsareeffectivefordatathatisreadheavilybutcanstandbeingsomewhatstale.AlthoughNoSQLdatabasesdon’thaveviews,theymayhaveprecomputedandcachedqueries,and
theyreusetheterm“materializedview”todescribethem.It’salsomuchmoreofacentralaspectforaggregate-orienteddatabasesthanitisforrelationalsystems,sincemostapplicationswillhavetodealwithsomequeriesthatdon’tfitwellwiththeaggregatestructure.(Often,NoSQLdatabasescreatematerializedviewsusingamap-reducecomputation,whichwe’lltalkaboutinChapter7.)Therearetworoughstrategiestobuildingamaterializedview.Thefirstistheeagerapproach
whereyouupdatethematerializedviewatthesametimeyouupdatethebasedataforit.Inthiscase,addinganorderwouldalsoupdatethepurchasehistoryaggregatesforeachproduct.Thisapproachisgoodwhenyouhavemorefrequentreadsofthematerializedviewthanyouhavewritesandyouwantthematerializedviewstobeasfreshaspossible.Theapplicationdatabase(p.7)approachisvaluablehereasitmakesiteasiertoensurethatanyupdatestobasedataalsoupdatematerializedviews.Ifyoudon’twanttopaythatoverheadoneachupdate,youcanrunbatchjobstoupdatethe
materializedviewsatregularintervals.You’llneedtounderstandyourbusinessrequirementstoassesshowstaleyourmaterializedviewscanbe.Youcanbuildmaterializedviewsoutsideofthedatabasebyreadingthedata,computingtheview,
andsavingitbacktothedatabase.Moreoftendatabaseswillsupportbuildingmaterializedviewsthemselves.Inthiscase,youprovidethecomputationthatneedstobedone,andthedatabaseexecutesthecomputationwhenneededaccordingtosomeparametersthatyouconfigure.Thisisparticularlyhandyforeagerupdatesofviewswithincrementalmap-reduce(“IncrementalMap-Reduce,”p.76).Materializedviewscanbeusedwithinthesameaggregate.Anorderdocumentmightincludean
ordersummaryelementthatprovidessummaryinformationabouttheordersothataqueryforanordersummarydoesnothavetotransfertheentireorderdocument.Usingdifferentcolumnfamiliesformaterializedviewsisacommonfeatureofcolumn-familydatabases.Anadvantageofdoingthisisthatitallowsyoutoupdatethematerializedviewwithinthesameatomicoperation.
3.5.ModelingforDataAccessAsmentionedearlier,whenmodelingdataaggregatesweneedtoconsiderhowthedataisgoingtobereadaswellaswhatarethesideeffectsondatarelatedtothoseaggregates.Let’sstartwiththemodelwhereallthedataforthecustomerisembeddedusingakey-valuestore
(seeFigure3.2).
Figure3.2.Embedalltheobjectsforcustomerandtheirorders.Inthisscenario,theapplicationcanreadthecustomer ’sinformationandalltherelateddataby
usingthekey.Iftherequirementsaretoreadtheordersortheproductssoldineachorder,thewholeobjecthastobereadandthenparsedontheclientsidetobuildtheresults.Whenreferencesareneeded,wecouldswitchtodocumentstoresandthenqueryinsidethedocuments,orevenchangethedataforthekey-valuestoretosplitthevalueobjectintoCustomerandOrderobjectsandthenmaintaintheseobjects’referencestoeachother.Withthereferences(seeFigure3.3),wecannowfindtheordersindependentlyfromtheCustomer,
andwiththeorderIdreferenceintheCustomerwecanfindallOrdersfortheCustomer.Usingaggregatesthiswayallowsforreadoptimization,butwehavetopushtheorderIdreferenceintoCustomereverytimewithanewOrder.Clickheretoviewcodeimage
#Customerobject{"customerId":1,"customer":{"name":"Martin","billingAddress":[{"city":"Chicago"}],"payment":[{"type":"debit","ccinfo":"1000-1000-1000-1000"}],"orders":[{"orderId":99}]}}
#Orderobject{"customerId":1,"orderId":99,"order":{"orderDate":"Nov-20-2011","orderItems":[{"productId":27,"price":32.45}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft"}],"shippingAddress":{"city":"Chicago"}}}
Figure3.3.CustomerisstoredseparatelyfromOrder.Aggregatescanalsobeusedtoobtainanalytics;forexample,anaggregateupdatemayfillin
informationonwhichOrdershaveagivenProductinthem.ThisdenormalizationofthedataallowsforfastaccesstothedataweareinterestedinandisthebasisforRealTimeBIorRealTimeAnalyticswhereenterprisesdon’thavetorelyonend-of-the-daybatchrunstopopulatedatawarehousetablesandgenerateanalytics;nowtheycanfillinthistypeofdata,formultipletypesofrequirements,whentheorderisplacedbythecustomer.Clickheretoviewcodeimage
{"itemid":27,"orders":{99,545,897,678}}{"itemid":29,"orders":{199,545,704,819}}
Indocumentstores,sincewecanqueryinsidedocuments,removingreferencestoOrdersfromtheCustomerobjectispossible.ThischangeallowsustonotupdatetheCustomerobjectwhenneworders
areplacedbytheCustomer.Clickheretoviewcodeimage
#Customerobject{"customerId":1,"name":"Martin","billingAddress":[{"city":"Chicago"}],"payment":[{"type":"debit","ccinfo":"1000-1000-1000-1000"}]}#Orderobject{"orderId":99,"customerId":1,"orderDate":"Nov-20-2011","orderItems":[{"productId":27,"price":32.45}],"orderPayment":[{"ccinfo":"1000-1000-1000-1000","txnId":"abelif879rft"}],"shippingAddress":{"city":"Chicago"}}
Sincedocumentdatastoresallowyoutoquerybyattributesinsidethedocument,searchessuchas“findallordersthatincludetheRefactoringDatabasesproduct”arepossible,butthedecisiontocreateanaggregateofitemsandorderstheybelongtoisnotbasedonthedatabase’squerycapabilitybutonthereadoptimizationdesiredbytheapplication.Whenmodelingforcolumn-familystores,wehavethebenefitofthecolumnsbeingordered,
allowingustonamecolumnsthatarefrequentlyusedsothattheyarefetchedfirst.Whenusingthecolumnfamiliestomodelthedata,itisimportanttoremembertodoitperyourqueryrequirementsandnotforthepurposeofwriting;thegeneralruleistomakeiteasytoqueryanddenormalizethedataduringwrite.Asyoucanimagine,therearemultiplewaystomodelthedata;onewayistostoretheCustomerand
Orderindifferentcolumn-familyfamilies(seeFigure3.4).Here,itisimportanttonotethereferencetoalltheordersplacedbythecustomerareintheCustomercolumnfamily.Similarotherdenormalizationsaregenerallydonesothatquery(read)performanceisimproved.
Figure3.4.ConceptualviewintoacolumndatastoreWhenusinggraphdatabasestomodelthesamedata,wemodelallobjectsasnodesandrelations
withinthemasrelationships;theserelationshipshavetypesanddirectionalsignificance.Eachnodehasindependentrelationshipswithothernodes.Theserelationshipshavenameslike
PURCHASED,PAID_WITH,orBELONGS_TO(seeFigure3.5);theserelationshipnamesletyoutraversethegraph.Let’ssayyouwanttofindalltheCustomerswhoPURCHASEDaproductwiththenameRefactoringDatabase.AllweneedtodoisqueryfortheproductnodeRefactoringDatabasesandlookforalltheCustomerswiththeincomingPURCHASEDrelationship.
Figure3.5.Graphmodelofe-commercedataThistypeofrelationshiptraversalisveryeasywithgraphdatabases.Itisespeciallyconvenient
whenyouneedtousethedatatorecommendproductstousersortofindpatternsinactionstakenbyusers.
3.6.KeyPoints•Aggregate-orienteddatabasesmakeinter-aggregaterelationshipsmoredifficulttohandlethanintra-aggregaterelationships.
•Graphdatabasesorganizedataintonodeandedgegraphs;theyworkbestfordatathathascomplexrelationshipstructures.
•Schemalessdatabasesallowyoutofreelyaddfieldstorecords,butthereisusuallyanimplicitschemaexpectedbyusersofthedata.
•Aggregate-orienteddatabasesoftencomputematerializedviewstoprovidedataorganizeddifferentlyfromtheirprimaryaggregates.Thisisoftendonewithmap-reducecomputations.
Chapter4.DistributionModels
TheprimarydriverofinterestinNoSQLhasbeenitsabilitytorundatabasesonalargecluster.Asdatavolumesincrease,itbecomesmoredifficultandexpensivetoscaleup—buyabiggerservertorunthedatabaseon.Amoreappealingoptionistoscaleout—runthedatabaseonaclusterofservers.Aggregateorientationfitswellwithscalingoutbecausetheaggregateisanaturalunittousefordistribution.Dependingonyourdistributionmodel,youcangetadatastorethatwillgiveyoutheabilityto
handlelargerquantitiesofdata,theabilitytoprocessagreaterreadorwritetraffic,ormoreavailabilityinthefaceofnetworkslowdownsorbreakages.Theseareoftenimportantbenefits,buttheycomeatacost.Runningoveraclusterintroducescomplexity—soit’snotsomethingtodounlessthebenefitsarecompelling.Broadly,therearetwopathstodatadistribution:replicationandsharding.Replicationtakesthe
samedataandcopiesitovermultiplenodes.Shardingputsdifferentdataondifferentnodes.Replicationandshardingareorthogonaltechniques:Youcanuseeitherorbothofthem.Replicationcomesintotwoforms:master-slaveandpeer-to-peer.Wewillnowdiscussthesetechniquesstartingatthesimplestandworkinguptothemorecomplex:firstsingle-server,thenmaster-slavereplication,thensharding,andfinallypeer-to-peerreplication.
4.1.SingleServerThefirstandthesimplestdistributionoptionistheonewewouldmostoftenrecommend—nodistributionatall.Runthedatabaseonasinglemachinethathandlesallthereadsandwritestothedatastore.Wepreferthisoptionbecauseiteliminatesallthecomplexitiesthattheotheroptionsintroduce;it’seasyforoperationspeopletomanageandeasyforapplicationdeveloperstoreasonabout.AlthoughalotofNoSQLdatabasesaredesignedaroundtheideaofrunningonacluster,itcan
makesensetouseNoSQLwithasingle-serverdistributionmodelifthedatamodeloftheNoSQLstoreismoresuitedtotheapplication.Graphdatabasesaretheobviouscategoryhere—theseworkbestinasingle-serverconfiguration.Ifyourdatausageismostlyaboutprocessingaggregates,thenasingle-serverdocumentorkey-valuestoremaywellbeworthwhilebecauseit’seasieronapplicationdevelopers.Fortherestofthischapterwe’llbewadingthroughtheadvantagesandcomplicationsofmore
sophisticateddistributionschemes.Don’tletthevolumeofwordsfoolyouintothinkingthatwewouldprefertheseoptions.Ifwecangetawaywithoutdistributingourdata,wewillalwayschooseasingle-serverapproach.
4.2.ShardingOften,abusydatastoreisbusybecausedifferentpeopleareaccessingdifferentpartsofthedataset.Inthesecircumstanceswecansupporthorizontalscalabilitybyputtingdifferentpartsofthedataontodifferentservers—atechniquethat’scalledsharding(seeFigure4.1).
Figure4.1.Shardingputsdifferentdataonseparatenodes,eachofwhichdoesitsownreadsandwrites.
Intheidealcase,wehavedifferentusersalltalkingtodifferentservernodes.Eachuseronlyhastotalktooneserver,sogetsrapidresponsesfromthatserver.Theloadisbalancedoutnicelybetweenservers—forexample,ifwehavetenservers,eachoneonlyhastohandle10%oftheload.Ofcoursetheidealcaseisaprettyrarebeast.Inordertogetclosetoitwehavetoensurethatdata
that’saccessedtogetherisclumpedtogetheronthesamenodeandthattheseclumpsarearrangedonthenodestoprovidethebestdataaccess.Thefirstpartofthisquestionishowtoclumpthedataupsothatoneusermostlygetsherdatafrom
asingleserver.Thisiswhereaggregateorientationcomesinreallyhandy.Thewholepointofaggregatesisthatwedesignthemtocombinedatathat’scommonlyaccessedtogether—soaggregatesleapoutasanobviousunitofdistribution.Whenitcomestoarrangingthedataonthenodes,thereareseveralfactorsthatcanhelpimprove
performance.Ifyouknowthatmostaccessesofcertainaggregatesarebasedonaphysicallocation,youcanplacethedataclosetowhereit’sbeingaccessed.IfyouhaveordersforsomeonewholivesinBoston,youcanplacethatdatainyoureasternUSdatacenter.Anotherfactoristryingtokeeptheloadeven.Thismeansthatyoushouldtrytoarrange
aggregatessotheyareevenlydistributedacrossthenodeswhichallgetequalamountsoftheload.Thismayvaryovertime,forexampleifsomedatatendstobeaccessedoncertaindaysoftheweek—sotheremaybedomain-specificrulesyou’dliketouse.Insomecases,it’susefultoputaggregatestogetherifyouthinktheymaybereadinsequence.The
Bigtablepaper[Changetc.]describedkeepingitsrowsinlexicographicorderandsortingwebaddressesbasedonreverseddomainnames(e.g.,com.martinfowler).Thiswaydataformultiplepagescouldbeaccessedtogethertoimproveprocessingefficiency.Historicallymostpeoplehavedoneshardingaspartofapplicationlogic.Youmightputall
customerswithsurnamesstartingfromAtoDononeshardandEtoGonanother.Thiscomplicatestheprogrammingmodel,asapplicationcodeneedstoensurethatqueriesaredistributedacrossthe
variousshards.Furthermore,rebalancingtheshardingmeanschangingtheapplicationcodeandmigratingthedata.ManyNoSQLdatabasesofferauto-sharding,wherethedatabasetakesontheresponsibilityofallocatingdatatoshardsandensuringthatdataaccessgoestotherightshard.Thiscanmakeitmucheasiertouseshardinginanapplication.Shardingisparticularlyvaluableforperformancebecauseitcanimprovebothreadandwrite
performance.Usingreplication,particularlywithcaching,cangreatlyimprovereadperformancebutdoeslittleforapplicationsthathavealotofwrites.Shardingprovidesawaytohorizontallyscalewrites.Shardingdoeslittletoimproveresiliencewhenusedalone.Althoughthedataisondifferentnodes,
anodefailuremakesthatshard’sdataunavailablejustassurelyasitdoesforasingle-serversolution.Theresiliencebenefititdoesprovideisthatonlytheusersofthedataonthatshardwillsuffer;however,it’snotgoodtohaveadatabasewithpartofitsdatamissing.Withasingleserverit’seasiertopaytheeffortandcosttokeepthatserverupandrunning;clustersusuallytrytouselessreliablemachines,andyou’remorelikelytogetanodefailure.Soinpractice,shardingaloneislikelytodecreaseresilience.Despitethefactthatshardingismademucheasierwithaggregates,it’sstillnotasteptobetaken
lightly.Somedatabasesareintendedfromthebeginningtousesharding,inwhichcaseit’swisetorunthemonaclusterfromtheverybeginningofdevelopment,andcertainlyinproduction.Otherdatabasesuseshardingasadeliberatestepupfromasingle-serverconfiguration,inwhichcaseit’sbesttostartsingle-serverandonlyuseshardingonceyourloadprojectionsclearlyindicatethatyouarerunningoutofheadroom.Inanycasethestepfromasinglenodetoshardingisgoingtobetricky.Wehaveheardtalesof
teamsgettingintotroublebecausetheyleftshardingtoverylate,sowhentheyturneditoninproductiontheirdatabasebecameessentiallyunavailablebecausetheshardingsupportconsumedallthedatabaseresourcesformovingthedataontonewshards.Thelessonhereistouseshardingwellbeforeyouneedto—whenyouhaveenoughheadroomtocarryoutthesharding.
4.3.Master-SlaveReplicationWithmaster-slavedistribution,youreplicatedataacrossmultiplenodes.Onenodeisdesignatedasthemaster,orprimary.Thismasteristheauthoritativesourceforthedataandisusuallyresponsibleforprocessinganyupdatestothatdata.Theothernodesareslaves,orsecondaries.Areplicationprocesssynchronizestheslaveswiththemaster(seeFigure4.2).
Figure4.2.Dataisreplicatedfrommastertoslaves.Themasterservicesallwrites;readsmaycomefromeithermasterorslaves.
Master-slavereplicationismosthelpfulforscalingwhenyouhavearead-intensivedataset.Youcanscalehorizontallytohandlemorereadrequestsbyaddingmoreslavenodesandensuringthatallreadrequestsareroutedtotheslaves.Youarestill,however,limitedbytheabilityofthemastertoprocessupdatesanditsabilitytopassthoseupdateson.Consequentlyitisn’tsuchagoodschemefordatasetswithheavywritetraffic,althoughoffloadingthereadtrafficwillhelpabitwithhandlingthewriteload.Asecondadvantageofmaster-slavereplicationisreadresilience:Shouldthemasterfail,theslaves
canstillhandlereadrequests.Again,thisisusefulifmostofyourdataaccessisreads.Thefailureofthemasterdoeseliminatetheabilitytohandlewritesuntileitherthemasterisrestoredoranewmasterisappointed.However,havingslavesasreplicatesofthemasterdoesspeeduprecoveryafterafailureofthemastersinceaslavecanbeappointedanewmasterveryquickly.Theabilitytoappointaslavetoreplaceafailedmastermeansthatmaster-slavereplicationis
usefulevenifyoudon’tneedtoscaleout.Allreadandwritetrafficcangotothemasterwhiletheslaveactsasahotbackup.Inthiscaseit’seasiesttothinkofthesystemasasingle-serverstorewithahotbackup.Yougettheconvenienceofthesingle-serverconfigurationbutwithgreaterresilience—whichisparticularlyhandyifyouwanttobeabletohandleserverfailuresgracefully.Masterscanbeappointedmanuallyorautomatically.Manualappointingtypicallymeansthatwhen
youconfigureyourcluster,youconfigureonenodeasthemaster.Withautomaticappointment,youcreateaclusterofnodesandtheyelectoneofthemselvestobethemaster.Apartfromsimplerconfiguration,automaticappointmentmeansthattheclustercanautomaticallyappointanewmaster
whenamasterfails,reducingdowntime.Inordertogetreadresilience,youneedtoensurethatthereadandwritepathsintoyour
applicationaredifferent,sothatyoucanhandleafailureinthewritepathandstillread.Thisincludessuchthingsasputtingthereadsandwritesthroughseparatedatabaseconnections—afacilitythatisnotoftensupportedbydatabaseinteractionlibraries.Aswithanyfeature,youcannotbesureyouhavereadresiliencewithoutgoodteststhatdisablethewritesandcheckthatreadsstilloccur.Replicationcomeswithsomealluringbenefits,butitalsocomeswithaninevitabledarkside—
inconsistency.Youhavethedangerthatdifferentclients,readingdifferentslaves,willseedifferentvaluesbecausethechangeshaven’tallpropagatedtotheslaves.Intheworstcase,thatcanmeanthataclientcannotreadawriteitjustmade.Evenifyouusemaster-slavereplicationjustforhotbackupthiscanbeaconcern,becauseifthemasterfails,anyupdatesnotpassedontothebackuparelost.We’lltalkabouthowtodealwiththeseissueslater(“Consistency,”p.47).
4.4.Peer-to-PeerReplicationMaster-slavereplicationhelpswithreadscalabilitybutdoesn’thelpwithscalabilityofwrites.Itprovidesresilienceagainstfailureofaslave,butnotofamaster.Essentially,themasterisstillabottleneckandasinglepointoffailure.Peer-to-peerreplication(seeFigure4.3)attackstheseproblemsbynothavingamaster.Allthereplicashaveequalweight,theycanallacceptwrites,andthelossofanyofthemdoesn’tpreventaccesstothedatastore.
Figure4.3.Peer-to-peerreplicationhasallnodesapplyingreadsandwritestoallthedata.Theprospectherelooksmightyfine.Withapeer-to-peerreplicationcluster,youcanrideover
nodefailureswithoutlosingaccesstodata.Furthermore,youcaneasilyaddnodestoimproveyourperformance.There’smuchtolikehere—buttherearecomplications.Thebiggestcomplicationis,again,consistency.Whenyoucanwritetotwodifferentplaces,you
runtheriskthattwopeoplewillattempttoupdatethesamerecordatthesametime—awrite-writeconflict.Inconsistenciesonreadleadtoproblemsbutatleasttheyarerelativelytransient.Inconsistentwritesareforever.We’lltalkmoreabouthowtodealwithwriteinconsistencieslateron,butforthemomentwe’ll
noteacoupleofbroadoptions.Atoneend,wecanensurethatwheneverwewritedata,thereplicascoordinatetoensureweavoidaconflict.Thiscangiveusjustasstrongaguaranteeasamaster,albeitatthecostofnetworktraffictocoordinatethewrites.Wedon’tneedallthereplicastoagreeonthewrite,justamajority,sowecanstillsurvivelosingaminorityofthereplicanodes.Attheotherextreme,wecandecidetocopewithaninconsistentwrite.Therearecontextswhenwe
cancomeupwithpolicytomergeinconsistentwrites.Inthiscasewecangetthefullperformancebenefitofwritingtoanyreplica.Thesepointsareattheendsofaspectrumwherewetradeoffconsistencyforavailability.
4.5.CombiningShardingandReplicationReplicationandshardingarestrategiesthatcanbecombined.Ifweusebothmaster-slavereplicationandsharding(seeFigure4.4),thismeansthatwehavemultiplemasters,buteachdataitemonlyhasasinglemaster.Dependingonyourconfiguration,youmaychooseanodetobeamasterforsomedataandslavesforothers,oryoumaydedicatenodesformasterorslaveduties.
Figure4.4.Usingmaster-slavereplicationtogetherwithshardingUsingpeer-to-peerreplicationandshardingisacommonstrategyforcolumn-familydatabases.In
ascenariolikethisyoumighthavetensorhundredsofnodesinaclusterwithdatashardedoverthem.Agoodstartingpointforpeer-to-peerreplicationistohaveareplicationfactorof3,soeachshardispresentonthreenodes.Shouldanodefail,thentheshardsonthatnodewillbebuiltontheothernodes(seeFigure4.5).
Figure4.5.Usingpeer-to-peerreplicationtogetherwithsharding
4.6.KeyPoints•Therearetwostylesofdistributingdata:•Shardingdistributesdifferentdataacrossmultipleservers,soeachserveractsasthesinglesourceforasubsetofdata.
•Replicationcopiesdataacrossmultipleservers,soeachbitofdatacanbefoundinmultipleplaces.
Asystemmayuseeitherorbothtechniques.•Replicationcomesintwoforms:•Master-slavereplicationmakesonenodetheauthoritativecopythathandleswriteswhileslavessynchronizewiththemasterandmayhandlereads.
•Peer-to-peerreplicationallowswritestoanynode;thenodescoordinatetosynchronizetheircopiesofthedata.
Master-slavereplicationreducesthechanceofupdateconflictsbutpeer-to-peerreplicationavoidsloadingallwritesontoasinglepointoffailure.
Chapter5.Consistency
Oneofthebiggestchangesfromacentralizedrelationaldatabasetoacluster-orientedNoSQLdatabaseisinhowyouthinkaboutconsistency.Relationaldatabasestrytoexhibitstrongconsistencybyavoidingallthevariousinconsistenciesthatwe’llshortlybediscussing.OnceyoustartlookingattheNoSQLworld,phrasessuchas“CAPtheorem”and“eventualconsistency”appear,andassoonasyoustartbuildingsomethingyouhavetothinkaboutwhatsortofconsistencyyouneedforyoursystem.Consistencycomesinvariousforms,andthatonewordcoversamyriadofwayserrorscancreep
intoyourlife.Sowe’regoingtobeginbytalkingaboutthevariousshapesconsistencycantake.Afterthatwe’lldiscusswhyyoumaywanttorelaxconsistency(anditsbigsister,durability).
5.1.UpdateConsistencyWe’llbeginbyconsideringupdatingatelephonenumber.Coincidentally,MartinandPramodarelookingatthecompanywebsiteandnoticethatthephonenumberisoutofdate.Implausibly,theybothhaveupdateaccess,sotheybothgoinatthesametimetoupdatethenumber.Tomaketheexampleinteresting,we’llassumetheyupdateitslightlydifferently,becauseeachusesaslightlydifferentformat.Thisissueiscalledawrite-writeconflict:twopeopleupdatingthesamedataitematthesametime.Whenthewritesreachtheserver,theserverwillserializethem—decidetoapplyone,thenthe
other.Let’sassumeitusesalphabeticalorderandpicksMartin’supdatefirst,thenPramod’s.Withoutanyconcurrencycontrol,Martin’supdatewouldbeappliedandimmediatelyoverwrittenbyPramod’s.InthiscaseMartin’sisalostupdate.Herethelostupdateisnotabigproblem,butoftenitis.WeseethisasafailureofconsistencybecausePramod’supdatewasbasedonthestatebeforeMartin’supdate,yetwasappliedafterit.Approachesformaintainingconsistencyinthefaceofconcurrencyareoftendescribedas
pessimisticoroptimistic.Apessimisticapproachworksbypreventingconflictsfromoccurring;anoptimisticapproachletsconflictsoccur,butdetectsthemandtakesactiontosortthemout.Forupdateconflicts,themostcommonpessimisticapproachistohavewritelocks,sothatinordertochangeavalueyouneedtoacquirealock,andthesystemensuresthatonlyoneclientcangetalockatatime.SoMartinandPramodwouldbothattempttoacquirethewritelock,butonlyMartin(thefirstone)wouldsucceed.PramodwouldthenseetheresultofMartin’swritebeforedecidingwhethertomakehisownupdate.Acommonoptimisticapproachisaconditionalupdatewhereanyclientthatdoesanupdatetests
thevaluejustbeforeupdatingittoseeifit’schangedsincehislastread.Inthiscase,Martin’supdatewouldsucceedbutPramod’swouldfail.TheerrorwouldletPramodknowthatheshouldlookatthevalueagainanddecidewhethertoattemptafurtherupdate.Boththepessimisticandoptimisticapproachesthatwe’vejustdescribedrelyonaconsistent
serializationoftheupdates.Withasingleserver,thisisobvious—ithastochooseone,thentheother.Butifthere’smorethanoneserver,suchaswithpeer-to-peerreplication,thentwonodesmightapplytheupdatesinadifferentorder,resultinginadifferentvalueforthetelephonenumberoneachpeer.Often,whenpeopletalkaboutconcurrencyindistributedsystems,theytalkaboutsequentialconsistency—ensuringthatallnodesapplyoperationsinthesameorder.Thereisanotheroptimisticwaytohandleawrite-writeconflict—savebothupdatesandrecordthat
theyareinconflict.Thisapproachisfamiliartomanyprogrammersfromversioncontrolsystems,particularlydistributedversioncontrolsystemsthatbytheirnaturewilloftenhaveconflictingcommits.Thenextstepagainfollowsfromversioncontrol:Youhavetomergethetwoupdatessomehow.Maybeyoushowbothvaluestotheuserandaskthemtosortitout—thisiswhathappensifyouupdatethesamecontactonyourphoneandyourcomputer.Alternatively,thecomputermaybeabletoperformthemergeitself;ifitwasaphoneformattingissue,itmaybeabletorealizethatandapplythenewnumberwiththestandardformat.Anyautomatedmergeofwrite-writeconflictsishighlydomain-specificandneedstobeprogrammedforeachparticularcase.Often,whenpeoplefirstencountertheseissues,theirreactionistopreferpessimisticconcurrency
becausetheyaredeterminedtoavoidconflicts.Whileinsomecasesthisistherightanswer,thereisalwaysatradeoff.Concurrentprogramminginvolvesafundamentaltradeoffbetweensafety(avoidingerrorssuchasupdateconflicts)andliveness(respondingquicklytoclients).Pessimisticapproachesoftenseverelydegradetheresponsivenessofasystemtothedegreethatitbecomesunfitforitspurpose.Thisproblemismadeworsebythedangeroferrors—pessimisticconcurrencyoftenleadstodeadlocks,whicharehardtopreventanddebug.Replicationmakesitmuchmorelikelytorunintowrite-writeconflicts.Ifdifferentnodeshave
differentcopiesofsomedatawhichcanbeindependentlyupdated,thenyou’llgetconflictsunlessyoutakespecificmeasurestoavoidthem.Usingasinglenodeasthetargetforallwritesforsomedatamakesitmucheasiertomaintainupdateconsistency.Ofthedistributionmodelswediscussedearlier,allbutpeer-to-peerreplicationdothis.
5.2.ReadConsistencyHavingadatastorethatmaintainsupdateconsistencyisonething,butitdoesn’tguaranteethatreadersofthatdatastorewillalwaysgetconsistentresponsestotheirrequests.Let’simaginewehaveanorderwithlineitemsandashippingcharge.Theshippingchargeiscalculatedbasedonthelineitemsintheorder.Ifweaddalineitem,wethusalsoneedtorecalculateandupdatetheshippingcharge.Inarelationaldatabase,theshippingchargeandlineitemswillbeinseparatetables.ThedangerofinconsistencyisthatMartinaddsalineitemtohisorder,Pramodthenreadsthelineitemsandshippingcharge,andthenMartinupdatestheshippingcharge.Thisisaninconsistentreadorread-writeconflict:InFigure5.1PramodhasdoneareadinthemiddleofMartin’swrite.
Figure5.1.Aread-writeconflictinlogicalconsistencyWerefertothistypeofconsistencyaslogicalconsistency:ensuringthatdifferentdataitemsmake
sensetogether.Toavoidalogicallyinconsistentread-writeconflict,relationaldatabasessupportthenotionoftransactions.ProvidingMartinwrapshistwowritesinatransaction,thesystemguaranteesthatPramodwilleitherreadbothdataitemsbeforetheupdateorbothaftertheupdate.AcommonclaimwehearisthatNoSQLdatabasesdon’tsupporttransactionsandthuscan’tbe
consistent.Suchclaimismostlywrongbecauseitglossesoverlotsofimportantdetails.OurfirstclarificationisthatanystatementaboutlackoftransactionsusuallyonlyappliestosomeNoSQLdatabases,inparticulartheaggregate-orientedones.Incontrast,graphdatabasestendtosupportACIDtransactionsjustthesameasrelationaldatabases.Secondly,aggregate-orienteddatabasesdosupportatomicupdates,butonlywithinasingle
aggregate.Thismeansthatyouwillhavelogicalconsistencywithinanaggregatebutnotbetweenaggregates.Sointheexample,youcouldavoidrunningintothatinconsistencyiftheorder,thedeliverycharge,andthelineitemsareallpartofasingleorderaggregate.Ofcoursenotalldatacanbeputinthesameaggregate,soanyupdatethataffectsmultiple
aggregatesleavesopenatimewhenclientscouldperformaninconsistentread.Thelengthoftimeaninconsistencyispresentiscalledtheinconsistencywindow.ANoSQLsystemmayhaveaquiteshortinconsistencywindow:Asonedatapoint,Amazon’sdocumentationsaysthattheinconsistencywindowforitsSimpleDBserviceisusuallylessthanasecond.Thisexampleofalogicallyinconsistentreadistheclassicexamplethatyou’llseeinanybookthat
touchesdatabaseprogramming.Onceyouintroducereplication,however,yougetawholenewkindofinconsistency.Let’simaginethere’sonelasthotelroomforadesirableevent.Thehotelreservationsystemrunsonmanynodes.MartinandCindyareacoupleconsideringthisroom,buttheyarediscussingthisonthephonebecauseMartinisinLondonandCindyisinBoston.MeanwhilePramod,whoisinMumbai,goesandbooksthatlastroom.Thatupdatesthereplicatedroomavailability,buttheupdategetstoBostonquickerthanitgetstoLondon.WhenMartinandCindyfireuptheirbrowserstoseeiftheroomisavailable,CindyseesitbookedandMartinseesitfree.Thisisanotherinconsistentread—butit’sabreachofadifferentformofconsistencywecallreplicationconsistency:ensuringthatthesamedataitemhasthesamevaluewhenreadfromdifferentreplicas(seeFigure5.2).
Figure5.2.AnexampleofreplicationinconsistencyEventually,ofcourse,theupdateswillpropagatefully,andMartinwillseetheroomisfully
booked.Thereforethissituationisgenerallyreferredtoaseventuallyconsistent,meaningthatatanytimenodesmayhavereplicationinconsistenciesbut,iftherearenofurtherupdates,eventuallyallnodeswillbeupdatedtothesamevalue.Datathatisoutofdateisgenerallyreferredtoasstale,whichremindsusthatacacheisanotherformofreplication—essentiallyfollowingthemaster-slavedistributionmodel.Althoughreplicationconsistencyisindependentfromlogicalconsistency,replicationcan
exacerbatealogicalinconsistencybylengtheningitsinconsistencywindow.Twodifferentupdatesonthemastermaybeperformedinrapidsuccession,leavinganinconsistencywindowofmilliseconds.Butdelaysinnetworkingcouldmeanthatthesameinconsistencywindowlastsformuchlongeronaslave.Consistencyguaranteesaren’tsomethingthat’sglobaltoanapplication.Youcanusuallyspecifythe
levelofconsistencyyouwantwithindividualrequests.Thisallowsyoutouseweakconsistencymostofthetimewhenitisn’tanissue,butrequeststrongconsistencywhenitis.Thepresenceofaninconsistencywindowmeansthatdifferentpeoplewillseedifferentthingsatthe
sametime.IfMartinandCindyarelookingatroomswhileonatransatlanticcall,itcancauseconfusion.It’smorecommonforuserstoactindependently,andthenthisisnotaproblem.Butinconsistencywindowscanbeparticularlyproblematicwhenyougetinconsistencieswithyourself.Considertheexampleofpostingcommentsonablogentry.Fewpeoplearegoingtoworryaboutinconsistencywindowsofevenafewminuteswhilepeoplearetypingintheirlatestthoughts.Often,systemshandletheloadofsuchsitesbyrunningonaclusterandload-balancingincomingrequeststodifferentnodes.Thereinliesadanger:Youmaypostamessageusingonenode,thenrefreshyourbrowser,buttherefreshgoestoadifferentnodewhichhasn’treceivedyourpostyet—anditlookslikeyourpostwaslost.Insituationslikethis,youcantoleratereasonablylonginconsistencywindows,butyouneedread-
your-writesconsistencywhichmeansthat,onceyou’vemadeanupdate,you’reguaranteedtocontinueseeingthatupdate.Onewaytogetthisinanotherwiseeventuallyconsistentsystemistoprovidesessionconsistency:Withinauser ’ssessionthereisread-your-writesconsistency.Thisdoesmeanthattheusermaylosethatconsistencyshouldtheirsessionendforsomereasonorshouldtheuseraccessthesamesystemsimultaneouslyfromdifferentcomputers,butthesecasesarerelativelyrare.Thereareacoupleoftechniquestoprovidesessionconsistency.Acommonway,andoftenthe
easiestway,istohaveastickysession:asessionthat’stiedtoonenode(thisisalsocalledsessionaffinity).Astickysessionallowsyoutoensurethataslongasyoukeepread-your-writesconsistencyonanode,you’llgetitforsessionstoo.Thedownsideisthatstickysessionsreducetheabilityoftheloadbalancertodoitsjob.Anotherapproachforsessionconsistencyistouseversionstamps(“VersionStamps,”p.61)and
ensureeveryinteractionwiththedatastoreincludesthelatestversionstampseenbyasession.Theservernodemustthenensurethatithastheupdatesthatincludethatversionstampbeforerespondingtoarequest.Maintainingsessionconsistencywithstickysessionsandmaster-slavereplicationcanbeawkward
ifyouwanttoreadfromtheslavestoimprovereadperformancebutstillneedtowritetothemaster.Onewayofhandlingthisisforwritestobesenttheslave,whothentakesresponsibilityforforwardingthemtothemasterwhilemaintainingsessionconsistencyforitsclient.Anotherapproachistoswitchthesessiontothemastertemporarilywhendoingawrite,justlongenoughthatreadsaredonefromthemasteruntiltheslaveshavecaughtupwiththeupdate.We’retalkingaboutreplicationconsistencyinthecontextofadatastore,butit’salsoanimportant
factorinoverallapplicationdesign.Evenasimpledatabasesystemwillhavelotsofoccasionswheredataispresentedtoauser,theusercogitates,andthenupdatesthatdata.It’susuallyabadideatokeepatransactionopenduringuserinteractionbecausethere’sarealdangerofconflictswhentheusertriestomakeherupdate,whichleadstosuchapproachesasofflinelocks[FowlerPoEAA].
5.3.RelaxingConsistencyConsistencyisaGoodThing—but,sadly,sometimeswehavetosacrificeit.Itisalwayspossibletodesignasystemtoavoidinconsistencies,butoftenimpossibletodosowithoutmakingunbearablesacrificesinothercharacteristicsofthesystem.Asaresult,weoftenhavetotradeoffconsistencyforsomethingelse.Whilesomearchitectsseethisasadisaster,weseeitaspartoftheinevitabletradeoffsinvolvedinsystemdesign.Furthermore,differentdomainshavedifferenttolerancesforinconsistency,andweneedtotakethistoleranceintoaccountaswemakeourdecisions.Tradingoffconsistencyisafamiliarconcepteveninsingle-serverrelationaldatabasesystems.
Here,ourprincipaltooltoenforceconsistencyisthetransaction,andtransactionscanprovidestrongconsistencyguarantees.However,transactionsystemsusuallycomewiththeabilitytorelaxisolationlevels,allowingqueriestoreaddatathathasn’tbeencommittedyet,andinpracticeweseemostapplicationsrelaxconsistencydownfromthehighestisolationlevel(serialized)inordertogeteffectiveperformance.Wemostcommonlyseepeopleusingtheread-committedtransactionlevel,whicheliminatessomeread-writeconflictsbutallowsothers.Manysystemsforgotransactionsentirelybecausetheperformanceimpactoftransactionsistoo
high.We’veseenthisinacoupledifferentways.Onasmallscale,wesawthepopularityofMySQLduringthedayswhenitdidn’tsupporttransactions.ManywebsiteslikedthehighspeedofMySQLandwerepreparedtolivewithouttransactions.Attheotherendofthescale,someverylargewebsites,suchaseBay[Pritchett],havehadtoforgotransactionsinordertoperformacceptably—thisisparticularlytruewhenyouneedtointroducesharding.Evenwithouttheseconstraints,manyapplicationbuildersneedtointeractwithremotesystemsthatcan’tbeproperlyincludedwithinatransactionboundary,soupdatingoutsideoftransactionsisaquitecommonoccurrenceforenterpriseapplications.
5.3.1.TheCAPTheoremIntheNoSQLworldit’scommontorefertotheCAPtheoremasthereasonwhyyoumayneedtorelaxconsistency.ItwasoriginallyproposedbyEricBrewerin2000[Brewer]andgivenaformalproofbySethGilbertandNancyLynch[LynchandGilbert]acoupleofyearslater.(YoumayalsohearthisreferredtoasBrewer ’sConjecture.)ThebasicstatementoftheCAPtheoremisthat,giventhethreepropertiesofConsistency,
Availability,andPartitiontolerance,youcanonlygettwo.Obviouslythisdependsverymuchonhowyoudefinethesethreeproperties,anddifferingopinionshaveledtoseveraldebatesonwhattherealconsequencesoftheCAPtheoremare.Consistencyisprettymuchaswe’vedefineditsofar.Availabilityhasaparticularmeaninginthe
contextofCAP—itmeansthatifyoucantalktoanodeinthecluster,itcanreadandwritedata.That’ssubtlydifferentfromtheusualmeaning,whichwe’llexplorelater.Partitiontolerancemeansthattheclustercansurvivecommunicationbreakagesintheclusterthatseparatetheclusterintomultiplepartitionsunabletocommunicatewitheachother(situationknownasasplitbrain,seeFigure5.3).
Figure5.3.Withtwobreaksinthecommunicationlines,thenetworkpartitionsintotwogroups.Asingle-serversystemistheobviousexampleofaCAsystem—asystemthathasConsistencyand
AvailabilitybutnotPartitiontolerance.Asinglemachinecan’tpartition,soitdoesnothavetoworryaboutpartitiontolerance.There’sonlyonenode—soifit’sup,it’savailable.Beingupandkeepingconsistencyisreasonable.Thisistheworldthatmostrelationaldatabasesystemslivein.ItistheoreticallypossibletohaveaCAcluster.However,thiswouldmeanthatifapartitionever
occursinthecluster,allthenodesintheclusterwouldgodownsothatnoclientcantalktoanode.Bytheusualdefinitionof“available,”thiswouldmeanalackofavailability,butthisiswhereCAP’sspecialusageof“availability”getsconfusing.CAPdefines“availability”tomean“everyrequestreceivedbyanonfailingnodeinthesystemmustresultinaresponse”[LynchandGilbert].Soafailed,unresponsivenodedoesn’tinferalackofCAPavailability.ThisdoesimplythatyoucanbuildaCAcluster,butyouhavetoensureitwillonlypartitionrarely
andcompletely.Thiscanbedone,atleastwithinadatacenter,butit’susuallyprohibitivelyexpensive.Rememberthatinordertobringdownallthenodesinaclusteronapartition,youalsohavetodetectthepartitioninatimelymanner—whichitselfisnosmallfeat.Soclustershavetobetolerantofnetworkpartitions.AndhereistherealpointoftheCAPtheorem.
AlthoughtheCAPtheoremisoftenstatedas“youcanonlygettwooutofthree,”inpracticewhatit’ssayingisthatinasystemthatmaysufferpartitions,asdistributedsystemdo,youhavetotradeoffconsistencyversusavailability.Thisisn’tabinarydecision;often,youcantradeoffalittleconsistencytogetsomeavailability.Theresultingsystemwouldbeneitherperfectlyconsistentnorperfectlyavailable—butwouldhaveacombinationthatisreasonableforyourparticularneeds.Anexampleshouldhelpillustratethis.MartinandPramodarebothtryingtobookthelasthotel
roomonasystemthatusespeer-to-peerdistributionwithtwonodes(LondonforMartinandMumbaiforPramod).Ifwewanttoensureconsistency,thenwhenMartintriestobookhisroomontheLondonnode,thatnodemustcommunicatewiththeMumbainodebeforeconfirmingthebooking.Essentially,bothnodesmustagreeontheserializationoftheirrequests.Thisgivesusconsistency—butshouldthenetworklinkbreak,thenneithersystemcanbookanyhotelroom,sacrificingavailability.Onewaytoimproveavailabilityistodesignateonenodeasthemasterforaparticularhoteland
ensureallbookingsareprocessedbythatmaster.ShouldthatmasterbeMumbai,thenMumbaicanstillprocesshotelbookingsforthathotelandPramodwillgetthelastroom.Ifweusemaster-slavereplication,Londonuserscanseetheinconsistentroominformationbutcannotmakeabookingandthuscauseanupdateinconsistency.However,usersexpectthatitcouldhappeninthissituation—so,again,thecompromiseworksforthisparticularusecase.Thisimprovesthesituation,butwestillcan’tbookaroomontheLondonnodeforthehotelwhose
masterisinMumbaiiftheconnectiongoesdown.InCAPterminology,thisisafailureofavailabilityinthatMartincantalktotheLondonnodebuttheLondonnodecannotupdatethedata.Togainmoreavailability,wemightallowbothsystemstokeepacceptinghotelreservationsevenifthenetworklinkbreaksdown.ThedangerhereisthatMartinandPramodbookthelasthotelroom.However,dependingonhowthishoteloperates,thatmaybefine.Often,travelcompaniestolerateacertainamountofoverbookinginordertocopewithno-shows.Conversely,somehotelsalwayskeepafewroomsclearevenwhentheyarefullybooked,inordertobeabletoswapaguestoutofaroomwithproblemsortoaccommodateahigh-statuslatebooking.Somemightevencancelthebookingwithanapologyoncetheydetectedtheconflict—reasoningthatthecostofthatislessthanthecostoflosingbookingsonnetworkfailures.Theclassicexampleofallowinginconsistentwritesistheshoppingcart,asdiscussedinDynamo
[Amazon’sDynamo].Inthiscaseyouarealwaysallowedtowritetoyourshoppingcart,evenifnetworkfailuresmeanyouendupwithmultipleshoppingcarts.Thecheckoutprocesscanmergethetwoshoppingcartsbyputtingtheunionoftheitemsfromthecartsintoasinglecartandreturningthat.Almostalwaysthat’sthecorrectanswer—butifnot,theusergetstheopportunitytolookatthecartbeforecompletingtheorder.ThelessonhereisthatalthoughmostsoftwaredeveloperstreatupdateconsistencyasTheWay
ThingsMustBe,therearecaseswhereyoucandealgracefullywithinconsistentanswerstorequests.Thesesituationsarecloselytiedtothedomainandrequiredomainknowledgetoknowhowtoresolve.Thusyoucan’tusuallylooktosolvethempurelywithinthedevelopmentteam—youhavetotalktodomainexperts.Ifyoucanfindawaytohandleinconsistentupdates,thisgivesyoumoreoptionstoincreaseavailabilityandperformance.Forashoppingcart,itmeansthatshopperscanalwaysshop,anddosoquickly.AndasPatrioticAmericans,weknowhowvitalitistosupportOurRetailDestiny.Asimilarlogicappliestoreadconsistency.Ifyouaretradingfinancialinstrumentsovera
computerizedexchange,youmaynotbeabletotolerateanydatathatisn’trightuptodate.However,ifyouarepostinganewsitemtoamediawebsite,youmaybeabletotolerateoldpagesforminutes.Inthesecasesyouneedtoknowhowtolerantyouareofstalereads,andhowlongtheinconsistency
windowcanbe—oftenintermsoftheaveragelength,worstcase,andsomemeasureofthedistributionforthelengths.Differentdataitemsmayhavedifferenttolerancesforstaleness,andthusmayneeddifferentsettingsinyourreplicationconfiguration.AdvocatesofNoSQLoftensaythatinsteadoffollowingtheACIDpropertiesofrelational
transactions,NoSQLsystemsfollowtheBASEproperties(BasicallyAvailable,Softstate,Eventualconsistency)[Brewer].AlthoughwefeelweoughttomentiontheBASEacronymhere,wedon’tthinkit’sveryuseful.TheacronymisevenmorecontrivedthanACID,andneither“basicallyavailable”nor“softstate”havebeenwelldefined.WeshouldalsostressthatwhenBrewerintroducedthenotionofBASE,hesawthetradeoffbetweenACIDandBASEasaspectrum,notabinarychoice.We’veincludedthisdiscussionoftheCAPtheorembecauseit’softenused(andabused)when
talkingaboutthetradeoffsinvolvingconsistencyindistributeddatabases.However,it’susuallybettertothinknotaboutthetradeoffbetweenconsistencyandavailabilitybutratherbetweenconsistencyandlatency.Wecansummarizemuchofthediscussionaboutconsistencyindistributionbysayingthatwecanimproveconsistencybygettingmorenodesinvolvedintheinteraction,buteachnodeweaddincreasestheresponsetimeofthatinteraction.Wecanthenthinkofavailabilityasthelimitoflatencythatwe’repreparedtotolerate;oncelatencygetstoohigh,wegiveupandtreatthedataasunavailable—whichneatlyfitsitsdefinitioninthecontextofCAP.
5.4.RelaxingDurabilitySofarwe’vetalkedaboutconsistency,whichismostofwhatpeoplemeanwhentheytalkabouttheACIDpropertiesofdatabasetransactions.ThekeytoConsistencyisserializingrequestsbyformingAtomic,Isolatedworkunits.Butmostpeoplewouldscoffatrelaxingdurability—afterall,whatisthepointofadatastoreifitcanloseupdates?Asitturnsout,therearecaseswhereyoumaywanttotradeoffsomedurabilityforhigher
performance.Ifadatabasecanrunmostlyinmemory,applyupdatestoitsin-memoryrepresentation,andperiodicallyflushchangestodisk,thenitmaybeabletoprovidesubstantiallyhigherresponsivenesstorequests.Thecostisthat,shouldtheservercrash,anyupdatessincethelastflushwillbelost.Oneexampleofwherethistradeoffmaybeworthwhileisstoringuser-sessionstate.Abigwebsite
mayhavemanyusersandkeeptemporaryinformationaboutwhateachuserisdoinginsomekindofsessionstate.There’salotofactivityonthisstate,creatinglotsofdemand,whichaffectstheresponsivenessofthewebsite.Thevitalpointisthatlosingthesessiondataisn’ttoomuchofatragedy—itwillcreatesomeannoyance,butmaybelessthanaslowerwebsitewouldcause.Thismakesitagoodcandidatefornondurablewrites.Often,youcanspecifythedurabilityneedsonacall-by-callbasis,sothatmoreimportantupdatescanforceaflushtodisk.Anotherexampleofrelaxingdurabilityiscapturingtelemetricdatafromphysicaldevices.Itmay
bethatyou’drathercapturedataatafasterrate,atthecostofmissingthelastupdatesshouldtheservergodown.Anotherclassofdurabilitytradeoffscomesupwithreplicateddata.Afailureofreplication
durabilityoccurswhenanodeprocessesanupdatebutfailsbeforethatupdateisreplicatedtotheothernodes.Asimplecaseofthismayhappenifyouhaveamaster-slavedistributionmodelwheretheslavesappointanewmasterautomaticallyshouldtheexistingmasterfail.Ifamasterdoesfail,anywritesnotpassedontothereplicaswilleffectivelybecomelost.Shouldthemastercomebackonline,thoseupdateswillconflictwithupdatesthathavehappenedsince.Wethinkofthisasadurabilityproblembecauseyouthinkyourupdatehassucceededsincethemasteracknowledgedit,butamasternodefailurecausedittobelost.Ifyou’resufficientlyconfidentinbringingthemasterbackonlinerapidly,thisisareasonnotto
auto-failovertoaslave.Otherwise,youcanimprovereplicationdurabilitybyensuringthatthemasterwaitsforsomereplicastoacknowledgetheupdatebeforethemasteracknowledgesittotheclient.
Obviously,however,thatwillslowdownupdatesandmaketheclusterunavailableifslavesfail—so,again,wehaveatradeoff,dependinguponhowvitaldurabilityis.Aswithbasicdurability,it’susefulforindividualcallstoindicatewhatlevelofdurabilitytheyneed.
5.5.QuorumsWhenyou’retradingoffconsistencyordurability,it’snotanallornothingproposition.Themorenodesyouinvolveinarequest,thehigheristhechanceofavoidinganinconsistency.Thisnaturallyleadstothequestion:Howmanynodesneedtobeinvolvedtogetstrongconsistency?Imaginesomedatareplicatedoverthreenodes.Youdon’tneedallnodestoacknowledgeawriteto
ensurestrongconsistency;allyouneedistwoofthem—amajority.Ifyouhaveconflictingwrites,onlyonecangetamajority.ThisisreferredtoasawritequorumandexpressedinaslightlypretentiousinequalityofW>N/2,meaningthenumberofnodesparticipatinginthewrite(W)mustbemorethanthehalfthenumberofnodesinvolvedinreplication(N).Thenumberofreplicasisoftencalledthereplicationfactor.Similarlytothewritequorum,thereisthenotionofreadquorum:Howmanynodesyouneedto
contacttobesureyouhavethemostup-to-datechange.Thereadquorumisabitmorecomplicatedbecauseitdependsonhowmanynodesneedtoconfirmawrite.Let’sconsiderareplicationfactorof3.Ifallwritesneedtwonodestoconfirm(W=2)thenwe
needtocontactatleasttwonodestobesurewe’llgetthelatestdata.If,however,writesareonlyconfirmedbyasinglenode(W=1)weneedtotalktoallthreenodestobesurewehavethelatestupdates.Inthiscase,sincewedon’thaveawritequorum,wemayhaveanupdateconflict,butbycontactingenoughreaderswecanbesuretodetectit.Thuswecangetstronglyconsistentreadsevenifwedon’thavestrongconsistencyonourwrites.Thisrelationshipbetweenthenumberofnodesyouneedtocontactforaread(R),thoseconfirming
awrite(W),andthereplicationfactor(N)canbecapturedinaninequality:YoucanhaveastronglyconsistentreadifR+W>N.Theseinequalitiesarewrittenwithapeer-to-peerdistributionmodelinmind.Ifyouhaveamaster-
slavedistribution,youonlyhavetowritetothemastertoavoidwrite-writeconflicts,andsimilarlyonlyreadfromthemastertoavoidread-writeconflicts.Withthisnotation,itiscommontoconfusethenumberofnodesintheclusterwiththereplicationfactor,buttheseareoftendifferent.Imayhave100nodesinmycluster,butonlyhaveareplicationfactorof3,withmostofthedistributionoccurringduetosharding.Indeedmostauthoritiessuggestthatareplicationfactorof3isenoughtohavegoodresilience.
Thisallowsasinglenodetofailwhilestillmaintainingquoraforreadsandwrites.Ifyouhaveautomaticrebalancing,itwon’ttaketoolongfortheclustertocreateathirdreplica,sothechancesoflosingasecondreplicabeforeareplacementcomesupareslight.Thenumberofnodesparticipatinginanoperationcanvarywiththeoperation.Whenwriting,we
mightrequirequorumforsometypesofupdatesbutnotothers,dependingonhowmuchwevalueconsistencyandavailability.Similarly,areadthatneedsspeedbutcantoleratestalenessshouldcontactlessnodes.Oftenyoumayneedtotakebothintoaccount.Ifyouneedfast,stronglyconsistentreads,youcould
requirewritestobeacknowledgedbyallthenodes,thusallowingreadstocontactonlyone(N=3,W=3,R=1).Thatwouldmeanthatyourwritesareslow,sincetheyhavetocontactallthreenodes,andyouwouldnotbeabletotoleratelosinganode.Butinsomecircumstancesthatmaybethetradeofftomake.
Thepointtoallofthisisthatyouhavearangeofoptionstoworkwithandcanchoosewhichcombinationofproblemsandadvantagestoprefer.SomewritersonNoSQLtalkaboutasimpletradeoffbetweenconsistencyandavailability;wehopeyounowrealizethatit’smoreflexible—andmorecomplicated—thanthat.
5.6.FurtherReadingThereareallsortsofinterestingblogpostsandpapersontheInternetaboutconsistencyindistributedsystems,butthemosthelpfulsourceforuswas[TanenbaumandVanSteen].Itdoesanexcellentjoboforganizingmuchofthefundamentalsofdistributedsystemsandisthebestplacetogoifyou’dliketodelvedeeperthanwehaveinthischapter.Aswewerefinishingthisbook,IEEEComputerhadaspecialissue[IEEEComputerFeb2012]on
thegrowinginfluenceoftheCAPtheorem,whichisahelpfulsourceoffurtherclarificationforthistopic.
5.7.KeyPoints•Write-writeconflictsoccurwhentwoclientstrytowritethesamedataatthesametime.Read-writeconflictsoccurwhenoneclientreadsinconsistentdatainthemiddleofanotherclient’swrite.
•Pessimisticapproacheslockdatarecordstopreventconflicts.Optimisticapproachesdetectconflictsandfixthem.
•Distributedsystemsseeread-writeconflictsduetosomenodeshavingreceivedupdateswhileothernodeshavenot.Eventualconsistencymeansthatatsomepointthesystemwillbecomeconsistentonceallthewriteshavepropagatedtoallthenodes.
•Clientsusuallywantread-your-writesconsistency,whichmeansaclientcanwriteandthenimmediatelyreadthenewvalue.Thiscanbedifficultifthereadandthewritehappenondifferentnodes.
•Togetgoodconsistency,youneedtoinvolvemanynodesindataoperations,butthisincreaseslatency.Soyouoftenhavetotradeoffconsistencyversuslatency.
•TheCAPtheoremstatesthatifyougetanetworkpartition,youhavetotradeoffavailabilityofdataversusconsistency.
•Durabilitycanalsobetradedoffagainstlatency,particularlyifyouwanttosurvivefailureswithreplicateddata.
•Youdonotneedtocontactallreplicantstopreservestrongconsistencywithreplication;youjustneedalargeenoughquorum.
Chapter6.VersionStamps
ManycriticsofNoSQLdatabasesfocusonthelackofsupportfortransactions.Transactionsareausefultoolthathelpsprogrammerssupportconsistency.OnereasonwhymanyNoSQLproponentsworrylessaboutalackoftransactionsisthataggregate-orientedNoSQLdatabasesdosupportatomicupdateswithinanaggregate—andaggregatesaredesignedsothattheirdataformsanaturalunitofupdate.Thatsaid,it’struethattransactionalneedsaresomethingtotakeintoaccountwhenyoudecidewhatdatabasetouse.Aspartofthis,it’simportanttorememberthattransactionshavelimitations.Evenwithina
transactionalsystemwestillhavetodealwithupdatesthatrequirehumaninterventionandusuallycannotberunwithintransactionsbecausetheywouldinvolveholdingatransactionopenfortoolong.Wecancopewiththeseusingversionstamps—whichturnouttobehandyinothersituationsaswell,particularlyaswemoveawayfromthesingle-serverdistributionmodel.
6.1.BusinessandSystemTransactionsTheneedtosupportupdateconsistencywithouttransactionsisactuallyacommonfeatureofsystemsevenwhentheyarebuiltontopoftransactionaldatabases.Whenusersthinkabouttransactions,theyusuallymeanbusinesstransactions.Abusinesstransactionmaybesomethinglikebrowsingaproductcatalog,choosingabottleofTaliskeratagoodprice,fillingincreditcardinformation,andconfirmingtheorder.Yetallofthisusuallywon’toccurwithinthesystemtransactionprovidedbythedatabasebecausethiswouldmeanlockingthedatabaseelementswhiletheuseristryingtofindtheircreditcardandgetscalledofftolunchbytheircolleagues.Usuallyapplicationsonlybeginasystemtransactionattheendoftheinteractionwiththeuser,so
thatthelocksareonlyheldforashortperiodoftime.Theproblem,however,isthatcalculationsanddecisionsmayhavebeenmadebasedondatathat’schanged.ThepricelistmayhaveupdatedthepriceoftheTalisker,orsomeonemayhaveupdatedthecustomer ’saddress,changingtheshippingcharges.Thebroadtechniquesforhandlingthisareofflineconcurrency[FowlerPoEAA],usefulinNoSQL
situationstoo.AparticularlyusefulapproachistheOptimisticOfflineLock[FowlerPoEAA],aformofconditionalupdatewhereaclientoperationrereadsanyinformationthatthebusinesstransactionreliesonandchecksthatithasn’tchangedsinceitwasoriginallyreadanddisplayedtotheuser.Agoodwayofdoingthisistoensurethatrecordsinthedatabasecontainsomeformofversionstamp:afieldthatchangeseverytimetheunderlyingdataintherecordchanges.Whenyoureadthedatayoukeepanoteoftheversionstamp,sothatwhenyouwritedatayoucanchecktoseeiftheversionhaschanged.YoumayhavecomeacrossthistechniquewithupdatingresourceswithHTTP[HTTP].Onewayof
doingthisistouseetags.Wheneveryougetaresource,theserverrespondswithanetagintheheader.Thisetagisanopaquestringthatindicatestheversionoftheresource.Ifyouthenupdatethatresource,youcanuseaconditionalupdatebysupplyingtheetagthatyougotfromyourlastGET.Iftheresourcehaschangedontheserver,theetagswon’tmatchandtheserverwillrefusetheupdate,returninga412(PreconditionFailed)response.Somedatabasesprovideasimilarmechanismofconditionalupdatethatallowsyoutoensure
updateswon’tbebasedonstaledata.Youcandothischeckyourself,althoughyouthenhavetoensurenootherthreadcanrunagainsttheresourcebetweenyourreadandyourupdate.(Sometimesthisiscalledacompare-and-set(CAS)operation,whosenamecomesfromtheCASoperationsdonein
processors.ThedifferenceisthataprocessorCAScomparesavaluebeforesettingit,whileadatabaseconditionalupdatecomparesaversionstampofthevalue.)Therearevariouswaysyoucanconstructyourversionstamps.Youcanuseacounter,always
incrementingitwhenyouupdatetheresource.Countersareusefulsincetheymakeiteasytotellifoneversionismorerecentthananother.Ontheotherhand,theyrequiretheservertogeneratethecountervalue,andalsoneedasinglemastertoensurethecountersaren’tduplicated.AnotherapproachistocreateaGUID,alargerandomnumberthat’sguaranteedtobeunique.
Theseusesomecombinationofdates,hardwareinformation,andwhateverothersourcesofrandomnesstheycanpickup.ThenicethingaboutGUIDsisthattheycanbegeneratedbyanyoneandyou’llnevergetaduplicate;adisadvantageisthattheyarelargeandcan’tbecompareddirectlyforrecentness.Athirdapproachistomakeahashofthecontentsoftheresource.Withabigenoughhashkeysize,
acontenthashcanbegloballyuniquelikeaGUIDandcanalsobegeneratedbyanyone;theadvantageisthattheyaredeterministic—anynodewillgeneratethesamecontenthashforsameresourcedata.However,likeGUIDstheycan’tbedirectlycomparedforrecentness,andtheycanbelengthy.Afourthapproachistousethetimestampofthelastupdate.Likecounters,theyarereasonably
shortandcanbedirectlycomparedforrecentness,yethavetheadvantageofnotneedingasinglemaster.Multiplemachinescangeneratetimestamps—buttoworkproperly,theirclockshavetobekeptinsync.Onenodewithabadclockcancauseallsortsofdatacorruptions.There’salsoadangerthatifthetimestampistoogranularyoucangetduplicates—it’snogoodusingtimestampsofamillisecondprecisionifyougetmanyupdatespermillisecond.Youcanblendtheadvantagesofthesedifferentversionstampschemesbyusingmorethanoneof
themtocreateacompositestamp.Forexample,CouchDBusesacombinationofcounterandcontenthash.Mostofthetimethisallowsversionstampstobecomparedforrecentness,evenwhenyouusepeer-to-peerreplication.Shouldtwopeersupdateatthesametime,thecombinationofthesamecountanddifferentcontenthashesmakesiteasytospottheconflict.Aswellashelpingtoavoidupdateconflicts,versionstampsarealsousefulforprovidingsession
consistency(p.52).
6.2.VersionStampsonMultipleNodesThebasicversionstampworkswellwhenyouhaveasingleauthoritativesourcefordata,suchasasingleserverormaster-slavereplication.Inthatcasetheversionstampiscontrolledbythemaster.Anyslavesfollowthemaster ’sstamps.Butthissystemhastobeenhancedinapeer-to-peerdistributionmodelbecausethere’snolongerasingleplacetosettheversionstamps.Ifyou’reaskingtwonodesforsomedata,yourunintothechancethattheymaygiveyoudifferent
answers.Ifthishappens,yourreactionmayvarydependingonthecauseofthatdifference.Itmaybethatanupdatehasonlyreachedonenodebutnottheother,inwhichcaseyoucanacceptthelatest(assumingyoucantellwhichonethatis).Alternatively,youmayhaverunintoaninconsistentupdate,inwhichcaseyouneedtodecidehowtodealwiththat.Inthissituation,asimpleGUIDoretagwon’tsuffice,sincethesedon’ttellyouenoughabouttherelationships.Thesimplestformofversionstampisacounter.Eachtimeanodeupdatesthedata,itincrements
thecounterandputsthevalueofthecounterintotheversionstamp.Ifyouhaveblueandgreenslavereplicasofasinglemaster,andthebluenodeanswerswithaversionstampof4andthegreennodewith6,youknowthatthegreen’sanswerismorerecent.Inmultiple-mastercases,weneedsomethingfancier.Oneapproach,usedbydistributedversion
controlsystems,istoensurethatallnodescontainahistoryofversionstamps.Thatwayyoucanseeifthebluenode’sanswerisanancestorofthegreen’sanswer.Thiswouldeitherrequiretheclientstoholdontoversionstamphistories,ortheservernodestokeepversionstamphistoriesandincludethemwhenaskedfordata.Thisalsodetectsaninconsistency,whichwewouldseeifwegettwoversionstampsandneitherofthemhastheotherintheirhistories.Althoughversioncontrolsystemskeepthesekindsofhistories,theyaren’tfoundinNoSQLdatabases.Asimplebutproblematicapproachistousetimestamps.Themainproblemhereisthatit’susually
difficulttoensurethatallthenodeshaveaconsistentnotionoftime,particularlyifupdatescanhappenrapidly.Shouldanode’sclockgetoutofsync,itcancauseallsortsoftrouble.Inaddition,youcan’tdetectwrite-writeconflictswithtimestamps,soitwouldonlyworkwellforthesingle-mastercase—andthenacounterisusuallybetter.Themostcommonapproachusedbypeer-to-peerNoSQLsystemsisaspecialformofversion
stampwhichwecallavectorstamp.Inessence,avectorstampisasetofcounters,oneforeachnode.Avectorstampforthreenodes(blue,green,black)wouldlooksomethinglike[blue:43,green:54,black:12].Eachtimeanodehasaninternalupdate,itupdatesitsowncounter,soanupdateinthegreennodewouldchangethevectorto[blue:43,green:55,black:12].Whenevertwonodescommunicate,theysynchronizetheirvectorstamps.Thereareseveralvariationsofexactlyhowthissynchronizationisdone.We’recoiningtheterm“vectorstamp”asageneralterminthisbook;you’llalsocomeacrossvectorclocksandversionvectors—thesearespecificformsofvectorstampsthatdifferinhowtheysynchronize.Byusingthisschemeyoucantellifoneversionstampisnewerthananotherbecausethenewer
stampwillhaveallitscountersgreaterthanorequaltothoseintheolderstamp.So[blue:1,green:2,black:5]isnewerthan[blue:1,green:1,black5]sinceoneofitscountersisgreater.Ifbothstampshaveacountergreaterthantheother,e.g.[blue:1,green:2,black:5]and[blue:2,green:1,black:5],thenyouhaveawrite-writeconflict.Theremaybemissingvaluesinthevector,inwhichcaseweusetreatthemissingvalueas0.So
[blue:6,black:2]wouldbetreatedas[blue:6,green:0,black:2].Thisallowsyoutoeasilyaddnewnodeswithoutinvalidatingtheexistingvectorstamps.Vectorstampsareavaluabletoolthatspotsinconsistencies,butdoesn’tresolvethem.Anyconflict
resolutionwilldependonthedomainyouareworkingin.Thisispartoftheconsistency/latencytradeoff.Youeitherhavetolivewiththefactthatnetworkpartitionsmaymakeyoursystemunavailable,oryouhavetodetectanddealwithinconsistencies.
6.3.KeyPoints•Versionstampshelpyoudetectconcurrencyconflicts.Whenyoureaddata,thenupdateit,youcanchecktheversionstamptoensurenobodyupdatedthedatabetweenyourreadandwrite.
•Versionstampscanbeimplementedusingcounters,GUIDs,contenthashes,timestamps,oracombinationofthese.
•Withdistributedsystems,avectorofversionstampsallowsyoutodetectwhendifferentnodeshaveconflictingupdates.
Chapter7.Map-Reduce
Theriseofaggregate-orienteddatabasesisinlargepartduetothegrowthofclusters.Runningonaclustermeansyouhavetomakeyourtradeoffsindatastoragedifferentlythanwhenrunningonasinglemachine.Clustersdon’tjustchangetherulesfordatastorage—theyalsochangetherulesforcomputation.Ifyoustorelotsofdataonacluster,processingthatdataefficientlymeansyouhavetothinkdifferentlyabouthowyouorganizeyourprocessing.Withacentralizeddatabase,therearegenerallytwowaysyoucanruntheprocessinglogicagainst
it:eitheronthedatabaseserveritselforonaclientmachine.Runningitonaclientmachinegivesyoumoreflexibilityinchoosingaprogrammingenvironment,whichusuallymakesforprogramsthatareeasiertocreateorextend.Thiscomesatthecostofhavingtoshleplotsofdatafromthedatabaseserver.Ifyouneedtohitalotofdata,thenitmakessensetodotheprocessingontheserver,payingthepriceinprogrammingconvenienceandincreasingtheloadonthedatabaseserver.Whenyouhaveacluster,thereisgoodnewsimmediately—youhavelotsofmachinestospreadthe
computationover.However,youalsostillneedtotrytoreducetheamountofdatathatneedstobetransferredacrossthenetworkbydoingasmuchprocessingasyoucanonthesamenodeasthedataitneeds.Themap-reducepattern(aformofScatter-Gather[HohpeandWoolf])isawaytoorganize
processinginsuchawayastotakeadvantageofmultiplemachinesonaclusterwhilekeepingasmuchprocessingandthedataitneedstogetheronthesamemachine.ItfirstgainedprominencewithGoogle’sMapReduceframework[DeanandGhemawat].Awidelyusedopen-sourceimplementationispartoftheHadoopproject,althoughseveraldatabasesincludetheirownimplementations.Aswithmostpatterns,therearedifferencesindetailbetweentheseimplementations,sowe’llconcentrateonthegeneralconcept.Thename“map-reduce”revealsitsinspirationfromthemapandreduceoperationsoncollectionsinfunctionalprogramminglanguages.
7.1.BasicMap-ReduceToexplainthebasicidea,we’llstartfromanexamplewe’vealreadyfloggedtodeath—thatofcustomersandorders.Let’sassumewehavechosenordersasouraggregate,witheachorderhavinglineitems.EachlineitemhasaproductID,quantity,andthepricecharged.Thisaggregatemakesalotofsenseasusuallypeoplewanttoseethewholeorderinoneaccess.Wehavelotsoforders,sowe’veshardedthedatasetovermanymachines.However,salesanalysispeoplewanttoseeaproductanditstotalrevenueforthelastsevendays.
Thisreportdoesn’tfittheaggregatestructurethatwehave—whichisthedownsideofusingaggregates.Inordertogettheproductrevenuereport,you’llhavetovisiteverymachineintheclusterandexaminemanyrecordsoneachmachine.Thisisexactlythekindofsituationthatcallsformap-reduce.Thefirststageinamap-reducejobis
themap.Amapisafunctionwhoseinputisasingleaggregateandwhoseoutputisabunchofkey-valuepairs.Inthiscase,theinputwouldbeanorder.Theoutputwouldbekey-valuepairscorrespondingtothelineitems.EachonewouldhavetheproductIDasthekeyandanembeddedmapwiththequantityandpriceasthevalues(seeFigure7.1).
Figure7.1.Amapfunctionreadsrecordsfromthedatabaseandemitskey-valuepairs.Eachapplicationofthemapfunctionisindependentofalltheothers.Thisallowsthemtobesafely
parallelizable,sothatamap-reduceframeworkcancreateefficientmaptasksoneachnodeandfreelyallocateeachordertoamaptask.Thisyieldsagreatdealofparallelismandlocalityofdataaccess.Forthisexample,wearejustselectingavalueoutoftherecord,butthere’snoreasonwhywecan’tcarryoutsomearbitrarilycomplexfunctionaspartofthemap—providingitonlydependsononeaggregate’sworthofdata.Amapoperationonlyoperatesonasinglerecord;thereducefunctiontakesmultiplemapoutputs
withthesamekeyandcombinestheirvalues.So,amapfunctionmightyield1000lineitemsfromordersfor“DatabaseRefactoring”;thereducefunctionwouldreducedowntoone,withthetotalsforthequantityandrevenue.Whilethemapfunctionislimitedtoworkingonlyondatafromasingleaggregate,thereducefunctioncanuseallvaluesemittedforasinglekey(seeFigure7.2).
Figure7.2.Areducefunctiontakesseveralkey-valuepairswiththesamekeyandaggregatesthemintoone.
Themap-reduceframeworkarrangesformaptaskstoberunonthecorrectnodestoprocessallthedocumentsandfordatatobemovedtothereducefunction.Tomakeiteasiertowritethereducefunction,theframeworkcollectsallthevaluesforasinglepairandcallsthereducefunctiononcewiththekeyandthecollectionofallthevaluesforthatkey.Sotorunamap-reducejob,youjustneedtowritethesetwofunctions.
7.2.PartitioningandCombiningInthesimplestform,wethinkofamap-reducejobashavingasinglereducefunction.Theoutputsfromallthemaptasksrunningonthevariousnodesareconcatenatedtogetherandsentintothereduce.Whilethiswillwork,therearethingswecandotoincreasetheparallelismandtoreducethedatatransfer(seeFigure7.3).
Figure7.3.Partitioningallowsreducefunctionstoruninparallelondifferentkeys.Thefirstthingwecandoisincreaseparallelismbypartitioningtheoutputofthemappers.Each
reducefunctionoperatesontheresultsofasinglekey.Thisisalimitation—itmeansyoucan’tdoanythinginthereducethatoperatesacrosskeys—butit’salsoabenefitinthatitallowsyoutorunmultiplereducersinparallel.Totakeadvantageofthis,theresultsofthemapperaredividedupbasedthekeyoneachprocessingnode.Typically,multiplekeysaregroupedtogetherintopartitions.Theframeworkthentakesthedatafromallthenodesforonepartition,combinesitintoasinglegroupforthatpartition,andsendsitofftoareducer.Multiplereducerscanthenoperateonthepartitionsinparallel,withthefinalresultsmergedtogether.(Thisstepisalsocalled“shuffling,”andthepartitionsaresometimesreferredtoas“buckets”or“regions.”)Thenextproblemwecandealwithistheamountofdatabeingmovedfromnodetonodebetween
themapandreducestages.Muchofthisdataisrepetitive,consistingofmultiplekey-valuepairsforthesamekey.Acombinerfunctioncutsthisdatadownbycombiningallthedataforthesamekeyintoasinglevalue(seeFigure7.4).Acombinerfunctionis,inessence,areducerfunction—indeed,inmanycasesthesamefunctioncanbeusedforcombiningasthefinalreduction.Thereducefunctionneedsaspecialshapeforthistowork:Itsoutputmustmatchitsinput.Wecallsuchafunctionacombinablereducer.
Figure7.4.Combiningreducesdatabeforesendingitacrossthenetwork.Notallreducefunctionsarecombinable.Considerafunctionthatcountsthenumberofunique
customersforaparticularproduct.Themapfunctionforsuchanoperationwouldneedtoemittheproductandthecustomer.Thereducercanthencombinethemandcounthowmanytimeseachcustomerappearsforaparticularproduct,emittingtheproductandthecount(seeFigure7.5).Butthisreducer ’soutputisdifferentfromitsinput,soitcan’tbeusedasacombiner.Youcanstillrunacombiningfunctionhere:onethatjusteliminatesduplicateproduct-customerpairs,butitwillbedifferentfromthefinalreducer.
Figure7.5.Thisreducefunction,whichcountshowmanyuniquecustomersorderaparticulartea,isnotcombinable.
Whenyouhavecombiningreducers,themap-reduceframeworkcansafelyrunnotonlyinparallel(toreducedifferentpartitions),butalsoinseriestoreducethesamepartitionatdifferenttimesandplaces.Inadditiontoallowingcombiningtooccuronanodebeforedatatransmission,youcanalsostartcombiningbeforemappershavefinished.Thisprovidesagoodbitofextraflexibilitytothemap-reduceprocessing.Somemap-reduceframeworksrequireallreducerstobecombiningreducers,whichmaximizesthisflexibility.Ifyouneedtodoanoncombiningreducerwithoneoftheseframeworks,you’llneedtoseparatetheprocessingintopipelinedmap-reducesteps.
7.3.ComposingMap-ReduceCalculationsThemap-reduceapproachisawayofthinkingaboutconcurrentprocessingthattradesoffflexibilityinhowyoustructureyourcomputationforarelativelystraightforwardmodelforparallelizingthecomputationoveracluster.Sinceit’satradeoff,thereareconstraintsonwhatyoucandoinyour
calculations.Withinamaptask,youcanonlyoperateonasingleaggregate.Withinareducetask,youcanonlyoperateonasinglekey.Thismeansyouhavetothinkdifferentlyaboutstructuringyourprogramssotheyworkwellwithintheseconstraints.Onesimplelimitationisthatyouhavetostructureyourcalculationsaroundoperationsthatfitin
wellwiththenotionofareduceoperation.Agoodexampleofthisiscalculatingaverages.Let’sconsiderthekindoforderswe’vebeenlookingatsofar;supposewewanttoknowtheaverageorderedquantityofeachproduct.Animportantpropertyofaveragesisthattheyarenotcomposable—thatis,ifItaketwogroupsoforders,Ican’tcombinetheiraveragesalone.Instead,Ineedtotaketotalamountandthecountofordersfromeachgroup,combinethose,andthencalculatetheaveragefromthecombinedsumandcount(seeFigure7.6).
Figure7.6.Whencalculatingaverages,thesumandcountcanbecombinedinthereducecalculation,buttheaveragemustbecalculatedfromthecombinedsumandcount.
Thisnotionoflookingforcalculationsthatreduceneatlyalsoaffectshowwedocounts.Tomakeacount,themappingfunctionwillemitcountfieldswithavalueof1,whichcanbesummedtogetatotalcount(seeFigure7.7).
Figure7.7.Whenmakingacount,eachmapemits1,whichcanbesummedtogetatotal.
7.3.1.ATwoStageMap-ReduceExampleAsmap-reducecalculationsgetmorecomplex,it’susefultobreakthemdownintostagesusinga
pipes-and-filtersapproach,withtheoutputofonestageservingasinputtothenext,ratherlikethepipelinesinUNIX.Consideranexamplewherewewanttocomparethesalesofproductsforeachmonthin2011tothe
prioryear.Todothis,we’llbreakthecalculationsdownintotwostages.Thefirststagewillproducerecordsshowingtheaggregatefiguresforasingleproductinasinglemonthoftheyear.Thesecondstagethenusestheseasinputsandproducestheresultforasingleproductbycomparingonemonth’sresultswiththesamemonthintheprioryear(seeFigure7.8).
Figure7.8.Acalculationbrokendownintotwomap-reducesteps,whichwillbeexpandedinthenextthreefigures
Afirststage(Figure7.9)wouldreadtheoriginalorderrecordsandoutputaseriesofkey-valuepairsforthesalesofeachproductpermonth.
Figure7.9.CreatingrecordsformonthlysalesofaproductThisstageissimilartothemap-reduceexampleswe’veseensofar.Theonlynewfeatureisusinga
compositekeysothatwecanreducerecordsbasedonthevaluesofmultiplefields.Thesecond-stagemappers(Figure7.10)processthisoutputdependingontheyear.A2011record
populatesthecurrentyearquantitywhilea2010recordpopulatesaprioryearquantity.Recordsforearlieryears(suchas2009)don’tresultinanymappingoutputbeingemitted.
Figure7.10.Thesecondstagemappercreatesbaserecordsforyear-on-yearcomparisons.Thereduceinthiscase(Figure7.11)isamergeofrecords,wherecombiningthevaluesby
summingallowstwodifferentyearoutputstobereducedtoasinglevalue(withacalculationbasedonthereducedvaluesthrowninforgoodmeasure).
Figure7.11.Thereductionstepisamergeofincompleterecords.Decomposingthisreportintomultiplemap-reducestepsmakesiteasiertowrite.Likemany
transformationexamples,onceyou’vefoundatransformationframeworkthatmakesiteasytocomposesteps,it’susuallyeasiertocomposemanysmallstepstogetherthantrytocramheapsoflogicintoasinglestep.Anotheradvantageisthattheintermediateoutputmaybeusefulfordifferentoutputstoo,soyou
cangetsomereuse.Thisreuseisimportantasitsavestimebothinprogrammingandinexecution.Theintermediaterecordscanbesavedinthedatastore,formingamaterializedview(“MaterializedViews,”p.30).Earlystagesofmap-reduceoperationsareparticularlyvaluabletosavesincetheyoftenrepresenttheheaviestamountofdataaccess,sobuildingthemonceasabasisformanydownstreamusessavesalotofwork.Aswithanyreuseactivity,however,it’simportanttobuildthemoutofexperiencewithrealqueries,asspeculativereuserarelyfulfillsitspromise.Soit’simportanttolookattheformsofvariousqueriesastheyarebuiltandfactoroutthecommonpartsofthecalculationsintomaterializedviews.Map-reduceisapatternthatcanbeimplementedinanyprogramminglanguage.However,the
constraintsofthestylemakeitagoodfitforlanguagesspecificallydesignedformap-reducecomputations.ApachePig[Pig],anoffshootoftheHadoop[Hadoop]project,isalanguagespecificallybuilttomakeiteasytowritemap-reduceprograms.ItcertainlymakesitmucheasiertoworkwithHadoopthantheunderlyingJavalibraries.Inasimilarvein,ifyouwanttospecifymap-reduceprogramsusinganSQL-likesyntax,thereishive[Hive],anotherHadoopoffshoot.Themap-reducepatternisimportanttoknowaboutevenoutsideofthecontextofNoSQL
databases.Google’soriginalmap-reducesystemoperatedonfilesstoredonadistributedfilesystem—anapproachthat’susedbytheopen-sourceHadoopproject.Whileittakessomethoughttogetusedtotheconstraintsofstructuringcomputationsinmap-reducesteps,theresultisacalculationthatisinherentlywell-suitedtorunningonacluster.Whendealingwithhighvolumesofdata,youneedtotakeacluster-orientedapproach.Aggregate-orienteddatabasesfitwellwiththisstyleofcalculation.Wethinkthatinthenextfewyearsmanymoreorganizationswillbeprocessingthevolumesofdata
thatdemandacluster-orientedsolution—andthemap-reducepatternwillseemoreandmoreuse.
7.3.2.IncrementalMap-ReduceTheexampleswe’vediscussedsofararecompletemap-reducecomputations,wherewestartwithrawinputsandcreateafinaloutput.Manymap-reducecomputationstakeawhiletoperform,evenwithclusteredhardware,andnewdatakeepscominginwhichmeansweneedtorerunthecomputationtokeeptheoutputuptodate.Startingfromscratcheachtimecantaketoolong,sooftenit’susefultostructureamap-reducecomputationtoallowincrementalupdates,sothatonlytheminimumcomputationneedstobedone.Themapstagesofamap-reduceareeasytohandleincrementally—onlyiftheinputdatachanges
doesthemapperneedtobererun.Sincemapsareisolatedfromeachother,incrementalupdatesarestraightforward.Themorecomplexcaseisthereducestep,sinceitpullstogethertheoutputsfrommanymapsand
anychangeinthemapoutputscouldtriggeranewreduction.Thisrecomputationcanbelesseneddependingonhowparallelthereducestepis.Ifwearepartitioningthedataforreduction,thenanypartitionthat’sunchangeddoesnotneedtobere-reduced.Similarly,ifthere’sacombinerstep,itdoesn’tneedtobererunifitssourcedatahasn’tchanged.Ifourreduceriscombinable,there’ssomemoreopportunitiesforcomputationavoidance.Ifthe
changesareadditive—thatis,ifweareonlyaddingnewrecordsbutarenotchangingordeletinganyoldrecords—thenwecanjustrunthereducewiththeexistingresultandthenewadditions.Iftherearedestructivechanges,thatisupdatesanddeletes,thenwecanavoidsomerecomputationbybreakingupthereduceoperationintostepsandonlyrecalculatingthosestepswhoseinputshavechanged—essentially,usingaDependencyNetwork[FowlerDSL]toorganizethecomputation.Themap-reduceframeworkcontrolsmuchofthis,soyouhavetounderstandhowaspecific
frameworksupportsincrementaloperation.
7.4.FurtherReadingIfyou’regoingtousemap-reducecalculations,yourfirstportofcallwillbethedocumentationfortheparticulardatabaseyouareusing.Eachdatabasehasitsownapproach,vocabulary,andquirks,andthat’swhatyou’llneedtobefamiliarwith.Beyondthat,thereisaneedtocapturemoregeneralinformationonhowtostructuremap-reducejobstomaximizemaintainabilityandperformance.Wedon’thaveanyspecificbookstopointtoyet,butwesuspectthatagoodthougheasilyoverlookedsourcearebooksonHadoop.AlthoughHadoopisnotadatabase,it’satoolthatusesmap-reduceheavily,sowritinganeffectivemap-reducetaskwithHadoopislikelytobeusefulinothercontexts(subjecttothechangesindetailbetweenHadoopandwhateversystemsyou’reusing).
7.5.KeyPoints•Map-reduceisapatterntoallowcomputationstobeparallelizedoveracluster.•Themaptaskreadsdatafromanaggregateandboilsitdowntorelevantkey-valuepairs.Mapsonlyreadasinglerecordatatimeandcanthusbeparallelizedandrunonthenodethatstorestherecord.
•Reducetaskstakemanyvaluesforasinglekeyoutputfrommaptasksandsummarizethemintoasingleoutput.Eachreduceroperatesontheresultofasinglekey,soitcanbeparallelizedbykey.
•Reducersthathavethesameformforinputandoutputcanbecombinedintopipelines.This
improvesparallelismandreducestheamountofdatatobetransferred.•Map-reduceoperationscanbecomposedintopipelineswheretheoutputofonereduceistheinputtoanotheroperation’smap.
•Iftheresultofamap-reducecomputationiswidelyused,itcanbestoredasamaterializedview.•Materializedviewscanbeupdatedthroughincrementalmap-reduceoperationsthatonlycomputechangestotheviewinsteadofrecomputingeverythingfromscratch.
PartII:Implement
Chapter8.Key-ValueDatabases
Akey-valuestoreisasimplehashtable,primarilyusedwhenallaccesstothedatabaseisviaprimarykey.ThinkofatableinatraditionalRDBMSwithtwocolumns,suchasIDandNAME,theIDcolumnbeingthekeyandNAMEcolumnstoringthevalue.InanRDBMS,theNAMEcolumnisrestrictedtostoringdataoftypeString.TheapplicationcanprovideanIDandVALUEandpersistthepair;iftheIDalreadyexiststhecurrentvalueisoverwritten,otherwiseanewentryiscreated.Let’slookathowterminologycomparesinOracleandRiak.
8.1.WhatIsaKey-ValueStoreKey-valuestoresarethesimplestNoSQLdatastorestousefromanAPIperspective.Theclientcaneithergetthevalueforthekey,putavalueforakey,ordeleteakeyfromthedatastore.Thevalueisablobthatthedatastorejuststores,withoutcaringorknowingwhat’sinside;it’stheresponsibilityoftheapplicationtounderstandwhatwasstored.Sincekey-valuestoresalwaysuseprimary-keyaccess,theygenerallyhavegreatperformanceandcanbeeasilyscaled.Someofthepopularkey-valuedatabasesareRiak[Riak],Redis(oftenreferredtoasDataStructure
server)[Redis],MemcachedDBanditsflavors[Memcached],BerkeleyDB[BerkeleyDB],HamsterDB(especiallysuitedforembeddeduse)[HamsterDB],AmazonDynamoDB[Amazon’sDynamo](notopen-source),andProjectVoldemort[ProjectVoldemort](anopen-sourceimplementationofAmazonDynamoDB).Insomekey-valuestores,suchasRedis,theaggregatebeingstoreddoesnothavetobeadomain
object—itcouldbeanydatastructure.Redissupportsstoringlists,sets,hashesandcandorange,diff,union,andintersectionoperations.ThesefeaturesallowRedistobeusedinmoredifferentwaysthanastandardkey-valuestore.Therearemanymorekey-valuedatabasesandmanynewonesarebeingworkedonatthistime.
ForthesakeofkeepingdiscussionsinthisbookeasierwewillfocusmostlyonRiak.Riakletsusstorekeysintobuckets,whicharejustawaytosegmentthekeys—thinkofbucketsasflatnamespacesforthekeys.Ifwewantedtostoreusersessiondata,shoppingcartinformation,anduserpreferencesinRiak,we
couldjuststorealloftheminthesamebucketwithasinglekeyandsinglevalueforalloftheseobjects.Inthisscenario,wewouldhaveasingleobjectthatstoresallthedataandisputintoasinglebucket(Figure8.1).
Figure8.1.StoringallthedatainasinglebucketThedownsideofstoringallthedifferentobjects(aggregates)inthesinglebucketwouldbethat
onebucketwouldstoredifferenttypesofaggregates,increasingthechanceofkeyconflicts.Analternateapproachwouldbetoappendthenameoftheobjecttothekey,suchas288790b8a421_userProfile,sothatwecangettoindividualobjectsastheyareneeded(Figure8.2).
Figure8.2.Changethekeydesigntosegmentthedatainasinglebucket.Wecouldalsocreatebucketswhichstorespecificdata.InRiak,theyareknownasdomainbuckets
allowingtheserializationanddeserializationtobehandledbytheclientdriver.Clickheretoviewcodeimage
Bucketbucket=client.fetchBucket(bucketName).execute();DomainBucket<UserProfile>profileBucket=DomainBucket.builder(bucket,UserProfile.class).build();
Usingdomainbucketsordifferentbucketsfordifferentobjects(suchasUserProfileandShoppingCart)segmentsthedataacrossdifferentbucketsallowingyoutoreadonlytheobjectyouneedwithouthavingtochangekeydesign.Key-valuestoressuchasRedisalsosupportstoringrandomdatastructures,whichcanbesets,
hashes,strings,andsoon.Thisfeaturecanbeusedtostorelistsofthings,likestatesoraddressTypes,oranarrayofuser ’svisits.
8.2.Key-ValueStoreFeatures
WhileusinganyNoSQLdatastores,thereisaninevitableneedtounderstandhowthefeaturescomparetothestandardRDBMSdatastoresthatwearesousedto.Theprimaryreasonistounderstandwhatfeaturesaremissingandhowdoestheapplicationarchitectureneedtochangetobetterusethefeaturesofakey-valuedatastore.SomeofthefeatureswewilldiscussforalltheNoSQLdatastoresareconsistency,transactions,queryfeatures,structureofthedata,andscaling.
8.2.1.ConsistencyConsistencyisapplicableonlyforoperationsonasinglekey,sincetheseoperationsareeitheraget,put,ordeleteonasinglekey.Optimisticwritescanbeperformed,butareveryexpensivetoimplement,becauseachangeinvaluecannotbedeterminedbythedatastore.Indistributedkey-valuestoreimplementationslikeRiak,theeventuallyconsistent(p.50)modelof
consistencyisimplemented.Sincethevaluemayhavealreadybeenreplicatedtoothernodes,Riakhastwowaysofresolvingupdateconflicts:eitherthenewestwritewinsandolderwritesloose,orboth(all)valuesarereturnedallowingtheclienttoresolvetheconflict.InRiak,theseoptionscanbesetupduringthebucketcreation.Bucketsarejustawaytonamespace
keyssothatkeycollisionscanbereduced—forexample,allcustomerkeysmayresideinthecustomerbucket.Whencreatingabucket,defaultvaluesforconsistencycanbeprovided,forexamplethatawriteisconsideredgoodonlywhenthedataisconsistentacrossallthenodeswherethedataisstored.Clickheretoviewcodeimage
Bucketbucket=connection.createBucket(bucketName).withRetrier(attempts(3)).allowSiblings(siblingsAllowed).nVal(numberOfReplicasOfTheData).w(numberOfNodesToRespondToWrite).r(numberOfNodesToRespondToRead).execute();
Ifweneeddataineverynodetobeconsistent,wecanincreasethenumberOfNodesToRespondToWritesetbywtobethesameasnVal.Ofcoursedoingthatwilldecreasethewriteperformanceofthecluster.Toimproveonwriteorreadconflicts,wecanchangetheallowSiblingsflagduringbucketcreation:Ifitissettofalse,weletthelastwritetowinandnotcreatesiblings.
8.2.2.TransactionsDifferentproductsofthekey-valuestorekindhavedifferentspecificationsoftransactions.Generallyspeaking,therearenoguaranteesonthewrites.Manydatastoresdoimplementtransactionsindifferentways.Riakusestheconceptofquorum(“Quorums,”p.57)implementedbyusingtheWvalue—replicationfactor—duringthewriteAPIcall.AssumewehaveaRiakclusterwithareplicationfactorof5andwesupplytheWvalueof3.When
writing,thewriteisreportedassuccessfulonlywhenitiswrittenandreportedasasuccessonatleastthreeofthenodes.ThisallowsRiaktohavewritetolerance;inourexample,withNequalto5andwithaWvalueof3,theclustercantolerateN-W=2nodesbeingdownforwriteoperations,thoughwewouldstillhavelostsomedataonthosenodesforread.
8.2.3.QueryFeaturesAllkey-valuestorescanquerybythekey—andthat’saboutit.Ifyouhaverequirementstoqueryby
usingsomeattributeofthevaluecolumn,it’snotpossibletousethedatabase:Yourapplicationneedstoreadthevaluetofigureoutiftheattributemeetstheconditions.Querybykeyalsohasaninterestingsideeffect.Whatifwedon’tknowthekey,especiallyduring
ad-hocqueryingduringdebugging?Mostofthedatastoreswillnotgiveyoualistofalltheprimarykeys;eveniftheydid,retrievinglistsofkeysandthenqueryingforthevaluewouldbeverycumbersome.Somekey-valuedatabasesgetaroundthisbyprovidingtheabilitytosearchinsidethevalue,suchasRiakSearchthatallowsyoutoquerythedatajustlikeyouwouldqueryitusingLuceneindexes.Whileusingkey-valuestores,lotsofthoughthastobegiventothedesignofthekey.Canthekey
begeneratedusingsomealgorithm?Canthekeybeprovidedbytheuser(userID,email,etc.)?Orderivedfromtimestampsorotherdatathatcanbederivedoutsideofthedatabase?Thesequerycharacteristicsmakekey-valuestoreslikelycandidatesforstoringsessiondata(with
thesessionIDasthekey),shoppingcartdata,userprofiles,andsoon.Theexpiry_secspropertycanbeusedtoexpirekeysafteracertaintimeinterval,especiallyforsession/shoppingcartobjects.Clickheretoviewcodeimage
Bucketbucket=getBucket(bucketName);IRiakObjectriakObject=bucket.store(key,value).execute();
WhenwritingtotheRiakbucketusingthestoreAPI,theobjectisstoredforthekeyprovided.Similarly,wecangetthevaluestoredforthekeyusingthefetchAPI.Clickheretoviewcodeimage
Bucketbucket=getBucket(bucketName);IRiakObjectriakObject=bucket.fetch(key).execute();byte[]bytes=riakObject.getValue();Stringvalue=newString(bytes);
RiakprovidesanHTTP-basedinterface,sothatalloperationscanbeperformedfromthewebbrowseroronthecommandlineusingcurl.Let’ssavethisdatatoRiak:Clickheretoviewcodeimage
{"lastVisit":1324669989288,"user":{"customerId":"91cfdf5bcb7c","name":"buyer","countryCode":"US","tzOffset":0}}
UsethecurlcommandtoPOSTthedata,storingthedatainthesessionbucketwiththekeyofa7e618d9db25(wehavetoprovidethiskey):Clickheretoviewcodeimage
curl-v-XPOST-d'{"lastVisit":1324669989288,"user":{"customerId":"91cfdf5bcb7c","name":"buyer","countryCode":"US","tzOffset":0}}'-H"Content-Type:application/json"http://localhost:8098/buckets/session/keys/a7e618d9db25
Thedataforthekeya7e618d9db25canbefetchedbyusingthecurlcommand:Clickheretoviewcodeimage
curl-ihttp://localhost:8098/buckets/session/keys/a7e618d9db25
8.2.4.StructureofDataKey-valuedatabasesdon’tcarewhatisstoredinthevaluepartofthekey-valuepair.Thevaluecanbeablob,text,JSON,XML,andsoon.InRiak,wecanusetheContent-TypeinthePOSTrequesttospecifythedatatype.
8.2.5.ScalingManykey-valuestoresscalebyusingsharding(“Sharding,”p.38).Withsharding,thevalueofthekeydeterminesonwhichnodethekeyisstored.Let’sassumeweareshardingbythefirstcharacterofthekey;ifthekeyisf4b19d79587d,whichstartswithanf,itwillbesenttodifferentnodethanthekeyad9c7a396542.Thiskindofshardingsetupcanincreaseperformanceasmorenodesareaddedtothecluster.Shardingalsointroducessomeproblems.Ifthenodeusedtostorefgoesdown,thedatastoredon
thatnodebecomesunavailable,norcannewdatabewrittenwithkeysthatstartwithf.DatastoressuchasRiakallowyoutocontroltheaspectsoftheCAPTheorem(“TheCAP
Theorem,”p.53):N(numberofnodestostorethekey-valuereplicas),R(numberofnodesthathavetohavethedatabeingfetchedbeforethereadisconsideredsuccessful),andW(thenumberofnodesthewritehastobewrittentobeforeitisconsideredsuccessful).Let’sassumewehavea5-nodeRiakcluster.SettingNto3meansthatalldataisreplicatedtoatleast
threenodes,settingRto2meansanytwonodesmustreplytoaGETrequestforittobeconsideredsuccessful,andsettingWto2ensuresthatthePUTrequestiswrittentotwonodesbeforethewriteisconsideredsuccessful.Thesesettingsallowustofine-tunenodefailuresforreadorwriteoperations.Basedonourneed,
wecanchangethesevaluesforbetterreadavailabilityorwriteavailability.GenerallyspeakingchooseaWvaluetomatchyourconsistencyneeds;thesevaluescanbesetasdefaultsduringbucketcreation.
8.3.SuitableUseCasesLet’sdiscusssomeoftheproblemswherekey-valuestoresareagoodfit.
8.3.1.StoringSessionInformationGenerally,everywebsessionisuniqueandisassignedauniquesessionidvalue.ApplicationsthatstorethesessionidondiskorinanRDBMSwillgreatlybenefitfrommovingtoakey-valuestore,sinceeverythingaboutthesessioncanbestoredbyasinglePUTrequestorretrievedusingGET.Thissingle-requestoperationmakesitveryfast,aseverythingaboutthesessionisstoredinasingleobject.SolutionssuchasMemcachedareusedbymanywebapplications,andRiakcanbeusedwhenavailabilityisimportant.
8.3.2.UserProfiles,PreferencesAlmosteveryuserhasauniqueuserId,username,orsomeotherattribute,aswellaspreferencessuchaslanguage,color,timezone,whichproductstheuserhasaccessto,andsoon.Thiscanallbeputintoanobject,sogettingpreferencesofausertakesasingleGEToperation.Similarly,productprofilescanbestored.
8.3.3.ShoppingCartDataE-commercewebsiteshaveshoppingcartstiedtotheuser.Aswewanttheshoppingcartstobeavailableallthetime,acrossbrowsers,machines,andsessions,alltheshoppinginformationcanbeputintothevaluewherethekeyistheuserid.ARiakclusterwouldbebestsuitedforthesekindsofapplications.
8.4.WhenNottoUseThereareproblemspaceswherekey-valuestoresarenotthebestsolution.
8.4.1.RelationshipsamongDataIfyouneedtohaverelationshipsbetweendifferentsetsofdata,orcorrelatethedatabetweendifferentsetsofkeys,key-valuestoresarenotthebestsolutiontouse,eventhoughsomekey-valuestoresprovidelink-walkingfeatures.
8.4.2.MultioperationTransactionsIfyou’resavingmultiplekeysandthereisafailuretosaveanyoneofthem,andyouwanttorevertorrollbacktherestoftheoperations,key-valuestoresarenotthebestsolutiontobeused.
8.4.3.QuerybyDataIfyouneedtosearchthekeysbasedonsomethingfoundinthevaluepartofthekey-valuepairs,thenkey-valuestoresarenotgoingtoperformwellforyou.Thereisnowaytoinspectthevalueonthedatabaseside,withtheexceptionofsomeproductslikeRiakSearchorindexingengineslikeLucene[Lucene]orSolr[Solr].
8.4.4.OperationsbySetsSinceoperationsarelimitedtoonekeyatatime,thereisnowaytooperateuponmultiplekeysatthesametime.Ifyouneedtooperateuponmultiplekeys,youhavetohandlethisfromtheclientside.
Chapter9.DocumentDatabases
Documentsarethemainconceptindocumentdatabases.Thedatabasestoresandretrievesdocuments,whichcanbeXML,JSON,BSON,andsoon.Thesedocumentsareself-describing,hierarchicaltreedatastructureswhichcanconsistofmaps,collections,andscalarvalues.Thedocumentsstoredaresimilartoeachotherbutdonothavetobeexactlythesame.Documentdatabasesstoredocumentsinthevaluepartofthekey-valuestore;thinkaboutdocumentdatabasesaskey-valuestoreswherethevalueisexaminable.Let’slookathowterminologycomparesinOracleandMongoDB.
The_idisaspecialfieldthatisfoundonalldocumentsinMongo,justlikeROWIDinOracle.InMongoDB,_idcanbeassignedbytheuser,aslongasitisunique.
9.1.WhatIsaDocumentDatabase?Clickheretoviewcodeimage
{"firstname":"Martin","likes":["Biking","Photography"],"lastcity":"Boston","lastVisited":}
TheabovedocumentcanbeconsideredarowinatraditionalRDBMS.Let’slookatanotherdocument:Clickheretoviewcodeimage
{"firstname":"Pramod","citiesvisited":["Chicago","London","Pune","Bangalore"],"addresses":[{"state":"AK","city":"DILLINGHAM","type":"R"},{"state":"MH","city":"PUNE","type":"R"}],"lastcity":"Chicago"}
Lookingatthedocuments,wecanseethattheyaresimilar,buthavedifferencesinattributenames.Thisisallowedindocumentdatabases.Theschemaofthedatacandifferacrossdocuments,butthesedocumentscanstillbelongtothesamecollection—unlikeanRDBMSwhereeveryrowinatablehastofollowthesameschema.Werepresentalistofcitiesvisitedasanarray,oralistofaddressesaslistofdocumentsembeddedinsidethemaindocument.Embeddingchilddocumentsassubobjectsinsidedocumentsprovidesforeasyaccessandbetterperformance.Ifyoulookatthedocuments,youwillseethatsomeoftheattributesaresimilar,suchasfirstname
orcity.Atthesametime,thereareattributesintheseconddocumentwhichdonotexistinthefirstdocument,suchasaddresses,whilelikesisinthefirstdocumentbutnotthesecond.ThisdifferentrepresentationofdataisnotthesameasinRDBMSwhereeverycolumnhastobe
defined,andifitdoesnothavedataitismarkedasemptyorsettonull.Indocuments,therearenoemptyattributes;ifagivenattributeisnotfound,weassumethatitwasnotsetornotrelevanttothedocument.Documentsallowfornewattributestobecreatedwithouttheneedtodefinethemortochangetheexistingdocuments.SomeofthepopulardocumentdatabaseswehaveseenareMongoDB[MongoDB],CouchDB
[CouchDB],Terrastore[Terrastore],OrientDB[OrientDB],RavenDB[RavenDB],andofcoursethewell-knownandoftenreviledLotusNotes[NotesStorageFacility]thatusesdocumentstorage.
9.2.FeaturesWhiletherearemanyspecializeddocumentdatabases,wewilluseMongoDBasarepresentativeofthefeatureset.Keepinmindthateachproducthassomefeaturesthatmaynotbefoundinotherdocumentdatabases.Let’stakesometimetounderstandhowMongoDBworks.EachMongoDBinstancehasmultiple
databases,andeachdatabasecanhavemultiplecollections.WhenwecomparethiswithRDBMS,anRDBMSinstanceisthesameasMongoDBinstance,theschemasinRDBMSaresimilartoMongoDBdatabases,andtheRDBMStablesarecollectionsinMongoDB.Whenwestoreadocument,wehavetochoosewhichdatabaseandcollectionthisdocumentbelongsin—forexample,database.collection.insert(document),whichisusuallyrepresentedasdb.coll.insert(document).
9.2.1.ConsistencyConsistencyinMongoDBdatabaseisconfiguredbyusingthereplicasetsandchoosingtowaitforthewritestobereplicatedtoalltheslavesoragivennumberofslaves.Everywritecanspecifythenumberofserversthewritehastobepropagatedtobeforeitreturnsassuccessful.Acommandlikedb.runCommand({getlasterror:1,w:"majority"})tellsthedatabase
howstrongistheconsistencyyouwant.Forexample,ifyouhaveoneserverandspecifythewasmajority,thewritewillreturnimmediatelysincethereisonlyonenode.Ifyouhavethreenodesinthereplicasetandspecifywasmajority,thewritewillhavetocompleteataminimumoftwonodesbeforeitisreportedasasuccess.Youcanincreasethewvalueforstrongerconsistencybutyouwillsufferonwriteperformance,sincenowthewriteshavetocompleteatmorenodes.ReplicasetsalsoallowyoutoincreasethereadperformancebyallowingreadingfromslavesbysettingslaveOk;thisparametercanbesetontheconnection,ordatabase,orcollection,orindividuallyforeachoperation.Clickheretoviewcodeimage
Mongomongo=newMongo("localhost:27017");mongo.slaveOk();
HerewearesettingslaveOkperoperation,sothatwecandecidewhichoperationscanworkwithdatafromtheslavenode.Clickheretoviewcodeimage
DBCollectioncollection=getOrderCollection();BasicDBObjectquery=newBasicDBObject();query.put("name","Martin");DBCursorcursor=collection.find(query).slaveOk();
Similartovariousoptionsavailableforread,youcanchangethesettingstoachievestrongwriteconsistency,ifdesired.Bydefault,awriteisreportedsuccessfuloncethedatabasereceivesit;youcanchangethissoastowaitforthewritestobesyncedtodiskortopropagatetotwoormoreslaves.ThisisknownasWriteConcern:YoumakesurethatcertainwritesarewrittentothemasterandsomeslavesbysettingWriteConcerntoREPLICAS_SAFE.ShownbelowiscodewherewearesettingtheWriteConcernforallwritestoacollection:Clickheretoviewcodeimage
DBCollectionshopping=database.getCollection("shopping");shopping.setWriteConcern(REPLICAS_SAFE);
WriteConcerncanalsobesetperoperationbyspecifyingitonthesavecommand:Clickheretoviewcodeimage
WriteResultresult=shopping.insert(order,REPLICAS_SAFE);
Thereisatradeoffthatyouneedtocarefullythinkabout,basedonyourapplicationneedsandbusinessrequirements,todecidewhatsettingsmakesenseforslaveOkduringreadorwhatsafetylevelyoudesireduringwritewithWriteConcern.
9.2.2.TransactionsTransactions,inthetraditionalRDBMSsense,meanthatyoucanstartmodifyingthedatabasewithinsert,update,ordeletecommandsoverdifferenttablesandthendecideifyouwanttokeepthechangesornotbyusingcommitorrollback.TheseconstructsaregenerallynotavailableinNoSQLsolutions—awriteeithersucceedsorfails.Transactionsatthesingle-documentlevelareknownasatomictransactions.Transactionsinvolvingmorethanoneoperationarenotpossible,althoughthereareproductssuchasRavenDBthatdosupporttransactionsacrossmultipleoperations.Bydefault,allwritesarereportedassuccessful.Afinercontroloverthewritecanbeachievedby
usingWriteConcernparameter.Weensurethatorderiswrittentomorethanonenodebeforeit’sreportedsuccessfulbyusingWriteConcern.REPLICAS_SAFE.DifferentlevelsofWriteConcernletyouchoosethesafetylevelduringwrites;forexample,whenwritinglogentries,youcanuselowestlevelofsafety,WriteConcern.NONE.Clickheretoviewcodeimage
finalMongomongo=newMongo(mongoURI);mongo.setWriteConcern(REPLICAS_SAFE);DBCollectionshopping=mongo.getDB(orderDatabase).getCollection(shoppingCollection);try{WriteResultresult=shopping.insert(order,REPLICAS_SAFE);//Writesmadeittoprimaryandatleastonesecondary}catch(MongoExceptionwriteException){//WritesdidnotmakeittominimumoftwonodesincludingprimarydealWithWriteFailure(order,writeException);}
9.2.3.AvailabilityTheCAPtheorem(“TheCAPTheorem,”p.53)dictatesthatwecanhaveonlytwoofConsistency,Availability,andPartitionTolerance.Documentdatabasestrytoimproveonavailabilitybyreplicatingdatausingthemaster-slavesetup.Thesamedataisavailableonmultiplenodesandtheclientscangettothedataevenwhentheprimarynodeisdown.Usually,theapplicationcodedoesnothavetodetermineiftheprimarynodeisavailableornot.MongoDBimplementsreplication,providinghighavailabilityusingreplicasets.Inareplicaset,therearetwoormorenodesparticipatinginanasynchronousmaster-slave
replication.Thereplica-setnodeselectthemaster,orprimary,amongthemselves.Assumingallthenodeshaveequalvotingrights,somenodescanbefavoredforbeingclosertotheotherservers,forhavingmoreRAM,andsoon;userscanaffectthisbyassigningapriority—anumberbetween0and1000—toanode.Allrequestsgotothemasternode,andthedataisreplicatedtotheslavenodes.Ifthemasternode
goesdown,theremainingnodesinthereplicasetvoteamongthemselvestoelectanewmaster;allfuturerequestsareroutedtothenewmaster,andtheslavenodesstartgettingdatafromthenewmaster.Whenthenodethatfailedcomesbackonline,itjoinsinasaslaveandcatchesupwiththerestofthenodesbypullingallthedataitneedstogetcurrent.Figure9.1isanexampleconfigurationofreplicasets.Wehavetwonodes,mongoAandmongoB,
runningtheMongoDBdatabaseintheprimarydata-center,andmongoCinthesecondarydatacenter.Ifwewantnodesintheprimarydatacentertobeelectedasprimarynodes,wecanassignthemahigherprioritythantheothernodes.Morenodescanbeaddedtothereplicasetswithouthavingtotakethemoffline.
Figure9.1.Replicasetconfigurationwithhigherpriorityassignedtonodesinthesamedatacenter
Theapplicationwritesorreadsfromtheprimary(master)node.Whenconnectionisestablished,theapplicationonlyneedstoconnecttoonenode(primaryornot,doesnotmatter)inthereplicaset,andtherestofthenodesarediscoveredautomatically.Whentheprimarynodegoesdown,thedrivertalkstothenewprimaryelectedbythereplicaset.Theapplicationdoesnothavetomanageanyofthecommunicationfailuresornodeselectioncriteria.Usingreplicasetsgivesyoutheabilitytohaveahighlyavailabledocumentdatastore.
Replicasetsaregenerallyusedfordataredundancy,automatedfailover,readscaling,servermaintenancewithoutdowntime,anddisasterrecovery.SimilaravailabilitysetupscanbeachievedwithCouchDB,RavenDB,Terrastore,andotherproducts.
9.2.4.QueryFeaturesDocumentdatabasesprovidedifferentqueryfeatures.CouchDBallowsyoutoqueryviaviews—complexqueriesondocumentswhichcanbeeithermaterialized(“MaterializedViews,”p.30)ordynamic(thinkofthemasRDBMSviewswhichareeithermaterializedornot).WithCouchDB,ifyouneedtoaggregatethenumberofreviewsforaproductaswellastheaveragerating,youcouldaddaviewimplementedviamap-reduce(“BasicMap-Reduce,”p.68)toreturnthecountofreviewsandtheaverageoftheirratings.Whentherearemanyrequests,youdon’twanttocomputethecountandaverageforeveryrequest;
insteadyoucanaddamaterializedviewthatprecomputesthevaluesandstorestheresultsinthedatabase.Thesematerializedviewsareupdatedwhenqueried,ifanydatawaschangedsincethelastupdate.Oneofthegoodfeaturesofdocumentdatabases,ascomparedtokey-valuestores,isthatwecan
querythedatainsidethedocumentwithouthavingtoretrievethewholedocumentbyitskeyandthenintrospectthedocument.ThisfeaturebringsthesedatabasesclosertotheRDBMSquerymodel.MongoDBhasaquerylanguagewhichisexpressedviaJSONandhasconstructssuchas$query
forthewhereclause,$orderbyforsortingthedata,or$explaintoshowtheexecutionplanofthequery.TherearemanymoreconstructslikethesethatcanbecombinedtocreateaMongoDBquery.Let’slookatcertainqueriesthatwecandoagainstMongoDB.Supposewewanttoreturnallthe
documentsinanordercollection(allrowsintheordertable).TheSQLforthiswouldbe:SELECT*FROMorder
TheequivalentqueryinMongoshellwouldbe:db.order.find()
SelectingtheordersforasinglecustomerIdof883c2c5b4e5bwouldbe:Clickheretoviewcodeimage
SELECT*FROMorderWHEREcustomerId="883c2c5b4e5b"
TheequivalentqueryinMongotogetallordersforasinglecustomerIdof883c2c5b4e5b:Clickheretoviewcodeimage
db.order.find({"customerId":"883c2c5b4e5b"})
Similarly,selectingorderIdandorderDateforonecustomerinSQLwouldbe:Clickheretoviewcodeimage
SELECTorderId,orderDateFROMorderWHEREcustomerId="883c2c5b4e5b"
andtheequivalentinMongowouldbe:Clickheretoviewcodeimage
db.order.find({customerId:"883c2c5b4e5b"},{orderId:1,orderDate:1})
Similarly,queriestocount,sum,andsoonareallavailable.Sincethedocumentsareaggregatedobjects,itisreallyeasytoqueryfordocumentsthathavetobematchedusingthefieldswithchild
objects.Let’ssaywewanttoqueryforalltheorderswhereoneoftheitemsorderedhasanamelikeRefactoring.TheSQLforthisrequirementwouldbe:Clickheretoviewcodeimage
SELECT*FROMcustomerOrder,orderItem,productWHEREcustomerOrder.orderId=orderItem.customerOrderIdANDorderItem.productId=product.productIdANDproduct.nameLIKE'%Refactoring%'
andtheequivalentMongoquerywouldbe:Clickheretoviewcodeimage
db.orders.find({"items.product.name":/Refactoring/})
ThequeryforMongoDBissimplerbecausetheobjectsareembeddedinsideasingledocumentandyoucanquerybasedontheembeddedchilddocuments.
9.2.5.ScalingTheideaofscalingistoaddnodesorchangedatastoragewithoutsimplymigratingthedatabasetoabiggerbox.Wearenottalkingaboutmakingapplicationchangestohandlemoreload;instead,weareinterestedinwhatfeaturesareinthedatabasesothatitcanhandlemoreload.Scalingforheavy-readloadscanbeachievedbyaddingmorereadslaves,sothatallthereadscan
bedirectedtotheslaves.Givenaheavy-readapplication,withour3-nodereplica-setcluster,wecanaddmorereadcapacitytotheclusterasthereadloadincreasesjustbyaddingmoreslavenodestothereplicasettoexecutereadswiththeslaveOkflag(Figure9.2).Thisishorizontalscalingforreads.
Figure9.2.Addinganewnode,mongoD,toanexistingreplica-setclusterOncethenewnode,mongoD,isstarted,itneedstobeaddedtothereplicaset.
rs.add("mongod:27017");
Whenanewnodeisadded,itwillsyncupwiththeexistingnodes,jointhereplicasetassecondarynode,andstartservingreadrequests.Anadvantageofthissetupisthatwedonothavetorestartanyothernodes,andthereisnodowntimefortheapplicationeither.Whenwewanttoscaleforwrite,wecanstartsharding(“Sharding,”p.38)thedata.Shardingis
similartopartitionsinRDBMSwherewesplitdatabyvalueinacertaincolumn,suchasstateoryear.
WithRDBMS,partitionsareusuallyonthesamenode,sotheclientapplicationdoesnothavetoqueryaspecificpartitionbutcankeepqueryingthebasetable;theRDBMStakescareoffindingtherightpartitionforthequeryandreturnsthedata.Insharding,thedataisalsosplitbycertainfield,butthenmovedtodifferentMongonodes.The
dataisdynamicallymovedbetweennodestoensurethatshardsarealwaysbalanced.Wecanaddmorenodestotheclusterandincreasethenumberofwritablenodes,enablinghorizontalscalingforwrites.Clickheretoviewcodeimage
db.runCommand({shardcollection:"ecommerce.customer",key:{firstname:1}})
Splittingthedataonthefirstnameofthecustomerensuresthatthedataisbalancedacrosstheshardsforoptimalwriteperformance;furthermore,eachshardcanbeareplicasetensuringbetterreadperformancewithintheshard(Figure9.3).Whenweaddanewshardtothisexistingshardedcluster,thedatawillnowbebalancedacrossfourshardsinsteadofthree.Asallthisdatamovementandinfrastructurerefactoringishappening,theapplicationwillnotexperienceanydowntime,althoughtheclustermaynotperformoptimallywhenlargeamountsofdataarebeingmovedtorebalancetheshards.
Figure9.3.MongoDBshardedsetupwhereeachshardisareplicasetTheshardkeyplaysanimportantrole.YoumaywanttoplaceyourMongoDBdatabaseshards
closertotheirusers,soshardingbasedonuserlocationmaybeagoodidea.Whenshardingbycustomerlocation,alluserdatafortheEastCoastoftheUSAisintheshardsthatareservedfromtheEastCoast,andalluserdatafortheWestCoastisintheshardsthatareontheWestCoast.
9.3.SuitableUseCases9.3.1.EventLoggingApplicationshavedifferenteventloggingneeds;withintheenterprise,therearemanydifferentapplicationsthatwanttologevents.Documentdatabasescanstoreallthesedifferenttypesofeventsandcanactasacentraldatastoreforeventstorage.Thisisespeciallytruewhenthetypeofdatabeingcapturedbytheeventskeepschanging.Eventscanbeshardedbythenameoftheapplicationwheretheeventoriginatedorbythetypeofeventsuchasorder_processedorcustomer_logged.
9.3.2.ContentManagementSystems,BloggingPlatformsSincedocumentdatabaseshavenopredefinedschemasandusuallyunderstandJSONdocuments,theyworkwellincontentmanagementsystemsorapplicationsforpublishingwebsites,managinguser
comments,userregistrations,profiles,web-facingdocuments.
9.3.3.WebAnalyticsorReal-TimeAnalyticsDocumentdatabasescanstoredataforreal-timeanalytics;sincepartsofthedocumentcanbeupdated,it’sveryeasytostorepageviewsoruniquevisitors,andnewmetricscanbeeasilyaddedwithoutschemachanges.
9.3.4.E-CommerceApplicationsE-commerceapplicationsoftenneedtohaveflexibleschemaforproductsandorders,aswellastheabilitytoevolvetheirdatamodelswithoutexpensivedatabaserefactoringordatamigration(“SchemaChangesinaNoSQLDataStore,”p.128).
9.4.WhenNottoUseThereareproblemspaceswheredocumentdatabasesarenotthebestsolution.
9.4.1.ComplexTransactionsSpanningDifferentOperationsIfyouneedtohaveatomiccross-documentoperations,thendocumentdatabasesmaynotbeforyou.However,therearesomedocumentdatabasesthatdosupportthesekindsofoperations,suchasRavenDB.
9.4.2.QueriesagainstVaryingAggregateStructureFlexibleschemameansthatthedatabasedoesnotenforceanyrestrictionsontheschema.Dataissavedintheformofapplicationentities.Ifyouneedtoquerytheseentitiesadhoc,yourquerieswillbechanging(inRDBMSterms,thiswouldmeanthatasyoujoincriteriabetweentables,thetablestojoinkeepchanging).Sincethedataissavedasanaggregate,ifthedesignoftheaggregateisconstantlychanging,youneedtosavetheaggregatesatthelowestlevelofgranularity—basically,youneedtonormalizethedata.Inthisscenario,documentdatabasesmaynotwork.
Chapter10.Column-FamilyStores
Column-familystores,suchasCassandra[Cassandra],HBase[Hbase],Hypertable[Hypertable],andAmazonSimpleDB[AmazonSimpleDB],allowyoutostoredatawithkeysmappedtovaluesandthevaluesgroupedintomultiplecolumnfamilies,eachcolumnfamilybeingamapofdata.
10.1.WhatIsaColumn-FamilyDataStore?Therearemanycolumn-familydatabases.Inthischapter,wewilltalkaboutCassandrabutalsoreferenceothercolumn-familydatabasestodiscussfeaturesthatmaybeofinterestinparticularscenarios.Column-familydatabasesstoredataincolumnfamiliesasrowsthathavemanycolumnsassociated
witharowkey(Figure10.1).Columnfamiliesaregroupsofrelateddatathatisoftenaccessedtogether.ForaCustomer,wewouldoftenaccesstheirProfileinformationatthesametime,butnottheirOrders.
Figure10.1.Cassandra’sdatamodelwithcolumnfamiliesCassandraisoneofthepopularcolumn-familydatabases;thereareothers,suchasHBase,
Hypertable,andAmazonDynamoDB[AmazonDynamoDB].Cassandracanbedescribedasfastandeasilyscalablewithwriteoperationsspreadacrossthecluster.Theclusterdoesnothaveamasternode,soanyreadandwritecanbehandledbyanynodeinthecluster.
10.2.Features
Let’sstartbylookingathowdataisstructuredinCassandra.ThebasicunitofstorageinCassandraisacolumn.ACassandracolumnconsistsofaname-valuepairwherethenamealsobehavesasthekey.Eachofthesekey-valuepairsisasinglecolumnandisalwaysstoredwithatimestampvalue.Thetimestampisusedtoexpiredata,resolvewriteconflicts,dealwithstaledata,anddootherthings.Oncethecolumndataisnolongerused,thespacecanbereclaimedlaterduringacompactionphase.Clickheretoviewcodeimage
{name:"fullName",value:"MartinFowler",timestamp:12345667890}
ThecolumnhasakeyoffirstNameandthevalueofMartinandhasatimestampattachedtoit.Arowisacollectionofcolumnsattachedorlinkedtoakey;acollectionofsimilarrowsmakesacolumnfamily.Whenthecolumnsinacolumnfamilyaresimplecolumns,thecolumnfamilyisknownasstandardcolumnfamily.Clickheretoviewcodeimage
//columnfamily{//row"pramod-sadalage":{firstName:"Pramod",lastName:"Sadalage",lastVisit:"2012/12/12"}//row"martin-fowler":{firstName:"Martin",lastName:"Fowler",location:"Boston"}}
EachcolumnfamilycanbecomparedtoacontainerofrowsinanRDBMStablewherethekeyidentifiestherowandtherowconsistsonmultiplecolumns.Thedifferenceisthatvariousrowsdonothavetohavethesamecolumns,andcolumnscanbeaddedtoanyrowatanytimewithouthavingtoaddittootherrows.Wehavethepramod-sadalagerowandthemartin-fowlerrowwithdifferentcolumns;bothrowsarepartofthecolumnfamily.Whenacolumnconsistsofamapofcolumns,thenwehaveasupercolumn.Asupercolumn
consistsofanameandavaluewhichisamapofcolumns.Thinkofasupercolumnasacontainerofcolumns.Clickheretoviewcodeimage
{name:"book:978-0767905923",value:{author:"MitchAlbon",title:"TuesdayswithMorrie",isbn:"978-0767905923"}}
Whenweusesupercolumnstocreateacolumnfamily,wegetasupercolumnfamily.Clickheretoviewcodeimage
//supercolumnfamily{//rowname:"billing:martin-fowler",value:{address:{name:"address:default",value:{fullName:"MartinFowler",street:"100N.MainStreet",zip:"20145"}},billing:{name:"billing:default",value:{creditcard:"8888-8888-8888-8888",expDate:"12/2016"}}}//rowname:"billing:pramod-sadalage",value:{address:{name:"address:default",value:{fullName:"PramodSadalage",street:"100E.StateParkway",zip:"54130"}},billing:{name:"billing:default",value:{creditcard:"9999-8888-7777-4444",expDate:"01/2016"}}}}
Supercolumnfamiliesaregoodtokeeprelateddatatogether,butwhensomeofthecolumnsarenotneededmostofthetime,thecolumnsarestillfetchedanddeserializedbyCassandra,whichmaynotbeoptimal.Cassandraputsthestandardandsupercolumnfamiliesintokeyspaces.Akeyspaceissimilartoa
databaseinRDBMSwhereallcolumnfamiliesrelatedtotheapplicationarestored.Keyspaceshavetobecreatedsothatcolumnfamiliescanbeassignedtothem:createkeyspaceecommerce
10.2.1.ConsistencyWhenawriteisreceivedbyCassandra,thedataisfirstrecordedinacommitlog,thenwrittentoanin-memorystructureknownasmemtable.Awriteoperationisconsideredsuccessfulonceit’swrittentothecommitlogandthememtable.WritesarebatchedinmemoryandperiodicallywrittenouttostructuresknownasSSTable.SSTablesarenotwrittentoagainaftertheyareflushed;iftherearechangestothedata,anewSSTableiswritten.UnusedSSTablesarereclaimedbycompactation.Let’slookatthereadoperationtoseehowconsistencysettingsaffectit.Ifwehaveaconsistency
settingofONEasthedefaultforallreadoperations,thenwhenareadrequestismade,Cassandrareturnsthedatafromthefirstreplica,evenifthedataisstale.Ifthedataisstale,subsequentreadswillgetthelatest(newest)data;thisprocessisknownasreadrepair.Thelowconsistencylevelisgoodtousewhenyoudonotcareifyougetstaledataand/orifyouhavehighreadperformancerequirements.Similarly,ifyouaredoingwrites,Cassandrawouldwritetoonenode’scommitlogandreturna
responsetotheclient.TheconsistencyofONEisgoodifyouhaveveryhighwriteperformancerequirementsandalsodonotmindifsomewritesarelost,whichmayhappenifthenodegoesdownbeforethewriteisreplicatedtoothernodes.Clickheretoviewcodeimage
quorum=newConfigurableConsistencyLevel();quorum.setDefaultReadConsistencyLevel(HConsistencyLevel.QUORUM);quorum.setDefaultWriteConsistencyLevel(HConsistencyLevel.QUORUM);
UsingtheQUORUMconsistencysettingforbothreadandwriteoperationsensuresthatmajorityofthenodesrespondtothereadandthecolumnwiththenewesttimestampisreturnedbacktotheclient,whilethereplicasthatdonothavethenewestdataarerepairedviathereadrepairoperations.Duringwriteoperations,theQUORUMconsistencysettingmeansthatthewritehastopropagatetothemajorityofthenodesbeforeitisconsideredsuccessfulandtheclientisnotified.UsingALLasconsistencylevelmeansthatallnodeswillhavetorespondtoreadsorwrites,which
willmaketheclusternottoleranttofaults—evenwhenonenodeisdown,thewriteorreadisblockedandreportedasafailure.It’sthereforeuponthesystemdesignerstotunetheconsistencylevelsastheapplicationrequirementschange.Withinthesameapplication,theremaybedifferentrequirementsofconsistency;theycanalsochangebasedoneachoperation,forexampleshowingreviewcommentsforaproducthasdifferentconsistencyrequirementscomparedtoreadingthestatusofthelastorderplacedbythecustomer.Duringkeyspacecreation,wecanconfigurehowmanyreplicasofthedataweneedtostore.This
numberdeterminesthereplicationfactorofthedata.Ifyouhaveareplicationfactorof3,thedatacopiedontothreenodes.WhenwritingandreadingdatawithCassandra,ifyouspecifytheconsistencyvaluesof2,yougetthatR+Wisgreaterthanthereplicationfactor(2+2>3)whichgivesyoubetterconsistencyduringwritesandreads.WecanrunthenoderepaircommandforthekeyspaceandforceCassandratocompareeverykey
it’sresponsibleforwiththerestofthereplicas.Asthisoperationisexpensive,wecanalsojustrepairaspecificcolumnfamilyoralistofcolumnfamilies:repairecommerce
repairecommercecustomerInfo
Whileanodeisdown,thedatathatwassupposedtobestoredbythatnodeishandedofftoothernodes.Asthenodecomesbackonline,thechangesmadetothedataarehandedbacktothenode.Thistechniqueisknownashintedhandoff.Hintedhandoffallowsforfasterrestoreoffailednodes.
10.2.2.TransactionsCassandradoesnothavetransactionsinthetraditionalsense—wherewecouldstartmultiplewritesandthendecideifwewanttocommitthechangesornot.InCassandra,awriteisatomicattherowlevel,whichmeansinsertingorupdatingcolumnsforagivenrowkeywillbetreatedasasinglewriteandwilleithersucceedorfail.Writesarefirstwrittentocommitlogsandmemtables,andareonly
consideredgoodwhenthewritetocommitlogandmemtablewassuccessful.Ifanodegoesdown,thecommitlogisusedtoapplychangestothenode,justliketheredologinOracle.Youcanuseexternaltransactionlibraries,suchasZooKeeper[ZooKeeper],tosynchronizeyour
writesandreads.TherearealsolibrariessuchasCages[Cages]thatallowyoutowrapyourtransactionsoverZooKeeper.
10.2.3.AvailabilityCassandraisbydesignhighlyavailable,sincethereisnomasterintheclusterandeverynodeisapeerinthecluster.Theavailabilityofaclustercanbeincreasedbyreducingtheconsistencyleveloftherequests.Availabilityisgovernedbythe(R+W)>Nformula(“Quorums,”p.57)whereWistheminimumnumberofnodeswherethewritemustbesuccessfullywritten,Ristheminimumnumberofnodesthatmustrespondsuccessfullytoaread,andNisthenumberofnodesparticipatinginthereplicationofdata.YoucantunetheavailabilitybychangingtheRandWvaluesforafixedvalueofN.Ina10-nodeCassandraclusterwithareplicationfactorforthekeyspacesetto3(N=3),ifweset
R=2andW=2,thenwehave(2+2)>3.Inthisscenario,whenonenodegoesdown,availabilityisnotaffectedmuch,asthedatacanberetrievedfromtheothertwonodes.IfW=2andR=1,whentwonodesaredowntheclusterisnotavailableforwritebutwecanstillread.Similarly,ifR=2andW=1,wecanwritebuttheclusterisnotavailableforread.WiththeR+W>Nequation,youaremakingconsciousdecisionsaboutconsistencytradeoffs.Youshouldsetupyourkeyspacesandread/writeoperationsbasedonyourneeds—higher
availabilityforwriteorhigheravailabilityforread.
10.2.4.QueryFeaturesWhendesigningthedatamodelinCassandra,itisadvisedtomakethecolumnsandcolumnfamiliesoptimizedforreadingthedata,asitdoesnothavearichquerylanguage;asdataisinsertedinthecolumnfamilies,dataineachrowissortedbycolumnnames.Ifwehaveacolumnthatisretrievedmuchmoreoftenthanothercolumns,it’sbetterperformance-wisetousethatvaluefortherowkeyinstead.10.2.4.1.BasicQueries
BasicqueriesthatcanberunusingaCassandraclientincludetheGET,SET,andDEL.Beforestartingtoqueryfordata,wehavetoissuethekeyspacecommanduseecommerce;.Thisensuresthatallofourqueriesarerunagainstthekeyspacethatweputourdatainto.Beforestartingtousethecolumnfamilyinthekeyspace,wehavetodefinethecolumnfamily.Clickheretoviewcodeimage
CREATECOLUMNFAMILYCustomerWITHcomparator=UTF8TypeANDkey_validation_class=UTF8TypeANDcolumn_metadata=[{column_name:city,validation_class:UTF8Type}{column_name:name,validation_class:UTF8Type}{column_name:web,validation_class:UTF8Type}];
WehaveacolumnfamilynamedCustomerwithname,city,andwebcolumns,andweareinsertingdatainthecolumnfamilywithaCassandraclient.Clickheretoviewcodeimage
SETCustomer['mfowler']['city']='Boston';
SETCustomer['mfowler']['name']='MartinFowler';SETCustomer['mfowler']['web']='www.martinfowler.com';
UsingtheHector[Hector]Javaclient,wecaninsertthesamedatainthecolumnfamily.Clickheretoviewcodeimage
ColumnFamilyTemplate<String,String>template=cassandra.getColumnFamilyTemplate();ColumnFamilyUpdater<String,String>updater=template.createUpdater(key);for(Stringname:values.keySet()){updater.setString(name,values.get(name));}try{template.update(updater);}catch(HectorExceptione){handleException(e);}
WecanreadthedatabackusingtheGETcommand.Therearemultiplewaystogetthedata;wecangetthewholecolumnfamily.GETCustomer['mfowler'];
Wecanevengetjustthecolumnweareinterestedinfromthecolumnfamily.GETCustomer['mfowler']['web'];
Gettingthespecificcolumnweneedismoreefficient,asonlythedatawecareaboutisreturned—whichsaveslotsofdatamovement,especiallywhenthecolumnfamilyhasalargenumberofcolumns.UpdatingthedataisthesameasusingtheSETcommandforthecolumnthatneedstobesettothenewvalue.UsingDELcommand,wecandeleteeitheracolumnortheentirecolumnfamily.Clickheretoviewcodeimage
DELCustomer['mfowler']['city'];
DELCustomer['mfowler'];
10.2.4.2.AdvancedQueriesandIndexing
Cassandraallowsyoutoindexcolumnsotherthanthekeysforthecolumnfamily.Wecandefineanindexonthecitycolumn.Clickheretoviewcodeimage
UPDATECOLUMNFAMILYCustomerWITHcomparator=UTF8TypeANDcolumn_metadata=[{column_name:city,validation_class:UTF8Type,index_type:KEYS}];
Wecannowquerydirectlyagainsttheindexedcolumn.GETCustomerWHEREcity='Boston';
Theseindexesareimplementedasbit-mappedindexesandperformwellforlow-cardinalitycolumnvalues.10.2.4.3.CassandraQueryLanguage(CQL)
CassandrahasaquerylanguagethatsupportsSQL-likecommands,knownasCassandraQueryLanguage(CQL).WecanusetheCQLcommandstocreateacolumnfamily.
Clickheretoviewcodeimage
CREATECOLUMNFAMILYCustomer(KEYvarcharPRIMARYKEY,namevarchar,cityvarchar,webvarchar);
WeinsertthesamedatausingCQL.Clickheretoviewcodeimage
INSERTINTOCustomer(KEY,name,city,web)VALUES('mfowler','MartinFowler','Boston','www.martinfowler.com');
WecanreaddatausingtheSELECTcommand.Herewereadallthecolumns:SELECT*FROMCustomer
Or,wecouldjustSELECTthecolumnsweneed.SELECTname,webFROMCustomer
IndexingcolumnsarecreatedusingtheCREATEINDEXcommand,andthencanbeusedtoquerythedata.Clickheretoviewcodeimage
SELECTname,webFROMCustomerWHEREcity='Boston'
CQLhasmanymorefeaturesforqueryingdata,butitdoesnothaveallthefeaturesthatSQLhas.CQLdoesnotallowjoinsorsubqueries,anditswhereclausesaretypicallysimple.
10.2.5.ScalingScalinganexistingCassandraclusterisamatterofaddingmorenodes.Asnosinglenodeisamaster,whenweaddnodestotheclusterweareimprovingthecapacityoftheclustertosupportmorewritesandreads.Thistypeofhorizontalscalingallowsyoutohavemaximumuptime,astheclusterkeepsservingrequestsfromtheclientswhilenewnodesarebeingaddedtothecluster.
10.3.SuitableUseCasesLet’sdiscusssomeoftheproblemswherecolumn-familydatabasesareagoodfit.
10.3.1.EventLoggingColumn-familydatabaseswiththeirabilitytostoreanydatastructuresareagreatchoicetostoreeventinformation,suchasapplicationstateorerrorsencounteredbytheapplication.Withintheenterprise,allapplicationscanwritetheireventstoCassandrawiththeirowncolumnsandtherowkeyoftheformappname:timestamp.Sincewecanscalewrites,Cassandrawouldworkideallyforaneventloggingsystem(Figure10.2).
Figure10.2.EventloggingwithCassandra
10.3.2.ContentManagementSystems,BloggingPlatformsUsingcolumnfamilies,youcanstoreblogentrieswithtags,categories,links,andtrackbacksindifferentcolumns.Commentscanbeeitherstoredinthesamerowormovedtoadifferentkeyspace;similarly,blogusersandtheactualblogscanbeputintodifferentcolumnfamilies.
10.3.3.CountersOften,inwebapplicationsyouneedtocountandcategorizevisitorsofapagetocalculateanalytics.YoucanusetheCounterColumnTypeduringcreationofacolumnfamily.Clickheretoviewcodeimage
CREATECOLUMNFAMILYvisit_counterWITHdefault_validation_class=CounterColumnTypeANDkey_validation_class=UTF8TypeANDcomparator=UTF8Type;
Onceacolumnfamilyiscreated,youcanhavearbitrarycolumnsforeachpagevisitedwithinthewebapplicationforeveryuser.Clickheretoviewcodeimage
INCRvisit_counter['mfowler'][home]BY1;INCRvisit_counter['mfowler'][products]BY1;INCRvisit_counter['mfowler'][contactus]BY1;
IncrementingcountersusingCQL:Clickheretoviewcodeimage
UPDATEvisit_counterSEThome=home+1WHEREKEY='mfowler'
10.3.4.ExpiringUsageYoumayprovidedemoaccesstousers,ormaywanttoshowadbannersonawebsiteforaspecifictime.Youcandothisbyusingexpiringcolumns:Cassandraallowsyoutohavecolumnswhich,afteragiventime,aredeletedautomatically.ThistimeisknownasTTL(TimeToLive)andisdefinedinseconds.ThecolumnisdeletedaftertheTTLhaselapsed;whenthecolumndoesnotexist,theaccesscanberevokedorthebannercanberemoved.Clickheretoviewcodeimage
SETCustomer['mfowler']['demo_access']='allowed'WITHttl=2592000;
10.4.WhenNottoUseThereareproblemsforwhichcolumn-familydatabasesarenotthebestsolutions,suchassystemsthatrequireACIDtransactionsforwritesandreads.Ifyouneedthedatabasetoaggregatethedatausingqueries(suchasSUMorAVG),youhavetodothisontheclientsideusingdataretrievedbytheclientfromalltherows.Cassandraisnotgreatforearlyprototypesorinitialtechspikes:Duringtheearlystages,weare
notsurehowthequerypatternsmaychange,andasthequerypatternschange,wehavetochangethecolumnfamilydesign.Thiscausesfrictionfortheproductinnovationteamandslowsdowndeveloperproductivity.RDBMSimposehighcostonschemachange,whichistradedoffforalowcostofquerychange;inCassandra,thecostmaybehigherforquerychangeascomparedtoschemachange.
Chapter11.GraphDatabases
Graphdatabasesallowyoutostoreentitiesandrelationshipsbetweentheseentities.Entitiesarealsoknownasnodes,whichhaveproperties.Thinkofanodeasaninstanceofanobjectintheapplication.Relationsareknownasedgesthatcanhaveproperties.Edgeshavedirectionalsignificance;nodesareorganizedbyrelationshipswhichallowyoutofindinterestingpatternsbetweenthenodes.Theorganizationofthegraphletsthedatatobestoredonceandtheninterpretedindifferentwaysbasedonrelationships.
11.1.WhatIsaGraphDatabase?IntheexamplegraphinFigure11.1,weseeabunchofnodesrelatedtoeachother.Nodesareentitiesthathaveproperties,suchasname.ThenodeofMartinisactuallyanodethathaspropertyofnamesettoMartin.
Figure11.1.AnexamplegraphstructureWealsoseethatedgeshavetypes,suchaslikes,author,andsoon.Thesepropertiesletus
organizethenodes;forexample,thenodesMartinandPramodhaveanedgeconnectingthemwitharelationshiptypeoffriend.Edgescanhavemultipleproperties.WecanassignapropertyofsinceonthefriendrelationshiptypebetweenMartinandPramod.Relationshiptypeshavedirectionalsignificance;thefriendrelationshiptypeisbidirectionalbutlikesisnot.WhenDawnlikesNoSQLDistilled,itdoesnotautomaticallymeanNoSQLDistilledlikesDawn.
Oncewehaveagraphofthesenodesandedgescreated,wecanquerythegraphinmanyways,suchas“getallnodesemployedbyBigCothatlikeNoSQLDistilled.”Aqueryonthegraphisalsoknownastraversingthegraph.Anadvantageofthegraphdatabasesisthatwecanchangethetraversingrequirementswithouthavingtochangethenodesoredges.Ifwewantto“getallnodesthatlikeNoSQLDistilled,”wecandosowithouthavingtochangetheexistingdataorthemodelofthedatabase,becausewecantraversethegraphanywaywelike.Usually,whenwestoreagraph-likestructureinRDBMS,it’sforasingletypeofrelationship
(“whoismymanager”isacommonexample).Addinganotherrelationshiptothemixusuallymeansalotofschemachangesanddatamovement,whichisnotthecasewhenweareusinggraphdatabases.Similarly,inrelationaldatabaseswemodelthegraphbeforehandbasedontheTraversalwewant;iftheTraversalchanges,thedatawillhavetochange.Ingraphdatabases,traversingthejoinsorrelationshipsisveryfast.Therelationshipbetween
nodesisnotcalculatedatquerytimebutisactuallypersistedasarelationship.Traversingpersistedrelationshipsisfasterthancalculatingthemforeveryquery.Nodescanhavedifferenttypesofrelationshipsbetweenthem,allowingyoutobothrepresent
relationshipsbetweenthedomainentitiesandtohavesecondaryrelationshipsforthingslikecategory,path,time-trees,quad-treesforspatialindexing,orlinkedlistsforsortedaccess.Sincethereisnolimittothenumberandkindofrelationshipsanodecanhave,alltheycanberepresentedinthesamegraphdatabase.
11.2.FeaturesTherearemanygraphdatabasesavailable,suchasNeo4J[Neo4J],InfiniteGraph[InfiniteGraph],OrientDB[OrientDB],orFlockDB[FlockDB](whichisaspecialcase:agraphdatabasethatonlysupportssingle-depthrelationshipsoradjacencylists,whereyoucannottraversemorethanoneleveldeepforrelationships).WewilltakeNeo4Jasarepresentativeofthegraphdatabasesolutionstodiscusshowtheyworkandhowtheycanbeusedtosolveapplicationproblems.InNeo4J,creatingagraphisassimpleascreatingtwonodesandthencreatingarelationship.Let’s
createtwonodes,MartinandPramod:Clickheretoviewcodeimage
Nodemartin=graphDb.createNode();martin.setProperty("name","Martin");
Nodepramod=graphDb.createNode();pramod.setProperty("name","Pramod");
WehaveassignedthenamepropertyofthetwonodesthevaluesofMartinandPramod.Oncewehavemorethanonenode,wecancreatearelationship:Clickheretoviewcodeimage
martin.createRelationshipTo(pramod,FRIEND);
pramod.createRelationshipTo(martin,FRIEND);
Wehavetocreaterelationshipbetweenthenodesinbothdirections,forthedirectionoftherelationshipmatters:Forexample,aproductnodecanbelikedbyuserbuttheproductcannotliketheuser.Thisdirectionalityhelpsindesigningarichdomainmodel(Figure11.2).NodesknowaboutINCOMINGandOUTGOINGrelationshipsthataretraversablebothways.
Figure11.2.RelationshipswithpropertiesRelationshipsarefirst-classcitizensingraphdatabases;mostofthevalueofgraphdatabasesis
derivedfromtherelationships.Relationshipsdon’tonlyhaveatype,astartnode,andanendnode,butcanhavepropertiesoftheirown.Usingthesepropertiesontherelationships,wecanaddintelligencetotherelationship—forexample,sincewhendidtheybecomefriends,whatisthedistancebetweenthenodes,orwhataspectsaresharedbetweenthenodes.Thesepropertiesontherelationshipscanbeusedtoquerythegraph.Sincemostofthepowerfromthegraphdatabasescomesfromtherelationshipsandtheir
properties,alotofthoughtanddesignworkisneededtomodeltherelationshipsinthedomainthatwearetryingtoworkwith.Addingnewrelationshiptypesiseasy;changingexistingnodesandtheirrelationshipsissimilartodatamigration(“MigrationsinGraphDatabases,”p.131),becausethesechangeswillhavetobedoneoneachnodeandeachrelationshipintheexistingdata.
11.2.1.ConsistencySincegraphdatabasesareoperatingonconnectednodes,mostgraphdatabasesolutionsusuallydonotsupportdistributingthenodesondifferentservers.Therearesomesolutions,however,thatsupportnodedistributionacrossaclusterofservers,suchasInfiniteGraph.Withinasingleserver,dataisalwaysconsistent,especiallyinNeo4JwhichisfullyACID-compliant.WhenrunningNeo4Jinacluster,awritetothemasteriseventuallysynchronizedtotheslaves,whileslavesarealwaysavailableforread.Writestoslavesareallowedandareimmediatelysynchronizedtothemaster;otherslaveswillnotbesynchronizedimmediately,though—theywillhavetowaitforthedatatopropagatefromthemaster.Graphdatabasesensureconsistencythroughtransactions.Theydonotallowdangling
relationships:Thestartnodeandendnodealwayshavetoexist,andnodescanonlybedeletedifthey
don’thaveanyrelationshipsattachedtothem.
11.2.2.TransactionsNeo4JisACID-compliant.Beforechanginganynodesoraddinganyrelationshipstoexistingnodes,wehavetostartatransaction.Withoutwrappingoperationsintransactions,wewillgetaNotInTransactionException.Readoperationscanbedonewithoutinitiatingatransaction.Clickheretoviewcodeimage
Transactiontransaction=database.beginTx();try{Nodenode=database.createNode();node.setProperty("name","NoSQLDistilled");node.setProperty("published","2012");transaction.success();}finally{transaction.finish();}
Intheabovecode,westartedatransactiononthedatabase,thencreatedanodeandsetpropertiesonit.Wemarkedthetransactionassuccessandfinallycompleteditbyfinish.Atransactionhastobemarkedassuccess,otherwiseNeo4Jassumesthatitwasafailureandrollsitbackwhenfinishisissued.Settingsuccesswithoutissuingfinishalsodoesnotcommitthedatatothedatabase.Thiswayofmanagingtransactionshastoberememberedwhendeveloping,asitdiffersfromthestandardwayofdoingtransactionsinanRDBMS.
11.2.3.AvailabilityNeo4J,asofversion1.8,achieveshighavailabilitybyprovidingforreplicatedslaves.Theseslavescanalsohandlewrites:Whentheyarewrittento,theysynchronizethewritetothecurrentmaster,andthewriteiscommittedfirstatthemasterandthenattheslave.Otherslaveswilleventuallygettheupdate.Othergraphdatabases,suchasInfiniteGraphandFlockDB,providefordistributedstorageofthenodes.Neo4JusestheApacheZooKeeper[ZooKeeper]tokeeptrackofthelasttransactionIDspersisted
oneachslavenodeandthecurrentmasternode.Onceaserverstartsup,itcommunicateswithZooKeeperandfindsoutwhichserveristhemaster.Iftheserveristhefirstonetojointhecluster,itbecomesthemaster;whenamastergoesdown,theclusterelectsamasterfromtheavailablenodes,thusprovidinghighavailability.
11.2.4.QueryFeaturesGraphdatabasesaresupportedbyquerylanguagessuchasGremlin[Gremlin].Gremlinisadomain-specificlanguagefortraversinggraphs;itcantraverseallgraphdatabasesthatimplementtheBlueprints[Blueprints]propertygraph.Neo4JalsohastheCypher[Cypher]querylanguageforqueryingthegraph.Outsidethesequerylanguages,Neo4Jallowsyoutoquerythegraphforpropertiesofthenodes,traversethegraph,ornavigatethenodesrelationshipsusinglanguagebindings.Propertiesofanodecanbeindexedusingtheindexingservice.Similarly,propertiesof
relationshipsoredgescanbeindexed,soanodeoredgecanbefoundbythevalue.Indexesshouldbequeriedtofindthestartingnodetobeginatraversal.Let’slookatsearchingforthenodeusingnodeindexing.IfwehavethegraphshowninFigure11.1,wecanindexthenodesastheyareaddedtothedatabase,
orwecanindexallthenodeslaterbyiteratingoverthem.Wefirstneedtocreateanindexforthe
nodesusingtheIndexManager.Clickheretoviewcodeimage
Index<Node>nodeIndex=graphDb.index().forNodes("nodes");
Weareindexingthenodesforthenameproperty.Neo4JusesLucene[Lucene]asitsindexingservice.Wewillseelaterthatwecanalsousethefull-textsearchcapabilityofLucene.Whennewnodesarecreated,theycanbeaddedtotheindex.Clickheretoviewcodeimage
Transactiontransaction=graphDb.beginTx();try{Index<Node>nodeIndex=graphDb.index().forNodes("nodes");nodeIndex.add(martin,"name",martin.getProperty("name"));nodeIndex.add(pramod,"name",pramod.getProperty("name"));transaction.success();}finally{transaction.finish();}
Addingnodestotheindexisdoneinsidethecontextofatransaction.Oncethenodesareindexed,wecansearchthemusingtheindexedproperty.IfwesearchforthenodewiththenameofBarbara,wewouldquerytheindexforthepropertyofnametohaveavalueofBarbara.Clickheretoviewcodeimage
Nodenode=nodeIndex.get("name","Barbara").getSingle();
WegetthenodewhosenameisMartin;giventhenode,wecangetallitsrelationships.Clickheretoviewcodeimage
Nodemartin=nodeIndex.get("name","Martin").getSingle();allRelationships=martin.getRelationships();
WecangetbothINCOMINGorOUTGOINGrelationships.Clickheretoviewcodeimage
incomingRelations=martin.getRelationships(Direction.INCOMING);
Wecanalsoapplydirectionalfiltersonthequerieswhenqueryingforarelationship.WiththegraphinFigure11.1,ifwewanttofindallpeoplewholikeNoSQLDistilled,wecanfindtheNoSQLDistillednodeandthengetitsrelationshipswithDirection.INCOMING.Atthispointwecanalsoaddthetypeofrelationshiptothequeryfilter,sincewearelookingonlyfornodesthatLIKENoSQLDistilled.Clickheretoviewcodeimage
NodenosqlDistilled=nodeIndex.get("name","NoSQLDistilled").getSingle();relationships=nosqlDistilled.getRelationships(INCOMING,LIKES);for(Relationshiprelationship:relationships){likesNoSQLDistilled.add(relationship.getStartNode());}
Findingnodesandtheirimmediaterelationsiseasy,butthiscanalsobeachievedinRDBMSdatabases.Graphdatabasesarereallypowerfulwhenyouwanttotraversethegraphsatanydepthandspecifyastartingnodeforthetraversal.Thisisespeciallyusefulwhenyouaretryingtofindnodesthatarerelatedtothestartingnodeatmorethanoneleveldown.Asthedepthofthegraphincreases,itmakesmoresensetotraversetherelationshipsbyusingaTraverserwhereyoucanspecifythatyou
arelookingforINCOMING,OUTGOING,orBOTHtypesofrelationships.Youcanalsomakethetraversergotop-downorsidewaysonthegraphbyusingOrdervaluesofBREADTH_FIRSTorDEPTH_FIRST.Thetraversalhastostartatsomenode—inthisexample,wetrytofindallthenodesatanydepththatarerelatedasaFRIENDwithBarbara:Clickheretoviewcodeimage
Nodebarbara=nodeIndex.get("name","Barbara").getSingle();
TraverserfriendsTraverser=barbara.traverse(Order.BREADTH_FIRST,StopEvaluator.END_OF_GRAPH,ReturnableEvaluator.ALL_BUT_START_NODE,EdgeType.FRIEND,Direction.OUTGOING);
ThefriendsTraverserprovidesusawaytofindallthenodesthatarerelatedtoBarbarawheretherelationshiptypeisFRIEND.Thenodescanbeatanydepth—friendofafriendatanylevel—allowingyoutoexploretreestructures.Oneofthegoodfeaturesofgraphdatabasesisfindingpathsbetweentwonodes—determiningif
therearemultiplepaths,findingallofthepathsortheshortestpath.InthegraphinFigure11.1,weknowthatBarbaraisconnectedtoJillbytwodistinctpaths;tofindallthesepathsandthedistancebetweenBarbaraandJillalongthosedifferentpaths,wecanuseClickheretoviewcodeimage
Nodebarbara=nodeIndex.get("name","Barbara").getSingle();Nodejill=nodeIndex.get("name","Jill").getSingle();PathFinder<Path>finder=GraphAlgoFactory.allPaths(Traversal.expanderForTypes(FRIEND,Direction.OUTGOING),MAX_DEPTH);Iterable<Path>paths=finder.findAllPaths(barbara,jill);
Thisfeatureisusedinsocialnetworkstoshowrelationshipsbetweenanytwonodes.Tofindallthepathsandthedistancebetweenthenodesforeachpath,wefirstgetalistofdistinctpathsbetweenthetwonodes.Thelengthofeachpathisthenumberofhopsonthegraphneededtoreachthedestinationnodefromthestartnode.Often,youneedtogettheshortestpathbetweentwonodes;ofthetwopathsfromBarbaratoJill,theshortestpathcanbefoundbyusingClickheretoviewcodeimage
PathFinder<Path>finder=GraphAlgoFactory.shortestPath(Traversal.expanderForTypes(FRIEND,Direction.OUTGOING),MAX_DEPTH);Iterable<Path>paths=finder.findAllPaths(barbara,jill);
Manyothergraphalgorithmscanbeappliedtothegraphathand,suchasDijkstra’salgorithm[Dijkstra’s]forfindingtheshortestorcheapestpathbetweennodes.Clickheretoviewcodeimage
STARTbeginingNode=(beginningnodespecification)MATCH(relationship,patternmatches)WHERE(filteringcondition:ondatainnodesandrelationships)RETURN(Whattoreturn:nodes,relationships,properties)ORDERBY(propertiestoorderby)SKIP(nodestoskipfromtop)LIMIT(limitresults)
Neo4JalsoprovidestheCypherquerylanguagetoquerythegraph.CypherneedsanodetoSTARTthequery.ThestartnodecanbeidentifiedbyitsnodeID,alistofnodeIDs,orindexlookups.Cypher
usestheMATCHkeywordformatchingpatternsinrelationships;theWHEREkeywordfiltersthepropertiesonanodeorrelationship.TheRETURNkeywordspecifieswhatgetsreturnedbythequery—nodes,relationships,orfieldsonthenodesorrelationships.CypheralsoprovidesmethodstoORDER,AGGREGATE,SKIP,andLIMITthedata.InFigure11.2,we
findallnodesconnectedtoBarbara,eitherincomingoroutgoing,byusingthe--.Clickheretoviewcodeimage
STARTbarbara=node:nodeIndex(name="Barbara")MATCH(barbara)--(connected_node)RETURNconnected_node
Wheninterestedindirectionalsignificance,wecanuseMATCH(barbara)<--(connected_node)
forincomingrelationshipsorMATCH(barbara)-->(connected_node)
foroutgoingrelationships.Matchcanalsobedoneonspecificrelationshipsusingthe:RELATIONSHIP_TYPEconventionandreturningtherequiredfieldsornodes.Clickheretoviewcodeimage
STARTbarbara=node:nodeIndex(name="Barbara")MATCH(barbara)-[:FRIEND]->(friend_node)RETURNfriend_node.name,friend_node.location
WestartwithBarbara,findalloutgoingrelationshipswiththetypeofFRIEND,andreturnthefriends’names.Therelationshiptypequeryonlyworksforthedepthofonelevel;wecanmakeitworkforgreaterdepthsandfindoutthedepthofeachoftheresultnodes.Clickheretoviewcodeimage
STARTbarbara=node:nodeIndex(name="Barbara")MATCHpath=barbara-[:FRIEND*1..3]->end_nodeRETURNbarbara.name,end_node.name,length(path)
Similarly,wecanqueryforrelationshipswhereaparticularrelationshippropertyexists.Wecanalsofilteronthepropertiesofrelationshipsandqueryifapropertyexistsornot.Clickheretoviewcodeimage
STARTbarbara=node:nodeIndex(name="Barbara")MATCH(barbara)-[relation]->(related_node)WHEREtype(relation)='FRIEND'ANDrelation.shareRETURNrelated_node.name,relation.since
TherearemanyotherqueryfeaturesintheCypherlanguagethatcanbeusedtoquerydatabasegraphs.
11.2.5.ScalingInNoSQLdatabases,oneofthecommonlyusedscalingtechniquesissharding,wheredataissplitanddistributedacrossdifferentservers.Withgraphdatabases,shardingisdifficult,asgraphdatabasesarenotaggregate-orientedbutrelationship-oriented.Sinceanygivennodecanberelatedtoanyothernode,storingrelatednodesonthesameserverisbetterforgraphtraversal.Traversingagraphwhenthenodesareondifferentmachinesisnotgoodforperformance.Knowingthislimitationofthegraphdatabases,wecanstillscalethemusingsomecommontechniquesdescribedbyJimWebber
[WebberNeo4JScaling].Generallyspeaking,therearethreewaystoscalegraphdatabases.Sincemachinesnowcancome
withlotsofRAM,wecanaddenoughRAMtotheserversothattheworkingsetofnodesandrelationshipsisheldentirelyinmemory.ThistechniqueisonlyhelpfulifthedatasetthatweareworkingwithwillfitinarealisticamountofRAM.Wecanimprovethereadscalingofthedatabasebyaddingmoreslaveswithread-onlyaccesstothe
data,withallthewritesgoingtothemaster.ThispatternofwritingonceandreadingfrommanyserversisaproventechniqueinMySQLclustersandisreallyusefulwhenthedatasetislargeenoughtonotfitinasinglemachine’sRAM,butsmallenoughtobereplicatedacrossmultiplemachines.Slavescanalsocontributetoavailabilityandread-scaling,astheycanbeconfiguredtoneverbecomeamaster,remainingalwaysread-only.Whenthedatasetsizemakesreplicationimpractical,wecanshard(seethe“Sharding”sectiononp.
38)thedatafromtheapplicationsideusingdomain-specificknowledge.Forexample,nodesthatrelatetotheNorthAmericacanbecreatedononeserverwhilethenodesthatrelatetoAsiaonanother.Thisapplication-levelshardingneedstounderstandthatnodesarestoredonphysicallydifferentdatabases(Figure11.3).
Figure11.3.Application-levelshardingofnodes
11.3.SuitableUseCasesLet’slookatsomesuitableusecasesforgraphdatabases.
11.3.1.ConnectedDataSocialnetworksarewheregraphdatabasescanbedeployedandusedveryeffectively.Thesesocialgraphsdon’thavetobeonlyofthefriendkind;forexample,theycanrepresentemployees,theirknowledge,andwheretheyworkedwithotheremployeesondifferentprojects.Anylink-richdomainiswellsuitedforgraphdatabases.Ifyouhaverelationshipsbetweendomainentitiesfromdifferentdomains(suchassocial,spatial,
commerce)inasingledatabase,youcanmaketheserelationshipsmorevaluablebyprovidingthe
abilitytotraverseacrossdomains.
11.3.2.Routing,Dispatch,andLocation-BasedServicesEverylocationoraddressthathasadeliveryisanode,andallthenodeswherethedeliveryhastobemadebythedeliverypersoncanbemodeledasagraphofnodes.Relationshipsbetweennodescanhavethepropertyofdistance,thusallowingyoutodeliverthegoodsinanefficientmanner.Distanceandlocationpropertiescanalsobeusedingraphsofplacesofinterest,sothatyourapplicationcanproviderecommendationsofgoodrestaurantsorentertainmentoptionsnearby.Youcanalsocreatenodesforyourpointsofsales,suchasbookstoresorrestaurants,andnotifytheuserswhentheyareclosetoanyofthenodestoprovidelocation-basedservices.
11.3.3.RecommendationEnginesAsnodesandrelationshipsarecreatedinthesystem,theycanbeusedtomakerecommendationslike“yourfriendsalsoboughtthisproduct”or“wheninvoicingthisitem,theseotheritemsareusuallyinvoiced.”Or,itcanbeusedtomakerecommendationstotravelersmentioningthatwhenothervisitorscometoBarcelonatheyusuallyvisitAntonioGaudi’screations.Aninterestingsideeffectofusingthegraphdatabasesforrecommendationsisthatasthedatasize
grows,thenumberofnodesandrelationshipsavailabletomaketherecommendationsquicklyincreases.Thesamedatacanalsobeusedtomineinformation—forexample,whichproductsarealwaysboughttogether,orwhichitemsarealwaysinvoicedtogether;alertscanberaisedwhentheseconditionsarenotmet.Likeotherrecommendationengines,graphdatabasescanbeusedtosearchforpatternsinrelationshipstodetectfraudintransactions.
11.4.WhenNottoUseInsomesituations,graphdatabasesmaynotappropriate.Whenyouwanttoupdateallorasubsetofentities—forexample,inananalyticssolutionwhereallentitiesmayneedtobeupdatedwithachangedproperty—graphdatabasesmaynotbeoptimalsincechangingapropertyonallthenodesisnotastraightforwardoperation.Evenifthedatamodelworksfortheproblemdomain,somedatabasesmaybeunabletohandlelotsofdata,especiallyinglobalgraphoperations(thoseinvolvingthewholegraph).
Chapter12.SchemaMigrations
12.1.SchemaChangesTherecenttrendindiscussingNoSQLdatabasesistohighlighttheirschemalessnature—itisapopularfeaturethatallowsdeveloperstoconcentrateonthedomaindesignwithoutworryingaboutschemachanges.It’sespeciallytruewiththeriseofagilemethods[AgileMethods]whererespondingtochangingrequirementsisimportant.Discussions,iterations,andfeedbackloopsinvolvingdomainexpertsandproductownersare
importanttoderivetherightunderstandingofthedata;thesediscussionsmustnotbehamperedbyadatabase’sschemacomplexity.WithNoSQLdatastores,changestotheschemacanbemadewiththeleastamountoffriction,improvingdeveloperproductivity(“TheEmergenceofNoSQL,”p.9).Wehaveseenthatdevelopingandmaintaininganapplicationinthebravenewworldofschemalessdatabasesrequirescarefulattentiontobegiventoschemamigration.
12.2.SchemaChangesinRDBMSWhiledevelopingwithstandardRDBMStechnologies,wedevelopobjects,theircorrespondingtables,andtheirrelationships.ConsiderasimpleobjectmodelanddatamodelthathasCustomer,Order,andOrderItems.TheERmodelwouldlooklikeFigure12.1.
Figure12.1.Datamodelofane-commercesystemWhilethisdatamodelsupportsthecurrentobjectmodel,lifeisgood.Thefirsttimethereisa
changeintheobjectmodel,suchasintroducingpreferredShippingTypeontheCustomerobject,wehavetochangetheobjectandchangethedatabasetable,becausewithoutchangingthetabletheapplicationwillbeoutofsyncwiththedatabase.WhenwegeterrorslikeORA-00942:tableorviewdoesnotexistorORA-00904:"PREFERRED_SHIPPING_TYPE":invalididentifier,weknowwehavethisproblem.Typically,adatabaseschemamigrationhasbeenaprojectinitself.Fordeploymentoftheschema
changes,databasechangescriptsaredeveloped,usingdifftechniques,forallthechangesinthedevelopmentdatabase.Thisapproachofcreatingmigrationscriptsduringthedeployment/releasetimeiserror-proneanddoesnotsupportagiledevelopmentmethods.
12.2.1.MigrationsforGreenFieldProjectsScriptingthedatabaseschemachangesduringdevelopmentisbetter,sincewecanstoretheseschemachangesalongwiththedatamigrationscriptsinthesamescriptfile.Thesescriptfilesshouldbenamedwithincrementingsequentialnumberswhichreflectthedatabaseversions;forexample,the
firstchangetothedatabasecouldhavescriptfilenamedas001_Description_Of_Change.sql.Scriptingchangesthiswayallowsforthedatabasemigrationstoberunpreservingtheorderofchanges.ShowninFigure12.2isafolderofallthechangesdonetoadatabasesofar.
Figure12.2.SequenceofmigrationsappliedtoadatabaseNow,supposeweneedtochangetheOrderItemtabletostoretheDiscountedPriceandthe
FullPriceoftheitem.ThiswillneedachangetotheOrderItemtableandwillbechangenumber007inoursequenceofchanges,asshowninFigure12.3.
Figure12.3.Newchange007_DiscountedPrice.sqlappliedtothedatabaseWeappliedanewchangetothedatabase.Thischange’sscripthasthecodeforaddinganew
column,renamingtheexistingcolumn,andmigratingthedataneededtomakethenewfeaturework.Shownbelowisthescriptcontainedinthechange007_DiscountedPrice.sql:Clickheretoviewcodeimage
ALTERTABLEorderitemADDdiscountedpriceNUMBER(18,2)NULL;UPDATEorderitemSETdiscountedprice=price;ALTERTABLEorderitemMODIFYdiscountedpriceNOTNULL;ALTERTABLEorderitemRENAMECOLUMNpriceTOfullprice;--//@UNDOALTERTABLEorderitemRENAMEfullpriceTOprice;ALTERTABLEorderitemDROPCOLUMNdiscountedprice;
Thechangescriptshowstheschemachangestothedatabaseaswellasthedatamigrationsneededtobedone.Intheexampleshown,weareusingDBDeploy[DBDeploy]astheframeworktomanagethechangestothedatabase.DBDeploymaintainsatableinthedatabase,namedChangeLog,whereallthechangesmadetothedatabasearestored.Inthistable,Change_Numberiswhattellseveryonewhichchangeshavebeenappliedtothedatabase.ThisChange_Number,whichisthedatabaseversion,isthenusedtofindthecorrespondingnumberedscriptinthefolderandapplythechangeswhichhavenotbeenappliedyet.Whenwewriteascriptwiththechangenumber007andapplyittothedatabaseusingDBDeploy,DBDeploywillchecktheChangeLogandpickupallthescriptsfromthefolderthathavenotyetbeenapplied.Figure12.4isthescreenshotofDBDeployapplyingthechangetothe
database.
Figure12.4.DBDeployupgradingthedatabasewithchangenumber007Thebestwaytointegratewiththerestofthedevelopersistouseyourproject’sversioncontrol
repositorytostoreallthesechangescripts,sothatyoucankeeptrackoftheversionofthesoftwareandthedatabaseinthesameplace,eliminatingpossiblemismatchesbetweenthedatabaseandtheapplication.Therearemanyothertoolsforsuchupgrades,includingLiquibase[Liquibase],MyBatisMigrator[MyBatisMigrator],DBMaintain[DBMaintain].
12.2.2.MigrationsinLegacyProjectsNoteveryprojectisagreenfield.Howtoimplementmigrationswhenanexistingapplicationisinproduction?Wefoundthattakinganexistingdatabaseandextractingitsstructureintoscripts,alongwithallthedatabasecodeandanyreferencedata,worksasabaselinefortheproject.Thisbaselineshouldnotcontaintransactionaldata.Oncethebaselineisready,furtherchangescanbedoneusingthemigrationstechniquedescribedabove(Figure12.5).
Figure12.5.UseofbaselinescriptswithalegacydatabaseOneofthemainaspectsofmigrationsshouldbemaintainingbackwardcompatibilityofthe
databaseschema.Inmanyenterprisestherearemultipleapplicationsusingthedatabase;whenwechangethedatabaseforoneapplication,thischangeshouldnotbreakotherapplications.Wecanachievebackwardcompatibilitybymaintainingatransitionphaseforthechange,asdescribedindetailinRefactoringDatabases[AmblerandSadalage].Duringatransitionphase,theoldschemaandthenewschemaaremaintainedinparallelandare
availableforalltheapplicationsusingthedatabase.Forthis,wehavetointroducescaffoldingcode,suchastriggers,views,andvirtualcolumnsensuringotherapplicationscanaccessthedatabaseschemaandthedatatheyrequirewithoutanycodechanges.Clickheretoviewcodeimage
ALTERTABLEcustomerADDfullnameVARCHAR2(60);UPDATEcustomerSETfullname=fname;
CREATEORREPLACETRIGGERSyncCustomerFullNameBEFOREINSERTORUPDATEONcustomerREFERENCINGOLDASOLDNEWASNEWFOREACHROWBEGINIF:NEW.fnameISNULLTHEN:NEW.fname:=:NEW.fullname;ENDIF;IF:NEW.fullnameISNULLTHEN:NEW.fullname:=:NEW.fnameENDIF;END;/
--DropTriggerandfname--whenallapplicationsstartusingcustomer.fullname
Intheexample,wearetryingtorenamethecustomer.fnamecolumntocustomer.fullnameaswewanttoavoidanyambiguityoffnamemeaningeitherfullnameorfirstname.Adirectrenameofthefnamecolumnandchangingtheapplicationcodeweareresponsibleformayjustwork,forourapplication—butwillnotfortheotherapplicationsintheenterprisethatareaccessingthesamedatabase.Usingthetransitionphasetechnique,weintroducethenewcolumnfullname,copythedataoverto
fullname,butleavetheoldcolumnfnamearound.WealsointroduceaBEFOREUPDATEtriggertosynchronizedatabetweenthecolumnsbeforetheyarecommittedtothedatabase.Now,whenapplicationsreaddatafromthetable,theywillreadeitherfromfnameorfrom
fullnamebutwillalwaysgettherightdata.Wecandropthetriggerandthefnamecolumnoncealltheapplicationshavemovedontousingthenewfullnamecolumn.It’sveryhardtodoschemamigrationsonlargedatasetsinRDBMS,especiallyifwehavetokeep
thedatabaseavailabletotheapplications,aslargedatamovementsandstructuralchangesusuallycreatelocksonthedatabasetables.
12.3.SchemaChangesinaNoSQLDataStoreAnRDBMSdatabasehastobechangedbeforetheapplicationischanged.Thisiswhattheschema-free,orschemaless,approachtriestoavoid,aimingatflexibilityofschemachangesperentity.
Frequentchangestotheschemaareneededtoreacttofrequentmarketchangesandproductinnovations.WhendevelopingwithNoSQLdatabases,insomecasestheschemadoesnothavetobethought
aboutbeforehand.Westillhavetodesignandthinkaboutotheraspects,suchasthetypesofrelationships(withgraphdatabases),orthenamesofthecolumnfamilies,rows,columns,orderofcolumns(withcolumndatabases),orhowarethekeysassignedandwhatisthestructureofthedatainsidethevalueobject(withkey-valuestores).Evenifwedidn’tthinkabouttheseupfront,orifwewanttochangeourdecisions,itiseasytodoso.TheclaimthatNoSQLdatabasesareentirelyschemalessismisleading;whiletheystorethedata
withoutregardtotheschemathedataadheresto,thatschemahastobedefinedbytheapplication,becausethedatastreamhastobeparsedbytheapplicationwhenreadingthedatafromthedatabase.Also,theapplicationhastocreatethedatathatwouldbesavedinthedatabase.Iftheapplicationcannotparsethedatafromthedatabase,wehaveaschemamismatchevenif,insteadoftheRDBMSdatabasethrowingaerror,thiserrorisnowencounteredbytheapplication.Thus,eveninschemalessdatabases,theschemaofthedatahastobetakenintoconsiderationwhenrefactoringtheapplication.Schemachangesespeciallymatterwhenthereisadeployedapplicationandexistingproduction
data.Forthesakeofsimplicity,assumeweareusingadocumentdatastorelikeMongoDB[MongoDB]andwehavethesamedatamodelasbefore:customer,order,andorderItems.Clickheretoviewcodeimage
{"_id":"4BD8AE97C47016442AF4A580","customerid":99999,"name":"FooSushiInc","since":"12/12/2012","order":{"orderid":"4821-UXWE-122012","orderdate":"12/12/2001","orderItems":[{"product":"FortuneCookies","price":19.99}]}}
ApplicationcodetowritethisdocumentstructuretoMongoDB:Clickheretoviewcodeimage
BasicDBObjectorderItem=newBasicDBObject();orderItem.put("product",productName);orderItem.put("price",price);orderItems.add(orderItem);
Codetoreadthedocumentbackfromthedatabase:Clickheretoviewcodeimage
BasicDBObjectitem=(BasicDBObject)orderItem;StringproductName=item.getString("product");Doubleprice=item.getDouble("price");
ChangingtheobjectstoaddpreferredShippingTypedoesnotrequireanychangeinthedatabase,asthedatabasedoesnotcarethatdifferentdocumentsdonotfollowthesameschema.Thisallowsforfasterdevelopmentandeasydeployments.Allthatneedstobedeployedistheapplication—nochangesonthedatabasesideareneeded.ThecodehastomakesurethatdocumentsthatdonothavethepreferredShippingTypeattributecanstillbeparsed—andthat’sall.Ofcoursewearesimplifyingtheschemachangesituationhere.Let’slookattheschemachangewe
madebefore:introducingdiscountedPriceandrenamingpricetofullPrice.Tomakethischange,werenamethepriceattributetofullPriceandadddiscountedPriceattribute.ThechangeddocumentisClickheretoviewcodeimage
{"_id":"5BD8AE97C47016442AF4A580","customerid":66778,"name":"IndiaHouse","since":"12/12/2012","order":{"orderid":"4821-UXWE-222012","orderdate":"12/12/2001","orderItems":[{"product":"ChairCovers","fullPrice":29.99,"discountedPrice":26.99}]}}
Oncewedeploythischange,newcustomersandtheirorderscanbesavedandreadbackwithoutproblems,butforexistingordersthepriceoftheirproductcannotberead,becausenowthecodeislookingforfullPricebutthedocumenthasonlyprice.
12.3.1.IncrementalMigrationSchemamismatchtripsmanynewconvertstotheNoSQLworld.Whenschemaischangedontheapplication,wehavetomakesuretoconvertalltheexistingdatatothenewschema(dependingondatasize,thismightbeanexpensiveoperation).Anotheroptionwouldbetomakesurethatdata,beforetheschemachanged,canstillbeparsedbythenewcode,andwhenit’ssaved,itissavedbackinthenewschema.Thistechnique,knownasincrementalmigration,willmigratedataovertime;somedatamaynevergetmigrated,becauseitwasneveraccessed.WearereadingbothpriceandfullPricefromthedocument:Clickheretoviewcodeimage
BasicDBObjectitem=(BasicDBObject)orderItem;StringproductName=item.getString("product");DoublefullPrice=item.getDouble("price");if(fullPrice==null){fullPrice=item.getDouble("fullPrice");}DoublediscountedPrice=item.getDouble("discountedPrice");
Whenwritingthedocumentback,theoldattributepriceisnotsaved:Clickheretoviewcodeimage
BasicDBObjectorderItem=newBasicDBObject();orderItem.put("product",productName);orderItem.put("fullPrice",price);orderItem.put("discountedPrice",discountedPrice);orderItems.add(orderItem);
Whenusingincrementalmigration,therecouldbemanyversionsoftheobjectontheapplicationsidethatcantranslatetheoldschematothenewschema;whilesavingtheobjectback,itissavedusingthenewobject.Thisgradualmigrationofthedatahelpstheapplicationevolvefaster.Theincrementalmigrationtechniquewillcomplicatetheobjectdesign,especiallyasnewchanges
arebeingintroducedyetoldchangesarenotbeingtakenout.Thisperiodbetweenthechange
deploymentandthelastobjectinthedatabasemigratingtothenewschemaisknownasthetransitionperiod(Figure12.6).Keepitasshortaspossibleandfocusittotheminimumpossiblescope—thiswillhelpyoukeepyourobjectsclean.
Figure12.6.TransitionperiodofschemachangesTheincrementalmigrationtechniquecanalsobeimplementedwithaschema_versionfieldonthe
data,usedbytheapplicationtochoosethecorrectcodetoparsethedataintotheobjects.Whensaving,thedataismigratedtothelatestversionandtheschema_versionisupdatedtoreflectthat.Havingapropertranslationlayerbetweenyourdomainandthedatabaseisimportantsothat,asthe
schemachanges,managingmultipleversionsoftheschemaisrestrictedtothetranslationlayeranddoesnotleakintothewholeapplication.Mobileappscreatespecialrequirements.Sincewecannotenforcethelatestupgradesofthe
application,theapplicationshouldbeabletohandlealmostallversionsoftheschema.
12.3.2.MigrationsinGraphDatabasesGraphdatabaseshaveedgesthathavetypesandproperties.Ifyouchangethetypeoftheseedgesinthecodebase,younolongercantraversethedatabase,renderingitunusable.Togetaroundthis,youcantraversealltheedgesandchangethetypeofeachedge.Thisoperationcanbeexpensiveandrequiresyoutowritecodetomigratealltheedgesinthedatabase.Ifweneedtomaintainbackwardcompatibilityordonotwanttochangethewholegraphinonego,
wecanjustcreatenewedgesbetweenthenodes;laterwhenwearecomfortableaboutthechange,theoldedgescanbedropped.Wecanusetraversalswithmultipleedgetypestotraversethegraphusingthenewandoldedgetypes.Thistechniquemayhelpagreatdealwithlargedatabases,especiallyifwewanttomaintainhighavailability.Ifwehavetochangepropertiesonallthenodesoredges,wehavetofetchallthenodesandchange
allthepropertiesthatneedtobechanged.AnexamplewouldbeaddingNodeCreatedByandNodeCreatedOntoallexistingnodestotrackthechangesbeingmadetoeachnode.Clickheretoviewcodeimage
for(Nodenode:database.getAllNodes()){node.setProperty("NodeCreatedBy",getSystemUser());node.setProperty("NodeCreatedOn",getSystemTimeStamp());
}
Wemayhavetochangethedatainthenodes.Newdatamaybederivedfromtheexistingnodedata,oritcouldbeimportedfromsomeothersource.Themigrationcanbedonebyfetchingallnodesusinganindexprovidedbythesourceofdataandwritingrelevantdatatoeachnode.
12.3.3.ChangingAggregateStructureSometimesyouneedtochangetheschemadesign,forexamplebysplittinglargeobjectsintosmalleronesthatarestoredindependently.Supposeyouhaveacustomeraggregatethatcontainsallthecustomersorders,andyouwanttoseparatethecustomerandeachoftheirordersintodifferentaggregateunits.Youthenhavetoensurethatthecodecanworkwithbothversionsoftheaggregates.Ifitdoesnot
findtheoldobjects,itwilllookforthenewaggregates.Codethatrunsinthebackgroundcanreadoneaggregateatatime,makethenecessarychange,and
savethedatabackintodifferentaggregates.Theadvantageofoperatingononeaggregateatatimeisthatthisway,you’renotaffectingdataavailabilityfortheapplication.
12.4.FurtherReadingFormoreonmigrationswithrelationaldatabases,see[AmblerandSadalage].Althoughmuchofthiscontentisspecifictorelationalwork,thegeneralprinciplesinmigrationwillalsoapplytootherdatabases.
12.5.KeyPoints•Databaseswithstrongschemas,suchasrelationaldatabases,canbemigratedbysavingeachschemachange,plusitsdatamigration,inaversion-controlledsequence.
•Schemalessdatabasesstillneedcarefulmigrationduetotheimplicitschemainanycodethataccessesthedata.
•Schemalessdatabasescanusethesamemigrationtechniquesasdatabaseswithstrongschemas.•Schemalessdatabasescanalsoreaddatainawaythat’stoleranttochangesinthedata’simplicitschemaanduseincrementalmigrationtoupdatedata.
Chapter13.PolyglotPersistence
Differentdatabasesaredesignedtosolvedifferentproblems.Usingasingledatabaseengineforalloftherequirementsusuallyleadstonon-performantsolutions;storingtransactionaldata,cachingsessioninformation,traversinggraphofcustomersandtheproductstheirfriendsboughtareessentiallydifferentproblems.EvenintheRDBMSspace,therequirementsofanOLAPandOLTPsystemareverydifferent—nonetheless,theyareoftenforcedintothesameschema.Let’sthinkofdatarelationships.RDBMSsolutionsaregoodatenforcingthatrelationshipsexist.If
wewanttodiscoverrelationships,orhavetofinddatafromdifferenttablesthatbelongtothesameobject,thentheuseofRDBMSstartsbeingdifficult.Databaseenginesaredesignedtoperformcertainoperationsoncertaindatastructuresanddata
amountsverywell—suchasoperatingonsetsofdataorastoreandretrievingkeysandtheirvaluesreallyfast,orstoringrichdocumentsorcomplexgraphsofinformation.
13.1.DisparateDataStorageNeedsManyenterprisestendtousethesamedatabaseenginetostorebusinesstransactions,sessionmanagementdata,andforotherstorageneedssuchasreporting,BI,datawarehousing,orlogginginformation(Figure13.1).
Figure13.1.UseofRDBMSforeveryaspectofstoragefortheapplicationThesession,shoppingcart,ororderdatadonotneedthesamepropertiesofavailability,
consistency,orbackuprequirements.Doessessionmanagementstorageneedthesamerigorousbackup/recoverystrategyasthee-commerceordersdata?Doesthesessionmanagementstorageneedmoreavailabilityofaninstanceofdatabaseenginetowrite/readsessiondata?In2006,NealFordcoinedthetermpolyglotprogramming,toexpresstheideathatapplications
shouldbewritteninamixoflanguagestotakeadvantageofthefactthatdifferentlanguagesaresuitablefortacklingdifferentproblems.Complexapplicationscombinedifferenttypesofproblems,sopickingtherightlanguageforeachjobmaybemoreproductivethantryingtofitallaspectsintoasinglelanguage.
Similarly,whenworkingonane-commercebusinessproblem,usingadatastorefortheshoppingcartwhichishighlyavailableandcanscaleisimportant,butthesamedatastorecannothelpyoufindproductsboughtbythecustomers’friends—whichisatotallydifferentquestion.Weusethetermpolyglotpersistencetodefinethishybridapproachtopersistence.
13.2.PolyglotDataStoreUsageLet’stakeoure-commerceexampleandusethepolyglotpersistenceapproachtoseehowsomeofthesedatastorescanbeapplied(Figure13.2).Akey-valuedatastorecouldbeusedtostoretheshoppingcartdatabeforetheorderisconfirmedbythecustomerandalsostorethesessiondatasothattheRDBMSisnotusedforthistransientdata.Key-valuestoresmakesenseheresincetheshoppingcartisusuallyaccessedbyuserIDand,onceconfirmedandpaidbythecustomer,canbesavedintheRDBMS.Similarly,sessiondataiskeyedbythesessionID.
Figure13.2.Useofkey-valuestorestooffloadsessionandshoppingcartdatastorageIfweneedtorecommendproductstocustomerswhentheyplaceproductsintotheirshoppingcarts
—forexample,“yourfriendsalsoboughttheseproducts”or“yourfriendsboughttheseaccessoriesforthisproduct”—thenintroducingagraphdatastoreinthemixbecomesrelevant(Figure13.3).
Figure13.3.Exampleimplementationofpolyglotpersistence
Itisnotnecessaryfortheapplicationtouseasingledatastoreforallofitsneeds,sincedifferentdatabasesarebuiltfordifferentpurposesandnotallproblemscanbeelegantlysolvedbyasingedatabase.Evenusingspecializedrelationaldatabasesfordifferentpurposes,suchasdatawarehousing
appliancesoranalyticsapplianceswithinthesameapplication,canbeviewedaspolyglotpersistence.
13.3.ServiceUsageoverDirectDataStoreUsageAswemovetowardsmultipledatastoresintheapplication,theremaybeotherapplicationsintheenterprisethatcouldbenefitfromtheuseofourdatastoresorthedatastoredinthem.Usingourexample,thegraphdatastorecanservedatatootherapplicationsthatneedtounderstand,forexample,whichproductsarebeingboughtbyacertainsegmentofthecustomerbase.Insteadofeachapplicationtalkingindependentlytothegraphdatabase,wecanwrapthegraph
databaseintoaservicesothatallrelationshipsbetweenthenodescanbesavedinoneplaceandqueriedbyalltheapplications(Figure13.4).ThedataownershipandtheAPIsprovidedbytheservicearemoreusefulthanasingleapplicationtalkingtomultipledatabases.
Figure13.4.ExampleimplementationofwrappingdatastoresintoservicesThephilosophyofservicewrappingcanbetakenfurther:Youcouldwrapalldatabasesinto
services,lettingtheapplicationtoonlytalktoabunchofservices(Figure13.5).Thisallowsforthedatabasesinsidetheservicestoevolvewithoutyouhavingtochangethedependentapplications.
Figure13.5.UsingservicesinsteadoftalkingtodatabasesManyNoSQLdatastoreproducts,suchasRiak[Riak]andNeo4J[Neo4J],actuallyprovideout-of-
the-boxRESTAPI’s.
13.4.ExpandingforBetterFunctionalityOften,wecannotreallychangethedatastorageforaspecificusagetosomethingdifferent,becauseoftheexistinglegacyapplicationsandtheirdependencyonexistingdatastorage.Wecan,however,addfunctionalitysuchascachingforbetterperformance,oruseindexingenginessuchasSolr[Solr]sothatsearchcanbemoreefficient(Figure13.6).Whentechnologieslikethisareintroduced,wehavetomakesuredataissynchronizedbetweenthedatastoragefortheapplicationandthecacheorindexingengine.
Figure13.6.UsingsupplementalstoragetoenhancelegacystorageWhiledoingthis,weneedtoupdatetheindexeddataasthedataintheapplicationdatabasechanges.
Theprocessofupdatingthedatacanbereal-timeorbatch,aslongasweensurethattheapplication
candealwithstaledataintheindex/searchengine.Theeventsourcing(“EventSourcing,”p.142)patterncanbeusedtoupdatetheindex.
13.5.ChoosingtheRightTechnologyThereisarichchoiceofdatastoragesolutions.Initially,thependulumhadshiftedfromspecialitydatabasestoasingleRDBMSdatabasewhichallowsalltypesofdatamodelstobestored,althoughwithsomeabstraction.Thetrendisnowshiftingbacktousingthedatastoragethatsupportstheimplementationofsolutionsnatively.Ifwewanttorecommendproductstocustomersbasedonwhat’sintheirshoppingcartsandwhich
otherproductswereboughtbycustomerswhoboughtthoseproducts,itcanbeimplementedinanyofthedatastoresbypersistingthedatawiththecorrectattributestoanswerourquestions.Thetrickistousetherighttechnology,sothatwhenthequestionschange,theycanstillbeaskedwiththesamedatastorewithoutlosingexistingdataorchangingitintonewformats.Let’sgobacktoournewfeatureneed.WecanuseRDBMStosolvethisusingahierarchalquery
andmodelingthetablesaccordingly.Whenweneedtochangethetraversal,wewillhavetorefactorthedatabase,migratethedata,andstartpersistingnewdata.Instead,ifwehadusedadatastorethattracksrelationsbetweennodes,wecouldhavejustprogrammedthenewrelationsandkeepusingthesamedatastorewithminimalchanges.
13.6.EnterpriseConcernswithPolyglotPersistenceIntroductionofNoSQLdatastoragetechnologieswillforcetheenterpriseDBAstothinkabouthowtousethenewstorage.TheenterpriseisusedtohavinguniformRDBMSenvironments;whateveristhedatabaseanenterprisestartsusingfirst,chancesarethatovertheyearsallitsapplicationswillbebuiltaroundthesamedatabase.Inthisnewworldofpolyglotpersistence,theDBAgroupswillhavetobecomemorepoly-skilled—tolearnhowsomeoftheseNoSQLtechnologieswork,howtomonitorthesesystems,backthemup,andtakedataoutofandputintothesesystems.OncetheenterprisedecidestouseanyNoSQLtechnology,issuessuchaslicensing,support,tools,
upgrades,drivers,auditing,andsecuritycomeup.ManyNoSQLtechnologiesareopen-sourceandhaveanactivecommunityofsupporters;also,therearecompaniesthatprovidecommercialsupport.Thereisnotarichecosystemoftools,butthetoolvendorsandtheopen-sourcecommunityarecatchingup,releasingtoolssuchasMongoDBMonitoringService[Monitoring],DatastaxOpsCenter[OpsCenter],orRekonbrowserforRiak[Rekon].Oneotherareathatenterprisesareconcernedaboutissecurityofthedata—theabilitytocreate
usersandassignprivilegestoseeornotseedataatthedatabaselevel.MostoftheNoSQLdatabasesdonothaveveryrobustsecurityfeatures,butthat’sbecausetheyaredesignedtooperatedifferently.IntraditionalRDBMS,datawasservedbythedatabaseandwecouldgettothedatabaseusinganyquerytools.WiththeNoSQLdatabases,therearequerytoolsaswellbuttheideaisfortheapplicationtoownthedataandserveitusingservices.Withthisapproach,theresponsibilityforthesecuritylieswiththeapplication.Havingsaidthat,thereareNoSQLtechnologiesthatintroducesecurityfeatures.Enterprisesoftenhavedatawarehousesystems,BI,andanalyticssystemsthatmayneeddatafrom
thepolyglotdatasources.EnterpriseswillhavetoensurethattheETLtoolsoranyothermechanismtheyareusingtomovedatafromsourcesystemstothedatawarehousecanreaddatafromtheNoSQLdatastore.TheETLtoolvendorsarecomingoutwithhavetheabilitytotalktoNoSQLdatabases;forexample,Pentaho[Pentaho]cantalktoMongoDBandCassandra.Everyenterpriserunsanalyticsofsomesort.Asthesheervolumeofdatathatneedstobecaptured
increases,enterprisesarestrugglingtoscaletheirRDBMSsystemstowriteallthisdatatothedatabases.AhugenumberofwritesandtheneedtoscaleforwritesareagreatusecaseforNoSQLdatabasesthatallowyoutowritelargevolumesofdata.
13.7.DeploymentComplexityOncewestartdownthepathofusingpolyglotpersistenceintheapplication,deploymentcomplexityneedscarefulconsideration.Theapplicationnowneedsalldatabasesinproductionatthesametime.YouwillneedtohavethesedatabasesinyourUAT,QA,andDevenvironments.AsmostoftheNoSQLproductsareopen-source,therearefewlicensecostramifications.Theyalsosupportautomationofinstallationandconfiguration.Forexample,toinstalladatabase,allthatneedstobedoneisdownloadandunzipthearchive,whichcanbeautomatedusingcurlandunzipcommands.Theseproductsalsohavesensibledefaultsandcanbestartedwithminimumconfiguration.
13.8.KeyPoints•Polyglotpersistenceisaboutusingdifferentdatastoragetechnologiestohandlevaryingdatastorageneeds.
•Polyglotpersistencecanapplyacrossanenterpriseorwithinasingleapplication.•Encapsulatingdataaccessintoservicesreducestheimpactofdatastoragechoicesonotherpartsofasystem.
•Addingmoredatastoragetechnologiesincreasescomplexityinprogrammingandoperations,sotheadvantagesofagooddatastoragefitneedtobeweighedagainstthiscomplexity.
Chapter14.BeyondNoSQL
TheappearanceofNoSQLdatabaseshasdoneagreatdealtoshakeupandopenuptheworldofdatabases,butwethinkthekindofNoSQLdatabaseswehavediscussedhereisonlypartofthepictureofpolyglotpersistence.Soitmakessensetospendsometimediscussingsolutionsthatdon’teasilyfitintotheNoSQLbucket.
14.1.FileSystemsDatabasesareverycommon,butfilesystemsarealmostubiquitous.Inthelastcoupleofdecadesthey’vebeenwidelyusedforpersonalproductivitydocuments,butnotforenterpriseapplications.Theydon’tadvertiseanyinternalstructure,sotheyaremorelikekey-valuestoreswithahierarchickey.Theyalsoprovidelittlecontroloverconcurrencyotherthansimplefilelocking—whichitselfissimilartothewayNoSQLonlyprovideslockingwithinasingleaggregate.Filesystemshavetheadvantageofbeingsimpleandwidelyimplemented.Theycopewellwithvery
largeentities,suchasvideoandaudio.Often,databasesareusedtoindexmediaassetsstoredinfiles.Filesalsoworkverywellforsequentialaccess,suchasstreaming,whichcanbehandyfordatawhichisappend-only.Recentattentiontoclusteredenvironmentshasseenariseofdistributedfilesystems.Technologies
liketheGoogleFileSystemandHadoop[Hadoop]providesupportforreplicationoffiles.Muchofthediscussionofmap-reduceisaboutmanipulatinglargefilesonclustersystems,withtoolsforautomaticsplittingoflargefilesintosegmentstobeprocessedonmultiplenodes.IndeedacommonentrypathintoNoSQLisfromorganizationsthathavebeenusingHadoop.Filesystemsworkbestforarelativelysmallnumberoflargefilesthatcanbeprocessedinbig
chunks,preferablyinastreamingstyle.Largenumbersofsmallfilesgenerallyperformbadly—thisiswhereadatastorebecomesmoreefficient.FilesalsoprovidenosupportforquerieswithoutadditionalindexingtoolssuchasSolr[Solr].
14.2.EventSourcingEventsourcingisanapproachtopersistencethatconcentratesonpersistingallthechangestoapersistentstate,ratherthanpersistingthecurrentapplicationstateitself.It’sanarchitecturalpatternthatworksquitewellwithmostpersistencetechnologies,includingrelationaldatabases.Wementionitherebecauseitalsounderpinssomeofthemoreunusualwaysofthinkingaboutpersistence.Consideranexampleofasystemthatkeepsalogofthelocationofships(Figure14.1).Ithasa
simpleshiprecordthatkeepsthenameoftheshipanditscurrentlocation.Intheusualwayofthinking,whenwehearthattheshipKingRoyhasarrivedinSanFrancisco,wechangethevalueofKingRoy’slocationfieldtoSanFrancisco.Lateron,wehearit’sdeparted,sowechangeittoatsea,changingitagainonceweknowit’sarrivedinHongKong.
Figure14.1.Inatypicalsystem,noticeofachangecausesanupdatetotheapplication’sstate.Withanevent-sourcedsystem,thefirststepistoconstructaneventobjectthatcapturesthe
informationaboutthechange(Figure14.2).Thiseventobjectisstoredinadurableeventlog.Finally,weprocesstheeventinordertoupdatetheapplication’sstate.
Figure14.2.Witheventsourcing,thesystemstoreseachevent,togetherwiththederivedapplicationstate.
Asaconsequence,inanevent-sourcedsystemwestoreeveryeventthat’scausedastatechangeofthesystemintheeventlog,andtheapplication’sstateisentirelyderivablefromthiseventlog.Atanytime,wecansafelythrowawaytheapplicationstateandrebuilditfromtheeventlog.Intheory,eventlogsareallyouneedbecauseyoucanalwaysrecreatetheapplicationstate
wheneveryouneeditbyreplayingtheeventlog.Inpractice,thismaybetooslow.Asaresult,it’s
usuallybesttoprovidetheabilitytostoreandrecreatetheapplicationstateinasnapshot.Asnapshotisdesignedtopersistthememoryimageoptimizedforrapidrecoveryofthestate.Itisanoptimizationaid,soitshouldnevertakeprecedenceovertheeventlogforauthorityonthedata.Howfrequentlyyoutakeasnapshotdependsonyouruptimeneeds.Thesnapshotdoesn’tneedtobe
completelyuptodate,asyoucanrebuildmemorybyloadingthelatestsnapshotandthenreplayingalleventsprocessedsincethatsnapshotwastaken.Anexampleapproachwouldbetotakeasnapshoteverynight;shouldthesystemgodownduringtheday,you’dreloadlastnight’ssnapshotfollowedbytoday’sevents.Ifyoucandothatquicklyenough,allwillbefine.Togetafullrecordofeverychangeinyourapplicationstate,youneedtokeeptheeventloggoing
backtothebeginningoftimeforyourapplication.Butinmanycasessuchalong-livedrecordisn’tnecessary,asyoucanfoldoldereventsintoasnapshotandonlyusetheeventlogafterthedateofthesnapshot.Usingeventsourcinghasanumberofadvantages.Youcanbroadcasteventstomultiplesystems,
eachofwhichcanbuildadifferentapplicationstatefordifferentpurposes(Figure14.3).Forread-intensivesystems,youcanprovidemultiplereadnodes,withpotentiallydifferentschemas,whileconcentratingthewritesonadifferentprocessingsystem(anapproachbroadlyknownasCQRS[CQRS]).
Figure14.3.Eventscanbebroadcasttomultipledisplaysystems.Eventsourcingisalsoaneffectiveplatformforanalyzinghistoricinformation,sinceyoucan
replicateanypaststateintheeventlog.Youcanalsoeasilyinvestigatealternativescenariosbyintroducinghypotheticaleventsintoananalysisprocessor.Eventsourcingdoesaddsomecomplexity—mostnotably,youhavetoensurethatallstatechanges
arecapturedandstoredasevents.Somearchitecturesandtoolscanmakethatinconvenient.Any
collaborationwithexternalsystemsneedstotaketheeventsourcingintoaccount;you’llneedtobecarefulofexternalsideeffectswhenreplayingeventstorebuildanapplicationstate.
14.3.MemoryImageOnetheconsequencesofeventsourcingisthattheeventlogbecomesthedefinitivepersistentrecord—butitisnotnecessaryfortheapplicationstatetobepersistent.Thisopensuptheoptionofkeepingtheapplicationstateinmemoryusingonlyin-memorydatastructures.Keepingallyourworkingdatainmemoryprovidesaperformanceadvantage,sincethere’snodiskI/Otodealwithwhenaneventisprocessed.Italsosimplifiesprogrammingsincethereisnoneedtoperformmappingbetweendiskandin-memorydatastructures.Theobviouslimitationhereisthatyoumustbeabletostoreallthedatayou’llneedtoaccessin
memory.Thisisanincreasinglyviableoption—wecanrememberdisksizesthatwereconsiderablylessthanthecurrentmemorysizes.Youalsoneedtoensurethatyoucanrecoverquicklyenoughfromasystemcrash—eitherbyreloadingeventsfromtheeventlogorbyrunningaduplicatesystemandcuttingover.You’llneedsomeexplicitmechanismtodealwithconcurrency.Onerouteisatransactional
memorysystem,suchastheonethatcomeswiththeClojurelanguage.Anotherrouteistodoallinputprocessingonasinglethread.Designedcarefully,asingle-threadedeventprocessorcanachieveimpressivethroughputatlowlatency[Fowlerlmax].Breakingtheseparationbetweenin-memoryandpersistentdataalsoaffectshowyouhandleerrors.
Acommonapproachistoupdateamodelandrollbackanychangesshouldanerroroccur.Withamemoryimage,you’llusuallynothaveanautomatedrollbackfacility;youeitherhavetowriteyourown(complicated)orensurethatyoudothoroughvalidationbeforeyoubegintoapplyanychanges.
14.4.VersionControlFormostsoftwaredevelopers,theirmostcommonexperienceofanevent-sourcedsystemisaversioncontrolsystem.Versioncontrolallowsmanypeopleonateamtocoordinatetheirmodificationsofacomplexinterconnectedsystem,withtheabilitytoexplorepaststatesofthatsystemandalternativerealitiesthroughbranching.Whenwethinkofdatastorage,wetendtothinkofasingle-point-of-timeworldview,whichisvery
limitingcomparedtothecomplexitysupportedbyaversioncontrolsystem.It’sthereforesurprisingthatdatastoragetoolshaven’tborrowedsomeoftheideasfromversioncontrolsystems.Afterall,manysituationsrequirehistoricqueriesandsupportformultipleviewsoftheworld.Versioncontrolsystemsarebuiltontopoffilesystems,andthushavemanyofthesamelimitations
fordatastorageasafilesystem.Theyarenotdesignedforapplicationdatastorage,soareawkwardtouseinthatcontext.However,theyareworthconsideringforscenarioswheretheirtimelinecapabilitiesareuseful.
14.5.XMLDatabasesAroundtheturnofthemillennium,peopleseemedtowanttouseXMLforeverything,andtherewasaflurryofinterestindatabasesspecificallydesignedtostoreandqueryXMLdocuments.Whilethatflurryhadaslittleimpactontherelationaldominanceaspreviousblusters,XMLdatabasesarestillaround.WethinkofXMLdatabasesasdocumentdatabaseswherethedocumentsarestoredinadatamodel
compatiblewithXML,andwherevariousXMLtechnologiesareusedtomanipulatethedocument.
YoucanusevariousformsofXMLschemadefinitions(DTDs,XMLSchema,RelaxNG)tocheckdocumentformats,runquerieswithXPathandXQuery,andperformtransformationswithXSLT.RelationaldatabasestookonXMLandblendedtheseXMLcapabilitieswithrelationalones,usually
byembeddingXMLdocumentsasacolumntypeandallowingsomewaytoblendSQLandXMLquerylanguages.Ofcoursethere’snoreasonwhyyoucan’tuseXMLasastructuringmechanismwithinakey-value
store.XMLislessfashionablethesedaysthanJSON,butisequallycapableofstoringcomplexaggregates,andXML’sschemaandquerycapabilitiesaregreaterthanwhatyoucantypicallygetforJSON.UsinganXMLdatabasemeansthatthedatabaseitselfisabletotakeadvantageoftheXMLstructureandnotjusttreatthevalueasablob,butthatadvantageneedstobeweighedwiththeotherdatabasecharacteristics.
14.6.ObjectDatabasesWhenobject-orientedprogrammingstarteditsriseinpopularity,therewasaflurryofinterestinobject-orienteddatabases.Thefocusherewasthecomplexityofmappingfromin-memorydatastructurestorelationaltables.Theideaofanobject-orienteddatabaseisthatyouavoidthiscomplexity—thedatabasewouldautomaticallymanagethestorageofin-memorystructuresontodisk.Youcouldthinkofitasapersistentvirtualmemorysystem,allowingyoutoprogramwithpersistenceyetwithouttakinganynoticeofadatabaseatall.Objectdatabasesdidn’ttakeoff.Onereasonwasthatthebenefitofthecloseintegrationwiththe
applicationmeantyoucouldn’teasilyaccessdataotherthanwiththatapplication.Ashiftfromintegrationdatabasestoapplicationdatabasescouldwellmakeobjectdatabasesmoreviableinthefuture.Animportantissuewithobjectdatabasesishowtodealwithmigrationasthedatastructures
change.Here,thecloselinkagebetweenthepersistentstorageandin-memorystructurescanbecomeaproblem.Someobjectdatabasesincludetheabilitytoaddmigrationfunctionstoobjectdefinitions.
14.7.KeyPoints•NoSQLisjustonesetofdatastoragetechnologies.Astheyincreasecomfortwithpolyglotpersistence,weshouldconsiderotherdatastoragetechnologieswhetherornottheybeartheNoSQLlabel.
Chapter15.ChoosingYourDatabase
Atthispointinthebook,we’vecoveredalotofthegeneralissuesyouneedtobeawareoftomakedecisionsinthenewworldofpolyglotpersistence.It’snowtimetotalkaboutchoosingyourdatabasesforfuturedevelopmentwork.Naturally,wedon’tknowyourparticularcircumstances,sowecan’tgiveyouyouranswer,norcanwereduceittoasimplesetofrulestofollow.Furthermore,it’sstillearlydaysintheproductionuseofNoSQLsystems,soevenwhatwedoknowisimmature—inacoupleofyearswemaywellthinkdifferently.WeseetwobroadreasonstoconsideraNoSQLdatabase:programmerproductivityanddataaccess
performance.Indifferentcasestheseforcesmaycomplementorcontradicteachother.Bothofthemaredifficulttoassessearlyoninaproject,whichisawkwardsinceyourchoiceofadatastoragemodelisdifficulttoabstractsoastoallowyoutochangeyourmindlateron.
15.1.ProgrammerProductivityTalktoanydeveloperofanenterpriseapplication,andyou’llsensefrustrationfromworkingwithrelationaldatabases.Informationisusuallycollectedanddisplayedintermsofaggregates,butithastobetransformedintorelationsinordertopersistit.Thischoreiseasierthanitusedtobe;duringthe1990smanyprojectsgroanedundertheeffortofbuildingobject-relationalmappinglayers.Bythe2000s,we’veseenpopularORMframeworkssuchasHibernate,iBATIS,andRailsActiveRecordthatreducemuchofthatburden.Butthishasnotmadetheproblemgoaway.ORMsarealeakyabstraction,therearealwayssomecasesthatneedmoreattention—particularlyinordertogetdecentperformance.Inthissituationaggregate-orienteddatabasescanofferatemptingdeal.WecanremovetheORM
andpersistaggregatesnaturallyasweusethem.We’vecomeacrossseveralprojectsthatclaimpalpablebenefitsfrommovingtoanaggregate-orientedsolution.Graphdatabasesofferadifferentsimplification.Relationaldatabasesdonotdoagoodjobwith
datathathasalotofrelationships.AgraphdatabaseoffersbothamorenaturalstorageAPIforthiskindofdataandquerycapabilitiesdesignedaroundthesekindsofstructures.AllkindsofNoSQLsystemsarebettersuitedtononuniformdata.Ifyoufindyourselfstruggling
withastrongschemainordertosupportad-hocfields,thentheschemalessNoSQLdatabasescanofferconsiderablerelief.ThesearethemajorreasonswhytheprogrammingmodelofNoSQLdatabasesmayimprovethe
productivityofyourdevelopmentteam.Thefirststepofassessingthisforyourcircumstancesistolookatwhatyoursoftwarewillneedtodo.Runthroughthecurrentfeaturesandseeifandhowthedatausagefits.Asyoudothis,youmaybegintoseethataparticulardatamodelseemslikeagoodfit.Thatclosenessoffitsuggeststhatusingthatmodelwillleadtoeasierprogramming.Asyoudothis,rememberthatpolyglotpersistenceisaboutusingmultipledatastoragesolutions.It
maybethatyou’llseedifferentdatastoragemodelsfitdifferentpartsofyourdata.Thiswouldsuggestusingdifferentdatabasesfordifferentaspectsofyourdata.Usingmultipledatabasesisinherentlymorecomplexthanusingasinglestore,buttheadvantagesofagoodfitineachcasemaybebetteroverall.Asyoulookatthedatamodelfit,payparticularattentiontocaseswherethereisaproblem.You
mayseemostofyourfeatureswillworkwellwithanaggregate,butafewwillnot.Havingafewfeaturesthatdon’tfitthemodelwellisn’tareasontoavoidthemodel—thedifficultiesofthebadfit
maynotoverwhelmtheadvantagesofthegoodfit—butit’susefultospotandhighlightthesebadfitcases.Goingthroughyourfeaturesandassessingyourdataneedsshouldleadyoutooneormore
alternativesforhowtohandleyourdatabaseneeds.Thiswillgiveyouastartingpoint,butthenextstepistotrythingsoutbyactuallybuildingsoftware.Takesomeinitialfeaturesandbuildthem,whilepayingcloseattentiontohowstraightforwarditistousethetechnologyyou’reconsidering.Inthissituation,itmaybeworthwhiletobuildthesamefeatureswithacoupleofdifferentdatabasesinordertoseewhichworksbest.Peopleareoftenreluctanttodothis—noonelikestobuildsoftwarethatwillbediscarded.Yetthisisanessentialwaytojudgehoweffectiveaparticularframeworkis.Sadly,thereisnowaytoproperlymeasurehowproductivedifferentdesignsare.Wehavenoway
ofproperlymeasuringoutput.Evenifyoubuildexactlythesamefeature,youcan’ttrulycomparetheproductivitybecauseknowledgeofbuildingitoncemakesiteasierasecondtime,andyoucan’tbuildthemsimultaneouslywithidenticalteams.Whatyoucandoisensurethepeoplewhodidtheworkcangiveanopinion.Mostdeveloperscansensewhentheyaremoreproductiveinoneenvironmentthananother.Althoughthisisasubjectivejudgment,andyoumaywellgetdisagreementsbetweenteammembers,thisisthebestjudgmentyouwillget.Intheendwebelievetheteamdoingtheworkshoulddecide.Whentryingoutadatabasetojudgeproductivity,it’simportanttoalsotryoutsomeofthebadfit
caseswementionedearlier.Thatwaytheteamcangetafeelingofboththehappypathandthedifficultone,togainanoverallimpression.Thisapproachhasitsflaws.Oftenyoucan’tgetafullappreciationofatechnologywithout
spendingmanymonthsusingit—andrunninganassessmentforthatlongisrarelycost-effective.Butlikemanythingsinlife,weneedtomakethebestassessmentwecan,knowingitsflaws,andgowiththat.Theessentialthinghereistobasethedecisiononasmuchrealprogrammingasyoucan.Evenamereweekworkingwithatechnologycantellyouthingsyou’dneverlearnfromahundredvendorpresentations.
15.2.Data-AccessPerformanceTheconcernthatledtothegrowthofNoSQLdatabaseswasrapidaccesstolotsofdata.Aslargewebsitesemerged,theywantedtogrowhorizontallyandrunonlargeclusters.TheydevelopedtheearlyNoSQLdatabasestohelpthemrunefficientlyonsucharchitectures.Asotherdatausersfollowtheirlead,againthefocusisonaccessingdatarapidly,oftenwithlargevolumesinvolved.Therearemanyfactorsthatcandetermineadatabase’sbetterperformancethantherelational
defaultinvariouscircumstances.Aaggregate-orienteddatabasemaybeveryfastforreadingorretrievingaggregatescomparedtoarelationaldatabasewheredataisspreadovermanytables.Easiershardingandreplicationoverclustersallowshorizontalscaling.Agraphdatabasecanretrievehighlyconnecteddatamorequicklythanusingrelationaljoins.Ifyou’reinvestigatingNoSQLdatabasesbasedonperformance,themostimportantthingyoumust
doistotesttheirperformanceinthescenariosthatmattertoyou.Reasoningabouthowadatabasemayperformcanhelpyoubuildashortlist,buttheonlywayyoucanassessperformanceproperlyistobuildsomething,runit,andmeasureit.Whenbuildingaperformanceassessment,thehardestthingisoftengettingarealisticsetof
performancetests.Youcan’tbuildyouractualsystem,soyouneedtobuildarepresentativesubset.It’simportant,however,forthissubsettobeasfaithfularepresentativeaspossible.It’snogoodtakingadatabasethat’sintendedtoservehundredsofconcurrentusersandassessingitsperformance
withasingleuser.Youaregoingtoneedtobuildrepresentativeloadsanddatavolumes.Particularlyifyouarebuildingapublicwebsite,itcanbedifficulttobuildahigh-loadtestbed.
Here,agoodargumentcanbemadeforusingcloudcomputingresourcesbothtogenerateloadandtobuildatestcluster.Theelasticnatureofcloudprovisioningisveryhelpfulforshort-livedperformanceassessmentwork.You’renotgoingtobeabletotesteverywayinwhichyourapplicationwillbeused,soyouneedto
buildarepresentativesubset.Choosescenariosthatarethemostcommon,themostperformance-dependent,andthosethatdon’tseemtofityourdatabasemodelwell.Thelattermayalertyoutoanyrisksoutsideofyourmainusecases.Comingupwithvolumestotestforcanbetricky,especiallyearlyoninaprojectwhenit’snot
clearwhatyourproductionvolumesarelikelytobe.Youwillhavetocomeupwithsomethingtobaseyourthinkingon,sobesuretomakeitexplicitandtocommunicateitwithallthestakeholders.Makingitexplicitreducesthechancethatdifferentpeoplehavevaryingideasonwhata“heavyreadload”is.Italsoallowsyoutospotproblemsmoreeasilyshouldyourlaterdiscoverieswanderoffyouroriginalassumptions.Withoutmakingyourassumptionsexplicit,it’seasiertodriftawayfromthemwithoutrealizingyouneedtoredoyourtestbedasyoulearnnewinformation.
15.3.StickingwiththeDefaultNaturallywethinkthatNoSQLisaviableoptioninmanycircumstances—otherwisewewouldn’thavespentseveralmonthswritingthisbook.Butwealsorealizethattherearemanycases,indeedthemajorityofcases,whereyou’rebetteroffstickingwiththedefaultoptionofarelationaldatabase.Relationaldatabasesarewellknown;youcaneasilyfindpeoplewiththeexperienceofusingthem.
Theyaremature,soyouarelesslikelytorunintotheroughedgesofnewtechnology.Therearelotsoftoolsthatarebuiltonrelationaltechnologythatyoucantakeadvantageof.Youalsodon’thavetodealwiththepoliticalissuesofmakinganunusualchoice—pickinganewtechnologywillalwaysintroduceariskofproblemsshouldthingsrunintodifficulties.So,onthewhole,wetendtotakeaviewthattochooseaNoSQLdatabaseyouneedtoshowareal
advantageoverrelationaldatabasesforyoursituation.There’snoshameindoingtheassessmentsforprogrammabilityandperformance,findingnoclearadvantage,andstayingwiththerelationaloption.WethinktherearemanycaseswhereitisadvantageoustouseNoSQLdatabases,but“many”doesnotmean“all”oreven“most.”
15.4.HedgingYourBetsOneofthegreatestdifficultieswehaveingivingadviceonchoosingadata-storageoptionisthatwedon’thavethatmuchdatatogoon.Aswewritethis,weareonlyseeingveryearlyadoptersdiscussingtheirexperienceswiththesetechnologies,sowedon’thaveaclearpictureoftheactualprosandcons.Withthesituationthisuncertain,there’smoreofanargumentforencapsulatingyourdatabase
choice—keepingallyourdatabasecodeinasectionofyourcodebasethatisrelativelyeasytoreplaceshouldyoudecidetochangeyourdatabasechoicelater.Theclassicwaytodothisisthroughanexplicitdatastorelayerinyourapplication—usingpatternssuchasDataMapperandRepository[FowlerPoEAA].Suchanencapsulationlayerdoescarryacost,particularlywhenyouareunsureaboutusingquitedifferentmodels,suchaskey-valueversusgraphdatamodels.Worsestill,wedon’thaveexperienceyetwithencapsulatingdatalayersbetweentheseverydifferentkindsofdatastores.Onthewhole,ouradviceistoencapsulateasadefaultstrategy,butpayattentiontothecostof
insulatinglayer.Ifit’sgettingtoomuchofaburden,forexamplebymakingithardertousesomehelpfuldatabasefeatures,thenit’sagoodargumentforusingthedatabasethathasthosefeatures.Thisinformationmaybejustwhatyouneedtomakeadatabasechoiceandthuseliminatetheencapsulation.Thisisanotherargumentfordecomposingthedatabaselayerintoservicesthatencapsulatedata
storage(“ServiceUsageoverDirectDataStoreUsage,”p.136).Aswellasreducingcouplingbetweenvariousservices,thishastheadditionaladvantageofmakingiteasiertoreplaceadatabaseshouldthingsnotworkoutinthefuture.Thisisaplausibleapproachevenifyouendupusingthesamedatabaseeverywhere—shouldthingsgobadly,youcangraduallyswapitout,focusingonthemostproblematicservicesfirst.Thisdesignadviceappliesjustasmuchifyouprefertostickwitharelationaloption.By
encapsulatingsegmentsofyourdatabaseintoservices,youcanreplacepartsofyourdatastorewithaNoSQLtechnologyasitmaturesandtheadvantagesbecomeclearer.
15.5.KeyPoints•ThetwomainreasonstouseNoSQLtechnologyare:•Toimproveprogrammerproductivitybyusingadatabasethatbettermatchesanapplication’sneeds.
•Toimprovedataaccessperformanceviasomecombinationofhandlinglargerdatavolumes,reducinglatency,andimprovingthroughput.
•It’sessentialtotestyourexpectationsaboutprogrammerproductivityand/orperformancebeforecommittingtousingaNoSQLtechnology.
•Serviceencapsulationsupportschangingdatastoragetechnologiesasneedsandtechnologyevolve.SeparatingpartsofapplicationsintoservicesalsoallowsyoutointroduceNoSQLintoanexistingapplication.
•Mostapplications,particularlynonstrategicones,shouldstickwithrelationaltechnology—atleastuntiltheNoSQLecosystembecomesmoremature.
15.6.FinalThoughtsWehopeyou’vefoundthisbookenlightening.Whenwestartedwritingit,wewerefrustratedbythelackofanythingthatwouldgiveusabroadsurveyoftheNoSQLworld.Inwritingthisbookwehadtomakethatsurveyourselves,andwe’vefounditanenjoyablejourney.Wehopeyourjourneythroughthismaterialisconsiderablyquickerbutnolessenjoyable.AtthispointyoumaybeconsideringmakinguseofaNoSQLtechnology.Ifsothisbookisonlyan
earlystepinbuildingyourunderstanding.Weurgeyoutodownloadsomedatabasesandworkwiththem,forwe’reofthefirmconvictionthatyoucanonlyunderstandatechnologyproperlybyworkingwithit—findingitsstrengthsandtheinevitablegotchasthatnevermakeitintothedocumentation.Weexpectthatmostpeople,includingmostreadersofthisbook,willnotbeusingNoSQLfora
while.Itisanewtechnologyandwearestillearlyintheprocessofunderstandingwhentouseitandhowtouseitwell.Butaswithanythinginthesoftwareworld,thingsarechangingmorerapidlythanwedarepredict,sodokeepaneyeonwhat’shappeninginthisfield.Wehopeyou’llalsofindotherbooksandarticlestohelpyou.WethinkthebestmaterialonNoSQL
willbewrittenafterthisbookisdone,sowecan’tpointyoutoanywhereinparticularaswewrite
this.WedohaveanactivepresenceontheWeb,soforourmoreup-to-datethoughtsontheNoSQLworldtakealookatwww.sadalage.comandhttp://martinfowler.com/nosql.html.
Bibliography
[AgileMethods]www.agilealliance.org.
[Amazon’sDynamo]www.allthingsdistributed.com/2007/10/amazons_dynamo.html.
[AmazonDynamoDB]http://aws.amazon.com/dynamodb.
[AmazonSimpleDB]http://aws.amazon.com/simpledb.
[AmblerandSadalage]Ambler,ScottandPramodkumarSadalage.RefactoringDatabases:EvolutionaryDatabaseDesign.Addison-Wesley.2006.ISBN978-0321293534.
[BerkeleyDB]www.oracle.com/us/products/database/berkeley-db.
[Blueprints]https://github.com/tinkerpop/blueprints/wiki.
[Brewer]Brewer,Eric.TowardsRobustDistributedSystems.www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf.
[Cages]http://code.google.com/p/cages.
[Cassandra]http://cassandra.apache.org.
[Changetc.]Chang,Fay,JeffreyDean,SanjayGhemawat,WilsonC.Hsieh,DeborahA.Wallach,MikeBurrows,TusharChandra,AndrewFikes,andRobertE.Gruber.Bigtable:ADistributedStorageSystemforStructuredData.http://research.google.com/archive/bigtable-osdi06.pdf.
[CouchDB]http://couchdb.apache.org.
[CQL]www.slideshare.net/jericevans/cql-sql-in-cassandra.
[CQRS]http://martinfowler.com/bliki/CQRS.html.
[C-Store]Stonebraker,Mike,DanielAbadi,AdamBatkin,XuedongChen,MitchCherniack,MiguelFerreira,EdmondLau,AmersonLin,SamMadden,ElizabethO’Neil,PatO’Neil,AlexRasin,NgaTran,andStanZdonik.C-Store:AColumn-orientedDBMS.http://db.csail.mit.edu/projects/cstore/vldb.pdf.
[Cypher]http://docs.neo4j.org/chunked/1.6.1/cypher-query-lang.html.
[Daigneau]Daigneau,Robert.ServiceDesignPatterns.Addison-Wesley.2012.ISBN032154420X.
[DBDeploy]http://dbdeploy.com.
[DBMaintain]www.dbmaintain.org.
[DeanandGhemawat]Dean,JeffreyandSanjayGhemawat.MapReduce:SimplifiedDataProcessingonLargeClusters.http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf.
[Dijkstra’s]http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm.
[Evans]Evans,Eric.Domain-DrivenDesign.Addison-Wesley.2004.ISBN0321125215.
[FlockDB]https://github.com/twitter/flockdb.
[FowlerDSL]Fowler,Martin.Domain-SpecificLanguages.Addison-Wesley.2010.ISBN0321712943.
[Fowlerlmax]Fowler,Martin.TheLMAXArchitecture.http://martinfowler.com/articles/lmax.html.
[FowlerPoEAA]Fowler,Martin.PatternsofEnterpriseApplicationArchitecture.Addison-Wesley.2003.ISBN0321127420.
[FowlerUML]Fowler,Martin.UMLDistilled.Addison-Wesley.2003.ISBN0321193687.
[Gremlin]https://github.com/tinkerpop/gremlin/wiki.
[Hadoop]http://hadoop.apache.org/mapreduce.
[HamsterDB]http://hamsterdb.com.
[Hbase]http://hbase.apache.org.
[Hector]https://github.com/rantav/hector.
[Hive]http://hive.apache.org.
[HohpeandWoolf]Hohpe,GregorandBobbyWoolf.EnterpriseIntegrationPatterns.Addison-Wesley.2003.ISBN0321200683.
[HTTP]Fielding,R.,J.Gettys,J.Mogul,H.Frystyk,L.Masinter,P.Leach,andT.Berners-Lee.HypertextTransferProtocol—HTTP/1.1.www.w3.org/Protocols/rfc2616/rfc2616.html.
[Hypertable]http://hypertable.org.
[InfiniteGraph]www.infinitegraph.com.
[JSON]http://json.org.
[LevelDB]http://code.google.com/p/leveldb.
[Liquibase]www.liquibase.org.
[Lucene]http://lucene.apache.org.
[LynchandGilbert]Lynch,NancyandSethGilbert.Brewer’sconjectureandthefeasibilityofconsistent,available,partition-tolerantwebservices.http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf.
[Memcached]http://memcached.org.
[MongoDB]www.mongodb.org.
[Monitoring]www.mongodb.org/display/DOCS/MongoDB+Monitoring+Service.
[MyBatisMigrator]http://mybatis.org.
[Neo4J]http://neo4j.org.
[NoSQLDebrief]http://blog.oskarsson.nu/post/22996140866/nosql-debrief.
[NoSQLMeetup]http://nosql.eventbrite.com.
[NotesStorageFacility]http://en.wikipedia.org/wiki/IBM_Lotus_Domino.
[OpsCenter]www.datastax.com/products/opscenter.
[OrientDB]www.orientdb.org.
[Oskarsson]PrivateCorrespondence.
[Pentaho]www.pentaho.com.
[Pig]http://pig.apache.org.
[Pritchett]www.infoq.com/interviews/dan-pritchett-ebay-architecture.
[ProjectVoldemort]http://project-voldemort.com.
[RavenDB]http://ravendb.net.
[Redis]http://redis.io.
[Rekon]https://github.com/basho/rekon.
[Riak]http://wiki.basho.com/Riak.html.
[Solr]http://lucene.apache.org/solr.
[StrozziNoSQL]www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/NoSQL.
[TanenbaumandVanSteen]Tanenbaum,AndrewandMaartenVanSteen.DistributedSystems.Prentice-Hall.2007.ISBN0132392275.
[Terrastore]http://code.google.com/p/terrastore.
[Vogels]Vogels,Werner.EventuallyConsistent—Revisited.www.allthingsdistributed.com/2008/12/eventually_consistent.html.
[WebberNeo4JScaling]http://jim.webber.name/2011/03/22/ef4748c3-6459-40b6-bcfa-818960150e0f.aspx.
[ZooKeeper]http://zookeeper.apache.org.
Index
AACID(Atomic,Consistent,Isolated,andDurable)transactions,19
incolumn-familydatabases,109ingraphdatabases,28,50,114–115inrelationaldatabases,10,26vs.BASE,56
adbanners,108–109aggregate-orienteddatabases,14,19–23,147
atomicupdatesin,50,61disadvantagesof,30noACIDtransactionsin,50performanceof,149vs.graphdatabases,28
aggregates,14–23changingstructureof,98,132modeling,31real-timeanalyticswith,33updating,26
agilemethods,123Amazon,9
SeealsoDynamoDB,SimpleDBanalytics
countingwebsitevisitorsfor,108ofhistoricinformation,144real-time,33,98
ApachePiglanguage,76ApacheZooKeeperlibrary,104,115applicationdatabases,7,146
updatingmaterializedviewsin,31arcs(graphdatabases).Seeedgesatomiccross-documentoperations,98atomicrebalancing,58atomictransactions,92,104atomicupdates,50,61automatedfailovers,94automatedmerges,48automatedrollbacks,145auto-sharding,39availability,53
incolumn-familydatabases,104–105indocumentdatabases,93ingraphdatabases,115vs.consistency,54SeealsoCAPtheorem
averages,calculating,72
Bbackwardcompatibility,126,131BASE(BasicallyAvailable,Softstate,Eventualconsistency),56BerkeleyDB,81BigTableDB,9,21–22bit-mappedindexes,106blogging,108Blueprintspropertygraph,115Brewer,Eric,53Brewer ’sConjecture.SeeCAPtheorembuckets(Riak),82
defaultvaluesforconsistencyfor,84domain,83storingalldatatogetherin,82
businesstransactions,61
Ccaching
performanceof,39,137staledatain,50
Cageslibrary,104CAP(Consistency,Availability,andPartitiontolerance)theorem,53–56
fordocumentdatabases,93forRiak,86
CAS(compare-and-set)operations,62CassandraDB,10,21–22,99–109
availabilityin,104–105columnfamiliesin:commandsfor,105–106standard,101super,101–102
columnsin,100expiring,108–109indexing,106–107reading,107
super,101compactionin,103consistencyin,103–104ETLtoolsfor,139hintedhandoffin,104keyspacesin,102–104memtablesin,103queriesin,105–107repairsin,103–104replicationfactorin,103scalingin,107SSTablesin,103timestampsin,100transactionsin,104wide/skinnyrowsin,23
clients,processingon,67Clojurelanguage,145cloudcomputing,149clumping,39clusters,8–10,67–72,76,149
infilesystems,8inRiak,87resiliencyof,8
column-familydatabases,21–23,99–109ACIDtransactionsin,109columnsformaterializedviewsin,31combiningpeer-to-peerreplicationandshardingin,43–44consistencyin,103–104modelingfor,34performancein,103schemalessnessof,28vs.key-valuedatabases,21wide/skinnyrowsin,23
combinablereducers,70–71compaction(Cassandra),103compatibility,backward,126,131concurrency,145
infilesystems,141inrelationaldatabases,4offline,62
conditionalupdates,48,62–63conflicts
key,82read-write,49–50resolving,64write-write,47–48,64
consistency,47–59eventual,50,84incolumn-familydatabases,103–104ingraphdatabases,114inmaster-slavereplication,52inMongoDB,91logical,50optimistic/pessimistic,48read,49–52,56read-your-writes,52relaxing,52–56replication,50session,52,63tradingoff,57update,47,56,61vs.availability,54write,92SeealsoCAPtheorem
contenthashes,62–63contentmanagementsystems,98,108CouchDB,10,91
conditionalupdatesin,63replicasetsin,94
counters,forversionstamps,62–63CQL(CassandraQueryLanguage),10,106CQRS(CommandQueryResponsibilitySegregation),143cross-documentoperations,98C-StoreDB,21Cypherlanguage,115–119
DDataMapperandRepositorypattern,151datamodels,13,25
aggregate-oriented,14–23,30document,20key-value,20relational,13–14
dataredundancy,94
databaseschoosing,7,147–152deploying,139encapsulatinginexplicitlayer,151NoSQL,definitionof,10–11sharedintegrationof,4,6
DatastaxOpsCenter,139DBDeployframework,125DBMaintaintool,126deadlocks,48demoaccess,108DependencyNetworkpattern,77deploymentcomplexity,139Dijkstra’salgorithm,118disasterrecovery,94distributedfilesystems,76,141distributedversioncontrolsystems,48
versionstampsin,64distributionmodels,37–43
Seealsoreplications,sharding,singleserverapproachdocumentdatabases,20,23,89–98
availabilityin,93embeddingchilddocumentsinto,90indexesin,25master-slavereplicationin,93performancein,91queriesin,25,94–95replicasetsin,94scalingin,95schemalessnessof,28,98XMLsupportin,146
domainbuckets(Riak),83Domain-DrivenDesign,14DTDs(DocumentTypeDefinitions),146durability,56–57DynamoDB,9,81,100
shoppingcartsin,55DynomiteDB,10
Eearlyprototypes,109e-commerce
datamodelingfor,14flexibleschemasfor,98polyglotpersistenceof,133–138shoppingcartsin,55,85,87
edges(graphdatabases),26,111eligibilityrules,26enterprises
commercialsupportofNoSQLfor,138–139concurrencyin,4DBasbackingstorefor,4eventloggingin,97integrationin,4polyglotpersistencein,138–139securityofdatain,139
errorhandling,4,145etags,62ETLtools,139Evans,Eric,10eventlogging,97,107–108eventsourcing,138,142,144eventualconsistency,50
inRiak,84expiringusage,108–109
Ffailovers,automated,94filesystems,141
asbackingstoreforRDBMS,3cluster-aware,8concurrencyin,141distributed,76,141performanceof,141queriesin,141
FlockDB,113datamodelof,27nodedistributionin,115
GGilbert,Seth,53Google,9
GoogleBigTable.SeeBigTableGoogleFileSystem,141
graphdatabases,26–28,111–121,148ACIDtransactionsin,28,50,114–115aggregate-ignoranceof,19availabilityin,115consistencyin,114creating,113edges(arcs)in,26,111heldentirelyinmemory,119master-slavereplicationin,115migrationsin,131modelingfor,35nodesin,26,111–117performanceof,149propertiesin,111queriesin,115–119relationshipsin,111–121scalingin,119schemalessnessof,28singleserverconfigurationof,38traversing,111–117vs.aggregatedatabases,28vs.relationaldatabases,27,112wrappingintoservice,136
Gremlinlanguage,115GUID(GloballyUniqueIdentifier),62
HHadoopproject,67,76,141HamsterDB,81hashtables,62–63,81HBaseDB,10,21–22,99–100Hectorclient,105Hibernateframework,5,147hintedhandoff,104hiveDB,76hotbackup,40,42hotelbooking,4,55HTTP(HypertextTransferProtocol),7
interfacesbasedon,85updatingwith,62
HypertableDB,10,99–100
I
iBATIS,5,147impedancemismatch,5,12inconsistency
inshoppingcarts,55ofreads,49ofupdates,56windowof,50–51,56
indexesbit-mapped,106indocumentdatabases,25staledatain,138updating,138
InfiniteGraphDB,113datamodelof,27nodedistributionin,114–115
initialtechspikes,109integrationdatabases,6,11interoperability,7
JJSON(JavaScriptObjectNotation),7,94–95,146
Kkeys(key-valuedatabases)
composite,74conflictsof,82designing,85expiring,85groupingintopartitions,70
keyspaces(Cassandra),102–104key-valuedatabases,20,23,81–88
consistencyof,83–84modelingfor,31–33nomultiplekeyoperationsin,88schemalessnessof,28shardingin,86structureofvaluesin,86transactionsin,84,88vs.column-familydatabases,21XMLsupportin,146
LLiquibasetool,126
location-basedservices,120locks
dead,48offline,52
lostupdates,47LotusDB,91Lucenelibrary,85,88,116Lynch,Nancy,53
MMapReduceframework,67map-reducepattern,67–77
calculationswith,72incremental,31,76–77mapsin,68materializedviewsin,76partitionsin,70reusingintermediateoutputsin,76stagesfor,73–76
master-slavereplication,40–42appointingmastersin,41,57combiningwithsharding,43consistencyof,52indocumentdatabases,93ingraphdatabases,115versionstampsin,63
materializedviews,30inmap-reduce,76updating,31
MemcachedDB,81,87memoryimages,144–145memtables(Cassandra),103merges,automated,48MicrosoftSQLServer,8migrations,123–132
duringdevelopment,124,126ingraphdatabases,131inlegacyprojects,126–128inobject-orienteddatabases,146inschemalessdatabases,128–132incremental,130transitionphaseof,126–128
mobileapps,131MongoDB,10,91–97
collectionsin,91consistencyin,91databasesin,91ETLtoolsfor,139queriesin,94–95replicasetsin,91,93,96schemamigrationsin,128–131shardingin,96slaveOkparameterin,91–92,96terminologyin,89WriteConcernparameterin,92
MongoDBMonitoringService,139MyBatisMigratortool,126MySQLDB,53,119
NNeo4JDB,113–118
ACIDtransactionsin,114–115availabilityin,115creatinggraphsin,113datamodelof,27replicatedslavesin,115servicewrappingin,136
nodes(graphdatabases),26,111distributedstoragefor,114findingpathsbetween,117indexingpropertiesof,115–116
nonuniformdata,10,28,30NoSQLdatabases
advantagesof,12definitionof,10–11lackofsupportfortransactionsin,10,61runningofclusters,10schemalessnessof,10
Oobject-orienteddatabases,5,146
migrationsin,146vs.relationaldatabases,6
offlineconcurrency,62
offlinelocks,52OptimisticOfflineLock,62OracleDB
redologin,104terminologyin,81,89
OracleRACDB,8OrientDB,91,113ORM(Object-RelationalMapping)frameworks,5–6,147Oskarsson,Johan,9
Ppartitiontolerance,53–54
SeealsoCAPtheorempartitioning,69–70peer-to-peerreplication,42–43
durabilityof,58inconsistencyof,43versionstampsin,63–64
Pentahotool,139performance
andsharding,39andtransactions,53binaryprotocolsfor,7cachingfor,39,137data-access,149–150inaggregate-orienteddatabases,149incolumn-familydatabases,103indocumentdatabases,91ingraphdatabases,149responsivenessof,48testsfor,149
pipes-and-filtersapproach,73polyglotpersistence,11,133–139,148
anddeploymentcomplexity,139inenterprises,138–139
polyglotprogramming,133–134processing,onclients/servers,67programmerproductivity,147–149purchaseorders,25
Qqueries
againstvaryingaggregatestructure,98bydata,88,94bykey,84–86forfiles,141incolumn-familydatabases,105–107indocumentdatabases,25,94–95ingraphdatabases,115–119precomputedandcached,31viaviews,94
quorums,57,59read,58write,58,84
RRailsActiveRecordframework,147RavenDB,91
atomiccross-documentoperationsin,98replicasetsin,94transactionsin,92
RDBMS.Seerelationaldatabasesreads
consistencyof,49–52,56,58horizontalscalingfor,94,96inconsistent,49multiplenodesfor,143performanceof,52quorumsof,58repairsof,103resilienceof,40–41separatingfromwrites,41stale,56
read-writeconflicts,49–50read-your-writesconsistency,52RealTimeAnalytics,33RealTimeBI,33rebalancing,atomic,58recommendationengines,26,35,121,138RedisDB,81–83redolog,104reducefunctions,69
combinable,70–71regions.Seemap-reducepattern,partitionsin
RekonbrowserforRiak,139relationaldatabases(RDBMS),13,17
advantagesof,3–5,7–8,150aggregate-ignoranceof,19backingstorein,3clustered,8columnsin,13,90concurrencyin,4definingschemasfor,28impedancemismatchin,5,12licensingcostsof,8mainmemoryin,3modifyingmultiplerecordsatoncein,26partitionsin,96persistencein,3relations(tables)in,5,13schemasfor,29–30,123–128securityin,7shardingin,8simplicityofrelationshipsin,112strongconsistencyof,47terminologyin,81,89transactionsin,4,26,92tuples(rows)in,5,13–14viewsin,30vs.graphdatabases,27,112vs.object-orienteddatabases,6XMLsupportin,146
relationships,25,111–121dangling,114directionof,113,116,118inRDBMS,112propertiesof,113–115traversing,111–117
RelaxNG,146replicasets,91,93,96replicationfactor,58
incolumn-familydatabases,103inRiak,84
replications,37combiningwithsharding,43consistencyof,42,50
durabilityof,57overclusters,149performanceof,39versionstampsin,63–64Seealsomaster-slavereplication,peer-to-peerreplication
resilienceandsharding,39read,40–41
responsiveness,48RiakDB,81–83
clustersin,87controllingCAPin,86eventualconsistencyin,84HTTP-basedinterfaceof,85link-walkingin,25partialretrievalin,25replicationfactorin,84servicewrappingin,136terminologyin,81transactionsin,84writetoleranceof,84
RiakSearch,85,88richdomainmodel,113rollbacks,automated,145routing,120rows(RDBMS).Seetuples
Sscaffoldingcode,126scaling,95
horizontal,149forreads,94,96forwrites,96
incolumn-familydatabases,107indocumentdatabases,95ingraphdatabases,119vertical,8
Scatter-Gatherpattern,67schemalessdatabases,28–30,148
implicitschemaof,29schemachangesin,128–132
schemas
backwardcompatibilityof,126,131changing,128–132duringdevelopment,124,126implicit,29migrationsof,123–132
searchengines,138security,139servers
maintenanceof,94processingon,67
service-orientedarchitecture,7services,136
andsecurity,139decomposingdatabaselayerinto,151decouplingbetweendatabasesand,7overHTTP,7
sessionsaffinity,52consistencyof,52,63expirekeysfor,85managementof,133sticky,52storing,57,87
sharding,37–38,40,149andperformance,39andresilience,39auto,39bycustomerlocation,97combiningwithreplication,43inkey-valuedatabases,86inMongoDB,96inrelationaldatabases,8
shareddatabaseintegration,4,6shoppingcarts
expirekeysfor,85inconsistencyin,55persistenceof,133storing,87
shuffling,70SimpleDB,99
inconsistencywindowof,50singleserverapproach,37–38
consistencyof,53nopartitiontolerancein,54transactionsin,53versionstampsin,63
single-threadedeventprocessors,145snapshots,142–143socialnetworks,26,120
relationshipsbetweennodesin,117Solrindexingengine,88,137,141splitbrainsituation,53SQL(StructuredQueryLanguage),5SSTables(Cassandra),103staledata
incache,50inindexes/searchengines,138reading,56
standardcolumnfamilies(Cassandra),101stickysessions,52storagemodels,13Strozzi,Carlo,9supercolumnfamilies(Cassandra),101–102supercolumns(Cassandra),101systemtransactions,61
Ttables.Seerelationaldatabases,relationsintelemetricdatafromphysicaldevices,57TerrastoreDB,91,94timestamps
consistentnotionoftimefor,64incolumn-familydatabases,100oflastupdate,63
transactionalmemorysystems,145transactions,50
ACID,10,19,26,28,50,56,109,114–115acrossmultipleoperations,92andperformance,53atomic,92,104business,61ingraphdatabases,28,114–115inkey-valuedatabases,84,88inRDBMS,4,26,92
insingleserversystems,53lackofsupportinNoSQLfor,10,61multioperation,88openduringuserinteraction,52rollingback,4system,61
treestructures,117triggers,126TTL(TimeToLive),108–109tuples(RDBMS),5,13–14
Uupdates
atomic,50,61conditional,48,62–63consistencyof,47,56,61lost,47merging,48timestampsof,63–64
usercomments,98userpreferences,87userprofiles,87,98userregistrations,98usersessions,57
Vvectorclock,64versioncontrolsystems,126,145
distributed,48,64versionstamps,52,61–64versionvector,64views,126virtualcolumns,126VoldemortDB,10,82
Wwebservices,7websites
distributingpagesfor,39onlargeclusters,149publishing,98visitorcountersfor,108
wordprocessors,3
writetolerance,84writes,64
atomic,104conflictsof,47–48consistencyof,92horizontalscalingfor,96performanceof,91quorumsof,58separatingfromreads,41serializing,47
XXML(ExtensibleMarkupLanguage),7,146XMLdatabases,145–146XMLSchemalanguage,146XPathlanguage,146XQuerylanguage,146XSLT(ExtensibleStylesheetLanguageTransformations),146
ZZooKeeper.SeeApacheZooKeeper