CompSci 516DataIntensiveComputingSystems
Lecture18NoSQLand
ColumnStore
Instructor:Sudeepa Roy
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
1
Announcements
• HW3(lastHW)hasbeenpostedonSakai
• SameproblemsasinHW1butinMongoDB(NOSQL)
• Dueintwoweeksaftertoday’slecture(~11/16)
2DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
ReadingMaterialNOSQL:• “ScalableSQLandNoSQLDataStores”RickCattell,SIGMODRecord,December2010(Vol.39,No.4)• seewebpagehttp://cattell.net/datastores/ forupdatesandmorepointers
ColumnStore:• D.Abadi,P.Boncz,S.Harizopoulos,S.Idreos andS.Madden.TheDesignandImplementationof
ModernColumn-OrientedDatabaseSystems.FoundationsandTrendsinDatabases,vol.5,no.3,pp.197–280,2012.
• SeeVLDB2009tutorial:http://nms.csail.mit.edu/~stavros/pubs/tutorial2009-column_stores.pdf
Optional:• “Dynamo:Amazon’sHighlyAvailableKey-valueStore”ByGiuseppeDeCandia et.al.SOSP
2007
• “Bigtable:ADistributedStorageSystemforStructuredData”FayChanget.al.OSDI2006
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 3
NoSQL
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 4
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 5
Sofar-- RDBMS
• RelationalDataModel• RelationalDatabaseSystems(RDBMS)• RDBMSshave– acompletepre-definedfixedschema– aSQLinterface– andACIDtransactions
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 6
Today• NoSQL:”new”databasesystems– nottypicallyRDBMS– relaxonsomerequirements,gainefficiencyandscalability
• Newsystemschoosetouse/notuseseveralconceptswelearntsofar– e.g.SystemXdoesnotuselocksbutusemulti-versionCC(MVCC)or,
– SystemYusesasynchronousreplication• therefore,itisimportanttounderstandthebasics(Lectures1-17)eveniftheyarenotusedinsomenewsystems!
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 7
Warnings!
• MaterialfromCattell’spaper(2010-11)–someinfowillbeoutdated– seewebpagehttp://cattell.net/datastores/ forupdatesandmorepointers
• WewillfocusonthebasicideasofNoSQLsystems
• Optional readingslidesattheend– therearealsocomparisontablesintheCattell’spaperifyouareinterested
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 8
OLAPvs.OLTP
• OLTP(OnLine TransactionProcessing)– Recalltransactions!– Multipleconcurrentread-writerequests– Commercialapplications(banking,onlineshopping)– Datachangesfrequently– ACIDproperties,concurrencycontrol,recovery
• OLAP(OnLine AnalyticalProcessing)– Manyaggregate/group-byqueries– multidimensionaldata– Datamostlystatic– WillstudyOLAPCubesoon
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 9
NewSystems• WewillexamineanumberofSQLandso- called“NoSQL”systemsor“datastores”
• DesignedtoscalesimpleOLTP-styleapplicationloads– todoupdatesaswellasreads– incontrasttotraditionalDBMSsanddatawarehouses– toprovidegoodhorizontalscalabilityforsimpleread/writedatabaseoperationsdistributedovermanyservers
• OriginallymotivatedbyWeb2.0applications– thesesystemsaredesignedtoscaletothousandsormillionsofusers
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 10
NewSystemsvs.RDMS• Whenyoustudyanewsystem,compareitwithRDBMS-sonits– datamodel– consistencymechanisms– storagemechanisms– durabilityguarantees– availability– querysupport
• Thesesystemstypicallysacrificesomeofthesedimensions– e.g.database-widetransactionconsistency,inordertoachieveothers,e.g.higheravailabilityandscalability
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 11
NoSQL
• Manyofthenewsystemsarereferredtoas“NoSQL”datastores
• NoSQLstandsfor“NotOnlySQL”or“NotRelational”– notentirelyagreedupon
• Next:sixkeyfeaturesofNoSQLsystems
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 12
NoSQL:SixKeyFeatures
1. theabilitytohorizontallyscale“simpleoperations”throughputovermanyservers
2. theabilitytoreplicateandtodistribute(partition)dataovermanyservers
3. asimplecalllevelinterfaceorprotocol(incontrasttoSQLbinding)
4. aweakerconcurrencymodelthantheACIDtransactionsofmostrelational(SQL)databasesystems
5. efficientuseofdistributedindexesandRAMfordatastorage6. theabilitytodynamicallyaddnewattributestodatarecords
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 13
ImportantExamplesofNewSystems
• Threesystemsprovideda“proofofconcept”andinspiredmanyotherdatastores
1. Memcached2. Amazon’sDynamo3. Google’sBigTable
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 14
1.Memcached:mainfeatures
• popularopensourcecache
• supportsdistributedhashing(later)
• demonstratedthatin-memoryindexes canbehighlyscalable,distributing andreplicatingobjectsovermultiplenodes
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 15
2.Dynamo:mainfeatures
• pioneeredtheideaofeventualconsistencyasawaytoachievehigheravailabilityandscalability
• datafetchedarenotguaranteedtobeup-to-date
• butupdatesareguaranteedtobepropagatedtoallnodeseventually
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 16
3.BigTable :mainfeatures
• demonstratedthatpersistentrecordstoragecouldbescaledtothousandsofnodes
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 17
BASE(notACIDJ)
• RecallACIDforRDBMSdesiredpropertiesoftransactions:– Atomicity,Consistency,Isolation,andDurability
• NOSQLsystemstypicallydonotprovideACID
• BasicallyAvailable• Softstate• Eventuallyconsistent
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 18
ACIDvs.BASE• TheideaisthatbygivingupACIDconstraints,onecanachievemuchhigherperformanceandscalability
• Thesystemsdifferinhowmuchtheygiveup– e.g.mostofthesystemscallthemselves“eventuallyconsistent”,meaningthatupdatesareeventuallypropagatedtoallnodes
– butmanyofthemprovidemechanismsforsomedegreeofconsistency,suchasmulti-versionconcurrencycontrol(MVCC)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 19
“CAP”Theorem
• OftenEricBrewer’sCAPtheoremcitedforNoSQL
• A systemcanhaveonlytwooutofthreeofthefollowingproperties:– Consistency,– Availability– Partition-tolerance
• TheNoSQLsystemsgenerallygiveupconsistency– However,thetrade-offsarecomplex
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 20
TwofociforNoSQLsystems
1. “Simple”operations
2. HorizontalScalability
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 21
1.“Simple”Operations
• Readingorwritingasmallnumberofrelatedrecordsineachoperation– e.g.keylookups– readsandwritesofonerecordorasmallnumberofrecords
• Thisisincontrasttocomplexqueries,joins,orread-mostlyaccess
• Inspiredbyweb,wheremillionsofusersmaybothreadandwritedatainsimpleoperations– e.g.searchandupdatemulti-serverdatabasesofelectronic
mail,personalprofiles,webpostings,wikis,customerrecords,onlinedatingrecords,classifiedads,andmanyotherkindsofdata
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 22
2.HorizontalScalability
• Shared-NothingHorizontalScaling
• Theabilitytodistributeboththedataandtheloadofthesesimpleoperationsovermanyservers– withnoRAMordisksharedamongtheservers
• Not“vertical”scaling– whereadatabasesystemutilizesmanycoresand/orCPUsthatshareRAManddisks
• Someofthesystemswedescribeprovidebothverticalandhorizontalscalability
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 23
2.Horizontalvs.VerticalScaling
• Effectiveuseofmultiplecores(verticalscaling)isimportant– butthenumberofcoresthatcansharememoryislimited
• horizontalscalinggenerallyislessexpensive– canusecommodityservers
• Note:horizontalandverticalpartitioningarenotrelatedtohorizontalandverticalscaling– exceptthattheyarebothusefulforhorizontalscaling(Lecture17)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 24
WhatisdifferentinNOSQLsystems
• WhenyoustudyanewNOSQLsystem,noticehowitdiffersfromRDBMSintermsof
1. ConcurrencyControl2. DataStorageMedium3. Replication4. Transactions
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 25
ChoicesinNOSQLsystems:1.ConcurrencyControl
a) Locks– somesystemsprovideone-user-at-a-timereadorupdatelocks– MongoDBprovideslockingatafieldlevel
b) MVCCc) None– donotprovideatomicity– multipleuserscaneditinparallel– noguaranteewhichversionyouwillread
d) ACID– pre-analyzetransactionstoavoidconflicts– nodeadlocksandnowaitsonlocks
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 26
ChoicesinNOSQLsystems:2.DataStorageMedium
a) StorageinRAM– snapshotsorreplicationtodisk– poorperformancewhenoverflowsRAM
b) Diskstorage– cachinginRAM
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 27
ChoicesinNOSQLsystems:3.Replication
• whethermirrorcopiesarealwaysinsynca) Synchronousb) Asynchronous– faster,butupdatesmaybelostinacrash
c) Both– localcopiessynchronously,remotecopies
asynchronously
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 28
ChoicesinNOSQLsystems:4.TransactionMechanisms
a) supportb) donotsupportc) inbetween– supportlocaltransactionsonlywithinasingle
objector“shard”– shard=ahorizontalpartitionofdataina
database
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 29
ComparisonfromCattell’spaper(2011)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 30
DataModelTerminologyforNoSQL
• UnlikeSQL/RDBMS,theterminologyforNoSQLisofteninconsistent– wearefollowingnotationsinCattell’spaper
• Allsystemsprovideawaytostorescalarvalues– e.g.numbersandstrings
• Someofthemalsoprovideawaytostoremorecomplexnestedorreferencevalues
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 31
DataModelTerminologyforNoSQL
• Thesystemsallstoresetsofattribute-valuepairs– butusefourdifferentdatastructures
1. Tuple2. Document3. ExtensibleRecord4. Object
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 32
1.Tuple
• Sameasbefore• A“tuple”isarowinarelationaltable– attributenamesarepre-definedinaschema– thevaluesmustbescalar– thevaluesarereferencedbyattributename– incontrasttoanarrayorlist,wheretheyarereferencedbyordinalposition
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 33
2.Document
• Allowsvaluestobenesteddocumentsorlistsaswellasscalarvalues
• Theattributenamesaredynamicallydefinedforeachdocumentatruntime
• Adocumentdiffersfromatupleinthattheattributesarenotdefinedinaglobalschema– anda widerrangeofvaluesarepermitted
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 34
3.ExtensibleRecord
• A hybrid betweenatupleandadocument• familiesofattributesaredefinedinaschema• butnewattributescanbeadded(withinanattributefamily)onaper-recordbasis
• Attributesmaybelist-valued
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 35
4.Object
• Analogoustoanobjectinprogramminglanguages– butwithouttheproceduralmethods
• Valuesmaybereferencesornestedobjects
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 36
DataStoreCategories• Thedatastoresaregroupedaccordingtotheirdatamodel• Key-valueStores:
– storevaluesandanindextofindthem– basedonaprogrammer- definedkey
• DocumentStores:– storedocuments– Thedocumentsareindexedandasimplequerymechanismis
provided• ExtensibleRecordStores:
– storeextensiblerecordsthatcanbepartitionedverticallyandhorizontallyacrossnodes
– Somepaperscallthese“widecolumnstores”• RelationalDatabases:
– store(andindexandquery)tuples– e.g.thenewRDBMSsthatprovidehorizontalscaling
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 37
ExampleNOSQLsystems
• Key-valueStores:– ProjectVoldemort,Riak,Redis,Scalaris,TokyoCabinet,Memcached/Membrain/Membase
• DocumentStores:– AmazonSimpleDB,CouchDB,MongoDB,Terrastore
• ExtensibleRecordStores:– Hbase,HyperTable,Cassandra,Yahoo’sPNUTS
• RelationalDatabases:– MySQLCluster,VoltDB,Clustrix,ScaleDB,ScaleBase,NimbusDB,GoogleMegastore(alayeronBigTable)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 38
Key-valuestore:1/2• Thesimplestdatastores• datamodelsimilartothememcached distributedin-memorycache– withasinglekey-valueindexforallthedata– doesnotprovidesecondaryindicesorkeys
• butunlikememcached,generallyprovide– apersistencemechanism– additionalfunctionalitylike replication,versioning,locking,transactions,sorting,etc
• Theclientinterfaceprovidesinserts,deletes,andindexlookups
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 39
Key-valuestore:2/2• Allkey-valuestoresprovidescalabilitythroughkey
distributionovernodes• Voldemort,Riak,TokyoCabinet,andenhancedmemcached
systemscanstoredatainRAMorondisk– TheothersstoredatainRAM,andprovidediskasbackup,orrely
onreplicationandrecoverysothatabackupisnotneeded• Scalaris andenhancedmemcached systemsusesynchronous
replication– therestuseasynchronous
• Scalaris andTokyoCabinetimplementtransactions– theothersdonot.
• VoldemortandRiak usemulti-versionconcurrencycontrol– theothersuselocks
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 40
UseCase:Key-valuestore
• ifyouhaveasimpleapplicationwithonlyonekindofobject,andyou onlyneedtolookupobjectsupbasedononeattribute
• Supposeyouhaveawebapplication– thatdoesmanyRDBMSqueriestocreateatailoredpagewhenauserlogsin
– Supposeittakesseveralsecondstoexecutethosequeries,andtheuser’sdataisrarelychanged
– youmightwanttostoretheuser’stailoredpageasasingleobjectinakey-valuestore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 41
Documentstore:1/3• Documentstoressupportmorecomplexdatathanthekey-valuestores
• “documentstore”maybeconfusing– thesesystemscouldstore“documents”inthetraditionalsense(articles,MicrosoftWordfiles,etc.)
– butadocumentinthesesystemscanbeanykindof“pointerless object”
• Unlikethekey-valuestores,thesesystemsgenerallysupport– secondaryindexes– multipletypesofdocuments(objects)perdatabase,and– nesteddocumentsorlists
• LikeotherNoSQLsystems,thedocumentstoresdonotprovideACIDtransactionalproperties
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 42
Documentstore:2/3• Thedocumentstoresareschema-less,exceptfor
– attributes(whicharesimplyaname,andarenotpre- specified)– collections(whicharesimplyagroupingofdocuments),and– indexesdefinedoncollections(explicitlydefined,exceptinSimpleDB)– Therearesomedifferencesintheirdatamodels,e.g.SimpleDB does
notallownesteddocuments
• Thedocumentstoresareverysimilarbutusedifferentterminology– e.g.aSimpleDB Domain=CouchDB Database=MongoDBCollection
(=Terrastore Bucket)– SimpleDB callsdocuments“items”– anattributeisafieldinCouchDB,orakeyinMongoDB(orTerrastore)
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 43
Documentstore:3/3• Unlikethekey-valuestores,thedocumentstores“typically”provideamechanismtoquerycollectionsbasedonmultipleattributevalueconstraints
• donotprovideexplicitlocks– haveweakerconcurrencyandatomicitypropertiesthantraditionalACID-compliantdatabases
• Documentscanbedistributedovernodesinallofthesystems– Allofthesystemscanachievescalabilitybyreading(potentially)out-of-datereplicas
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 44
Usecase:DocumentStore
• applicationwithmultipledifferentkindsofobjects– e.g.inaDepartmentofMotorVehiclesapplication,withvehiclesanddrivers
• whereyouneedtolookupobjectsbasedonmultiplefields– e.g.,adriver’sname,licensenumber,ownedvehicle,orbirthdate
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 45
ExtensibleRecordStores:1/1
• MotivatedbyGoogle’ssuccesswithBigTable– stilltherecentextensiblerecordstorescannotcomeclosetoBigTable’s
scalability• Basicdatamodelisrowsandcolumns• Basicscalabilitymodelissplittingbothrowsandcolumnsover
multiplenodes• Rowsaresplitacrossnodesthroughsharding ontheprimarykey
– Theytypicallysplitbyrangeratherthanahashfunction• Columnsofatablearedistributedovermultiplenodesbyusing
“columngroups”– awayforthecustomertoindicatewhichcolumnsarebeststored
together• Bothhorizontalandverticalpartitioningcanbeusedsimultaneously
onthesametable
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 46
Usecase:ExtensibleRecordStore• usescasessimilartothosefordocumentstores:
– multiplekindsofobjects,withlookupsbasedonanyfield.
• However,aimedathigherthroughput,andmayprovidestrongerconcurrencyguarantees,– atthecostofslightlymorecomplexity thanthedocumentstores
• SupposestoringcustomerinformationforaneBay-styleapplication,andyouwanttopartitionyourdatabothhorizontallyandvertically:– clustercustomersbycountry,sothatyoucanefficientlysearchallof
thecustomersinonecountry– separatetherarely-changed“core”customerinformationsuchas
customeraddressesandemailaddressesinoneplace,and– putcertainfrequently-updatedcustomerinformation(suchascurrent
bidsinprogress)inadifferentplace,toimproveperformanceDukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 47
ScalableRDBMS:1/1
• SomeRDBMSsareexpectedtoprovidescalabilitycomparablewithNoSQLdatastores
• But,withtwoprovisos:– Usesmall-scopeoperations: Operationsthatspanmanynodes,
e.g.joinsovermanytables,willnotscalewellwithsharding– Usesmall-scopetransactions:Likewise,transactionsthatspan
manynodesaregoingtobeveryinefficient,withthecommunicationand2PCoverhead
• TypicalNOSQLsystemsmakethesetwoimpossible• ScalableRDBMSallowsthem,butpenalizesacustomerfor
theseoperations• Havehigher-levelSQLlanguageandACIDproperties
– butpayapricewhentheyspannodes
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 48
Usecase:ScalableRDBMS
• Ifyourapplicationrequiresmanytableswithdifferenttypesofdata– arelationalschemacentralizesandsimplifiesdatadefinitionandSQLsimplifiesoperations
– orforprojectswithmanyprogrammers• However,moreusefuliftheapplicationdoesnotrequire– updatesorjoinsthatspanmanynodes– transactioncoordination– or,datamovement
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 49
ConsistentHashing
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 50
inDynamoDB
ConsistentHashing(CH)• Recalldynamichashingschemes• Ifthe#ofslots(directorysize)changes,thenalmostallkeyshadtoberemapped
• Inconsistenthashing(CH),with#keys=Kand#slots=N,onlyK/Nkeysneedtoberemappedonaverage
• AppliestothedesignofDistributedHashTable(DHTs)forUniformLoadDistribution– partitionakeyspace amongasetofsites/nodes– additionallyprovideanoverlaynetworkthatconnectsnodessuchthatthenodesresponsibleforanykeycanbeefficientlylocated
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 51
DynamoDB :CH1/2• [ref.theDynamoDB paper,sec4.3]• Mustscaleincrementally• Consistenthashingisusedtodynamicallydistribute
dataarounda“ring”ofnodes(=sites)• Theoutputofahashfunctionistreatedasacircular
ring• Eachnodeisassignedarandomvalueinthisspace
– representsthe“position”onthering
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 52
• Dataitemidentifiedbyakey• Assigntoanodebyhashing
thekeyto
DynamoDB :CH2/2• Dataitemidentifiedbyakey• Assigntoanodebyhashingthekeytoyielditsposition
onthering• Walktheringclockwisetofindthefirstnodewitha
positionlargerthantheitem’sposition• Eachnodeisresponsiblefortheregioninthering
betweenitanditspredecessornodeonthering
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 53
• Note:• departureorarrivalofanodeonly
affectsitsimmediateneighbor• Theothernodesremainunaffected• K/Nonaverage!
DynamoDB :ChallengesinCH• However,thisbasicCHalgorithmposessomechallenges1. Randompositionassignmentofeachnodeonthering
leadstonon-uniformdataandloaddistribution2. Thebasicalgorithmisoblivioustotheheterogeneityinthe
performanceofthenodes
• Solution:DynamousesavariantofCH
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 54
• Eachnodegetsassignedtomultiplepointsinthering
• called“virtualnode”• onenodetakescareofmultiple
virtualnodes
DynamoDB:VirtualNodes• Usingvirtualnodeshasadvantages
1. Ifanodebecomesunavailable(duetofailuresorroutinemaintenance),theloadhandledbythisnodeisevenlydispersedacrosstheremainingavailablenodes
2. Whenanodebecomesavailableagain,oranewnodeisaddedtothesystem,thenewlyavailablenodeacceptsaroughlyequivalentamountofloadfromeachoftheotheravailablenodes
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 55
3. Thenumberofvirtualnodesthatanodeisresponsiblecandecidedbasedonitscapacity,accountingforheterogeneityinthephysicalinfrastructure
DynamoDB:Replication• Dynamoreplicatesitsdataonmultiple(N)hostsforhigh
availabilityanddurability• Eachkeykisassignedtoacoordinatorwhichisinchargeof
replication– coordinatorhandlesallkeysinitsrange
• Coordinatorreplicateseachkeyitisinchargeof– bystoringitlocally– replicatingitattheN-1clockwisesuccesor nodesinthering
• EachnodeisinchargeofregionoftheringbetweenitanditsN-th predecessor
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 56
NodeBreplicateskeyKatnodesCandDNodeDwillstorekeysintherange(A,B],(B,C],(C,D]Note:theremaybe<N“physical”nodes
CHHistory• ProposedbyCStheoreticiansfromMIT:
– Karger-Lehman-Leighton-Panigrahy-Levine-Lewin– “ConsistentHashingandRandomTrees:DistributedCachingProtocols
forRelievingHotSpotsontheWorldWideWeb”– STOC1997
• ConsistenthashinggavebirthtoAkamaiTechnologies– FoundedbyDannyLewinandTomLeightonin1998– Akamai’scontentdeliverynetworkisoneofthelargestdistributed
computingplatforms– Nowmarketcap$12Band6200employees– Managingweb-presenceofmanymajorcompanies
• 2001:TheconceptofDistributedHashTable(DHT)isproposed(howtolookforafile)andCHwasre-purposed
• NowusedinDynamo,Couchbase,Cassandra,Voldemort,Riak,..
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 57
SQLvs.NOSQL
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 58
Argumentsforbothsidesstillacontroversialtopic
WhychooseRDBMSoverNoSQL:1/31. Ifnewrelationalsystemscandoeverything
thataNoSQLsystemcan,withanalogousperformanceandscalability(?),andwiththeconvenienceoftransactionsandSQL,NoSQLisnotneeded
2. RelationalDBMSshavetakenandretainedmajoritymarketshareoverothercompetitorsinthepast30years– (network,object,andXMLDBMSs)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 59
WhychooseRDBMSoverNoSQL:2/33. SuccessfulrelationalDBMSshavebeenbuilt
tohandleotherspecificapplicationloads inthepast:– read-onlyorread-mostlydatawarehousing– OLTPonmulti-coremulti-diskCPUs– in-memorydatabases– distributeddatabases,and– nowhorizontallyscaleddatabases
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 60
WhychooseRDBMSoverNoSQL:3/3
4. Whileno“onesizefitsall”intheSQLproductsthemselves,thereisacommoninterfacewithSQL,transactions,andrelationalschemathatgiveadvantagesintraining,continuity,anddatainterchange
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 61
WhychooseNoSQLoverRDBMS:1/31. Wehaven’tyetseengoodbenchmarksshowing
thatRDBMSscanachievescaling comparablewithNoSQLsystemslikeGoogle’sBigTable
2. Ifyouonlyrequirealookupofobjectsbasedonasinglekey– thenakey-valuestoreisadequateandprobablyeasiertounderstand
thanarelationalDBMS– Likewiseforadocumentstoreonasimpleapplication:youonlypay
thelearningcurveforthelevelofcomplexityyourequire
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 62
WhychooseNoSQLoverRDBMS:2/3
3. Someapplicationsrequireaflexibleschema– allowingeachobjectinacollectiontohavedifferentattributes
– WhilesomeRDBMSsallowefficient“packing”oftupleswithmissingattributes,andsomeallowaddingnewattributesatruntime,thisisuncommon
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 63
WhychooseNoSQLoverRDBMS:3/3
4. ArelationalDBMSmakes“expensive”(multi- nodemulti-table)operations“tooeasy”– NoSQLsystemsmakethemimpossibleorobviouslyexpensiveforprogrammers
5. WhileRDBMSshavemaintainedmajoritymarketshareovertheyears,otherproductshaveestablishedsmallerbutnon-trivialmarketsinareaswherethereisaneedforparticularcapabilities– e.g.indexedobjectswithproductslikeBerkeleyDB,orgraph-following
operationswithobject-orientedDBMSs
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 64
ColumnStore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 65
Rowvs.ColumnStore
• Rowstore– storeallattributesofatupletogether– storagelike“row-majororder”inamatrix
• Columnstore– storeallrowsforanattribute(column)together– storagelike“column-majororder”inamatrix
• e.g.– MonetDB,Vertica(earlier,C-store),SAP/SybaseIQ,GoogleBigtable (withcolumngroups)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 66
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 67
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 68
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 69
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 70
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 71
Ack:SlidefromVLDB2009tutorialonColumnstore
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 72
AdditionalandOptionalSlidesonMongoDB
(MaybeusefulforHW3)https://docs.mongodb.com
MongoDB
• MongoDBisanopensourcedocumentstorewritteninC++• providesindexesoncollections• lockless• providesadocumentquerymechanism• supportsautomaticsharding• Replicationismostlyusedforfailover• doesnotprovidetheglobalconsistencyofatraditionalDBMS
– butyoucangetlocalconsistencyontheup-to-dateprimarycopyofadocument
• supportsdynamicquerieswithautomaticuseofindices,likeRDBMSs
• alsosupportsmap-reduce– helpscomplexaggregationsacrossdocs
• providesatomicoperationsonfields
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 73
Optionalslide:Readyourself
MongoDB:AtomicOpsonFields• Theupdatecommandsupports“modifiers”thatfacilitateatomic
changestoindividualvalues– $setsetsavalue– $inc incrementsavalue– $pushappendsavaluetoanarray– $pushAll appendsseveralvaluestoanarray– $pullremovesavaluefromanarray,and$pullAll removesseveral
valuesfromanarray• Sincetheseupdatesnormallyoccur“inplace”,theyavoidthe
overheadofareturntriptotheserver• Thereisan“updateifcurrent”conventionforchangingadocument
onlyiffieldvaluesmatchagivenpreviousvalue• MongoDBsupportsafindAndModify commandtoperforman
atomicupdateandimmediatelyreturntheupdateddocument– usefulforimplementingqueuesandotherdatastructuresrequiring
atomicity
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 74
Optionalslide:Readyourself
MongoDB:Index• MongoDBindicesareexplicitlydefinedusinganensureIndex call– anyexistingindicesareautomaticallyusedforqueryprocessing
• Tofindallproductsreleasedlastyear(2015)orlatercostingunder$100youcouldwrite:
• db.products.find({released:{$gte:newDate(2015,1,1,)},price{‘$lte’:100},})
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 75
Optionalslide:Readyourself
MongoDB:Data
• MongoDBstoresdatainabinaryJSON-likeformatcalledBSON– BSONsupportsboolean,integer,float,date,stringandbinarytypes
–MongoDBcanalsosupportlargebinaryobjects,eg.imagesandvideos
– Thesearestoredinchunksthatcanbestreamedbacktotheclientforefficientdelivery
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 76
Optionalslide:Readyourself
MongoDB:Replication
• MongoDBsupportsmaster-slavereplicationwithautomaticfailoverandrecovery– Replication(andrecovery)isdoneatthelevelofshards
– Replicationisasynchronousforhigherperformance,sosomeupdatesmaybelostonacrash
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 77
Optionalslide:Readyourself
Top Related