Introduc/on to Data Management Lecture #17 SQL NoSQL J · 6/2/17 5 As the Relaonal Era Unfolded •...
Transcript of Introduc/on to Data Management Lecture #17 SQL NoSQL J · 6/2/17 5 As the Relaonal Era Unfolded •...
6/2/17
1
Introduc/ontoDataManagement
Lecture#17SQLNoSQL(J)
Instructor:[email protected]
Announcements
• Homeworkinfo:– HW#7:Duetomorrow(6PM).– HW#8istheend(“NoSQL”)!
• DueaweekfromFriday(6PM).• Latepenalty:10pts/day(BUTJUSTONEDAY).
• NoSQLlectureplans:– Today/Tuesday:NoSQL&BigData(alaAsterixDB)
• Notinbook:Seepaperlinkedtowikisyllabus!• AlsoseedocsontheApacheAsterixDBsite.
– WatchtheStanfordonlinelecturematerial!• WatchbothoftheJSONvideolectures.• Besuretotakethequizattheend!
6/2/17
2
OurPlanforNoSQL+AsterixDB• Thepre-rela/onalera• Therela/onalDBera• Beyondrowsandcolumns?
1. Theobject-orientedDBera2. Theobject-rela/onalDBera3. TheXMLDBera4. TheNoSQLDBera*(*watchStanfordmaterialtoo...!)
• Reflec/ons,andthen...AsterixDB!
TheBirthofToday’sDBMSField
• InthebeginningwastheWord,andtheWordwaswithCodd,andtheWordwasCodd...– 1970CACMpaper:“Arela/onalmodelofdataforlargeshareddatabanks”
• Manyrefertothisasthefirstgenera/onof(real?)databasemanagementsystems
6/2/17
3
ThisisaSQL/NoSQLHistoryTalk• Thepre-rela/onalera• Therela/onalDBera• Beyondrowsandcolumns?
1. Theobject-orientedDBera2. Theobject-rela/onalDBera3. TheXMLDBera4. TheNoSQLDBera
• Reflec/ons&challenges
TheFirstDecadeB.C.• Theneedforadatamanagementlibrary,oradatabasemanagementsystem,hadactuallybeenwellrecognized– HierarchicalDBsystems(e.g.,IMSfromIBM)– NetworkDBsystems(mostnotablyCODASYL)
• Thesesystemsprovidednaviga9onalAPIs– Systemsprovidedfiles,records,pointers,indexes– Programmershadto(carefully!)scanorsearchforrecords,followparent/childstructuresorpointers,andmaintaincodewhenanythingphysicalchanged
6/2/17
4
TheFirstDecadeB.C.(cont.)
Order(id,custName,custCity,total)
Item(ino,qty,price)
Product(sku,name,listPrice,size,power)
Item-ProductItem-Order
Item-Order
123FredLA25.97401GarfieldT-Shirt9.99XL-
544USBCharger5.99-115V129.99 213.99
Order
Item Item
Product
Product
Item-Product
Item-Product
EntertheRela/onalDBEra
• Besuretono/cethat– Everything’snow(logical)rowsandcolumns– Theworldisflat;columnsareatomic(1NF)– Dataisnowconnectedviakeys(foreign/primary)
Order(id,custName,custCity,total)
Item(order-id,ino,product-sku,qty,price)
Product(sku,name,listPrice,size,power)123FredLA25.97401GarfieldT-Shirt9.99XLnull
544USBCharger5.99null115V
123140129.99
123254413.99
6/2/17
5
AstheRela/onalEraUnfolded• TheSpartansimplicityoftherela/onaldatamodelmadeitpossibletostarttacklingtheopportuni/esandchallengesofalogicaldatamodel– Declara/vequeries(RelAlg/Calc,Quel,QBE,SQL,...)– Transparentindexing(physicaldataindependence)– Queryop/miza/onandexecu/on– Views,constraints,referen/alintegrity,security,...– Scalable(shared-nothing)parallelprocessing
• Today’smul/-$Bindustrywasslowlyborn– Commercialadop/ontook~10-15years– ParallelDBsystemstook~5moreyears
EntertheObject-OrientedDBEra
• No/cethat:– Datamodelcontainsobjectsandpointers(OIDs)– Theworldisnolongerflat–theOrderandProductschemasnowhaveset(Item)andProductinthem,respec/vely
123FredLA25.97{��}401GarfieldT-Shirt9.99XL-
544USBCharger5.99-115V129.99�213.99�
Order
ItemItem
Product
Product
6/2/17
6
WhatOODBsSoughttoOffer• Mo/vatedlargelybylate1980’sCAxapplica/ons(e.g.,mechanicalCAD,VLSICAD,soywareCAD,...)– Richschemaswithinheritance,complexobjects,objectiden/ty,references,...
– Methods(“behavior”)aswellasdataintheDBMS– Tightbindingswith(OO)programminglanguages– Fastnaviga/on,somedeclara/vequerying
• Ex:Gemstone,Ontos,Objec/vity,Versant,ObjectDesign,O2,alsoDASDBS(sortof)
WhyOODBs“FellFlat”• Toosoonforanother(radical)DBtechnology– Alsotechnicallyimmaturerela/vetoRDBMSs
• TightPLbindingswereadouble-edgedsword– Datashared,outlivesprogramminglanguages– Bindingsledtosignificantsystemheterogeneity– Alsomadeschemaevolu/onamajorchallenge
• Systems“overfized”insomedimensions– Inheritance,versionmanagement,...– Focusedonthickclients(e.g.,CADworksta/ons)
6/2/17
7
Product(sku,name,listPrice)ClothingProduct(size)underProductElectricProduct(power)underProduct
EntertheObject-Rela/onalDBEra
• Besuretono/ce:– “Onesizefitsall!”(J)– UDTs/UDFs,tablehierarchies,references,...– Buttheworldgotflazeragain...(TiminglaggedOODBsbyjustafewyears)
Order(id,customer,total)
Item(order-id,ino,product-sku,qty,price)
401GarfieldT-Shirt9.99XL
544USBCharger5.99115V(123)1(401)29.99
(123)2(544)13.99
123FredLA25.97
WhatO-RDBsSoughttoOffer• Mo/vatedbynewlyemergingapplica/onopportuni/es(mul/media,spa/al,text,...)– User-definedfunc9ons(UDTs/UDFs)&aggregates– Datablades(UDTs/UDFs+indexingsupport)– OOgoodiesfortables:rowtypes,references,...– Nestedtables(well,atleastOracleaddedthese)
• Backtoamodelwhereapplica/onswerelooselyboundtotheDBMS(e.g.,ODBC/JDBC)
• Ex:ADT-Ingres,Postgres,Starburst,UniSQL,Illustra,DB2,Oracle
6/2/17
8
WhyO-RDBs“FellFlat”• SignificantdifferencesacrossDBvendors– SQLstandardiza/onlaggedsomewhat– Didn’tincludedetailsofUDT/UDFextensions– Toughtoextendtheinnards(forindexing)
• Applica/onissues(andmul/plepla{orms)– Leastcommondenominatorvs.coolestfeatures– Tools(e.g.,DBdesigntools,ORMlayers,...)
• Alsos/llprobablyabittoomuchtoosoon– ITdepartmentss/llrollinginRDBMSsandcrea/ngrela/onaldatawarehouses
ThenCametheXMLDBEra<Orderid=”123”><Customer><custName>Fred</custName><custCity>LA</custCity></Customer><total>25.97</total><Items><Itemino=”1”><product-sku>401</product-sku><qty>2</qty><price>9.99</price></Item><Itemino=“2”><product-sku>544</product-sku><qty>1</qty><price>3.99</price></Itemino=”2”></Items></Order>
<Productsku=”401”><name>GarfieldT-Shirt</name><listPrice>9.99</listPrice><size>XL</size></Product><Productsku=”544”><name>USBCharger</name><listPrice>5.99</listPrice><power>115V</power></Product>
Notethat-Theworld’slessflatagain-We’renowinthe2000’s
6/2/17
9
WhatXMLDBsSoughttoOffer• One<flexible/>datamodelfitsall(XML)– Originsindocumentmarkup(SGML)– Nesteddata– Schemavariety/op/onality
• Newdeclara/vequerylanguage(XQuery)– Designedbothforqueryingandtransforma/on– Earlystandardiza/oneffort(W3C)
• TwodifferentDB-relatedusecases,inreality– Datastorage:Lore(pre-XML),Na/x,Timber,Ipedo,MarkLogic,BaseX;alsoDB2,Oracle,SQLServer
– Dataintegra9on:NimbleTechnology,BEALiquidData(fromEnosys),BEAAquaLogicDataServicesPla{orm
WhyXMLDBs“FellFlat”Too• Document-centricorigins(vs.datausecases)ofXMLSchemaandXQuerymadeamessofthings– W3CXPATHlegacy(K)– Documentiden/ty,documentorder,...– Azributesvs.elements,nulls,...– Mixedcontent(overkillfornon-documentdata)
• Twootherexternaltrendsalsoplayedarole– SOAandWebservicescamebutthenwent– JSON(andRESTfulservices)appearedonthescene
• Note:Likelys/llanimportantnichemarket...
6/2/17
10
NowtheNoSQLDBEra?• NotfromtheDBworld– Distributedsystemsfolks– Alsovariousstartups
• FromcachesàK/Vusecases– Neededmassivescale-out– OLTP(vs.parallelDB)apps– Simple,low-latencyAPI– NeedakeyK,butwantnoschemaforV– Record-levelatomicity,replicaconsistencyvaries
• Inthecontextofthistalk,NoSQLdoesnotmean– Hadoop(orSQLonHadoop)– Graphdatabasesorgraphanaly/cspla{orms
NoSQLData(JSON-based)
{“id”:“123”,“Customer”:{“custName”:“Fred”,“custCity”:“LA”}“total”:25.97,“Items”:[{“product-sku”:401,“qty”:2,“price”:9.99},{“product-sku”:544,“qty”:1,“price”:3.99}]}
{“sku”:401,“name”:“GarfieldT-Shirt”,“listPrice”:9.99,“size”:“XL”}{“sku”:544,“name”:“USBCharger”,“listPrice”:5.99,“power”:“115V”}
Notethat-Theworld’snotflat,butit’sless<messy/>-We’renowinthe2010’s,/ming-wise
Collec/on(Order) Collec/on(Product)
6/2/17
11
• Popularexamples:MongoDB,Couchbase• Cove/ngthebenefitsofmanyDBgoodies– Secondaryindexingandnon-keyaccess– Declara/vequeries– Aggregatesandnow(ini/allysmall)joins
• Seemtobeheadingtowards...– BDMS(thinkscalable,OLTP-aimed,parallelDBMS)– Declara/vequeriesandqueryop/miza/on,butappliedtoschema-lessdata
– Returnof(some,op/onal!)schemainforma/on
CurrentNoSQLTrends
OurExample:ApacheAsterixDB
hzp://asterixdb.apache.org/
6/2/17
12
BigData/WebWarehousing
23
Sowhatwenton–andwhy?
What’sgoingonrightnow?
What’sgoingon…?
24
Also:Today’sBigDataTangle
(Pig)
SQL
6/2/17
13
AsterixDB:“OneSizeFitsaBunch”
25
SemistructuredDataManagement
ParallelDatabaseSystems
Data-IntensiveComputing
BDMSDesiderata:• Flexibledatamodel• Efficientrun/me• Fullquerycapability• Costpropor/onalto
taskathand(!)• Designedfor
con/nuousdatainges/on
• Supporttoday’s“BigDatadatatypes”
• • •
• BuildanewBigDataManagementSystem(BDMS)– Runonlargecommodityclusters– Handlemassquan//esofsemistructureddata– Openlylayered,forselec/vereusebyothers– Sharewiththecommunityviaopensource(ASF)
• Conductscalableinforma/onsystemsresearch,e.g.,– Large-scalequeryprocessingandworkloadmanagement– Highlyscalablestorageandindexmanagement– Fuzzymatching,spa/aldata,date//medata(allinparallel)– Novelsupportfor“fastdata”(bothinandout)
• Trainnextgenera/onof“BigData”graduates26
ProjectGoals
6/2/17
14
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,userSince:date/me,address:{street:string,city:string,state:string,zip:string,country:string},friendIds:{{int32}},employment:[EmploymentType]};
ASTERIXDataModel(ADM)
27
createdatasetMugshotUsers(MugshotUserType)primarykeyid;
Highlightsinclude:• JSON++baseddatamodel• Richtypesupport(spa/al,temporal,…)• Records,lists,bags• Openvs.closedtypes
createtypeEmploymentTypeasopen{organiza/onName:string,startDate:date,endDate:date?};
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,user-since:date/me,address:{street:string,city:string,state:string,zip:string,country:string},friend-ids:{{int32}},employment:[EmploymentType]}
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32};
ASTERIXDataModel(ADM)
28
createdatasetMugshotUsers(MugshotUserType)primarykeyid;
Highlightsinclude:• JSON++baseddatamodel• Richtypesupport(spa/al,temporal,…)• Records,lists,bags• Openvs.closedtypes
createtypeEmploymentTypeasopen{organiza/onName:string,startDate:date,endDate:date?};
6/2/17
15
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,user-since:date/me,address:{street:string,city:string,state:string,zip:string,country:string},friend-ids:{{int32}},employment:[EmploymentType]}
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32};createtypeMugshotMessageTypeasclosed{messageId:int32,authorId:int32,/mestamp:date/me,inResponseTo:int32?,senderLoca/on:point?,tags:{{string}},message:string};
ASTERIXDataModel(ADM)
29
createdatasetMugshotUsers(MugshotUserType)primarykeyid;createdatasetMugshotMessages(MugshotMessageType)primarykeymessageId;
Highlightsinclude:• JSON++baseddatamodel• Richtypesupport(spa/al,temporal,…)• Records,lists,bags• Openvs.closedtypes
createtypeEmploymentTypeasopen{organiza/onName:string,startDate:date,endDate:date?};
30
{"id":1,"alias":"Margarita","name":"MargaritaStoddard","address":{"street":"234ThomasAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date/me("2012-08-20T10:10:00"),"friendIds":{{2,3,6,10}},"employment":[{"organiza/onName":"Codetechno","startDate":date("2006-08-06")}]},{"id":2,"alias":"Isbel","name":"IsbelDull","address":{"street":"345JamesAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date/me("2011-01-22T10:10:00"),"friendIds":{{1,4}},"employment":[{"organiza/onName":"Hexviafind","startDate":date("2010-04-27")}]},{"id":3,"alias":"Emory","name":"EmoryUnk","address":{"street":"456JoseAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date/me("2012-07-10T10:10:00"),"friendIds":{{1,5,8,9}},"employment":[{"organiza/onName":"geomedia","startDate":date("2010-06-17"),"endDate":date("2010-01-26")}]}...
Ex:MugshotUsersData
6/2/17
16
createindexmsUserSinceIdxonMugshotUsers(userSince);createindexmsTimestampIdxonMugshotMessages(/mestamp);createindexmsAuthorIdxonMugshotMessages(authorId)typebtree;createindexmsSenderLocIndexonMugshotMessages(senderLoca/on)typertree;createindexmsMessageIdxonMugshotMessages(message)typekeyword;
//---------------------andalso------------------------------------------------------------------------------------
createtypeAccessLogTypeasclosed{ip:string,/me:string,user:string,verb:string,`path`:string,stat:int32,size:int32};createexternaldatasetAccessLog(AccessLogType)usinglocalfs(("path"="{hostname}://{path}"),("format"="delimited-text"),("delimiter"="|"));
createfeedmySocketFeedusingsocket_adaptor(("sockets"="{address}:{port}"),("addressType"="IP"),("type-name"="MugshotMessageType"),("format"="adm"));connectfeedmySocketFeedtodatasetMugshotMessages;
OtherDDLFeatures
31
Externaldatahighlights:• Equalopportunityaccess• “Keepeverything!”• Datainges/on,notstreams
ASTERIXQueries(SQL++orAQL)• Q1:Listtheusernameandmessagessentbythoseusers
whojoinedtheMugshotsocialnetworkinacertain/mewindow:
selectuser.nameasuname,(selectvaluemsg.messagefromMugshotMessagesmsgwheremsg.authorId=user.id)asmessagesfromMugshotUsersuserwhereuser.userSince>=date/me('2010-07-22T00:00:00')anduser.userSince<=date/me('2012-07-29T23:59:59');
32
{"uname":"IsbelDull","messages":["likesamsungtheplanisamazing”,"liket-mobileitspla{ormismind-blowing"]}{"uname":"EmoryUnk","messages":["lovesprintitsshortcut-menuisawesome:)",...]}
6/2/17
17
SQL++(cont.)
33
• Q2:Iden/fyac/veusersandgroup/countthembycountry:
withendTimeascurrent_date/me(),startTimeasendTime-dura/on("P30D")selectuser.address.countryascountry,count(users)asac/veUsersfromMugshotUsersuserwheresomelogrecinAccessLogsaLsfiesuser.alias=logrec.useranddate/me(logrec./me)>=startTimeanddate/me(logrec./me)<=endTimegroupbyuser.address.country; SQL++highlights:
• Lotsofotherfeatures(seewebsite!)• Spa/alpredicatesandaggrega/on• Set-similarity(“fuzzy”)matching
UpdatesandTransac/ons
34
• Key-valuestore-liketransac/ons(w/recordlevelatomicity)
• Insert,delete,andupsertops;index-consistent
• 2PLconcurrency• WALno-steal,
no-forcewithLSMshadowing
• Q3:AddanewusertoMugshot.com:
insertintoMugshotUsers({"id":11,"alias":"John","name":"JohnDoe","userSince":date/me("2012-08-20T10:10:00.000Z"),"address":{"street":"789JaneSt","city":"SanHarry","state":"CA","zip":"98767","country":"USA"},"friendIds":{{5,9,11}},"employment":[{"organiza/onName":"Kongreen","startDate":date("2009-08-11")}]});
6/2/17
18
AsterixDBClusterOverview
3535
Data Loads and Feeds
AQL queries and results
Data publishing
Cluster Controller
MD Node Controller
Node Controller
Node Controller! ! !
Aste
rixD
B
ASTERIXSoywareStack
36
Hivesterix Apache VXQuery
Algebricks Algebra Layer M/R LayerPregelix
Hyracks Data-Parallel Platform
Hyracks Job
HadoopM/R JobPregel Job
AQL HiveQL XQuery
AsterixDB
6/2/17
19
APeekatPerformance
37
APeekatPerformance(cont.)
#AsterixDB38
6/2/17
20
• Poten/alusecaseareasinclude– Socialdataanaly/cs– Cellphoneeventanaly/cs– Behavioralscience– Educa/on– Publichealth– Powerusagemonitoring– Clustermanagementloganaly/cs– ....
39
ExampleAsterixDBUseCases
CurrentStatus
• 4yearini/alNSFproject(250+KLOC),started2009• NowofficiallyApacheAsterixDB!– Semistructured“NoSQL”styledatamodel– Declara/veparallelqueries,inserts,deletes,…– Datastorage/indexing(primary&secondary,LSM-based)– Internalandexternaldatasetsbothsupported– Richsetofdatatypes(includingtext,/me,loca/on)– Fuzzyandspa/alqueryprocessing– NoSQL-liketransac/ons(forinserts/deletes)– Datafeedsandindexesforexternaldatasets– ....
40
6/2/17
21
ForMoreInforma/on
• AsterixprojectUCI/UCRresearchhome– hzp://asterix.ics.uci.edu/
• ApacheAsterixDBhome– hzp://asterixdb.apache.org/
• SQL++Primer– hzp://asterixdb.apache.org/docs/0.9.1/index.html
• NavigatefromCS122awiki(HW)togetandinstallit!– AfewotherresourcesandhintsintheHWmaterials.
QUESTIONS...?41