CompSci516 Data Intensive Computing Systems

59
CompSci 516 Data Intensive Computing Systems Lecture 4 Relational Algebra and Relational Calculus Instructor: Sudeepa Roy 1 Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems

Transcript of CompSci516 Data Intensive Computing Systems

Page 1: CompSci516 Data Intensive Computing Systems

CompSci 516DataIntensiveComputingSystems

Lecture4RelationalAlgebra

andRelationalCalculus

Instructor:Sudeepa Roy

1DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems

Page 2: CompSci516 Data Intensive Computing Systems

Announcement

• Deadlinesextendedbyaweekforprojectproposalandmidtermreport

• Proposal(writeup):09/28(Wed)– Butsendmeanemail(s)mentioningyourgroupmembersanddescribingyourprojectby 09/21

– Projectideastobepostedbythisweekend• Midtermreport:10/28(Fri)• Finalreport:11/28(Mon)

DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 2

Page 3: CompSci516 Data Intensive Computing Systems

Today’stopics

• FinishSQL(fromLecture2)• RelationalAlgebra(RA)andRelationalCalculus(RC)

• Readingmaterial– [RG]Chapter4(RA,RC)– [GUW]Chapters2.4,5.1,5.2

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 3

Acknowledgement:Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.

Page 4: CompSci516 Data Intensive Computing Systems

RelationalQueryLanguages

• Querylanguages:Allowmanipulationandretrievalofdatafromadatabase

• Relationalmodelsupportssimple,powerfulQLs:– Strongformalfoundationbasedonlogic– Allowsformuchoptimization

• QueryLanguages!= programminglanguages– QLsnotintendedtobeusedforcomplexcalculations– QLssupporteasy,efficientaccesstolargedatasets

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 4

Page 5: CompSci516 Data Intensive Computing Systems

FormalRelationalQueryLanguages

• Two“mathematical”QueryLanguagesformthebasisfor“real”languages(e.g.SQL),andforimplementation:– RelationalAlgebra:Moreoperational,veryusefulforrepresentingexecutionplans

– RelationalCalculus:Letsusersdescribewhattheywant,ratherthanhowtocomputeit(Non-operational,declarative)

• Note:Declarative(RC,SQL)vs.Operational(RA)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 5

Page 6: CompSci516 Data Intensive Computing Systems

Preliminaries• Aqueryisappliedtorelationinstances,andtheresultofaqueryisalsoarelationinstance.– Schemasofinputrelationsforaqueryarefixed

• querywillrunregardlessofinstance

– Theschemafortheresultofagivenqueryisalsofixed• Determinedbydefinitionofquerylanguageconstructs

• Positionalvs.named-fieldnotation:– Positionalnotationeasierforformaldefinitions,named-fieldnotationmorereadable

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 6

Page 7: CompSci516 Data Intensive Computing Systems

ExampleSchemaandInstances

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day22 101 10/10/9658 103 11/12/96

R1

S1 S2

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 8: CompSci516 Data Intensive Computing Systems

LogicNotations

• $ Thereexists• " Forall• ∧ LogicalAND• ∨ LogicalOR• ¬NOT

Page 9: CompSci516 Data Intensive Computing Systems

RelationalAlgebra(RA)

Page 10: CompSci516 Data Intensive Computing Systems

RelationalAlgebra

• Takesoneormorerelationsasinput,andproducesarelationasoutput– operator– operand– semantic– soanalgebra!

• Sinceeachoperationreturnsarelation,operationscanbecomposed– Algebrais“closed”

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 10

Page 11: CompSci516 Data Intensive Computing Systems

RelationalAlgebra• Basicoperations:

– Selection(σ)Selectsasubsetofrowsfromrelation– Projection(π)Deletesunwantedcolumnsfromrelation.– Cross-product(x)Allowsustocombinetworelations.– Set-difference(-)Tuplesinreln.1,butnotinreln.2.– Union(∪)Tuplesinreln.1orinreln.2.

• Additionaloperations:– Intersection(∩)– join⨝– division(/)– renaming(ρ)– Notessential,but(very)useful.

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 11

Page 12: CompSci516 Data Intensive Computing Systems

Projection

sname ratingyuppy 9lubber 8guppy 5rusty 10p sname rating S, ( )2

age35.055.5

page S( )2

• Deletesattributesthatarenotinprojectionlist.

• Schema ofresultcontainsexactlythefieldsintheprojectionlist,withthesamenamesthattheyhadinthe(only)inputrelation.

• Projectionoperatorhastoeliminateduplicates(Why)

– Note:realsystemstypicallydon’tdoduplicateeliminationunlesstheuserexplicitlyasksforit(performance)

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

10DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

Page 13: CompSci516 Data Intensive Computing Systems

Selection

s rating S>8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sname ratingyuppy 9rusty 10

p ssname rating rating S, ( ( ))>8 2

• Selectsrowsthatsatisfyselectioncondition

• Noduplicatesinresult.Why?

• Schemaofresultidenticaltoschemaof(only)inputrelation

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

11DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

Page 14: CompSci516 Data Intensive Computing Systems

CompositionofOperators

• Resultrelationcanbetheinputforanotherrelationalalgebraoperation– Operatorcomposition s rating S>8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sname ratingyuppy 9rusty 10

p ssname rating rating S, ( ( ))>8 2

Page 15: CompSci516 Data Intensive Computing Systems

Union,Intersection,Set-Difference

• Alloftheseoperationstaketwoinputrelations,whichmustbeunion-compatible:– Samenumberoffields.– `Corresponding’fieldshavethesametype

– sameschemaastheinputs

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

S S1 2È

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1 S2

Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 12

Page 16: CompSci516 Data Intensive Computing Systems

Union,Intersection,Set-Difference

• Note:noduplicate– “Setsemantic”– SQL:UNION– SQLallows“bagsemantic”aswell:UNIONALL

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

S S1 2È

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1 S2

Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 12

Page 17: CompSci516 Data Intensive Computing Systems

Union,Intersection,Set-Difference

sid sname rating age31 lubber 8 55.558 rusty 10 35.0

S S1 2Ç

sid sname rating age22 dustin 7 45.0

S S1 2-

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1 S2

Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 13

Page 18: CompSci516 Data Intensive Computing Systems

Cross-Product• EachrowofS1ispairedwitheachrowofR1.• ResultschemahasonefieldperfieldofS1andR1,withfield

names`inherited’ifpossible.– Conflict:BothS1andR1haveafieldcalledsid.

(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96

DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 18

Page 19: CompSci516 Data Intensive Computing Systems

RenamingOperator⍴

(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96

§Ingeneral,canuse⍴(<Temp>,<RA-expression>)

DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 19

(⍴sid →sid1S1)⨉ (⍴sid →sid1R1)or

⍴(C(1→sid1,5→sid2),S1⨉ R1)Cisthenewrelationname

Page 20: CompSci516 Data Intensive Computing Systems

Joins

• Resultschemasameasthatofcross-product.• Fewertuplesthancross-product,mightbeableto

computemoreefficiently

R c S c R S!" = ´s ( )

(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96

S RS sid R sid1 11 1!" . .<

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 20

Page 21: CompSci516 Data Intensive Computing Systems

Findnamesofsailorswho’vereservedboat#103

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 21

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 22: CompSci516 Data Intensive Computing Systems

Findnamesofsailorswho’vereservedboat#103

• Solution1:

• Solution2:

p ssname bid serves Sailors(( Re ) )=103 !"

p ssname bid serves Sailors( (Re ))=103 !"

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 22

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 23: CompSci516 Data Intensive Computing Systems

ExpressinganRAexpressionasaTree

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 23

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

p ssname bid serves Sailors(( Re ) )=103 !"Sailors Reserves

σbid=103

⨝sid =sid

πsname

Alsocalledalogicalqueryplan

Page 24: CompSci516 Data Intensive Computing Systems

Findsailorswho’vereservedaredoragreenboat

• Canidentifyallredorgreenboats,thenfindsailorswho’vereservedoneoftheseboats:

r s( , ( ' ' ' ' ))Tempboats color red color green Boats= Ú =

p sname Tempboats serves Sailors( Re )!" !"

Can also define Tempboats using unionTry the “AND” version yourself

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 24

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Useofrenameoperation

Page 25: CompSci516 Data Intensive Computing Systems

Whataboutaggregates?

• Extendedrelationalalgebra• 𝝲age,avg(rating)→ avgr Sailors• Alsoextendedto“bagsemantic”:allowduplicates

– Takeintoaccountcardinality– RandShavetupletresp.mandntimes– R∪ Shastm+n times– R∩Shastmin(m,n)times– R– Shastmax(0,m-n)times– sorting(τ),duplicateremoval(ẟ)operators

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 25

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 26: CompSci516 Data Intensive Computing Systems

RelationalCalculus(RC)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 26

Page 27: CompSci516 Data Intensive Computing Systems

RelationalCalculus• RAisprocedural

– πA(σA=a R)andσA=a (πAR)areequivalentbutdifferentexpressions

• RC– non-proceduralanddeclarative– describesasetofanswerswithoutbeingexplicitabouthowthey

shouldbecomputed

• TRC(tuplerelationalcalculus)– variablestaketuplesasvalues– wewillprimarilydoTRC

• DRC(domainrelationalcalculus)– variablesrangeoverfieldvalues

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 27

Page 28: CompSci516 Data Intensive Computing Systems

TRC:example

• Findthenameandageofallsailorswitharatingabove7

{P|∃ SϵSailors(S.rating >7⋀ P.name =S.name ⋀ P.age =S.age)}

• Pisatuplevariable– withexactlytwofieldsnameandage(schemaoftheoutputrelation)– P.name =S.name ⋀ P.age =S.age givesvaluestothefieldsofananswer

tuple

• Useparentheses,∀ ∃ ⋁ ⋀ ><= ≠ ¬etc asnecessary• A⇒ Bisveryusefultoo

– nextslideDukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 28

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

$ Thereexists

Page 29: CompSci516 Data Intensive Computing Systems

A⇒ B

• A“implies”B• Equivalently,ifAistrue,Bmustbetrue• Equivalently,¬A⋁ B,i.e.

– eitherAisfalse(thenBcanbeanything)– otherwise(i.e.Aistrue)Bmustbetrue

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 29

Page 30: CompSci516 Data Intensive Computing Systems

UsefulLogicalEquivalences

• "xP(x) =¬$x [¬P(x)]

• ¬(P∨Q) =¬P∧ ¬Q• ¬(P∧ Q) =¬P∨ ¬Q

– Similarly,¬(¬P∨Q)=P∧ ¬Qetc.

• AÞ B=¬A∨ B

DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 30

$ Thereexists" Forall∧ LogicalAND∨ LogicalOR¬ NOT

deMorgan’slaws

Page 31: CompSci516 Data Intensive Computing Systems

TRC:example

• Findthenamesofsailorswhohavereservedatleasttwoboats

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 31

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 32: CompSci516 Data Intensive Computing Systems

TRC:example

• Findthenamesofsailorswhohavereservedatleasttwoboats

{P|∃ SϵSailors(∃ R1ϵ Reserves∃ R2ϵReserves⋀ S.sid =R1.sid⋀ S.sid =R2.sid⋀ R1.bid≠R2.bid⋀ P.name =S.name)}

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 32

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 33: CompSci516 Data Intensive Computing Systems

TRC:example

• Findthenamesofsailorswhohavereservedallboats• Divisionoperation

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 33

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 34: CompSci516 Data Intensive Computing Systems

TRC:example

• Findthenamesofsailorswhohavereservedallboats• DivisionoperationinRA!

{P|∃ SϵSailors[∀BϵBoats(∃ Rϵ Reserves(S.sid =R.sid ⋀R.bid =B.bid))]⋀ P.name =S.name}

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 34

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 35: CompSci516 Data Intensive Computing Systems

TRC:example

• Findthenamesofsailorswhohavereservedallred boats

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 35

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

HowwillyouchangethepreviousTRCexpression?

Page 36: CompSci516 Data Intensive Computing Systems

TRC:example

• Findthenamesofsailorswhohavereservedallred boats{P|∃ SϵSailors∀BϵBoats(B.color =‘red’⇒ (∃ Rϵ Reserves(S.sid =R.sid ⋀ R.bid =B.bid ⋀ P.name =S.name)))}

RecallthatA⇒ Bislogicallyequivalentto¬A⋁ Bso⇒ canbeavoided,butitiscleanerandmoreintuitive

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 36

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 37: CompSci516 Data Intensive Computing Systems

DRC:example

• Findthenameandageofallsailorswitharatingabove7

TRC:{P|∃ SϵSailors(S.rating >7⋀ P.name =S.name ⋀ P.age =S.age)}

DRC:{<N,A>|∃ <I,N,T,A>ϵSailors⋀ T>7}

• Variablesarenowdomainvariables• WewilluseuseTRC

– bothareequivalent

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 37

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Page 38: CompSci516 Data Intensive Computing Systems

MoreExamples:RC

• Thefamous“Drinker-Beer-Bar”example!

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 38

Acknowledgement:examplesandslidesbyProfs.BalazinskaandSuciu,andthe[GUW]book

UNDERSTANDTHEDIFFERENCEINANSWERSFORALLFOURDRINKERS

Page 39: CompSci516 Data Intensive Computing Systems

DrinkerCategory1

Find drinkers that frequent some bar that serves some beer they like.

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

39CompSci516:DataIntensiveComputingSystems

DukeCS,Spring2016

Page 40: CompSci516 Data Intensive Computing Systems

DrinkerCategory1

Find drinkers that frequent some bar that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

a shortcut for{x | $Y ϵ Frequents Z ϵ Serves W ϵ Likes (T.drinker = x.drinker∧ T.bar = Z.bar∧ W.beer = ……}

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

40CompSci516:DataIntensiveComputingSystems

DukeCS,Spring2016

Thedifferenceisthatinthefirstone,onevariable=oneattributeinthesecondone,onevariable=onetuple(TupleRC)Bothareequivalentandfeelfreetousetheonethatisconvenienttoyou

Page 41: CompSci516 Data Intensive Computing Systems

DrinkerCategory2

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

41CompSci516:DataIntensiveComputingSystems

Q(x) = …

DukeCS,Spring2016

Page 42: CompSci516 Data Intensive Computing Systems

DrinkerCategory2

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

42CompSci516:DataIntensiveComputingSystems

DukeCS,Spring2016

Page 43: CompSci516 Data Intensive Computing Systems

DrinkerCategory3

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

43CompSci516:DataIntensiveComputingSystems

Q(x) = …

DukeCS,Spring2016

Page 44: CompSci516 Data Intensive Computing Systems

DrinkerCategory3

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

44CompSci516:DataIntensiveComputingSystems

DukeCS,Spring2016

Page 45: CompSci516 Data Intensive Computing Systems

DrinkerCategory4

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent only bars that serves only beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

45CompSci516:DataIntensiveComputingSystems

Q(x) = …

DukeCS,Spring2016

Page 46: CompSci516 Data Intensive Computing Systems

DrinkerCategory4

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent only bars that serves only beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Q(x) = "y. Frequents(x, y)Þ "z.(Serves(y,z) Þ Likes(x,z))

Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

46DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems

Page 47: CompSci516 Data Intensive Computing Systems

WhyshouldwecareaboutRC• RCisdeclarative,likeSQL,andunlikeRA(whichisoperational)

• Givesfoundationofdatabasequeriesinfirst-orderlogic– youcannotexpressallaggregatesinRC,e.g.cardinalityofarelationorsum(possibleinextendedRAandSQL)

– stillcanexpressconditionslike“atleasttwotuples”(oranyconstant)

• RCexpressionmaybemuchsimplerthanSQLqueries– andeasiertocheckforcorrectnessthanSQL– powertouse" and Þ– then you can systematically go to a “correct” SQL

query

DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 47

Page 48: CompSci516 Data Intensive Computing Systems

FromRCtoSQL

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer (so much) that they frequent all bars that serve it

CompSci516:DataIntensiveComputingSystems

48

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Spring2016

Page 49: CompSci516 Data Intensive Computing Systems

FromRCtoSQL

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer so much that they frequent all bars that serve it

Step 1: Replace " with $ using de Morgan’s Laws

Q(x) = $y. Likes(x, y)∧ ¬$z.(Serves(z,y) ∧ ¬Frequents(x,z))

CompSci516:DataIntensiveComputingSystems

49

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

"x P(x) same as¬$x ¬P(x)

¬(¬P∨Q) same asP∧ ¬ Q

º Q(x) = $y. Likes(x, y)∧"z.(¬ Serves(z,y) ∨ Frequents(x,z))

DukeCS,Spring2016

Page 50: CompSci516 Data Intensive Computing Systems

FromRCtoSQL

SELECT DISTINCT L.drinkerFROM Likes LWHERE not exists

(SELECT S.barFROM Serves SWHERE L.beer=S.beer

AND not exists (SELECT * FROM Frequents FWHERE F.drinker=L.drinker

AND F.bar=S.bar))CompSci516:DataIntensiveComputing

Systems50

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Spring2016

Q(x) = $y. Likes(x, y) ∧¬ $z.(Serves(z,y)∧¬Frequents(x,z))

Step 2: Translate into SQL

Page 51: CompSci516 Data Intensive Computing Systems

The“correct”intermediatesteps• WritethequeryinRC• Ifyouhaveavariableunder“negation”,alsoaddthe

“domain”,i.e.wherethevariableappearswithoutanegation– e.g.ifyouhave¬H(x,y)forasubquery,– where xandycanonlycomefromarelationR(x,y)– makeitR(x,y)∧ ¬H(x,y)

– Thisistomakethequery“safe”and“domainindependent”–wewilldiscussthiswhenwedoDatalog

– Intuitively,ifyouaretryingtofind“sailorsthatdonotsatisfysomecriteria”youhavetospecifythedomainofsailors,sayfromthesailortable,otherwiseyouarelookingataninfinitespace

CompSci516:DataIntensiveComputingSystems 51DukeCS,Spring2016

thisslidecanbeskippeduntilwedoDatalogwillrevisit”Safequeries”then

Page 52: CompSci516 Data Intensive Computing Systems

The“correct”intermediatestep

DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems

52

Make all subqueries with negation domain independenti.e. say where x is coming from

Q(x) = $y. Likes(x, y) ∧ ¬$z.(Likes(x,y)∧Serves(z,y)∧¬Frequents(x,z))

SELECT DISTINCT L.drinker FROM Likes LWHERE not exists

(SELECT * FROM Likes L2, Serves SWHERE L2.drinker=L.drinker and L2.beer=L.beer

and L2.beer=S.beerand not exists (SELECT * FROM Frequents F

WHERE F.drinker=L2.drinkerand F.bar=S.bar))

which can be simplifiedto the previous query

thisslidecanbeskippeduntilwedoDatalogwillrevisit”Safequeries”then

Page 53: CompSci516 Data Intensive Computing Systems

Summary

• YoulearntthreequerylanguagesfortheRelationalDBmodel– SQL– RA– RC

• Allhavetheirownpurposes

• Youshouldbeabletowriteaqueryinallthreelanguagesandconvertfromonetoanother– However,youhavetobecareful,notall“valid”expressionsinonemaybe

expressedinanother– {S|¬(SϵSailors)}– infinitelymanytuples– an“unsafe”query– Morewhenwedo“Datalog”,alsoseeCh.4.4in[RG]

• Nexttopic:DBMSinternals– storage/indexing,queryexecution,algorithms,optimization

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 53

Page 54: CompSci516 Data Intensive Computing Systems

AdditionalSlidesDivisioninRA

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 54

Page 55: CompSci516 Data Intensive Computing Systems

Division• Notsupportedasaprimitiveoperator,butusefulforexpressing

querieslike:

Findsailorswhohavereservedall boats.

• LetAhave2fields,xandy;Bhaveonlyfieldy:

– A/B= {<x>:∀<y>ϵB∃ <x,y>ϵA}

– i.e.,A/Bcontainsallxtuples(sailors)suchthatforevery ytuple(boat)inB,thereisanxy tupleinA.

– Equivalently,– Ifthesetofyvalues(boats)associatedwithanxvalue(sailor)inAcontains

allyvaluesinB,thexvalueisinA/B.

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 55

Page 56: CompSci516 Data Intensive Computing Systems

ExamplesofDivisionA/B

sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4

pnop2

pnop2p4

pnop1p2p4

snos1s2s3s4

snos1s4

snos1

A

B1B2

B3

A/B1 A/B2 A/B3DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 56

Page 57: CompSci516 Data Intensive Computing Systems

ExpressingA/BUsingBasicOperators

• Divisionisnotessentialop;justausefulshorthand.– Alsotrueofjoins=selectiononcrossproducts– butjoinsaresocommonthatsystemsimplementjoinsspecially

• Idea:ForA/B,computeallxvaluesthatare“notdisqualified”bysomeyvalueinB

– xvalueisdisqualifiedifbyattachingyvaluefromB,weobtainanxy tuplethatisnotinA

• ForA(x,y)andB(y):– seeexamplefirstonthenextslide

A/B: p px x A B A(( ( ) ) )´ -p x A( ) -

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 57

Page 58: CompSci516 Data Intensive Computing Systems

Example:A/Billustration• Whatis

– Firstcrossproduct:• {s1,s2,s3,s4}⨉ {p2,p4}

– ThenmissingcombinationsinA:• {<s2,p4>,<s3,p4>}

– Then“disqualified”Atuples• {s2,s3}

• Whatis

– The“qualified”Atuples• {s1,s2,s3,s4}– {s2,s3}={s1,s4}

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 58

sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4

pnop2p4

A

B2

p px x A B A(( ( ) ) )´ -

p x A( ) - p px x A B A(( ( ) ) )´ -snos1s4

A/B2

Page 59: CompSci516 Data Intensive Computing Systems

Findthenamesofsailorswho’vereservedallboats

• Usesdivision;schemasoftheinputrelationsto/mustbecarefullychosen:

r p p( , ( , Re ) / ( ))Tempsids sid bid serves bid Boats

p sname Tempsids Sailors( )!"

• To find sailors who’ve reserved all ‘Interlake’ boats:

/ ( ' ' )p sbid bname Interlake Boats=.....

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 59

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)