CompSci516 Data Intensive Computing Systems

Post on 29-Dec-2021

5 views 0 download

Transcript of CompSci516 Data Intensive Computing Systems

CompSci 516DataIntensiveComputingSystems

Lecture4RelationalAlgebra

andRelationalCalculus

Instructor:Sudeepa Roy

1DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems

Announcement

• Deadlinesextendedbyaweekforprojectproposalandmidtermreport

• Proposal(writeup):09/28(Wed)– Butsendmeanemail(s)mentioningyourgroupmembersanddescribingyourprojectby 09/21

– Projectideastobepostedbythisweekend• Midtermreport:10/28(Fri)• Finalreport:11/28(Mon)

DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 2

Today’stopics

• FinishSQL(fromLecture2)• RelationalAlgebra(RA)andRelationalCalculus(RC)

• Readingmaterial– [RG]Chapter4(RA,RC)– [GUW]Chapters2.4,5.1,5.2

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 3

Acknowledgement:Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.

RelationalQueryLanguages

• Querylanguages:Allowmanipulationandretrievalofdatafromadatabase

• Relationalmodelsupportssimple,powerfulQLs:– Strongformalfoundationbasedonlogic– Allowsformuchoptimization

• QueryLanguages!= programminglanguages– QLsnotintendedtobeusedforcomplexcalculations– QLssupporteasy,efficientaccesstolargedatasets

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 4

FormalRelationalQueryLanguages

• Two“mathematical”QueryLanguagesformthebasisfor“real”languages(e.g.SQL),andforimplementation:– RelationalAlgebra:Moreoperational,veryusefulforrepresentingexecutionplans

– RelationalCalculus:Letsusersdescribewhattheywant,ratherthanhowtocomputeit(Non-operational,declarative)

• Note:Declarative(RC,SQL)vs.Operational(RA)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 5

Preliminaries• Aqueryisappliedtorelationinstances,andtheresultofaqueryisalsoarelationinstance.– Schemasofinputrelationsforaqueryarefixed

• querywillrunregardlessofinstance

– Theschemafortheresultofagivenqueryisalsofixed• Determinedbydefinitionofquerylanguageconstructs

• Positionalvs.named-fieldnotation:– Positionalnotationeasierforformaldefinitions,named-fieldnotationmorereadable

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 6

ExampleSchemaandInstances

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day22 101 10/10/9658 103 11/12/96

R1

S1 S2

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

LogicNotations

• $ Thereexists• " Forall• ∧ LogicalAND• ∨ LogicalOR• ¬NOT

RelationalAlgebra(RA)

RelationalAlgebra

• Takesoneormorerelationsasinput,andproducesarelationasoutput– operator– operand– semantic– soanalgebra!

• Sinceeachoperationreturnsarelation,operationscanbecomposed– Algebrais“closed”

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 10

RelationalAlgebra• Basicoperations:

– Selection(σ)Selectsasubsetofrowsfromrelation– Projection(π)Deletesunwantedcolumnsfromrelation.– Cross-product(x)Allowsustocombinetworelations.– Set-difference(-)Tuplesinreln.1,butnotinreln.2.– Union(∪)Tuplesinreln.1orinreln.2.

• Additionaloperations:– Intersection(∩)– join⨝– division(/)– renaming(ρ)– Notessential,but(very)useful.

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 11

Projection

sname ratingyuppy 9lubber 8guppy 5rusty 10p sname rating S, ( )2

age35.055.5

page S( )2

• Deletesattributesthatarenotinprojectionlist.

• Schema ofresultcontainsexactlythefieldsintheprojectionlist,withthesamenamesthattheyhadinthe(only)inputrelation.

• Projectionoperatorhastoeliminateduplicates(Why)

– Note:realsystemstypicallydon’tdoduplicateeliminationunlesstheuserexplicitlyasksforit(performance)

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

10DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

Selection

s rating S>8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sname ratingyuppy 9rusty 10

p ssname rating rating S, ( ( ))>8 2

• Selectsrowsthatsatisfyselectioncondition

• Noduplicatesinresult.Why?

• Schemaofresultidenticaltoschemaof(only)inputrelation

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

11DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

CompositionofOperators

• Resultrelationcanbetheinputforanotherrelationalalgebraoperation– Operatorcomposition s rating S>8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sname ratingyuppy 9rusty 10

p ssname rating rating S, ( ( ))>8 2

Union,Intersection,Set-Difference

• Alloftheseoperationstaketwoinputrelations,whichmustbeunion-compatible:– Samenumberoffields.– `Corresponding’fieldshavethesametype

– sameschemaastheinputs

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

S S1 2È

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1 S2

Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 12

Union,Intersection,Set-Difference

• Note:noduplicate– “Setsemantic”– SQL:UNION– SQLallows“bagsemantic”aswell:UNIONALL

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

S S1 2È

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1 S2

Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 12

Union,Intersection,Set-Difference

sid sname rating age31 lubber 8 55.558 rusty 10 35.0

S S1 2Ç

sid sname rating age22 dustin 7 45.0

S S1 2-

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S1 S2

Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 13

Cross-Product• EachrowofS1ispairedwitheachrowofR1.• ResultschemahasonefieldperfieldofS1andR1,withfield

names`inherited’ifpossible.– Conflict:BothS1andR1haveafieldcalledsid.

(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96

DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 18

RenamingOperator⍴

(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96

§Ingeneral,canuse⍴(<Temp>,<RA-expression>)

DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 19

(⍴sid →sid1S1)⨉ (⍴sid →sid1R1)or

⍴(C(1→sid1,5→sid2),S1⨉ R1)Cisthenewrelationname

Joins

• Resultschemasameasthatofcross-product.• Fewertuplesthancross-product,mightbeableto

computemoreefficiently

R c S c R S!" = ´s ( )

(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96

S RS sid R sid1 11 1!" . .<

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 20

Findnamesofsailorswho’vereservedboat#103

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 21

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Findnamesofsailorswho’vereservedboat#103

• Solution1:

• Solution2:

p ssname bid serves Sailors(( Re ) )=103 !"

p ssname bid serves Sailors( (Re ))=103 !"

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 22

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

ExpressinganRAexpressionasaTree

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 23

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

p ssname bid serves Sailors(( Re ) )=103 !"Sailors Reserves

σbid=103

⨝sid =sid

πsname

Alsocalledalogicalqueryplan

Findsailorswho’vereservedaredoragreenboat

• Canidentifyallredorgreenboats,thenfindsailorswho’vereservedoneoftheseboats:

r s( , ( ' ' ' ' ))Tempboats color red color green Boats= Ú =

p sname Tempboats serves Sailors( Re )!" !"

Can also define Tempboats using unionTry the “AND” version yourself

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 24

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

Useofrenameoperation

Whataboutaggregates?

• Extendedrelationalalgebra• 𝝲age,avg(rating)→ avgr Sailors• Alsoextendedto“bagsemantic”:allowduplicates

– Takeintoaccountcardinality– RandShavetupletresp.mandntimes– R∪ Shastm+n times– R∩Shastmin(m,n)times– R– Shastmax(0,m-n)times– sorting(τ),duplicateremoval(ẟ)operators

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 25

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

RelationalCalculus(RC)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 26

RelationalCalculus• RAisprocedural

– πA(σA=a R)andσA=a (πAR)areequivalentbutdifferentexpressions

• RC– non-proceduralanddeclarative– describesasetofanswerswithoutbeingexplicitabouthowthey

shouldbecomputed

• TRC(tuplerelationalcalculus)– variablestaketuplesasvalues– wewillprimarilydoTRC

• DRC(domainrelationalcalculus)– variablesrangeoverfieldvalues

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 27

TRC:example

• Findthenameandageofallsailorswitharatingabove7

{P|∃ SϵSailors(S.rating >7⋀ P.name =S.name ⋀ P.age =S.age)}

• Pisatuplevariable– withexactlytwofieldsnameandage(schemaoftheoutputrelation)– P.name =S.name ⋀ P.age =S.age givesvaluestothefieldsofananswer

tuple

• Useparentheses,∀ ∃ ⋁ ⋀ ><= ≠ ¬etc asnecessary• A⇒ Bisveryusefultoo

– nextslideDukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 28

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

$ Thereexists

A⇒ B

• A“implies”B• Equivalently,ifAistrue,Bmustbetrue• Equivalently,¬A⋁ B,i.e.

– eitherAisfalse(thenBcanbeanything)– otherwise(i.e.Aistrue)Bmustbetrue

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 29

UsefulLogicalEquivalences

• "xP(x) =¬$x [¬P(x)]

• ¬(P∨Q) =¬P∧ ¬Q• ¬(P∧ Q) =¬P∨ ¬Q

– Similarly,¬(¬P∨Q)=P∧ ¬Qetc.

• AÞ B=¬A∨ B

DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 30

$ Thereexists" Forall∧ LogicalAND∨ LogicalOR¬ NOT

deMorgan’slaws

TRC:example

• Findthenamesofsailorswhohavereservedatleasttwoboats

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 31

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

TRC:example

• Findthenamesofsailorswhohavereservedatleasttwoboats

{P|∃ SϵSailors(∃ R1ϵ Reserves∃ R2ϵReserves⋀ S.sid =R1.sid⋀ S.sid =R2.sid⋀ R1.bid≠R2.bid⋀ P.name =S.name)}

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 32

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

TRC:example

• Findthenamesofsailorswhohavereservedallboats• Divisionoperation

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 33

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

TRC:example

• Findthenamesofsailorswhohavereservedallboats• DivisionoperationinRA!

{P|∃ SϵSailors[∀BϵBoats(∃ Rϵ Reserves(S.sid =R.sid ⋀R.bid =B.bid))]⋀ P.name =S.name}

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 34

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

TRC:example

• Findthenamesofsailorswhohavereservedallred boats

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 35

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

HowwillyouchangethepreviousTRCexpression?

TRC:example

• Findthenamesofsailorswhohavereservedallred boats{P|∃ SϵSailors∀BϵBoats(B.color =‘red’⇒ (∃ Rϵ Reserves(S.sid =R.sid ⋀ R.bid =B.bid ⋀ P.name =S.name)))}

RecallthatA⇒ Bislogicallyequivalentto¬A⋁ Bso⇒ canbeavoided,butitiscleanerandmoreintuitive

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 36

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

DRC:example

• Findthenameandageofallsailorswitharatingabove7

TRC:{P|∃ SϵSailors(S.rating >7⋀ P.name =S.name ⋀ P.age =S.age)}

DRC:{<N,A>|∃ <I,N,T,A>ϵSailors⋀ T>7}

• Variablesarenowdomainvariables• WewilluseuseTRC

– bothareequivalent

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 37

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

MoreExamples:RC

• Thefamous“Drinker-Beer-Bar”example!

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 38

Acknowledgement:examplesandslidesbyProfs.BalazinskaandSuciu,andthe[GUW]book

UNDERSTANDTHEDIFFERENCEINANSWERSFORALLFOURDRINKERS

DrinkerCategory1

Find drinkers that frequent some bar that serves some beer they like.

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

39CompSci516:DataIntensiveComputingSystems

DukeCS,Spring2016

DrinkerCategory1

Find drinkers that frequent some bar that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

a shortcut for{x | $Y ϵ Frequents Z ϵ Serves W ϵ Likes (T.drinker = x.drinker∧ T.bar = Z.bar∧ W.beer = ……}

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

40CompSci516:DataIntensiveComputingSystems

DukeCS,Spring2016

Thedifferenceisthatinthefirstone,onevariable=oneattributeinthesecondone,onevariable=onetuple(TupleRC)Bothareequivalentandfeelfreetousetheonethatisconvenienttoyou

DrinkerCategory2

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

41CompSci516:DataIntensiveComputingSystems

Q(x) = …

DukeCS,Spring2016

DrinkerCategory2

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

42CompSci516:DataIntensiveComputingSystems

DukeCS,Spring2016

DrinkerCategory3

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

43CompSci516:DataIntensiveComputingSystems

Q(x) = …

DukeCS,Spring2016

DrinkerCategory3

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

44CompSci516:DataIntensiveComputingSystems

DukeCS,Spring2016

DrinkerCategory4

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent only bars that serves only beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

45CompSci516:DataIntensiveComputingSystems

Q(x) = …

DukeCS,Spring2016

DrinkerCategory4

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent only bars that serves only beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))

Q(x) = "y. Frequents(x, y)Þ "z.(Serves(y,z) Þ Likes(x,z))

Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

46DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems

WhyshouldwecareaboutRC• RCisdeclarative,likeSQL,andunlikeRA(whichisoperational)

• Givesfoundationofdatabasequeriesinfirst-orderlogic– youcannotexpressallaggregatesinRC,e.g.cardinalityofarelationorsum(possibleinextendedRAandSQL)

– stillcanexpressconditionslike“atleasttwotuples”(oranyconstant)

• RCexpressionmaybemuchsimplerthanSQLqueries– andeasiertocheckforcorrectnessthanSQL– powertouse" and Þ– then you can systematically go to a “correct” SQL

query

DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 47

FromRCtoSQL

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer (so much) that they frequent all bars that serve it

CompSci516:DataIntensiveComputingSystems

48

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Spring2016

FromRCtoSQL

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer so much that they frequent all bars that serve it

Step 1: Replace " with $ using de Morgan’s Laws

Q(x) = $y. Likes(x, y)∧ ¬$z.(Serves(z,y) ∧ ¬Frequents(x,z))

CompSci516:DataIntensiveComputingSystems

49

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

"x P(x) same as¬$x ¬P(x)

¬(¬P∨Q) same asP∧ ¬ Q

º Q(x) = $y. Likes(x, y)∧"z.(¬ Serves(z,y) ∨ Frequents(x,z))

DukeCS,Spring2016

FromRCtoSQL

SELECT DISTINCT L.drinkerFROM Likes LWHERE not exists

(SELECT S.barFROM Serves SWHERE L.beer=S.beer

AND not exists (SELECT * FROM Frequents FWHERE F.drinker=L.drinker

AND F.bar=S.bar))CompSci516:DataIntensiveComputing

Systems50

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Spring2016

Q(x) = $y. Likes(x, y) ∧¬ $z.(Serves(z,y)∧¬Frequents(x,z))

Step 2: Translate into SQL

The“correct”intermediatesteps• WritethequeryinRC• Ifyouhaveavariableunder“negation”,alsoaddthe

“domain”,i.e.wherethevariableappearswithoutanegation– e.g.ifyouhave¬H(x,y)forasubquery,– where xandycanonlycomefromarelationR(x,y)– makeitR(x,y)∧ ¬H(x,y)

– Thisistomakethequery“safe”and“domainindependent”–wewilldiscussthiswhenwedoDatalog

– Intuitively,ifyouaretryingtofind“sailorsthatdonotsatisfysomecriteria”youhavetospecifythedomainofsailors,sayfromthesailortable,otherwiseyouarelookingataninfinitespace

CompSci516:DataIntensiveComputingSystems 51DukeCS,Spring2016

thisslidecanbeskippeduntilwedoDatalogwillrevisit”Safequeries”then

The“correct”intermediatestep

DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems

52

Make all subqueries with negation domain independenti.e. say where x is coming from

Q(x) = $y. Likes(x, y) ∧ ¬$z.(Likes(x,y)∧Serves(z,y)∧¬Frequents(x,z))

SELECT DISTINCT L.drinker FROM Likes LWHERE not exists

(SELECT * FROM Likes L2, Serves SWHERE L2.drinker=L.drinker and L2.beer=L.beer

and L2.beer=S.beerand not exists (SELECT * FROM Frequents F

WHERE F.drinker=L2.drinkerand F.bar=S.bar))

which can be simplifiedto the previous query

thisslidecanbeskippeduntilwedoDatalogwillrevisit”Safequeries”then

Summary

• YoulearntthreequerylanguagesfortheRelationalDBmodel– SQL– RA– RC

• Allhavetheirownpurposes

• Youshouldbeabletowriteaqueryinallthreelanguagesandconvertfromonetoanother– However,youhavetobecareful,notall“valid”expressionsinonemaybe

expressedinanother– {S|¬(SϵSailors)}– infinitelymanytuples– an“unsafe”query– Morewhenwedo“Datalog”,alsoseeCh.4.4in[RG]

• Nexttopic:DBMSinternals– storage/indexing,queryexecution,algorithms,optimization

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 53

AdditionalSlidesDivisioninRA

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 54

Division• Notsupportedasaprimitiveoperator,butusefulforexpressing

querieslike:

Findsailorswhohavereservedall boats.

• LetAhave2fields,xandy;Bhaveonlyfieldy:

– A/B= {<x>:∀<y>ϵB∃ <x,y>ϵA}

– i.e.,A/Bcontainsallxtuples(sailors)suchthatforevery ytuple(boat)inB,thereisanxy tupleinA.

– Equivalently,– Ifthesetofyvalues(boats)associatedwithanxvalue(sailor)inAcontains

allyvaluesinB,thexvalueisinA/B.

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 55

ExamplesofDivisionA/B

sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4

pnop2

pnop2p4

pnop1p2p4

snos1s2s3s4

snos1s4

snos1

A

B1B2

B3

A/B1 A/B2 A/B3DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 56

ExpressingA/BUsingBasicOperators

• Divisionisnotessentialop;justausefulshorthand.– Alsotrueofjoins=selectiononcrossproducts– butjoinsaresocommonthatsystemsimplementjoinsspecially

• Idea:ForA/B,computeallxvaluesthatare“notdisqualified”bysomeyvalueinB

– xvalueisdisqualifiedifbyattachingyvaluefromB,weobtainanxy tuplethatisnotinA

• ForA(x,y)andB(y):– seeexamplefirstonthenextslide

A/B: p px x A B A(( ( ) ) )´ -p x A( ) -

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 57

Example:A/Billustration• Whatis

– Firstcrossproduct:• {s1,s2,s3,s4}⨉ {p2,p4}

– ThenmissingcombinationsinA:• {<s2,p4>,<s3,p4>}

– Then“disqualified”Atuples• {s2,s3}

• Whatis

– The“qualified”Atuples• {s1,s2,s3,s4}– {s2,s3}={s1,s4}

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 58

sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4

pnop2p4

A

B2

p px x A B A(( ( ) ) )´ -

p x A( ) - p px x A B A(( ( ) ) )´ -snos1s4

A/B2

Findthenamesofsailorswho’vereservedallboats

• Usesdivision;schemasoftheinputrelationsto/mustbecarefullychosen:

r p p( , ( , Re ) / ( ))Tempsids sid bid serves bid Boats

p sname Tempsids Sailors( )!"

• To find sailors who’ve reserved all ‘Interlake’ boats:

/ ( ' ' )p sbid bname Interlake Boats=.....

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 59

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)