Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS,...

Post on 21-May-2020

2 views 0 download

Transcript of Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS,...

9/10/17

1

CompSci 516DataIntensiveComputingSystems

Lecture5DesignTheoryandNormalization

Instructor:Sudeepa Roy

1DukeCS,Fall2017 CompSci516:DatabaseSystems

Announcements• HW1deadline:

– Dueon09/21(Thurs),11:55pm,nolatedays

• Projectproposaldeadline:– Preliminaryideaandteammembersdueby09/18(Mon)byemailtotheinstructor

– Proposaldueonsakai by09/25(Mon),11:55pm

DukeCS,Fall2017 CompSci516:DatabaseSystems 2

Today

• FinishRCfromLecture4– DRC– Moreexample

• Normalization

DukeCS,Fall2017 CompSci516:DatabaseSystems 3

DRC:example

• Findthenameandageofallsailorswitharatingabove7

TRC:{P|∃ SϵSailors(S.rating >7⋀ P.name =S.name ⋀ P.age =S.age)}

DRC:{<N,A>|∃ <I,N,T,A>ϵSailors⋀ T>7}

• Variablesarenowdomainvariables• WewilluseuseTRC

– bothareequivalent

DukeCS,Fall2017 CompSci516:DatabaseSystems 4

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

MoreExamples:RC

• Thefamous“Drinker-Beer-Bar”example!

DukeCS,Fall2017 CompSci516:DatabaseSystems 5

Acknowledgement:examplesandslidesbyProfs.BalazinskaandSuciu,andthe[GUW]book

UNDERSTANDTHEDIFFERENCEINANSWERSFORALLFOURDRINKERS

DrinkerCategory1

Find drinkers that frequent some bar that serves some beer they like.

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

6CompSci516:DatabaseSystemsDukeCS,Fall2017

9/10/17

2

DrinkerCategory1

Find drinkers that frequent some bar that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

a shortcut for{x | $Y ϵ Frequents Z ϵ Serves W ϵ Likes (T.drinker = x.drinker∧ T.bar = Z.bar∧ W.beer = ……}

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

7CompSci516:DatabaseSystemsDukeCS,Fall2017

Thedifferenceisthatinthefirstone,onevariable=oneattributeinthesecondone,onevariable=onetuple(TupleRC)Bothareequivalentandfeelfreetousetheonethatisconvenienttoyou

DrinkerCategory2/3/4

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent only bars that serves only beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) =

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

8DukeCS,Fall2017 CompSci 516:DatabaseSystems

Q(x) =

Q(x) =

WhyshouldwecareaboutRC• RCisdeclarative,likeSQL,andunlikeRA(whichis

operational)• Givesfoundationofdatabasequeriesinfirst-order

logic– youcannotexpressallaggregatesinRC,e.g.cardinalityofarelationorsum(possibleinextendedRAandSQL)

– stillcanexpressconditionslike“atleasttwotuples”(oranyconstant)

• RCexpressionmaybemuchsimplerthanSQLqueries– andeasiertocheckforcorrectnessthanSQL– powertouse" and Þ– then you can systematically go to a “correct” SQL

query

DukeCS,Fall2017 CompSci516:DatabaseSystems 9

FromRCtoSQL

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer (so much) that they frequent all bars that serve it

CompSci516:DatabaseSystems 10

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2017

FromRCtoSQL

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer so much that they frequent all bars that serve it

Step 1: Replace " with $ using de Morgan’s Laws

Q(x) = $y. Likes(x, y)∧ ¬$z.(Serves(z,y) ∧ ¬Frequents(x,z))

CompSci516:DatabaseSystems 11

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

"x P(x) same as¬$x ¬P(x)

¬(¬P∨Q) same asP∧ ¬ Q

º Q(x) = $y. Likes(x, y)∧"z.(¬ Serves(z,y) ∨ Frequents(x,z))

DukeCS,Fall2017

FromRCtoSQL

SELECT DISTINCT L.drinkerFROM Likes LWHERE not exists

(SELECT S.barFROM Serves SWHERE L.beer=S.beer

AND not exists (SELECT * FROM Frequents FWHERE F.drinker=L.drinker

AND F.bar=S.bar))

CompSci516:DatabaseSystems 12

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2017

Q(x) = $y. Likes(x, y) ∧¬ $z.(Serves(z,y)∧¬Frequents(x,z))

Step 2: Translate into SQL

Query: Find drinkers that like some beer so much that they frequent all bars that serve it

Wewillseea“methodicalandcorrect”translationtrough“safequeries”inDatalog

9/10/17

3

Summary

• YoulearntthreequerylanguagesfortheRelationalDBmodel– SQL– RA– RC

• Allhavetheirownpurposes

• Youshouldbeabletowriteaqueryinallthreelanguagesandconvertfromonetoanother– However,youhavetobecareful,notall“valid”expressionsinonemay

beexpressedinanother– {S|¬(SϵSailors)}– infinitelymanytuples– an“unsafe”query– Morewhenwedo“Datalog”,alsoseeCh.4.4in[RG]

DukeCS,Fall2017 CompSci516:DatabaseSystems 13

Wherearewenow?

WelearntüRelationalModelandQueryLanguagesüSQL,RA,RCüPostgres(DBMS)üXML(overview)§ HW1

DukeCS,Fall2017 CompSci516:DatabaseSystems 14

Next

• DatabaseNormalization– (forgoodschemadesign)

DesignTheoryandNormalization

DukeCS,Fall2017 CompSci516:DatabaseSystems 15

ReadingMaterial

• Databasenormalization– [RG]Chapter19.1to19.5,19.6.1,19.8(overview)– [GUW]Chapter3

DukeCS,Fall2017 CompSci516:DatabaseSystems 16

Acknowledgement:• Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.• SomeslideshavebeenadaptedfromslidesbyProf.JunYang

Whatwillwelearn?

• Whatgoeswrongifwehaveredundantinfoinadatabase?

• Whyandhowshouldyourefineaschema?• FunctionalDependencies– anewkindofintegrityconstraints(IC)

• NormalForms• Howtoobtainthosenormalforms

DukeCS,Fall2017 CompSci516:DatabaseSystems 17

Example

ssn (S) name(N) lot(L)

rating(R)

hourly-wage(W)

hours-worked(H)

111-11-1111 Attishoo 48 8 10 40222-22-2222 Smiley 22 8 10 30333-33-3333 Smethurst 35 5 7 30444-44-4444 Guldu 35 5 7 32555-55-5555 Madayan 35 8 10 40

DukeCS,Fall2017 CompSci516:DatabaseSystems 18

Thelistofhourlyemployeesinanorganization

• key=SSN

9/10/17

4

Example

ssn (S) name(N) lot(L)

rating(R)

hourly-wage(W)

hours-worked(H)

111-11-1111 Attishoo 48 8 10 40222-22-2222 Smiley 22 8 10 30333-33-3333 Smethurst 35 5 7 30444-44-4444 Guldu 35 5 7 32555-55-5555 Madayan 35 8 10 40

DukeCS,Fall2017 CompSci516:DatabaseSystems 19

Thelistofhourlyemployeesinanorganization

• key=SSN• Supposeforagivenrating,thereisonlyonehourly_wage value• Redundancyinthetable• Whyisredundancybad?

Nullsmayormaynothelp

• Doesnothelpredundantstorageorupdateanomalies• Mayhelpinsertionanddeletionanomalies

– caninsertatuplewithnullvalueinthehourly_wage field– butcannotrecordhourly_wage foraratingunlessthereissuchan

employee(SSNcannotbenull)– samefordeletionDukeCS,Fall2017 CompSci516:DatabaseSystems 20

ssn (S) name(N) lot(L)

rating(R)

hourly-wage(W)

hours-worked(H)

111-11-1111 Attishoo 48 8 10 40222-22-2222 Smiley 22 8 10 30333-33-3333 Smethurst 35 5 7 30444-44-4444 Guldu 35 5 7 32555-55-5555 Madayan 35 8 10 40

Summary:Redundancy

Therefore,• Redundancyariseswhentheschemaforcesanassociation

betweenattributesthatis“notnatural”• Wewantschemasthatdonotpermitredundancy

– atleastidentifyschemasthatallowredundancytomakeaninformeddecision(e.g.forperformancereasons)

• Nullvaluemayormaynothelp

• Solution?– decompositionofschema

DukeCS,Fall2017 CompSci516:DatabaseSystems 21

ssn (S) name(N) lot(L)

rating(R)

hourly-wage(W)

hours-worked(H)

111-11-1111 Attishoo 48 8 10 40222-22-2222 Smiley 22 8 10 30333-33-3333 Smethurst 35 5 7 30444-44-4444 Guldu 35 5 7 32555-55-5555 Madayan 35 8 10 40

DukeCS,Fall2017 CompSci516:DatabaseSystems 22

Decomposition

ssn (S) name(N) lot(L)

rating(R)

hours-worked(H)

111-11-1111 Attishoo 48 8 40222-22-2222 Smiley 22 8 30333-33-3333 Smethurst 35 5 30444-44-4444 Guldu 35 5 32555-55-5555 Madayan 35 8 40

rating hourly_wage

8 10

5 7

Decompositionsshouldbeusedjudiciously

1. Doweneedtodecomposearelation?– Severalnormalforms– Ifarelationisnotinoneofthem,mayneedto

decomposefurther

2. Whataretheproblemswithdecomposition?

DukeCS,Fall2017 CompSci516:DatabaseSystems 23

FunctionalDependencies(FDs)• Afunctionaldependency (FD)X→ YholdsoverrelationRif,foreveryallowableinstancer ofR:– i.e.,giventwotuplesinr,iftheXvaluesagree,thentheYvaluesmustalsoagree

– XandYaresets ofattributes– t1ϵr,t2 ϵr,ΠX (t1)=ΠX (t2)impliesΠY (t1)=ΠY (t2)

DukeCS,Fall2017 CompSci516:DatabaseSystems 24

A B C Da1 b1 c1 d1

a1 b1 c1 d2

a1 b2 c2 d1

a2 b1 c3 d1

WhatisapossibleFDhere?

9/10/17

5

FunctionalDependencies(FDs)

• AnFDisastatementaboutall allowablerelations– Mustbeidentifiedbasedonsemanticsofapplication– Givensomeallowableinstancer1 ofR,wecancheckifitviolates someFDf,butwecannottelliff holdsoverR

• KisacandidatekeyforRmeansthatK→R– denotingR=allattributesofRtoo– However,S →RdoesnotrequireS tobeminimal– e.g.Scanbeasuperkey

DukeCS,Fall2017 CompSci516:DatabaseSystems 25

Example

• ConsiderrelationobtainedfromHourly_Emps:– Hourly_Emps (ssn,name,lot,rating,hourly_wage,hours_worked)

• Notation:Wewilldenotea relationschemabylistingtheattributes:SNLRWH

– Basicallytheset ofattributes{S,N,L,R,W,H}– herefirstletterofeachattribute

• FDsonHourly_Emps:– ssn isthekey:S →SNLRWH– ratingdetermineshourly_wages:R→W

DukeCS,Fall2017 CompSci516:DatabaseSystems 26

Armstrong’sAxioms

• X,Y,Zaresetsofattributes

• Reflexivity:IfX⊇ Y,thenX→ Y• Augmentation:IfX→ Y,thenXZ→ YZforanyZ• Transitivity:IfX→ YandY→ Z,thenX→ Z

DukeCS,Fall2017 CompSci516:DatabaseSystems 27

A B C Da1 b1 c1 d1

a1 b1 c1 d2

a1 b2 c2 d1

a2 b1 c3 d1

ApplytheserulesonAB→Candcheck

Armstrong’sAxioms

• Thesearesound andcomplete inferencerulesforFDs– sound:thenonlygenerateFDsinF+ forF– complete:byrepeatedapplicationoftheserules,allFDsinF+willbegenerated

DukeCS,Fall2017 CompSci516:DatabaseSystems 28

• X,Y,Zaresetsofattributes

• Reflexivity:IfX⊇ Y,thenX→ Y• Augmentation:IfX→ Y,thenXZ→ YZforanyZ• Transitivity:IfX→ YandY→ Z,thenX→ Z

AdditionalRules

• FollowfromArmstrong’sAxioms

• Union:IfX→YandX→ Z,thenX→ YZ• Decomposition:IfX→ YZ,thenX→ YandX→ Z

DukeCS,Fall2017 CompSci516:DatabaseSystems 29

A B C Da1 b1 c1 d1

a1 b1 c1 d2

a2 b2 c2 d1

a2 b2 c2 d2

A→B,A→CA→BC

A→BCA→B,A→C

ClosureofasetofFDs

• GivensomeFDs,wecanusuallyinferadditionalFDs:– SSN→DEPT,andDEPT→ LOTimpliesSSN→LOT

• AnFDf isimpliedbyasetofFDsF iff holdswheneverallFDsinF hold.

• F+

=closureofFisthesetofallFDsthatareimpliedbyF

DukeCS,Fall2017 CompSci516:DatabaseSystems 30

9/10/17

6

TocheckifanFDbelongstoaclosure

• ComputingtheclosureofasetofFDscanbeexpensive– Sizeofclosurecanbeexponentialin#attributes

• Typically,wejustwanttocheckifagivenFDX→ YisintheclosureofasetofFDsF

• NoneedtocomputeF+

1. ComputeattributeclosureofX(denotedX+)wrt F:– SetofallattributesAsuchthatX→AisinF+

2. CheckifYisinX+

DukeCS,Fall2017 CompSci 516:DatabaseSystems 31

ComputingAttributeClosure

Algorithm:• closure=X• Repeatuntilnochange

– ifthereisanFDU→VinFsuchthatU⊆closure,thenclosure=closure∪ V

• DoesF={A→B,B→C,CD→ E}implyA→E?– i.e,isA→EintheclosureF+?Equivalently,isEinA+?

DukeCS,Fall2017 CompSci516:DatabaseSystems 32

NormalForms

• Question:givenaschema,howtodecidewhetheranyschemarefinementisneededatall?

• Ifarelationisinacertainnormalforms,itisknownthatcertainkindsofproblemsareavoided/minimized

• Helpsusdecidewhetherdecomposingtherelationissomethingwewanttodo

DukeCS,Fall2017 CompSci516:DatabaseSystems 33

FDsplayaroleindetectingredundancy

Example• ConsiderarelationRwith3attributes,ABC

– NoFDshold:Thereisnoredundancyhere– nodecompositionneeded

– GivenA→ B:SeveraltuplescouldhavethesameAvalue,andifso,they’llallhavethesameBvalue– redundancy–decompositionmaybeneededifAisnotakey

• Intuitiveidea:– ifthereisanynon-keydependency,e.g.A→B,decompose!

DukeCS,Fall2017 CompSci516:DatabaseSystems 34

NormalForms

RisinBCNF⇒ Risin3NF⇒ Risin2NF(ahistoricalone,notcovered)⇒ Risin1NF(everyfieldhasatomicvalues)

DukeCS,Fall2017 CompSci516:DatabaseSystems 35

BCNF

3NF

2NF

1NF

Definitionsnext

Boyce-CoddNormalForm(BCNF)

• RelationRwithFDsF isinBCNF if,forallX→AinF– AϵX(calledatrivial FD),or– XcontainsakeyforR

• i.e.Xisasuperkey

DukeCS,Fall2017 CompSci516:DatabaseSystems 36

9/10/17

7

ThirdNormalForm(3NF)

• RelationRwithFDsF isin3NF if,forallX→ AinF+– AϵX(calledatrivialFD),or– Xcontains akeyforR,or– Aispartof somekey forR.

• Minimality ofakeyiscrucialinthirdconditionin3NF– everyattributeispartofsomesuperkey (=setofallattributes)

• IfRisinBCNF,obviouslyin3NF• IfRisin3NF,someredundancyispossible

– whenX→ AandAispartofakey(notallowedinBCNF)

DukeCS,Fall2017 CompSci516:DatabaseSystems 37

twoconditionsforBCNF

DecompositionofaRelationSchema• ConsiderrelationRcontainsattributesA1...An

• Adecomposition ofRconsistsofreplacingRbytwoormorerelationssuchthat“noattributeislost”and“nonewattributeappears”,i.e.

– EachnewrelationschemacontainsasubsetoftheattributesofR– EveryattributeofRappearsasanattributeofoneofthenewrelations– E.g.,CandecomposeSNLRWH intoSNLRH andRW

• Whatarethepotentialproblemswithanarbitrarydecomposition?

DukeCS,Fall2017 CompSci516:DatabaseSystems 38

LosslessJoinDecompositions• DecompositionofRintoXandYislossless-join w.r.t.asetof

FDsFif,foreveryinstancer thatsatisfiesF:πX(r)⨝ πY(r)=r

DukeCS,Fall2017 CompSci516:DatabaseSystems 39

S P Ds1 p1 d1

s2 p2 d2

s3 p1 d3

• DecomposeintoSPandPD-- isthedecompositionlossless?

• HowaboutSPandSD?

LosslessdecompositionofRintoR1,R2happensif• eitherR1∩R2→R1• orR1∩ R2→R2

Algorithm:DecompositionintoBCNF

• Input:relationRwithFDsFIfX→ YviolatesBCNF,decomposeRintoR- Y andXY.RepeatuntilallnewrelationsareinBCNFw.r.t.thegivenF

• NOTE:NeedtoconsiderallpossibleFDsthatcanbeinferredfromthecurrentsetofFDs(closure),notonlythegivenones!

• Givesacollectionofrelationsthatare– inBCNF– losslessjoindecomposition– andguaranteedtoterminate

DukeCS,Fall2017 CompSci516:DatabaseSystems 40

DecompositionintoBCNF(example)

• CSJDPQV,keyC,F={JP→ C,SD→ P,J→ S}– TodealwithSD→P,decomposeintoSDP,CSJDQV.– TodealwithJ→ S,decomposeCSJDQVintoJSandCJDQV

• Note:– severaldependenciesmaycauseviolationofBCNF– Theorderinwhichwepickthemmayleadtoverydifferentsetsofrelations

– theremaybemultiplecorrectdecompositionsDukeCS,Fall2017 CompSci516:DatabaseSystems 41

BCNFdecompositionexample

42

UserJoinsGroup (uid,uname,twitterid,gid,fromDate)

uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate

Ack:SlidefromProf.JunYang

9/10/17

8

Anotherexample

43

UserJoinsGroup (uid,uname,twitterid,gid,fromDate)

uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate

Ack:SlidefromProf.JunYang

Recap

• Functionaldependencies:ageneralizationofthekeyconcept

• Non-keyfunctionaldependencies:asourceofredundancy

• BCNFdecomposition:amethodforremovingredundancies– BNCFdecompositionisalosslessjoindecomposition

• BCNF:schemainthisnormalformhasnoredundancyduetoFD’s

44

BCNF=noredundancy?

• User (uid,gid,place)– Ausercanbelongtomultiplegroups– Ausercanregisterplacesshe’svisited– Groupsandplaceshavenothingtodowithother– FD’s?– BCNF?– Redundancies?

45

uid gid place

142 dps Springfield

142 dps Australia

456 abc Springfield

456 abc Morocco

456 gov Springfield

456 gov Morocco

… … …

Multivalueddependencies

• Amultivalueddependency(MVD)hastheform𝑋 ↠ 𝑌,where𝑋 and𝑌 aresetsofattributesinarelation𝑅

• 𝑋 ↠ 𝑌 meansthatwhenevertworowsin𝑅 agreeonalltheattributesof𝑋,thenwecanswaptheir𝑌 componentsandgettworowsthatarealsoin𝑅

46

𝑿 𝒀 𝒁𝑎 𝑏+ 𝑐+𝑎 𝑏- 𝑐-… … …

𝑿 𝒀 𝒁𝑎 𝑏+ 𝑐+𝑎 𝑏- 𝑐-𝑎 𝑏- 𝑐+𝑎 𝑏+ 𝑐-… … …

MVDexamples

User(uid,gid,place)• uid↠ gid• uid↠ place

– Intuition:givenuid,attributesgid andplaceare“independent”

• uid,gid↠ place– Trivial:LHS∪ RHS=allattributesof𝑅

• uid,gid↠ uid– Trivial:LHS⊇ RHS

47

CompleteMVD+FDrules

• FDreflexivity,augmentation,andtransitivity• MVDcomplementation:

If𝑋 ↠ 𝑌,then𝑋 ↠ 𝑎𝑡𝑡𝑟𝑠 𝑅 − 𝑋 − 𝑌• MVDaugmentation:

If𝑋 ↠ 𝑌 and𝑉 ⊆ 𝑊,then𝑋𝑊 ↠ 𝑌𝑉• MVDtransitivity:

If𝑋 ↠ 𝑌 and𝑌 ↠ 𝑍,then𝑋 ↠ 𝑍 − 𝑌• Replication(FDisMVD):

If𝑋 → 𝑌,then𝑋 ↠ 𝑌• Coalescence:

If𝑋 ↠ 𝑌 and𝑍 ⊆ 𝑌 andthereissome𝑊 disjointfrom𝑌 suchthat𝑊 → 𝑍,then𝑋 → 𝑍

48

Tryprovingthingsusingthese!?

Verifytheseyourself!

9/10/17

9

Anelegantsolution:“chase”

• GivenasetofFD’sandMVD’s𝒟,doesanotherdependency𝑑 (FDorMVD)followfrom𝒟?

• Procedure– Startwiththepremiseof𝑑,andtreatthemas“seed”tuplesinarelation

– Applythegivendependenciesin𝒟 repeatedly• IfweapplyanFD,weinferequalityoftwosymbols• IfweapplyanMVD,weinfermoretuples

– Ifweinfertheconclusionof𝑑,wehaveaproof– Otherwise,ifnothingmorecanbeinferred,wehaveacounterexample

49

Proofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵 and𝐵 ↠ 𝐶implythat𝐴 ↠ 𝐶?

50

𝑨 𝑩 𝑪 𝑫𝑎 𝑏+ 𝑐+ 𝑑+𝑎 𝑏- 𝑐- 𝑑-

𝑨 𝑩 𝑪 𝑫𝑎 𝑏+ 𝑐- 𝑑+𝑎 𝑏- 𝑐+ 𝑑-

Have: Need:

𝑎 𝑏- 𝑐+ 𝑑+𝑎 𝑏+ 𝑐- 𝑑-

𝐴 ↠ 𝐵

𝑎 𝑏- 𝑐+ 𝑑-𝑎 𝑏- 𝑐- 𝑑+

𝐵 ↠ 𝐶

𝑎 𝑏+ 𝑐- 𝑑+𝑎 𝑏+ 𝑐+ 𝑑-

𝐵 ↠ 𝐶

AA

Anotherproofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 → 𝐵 and𝐵 → 𝐶 implythat𝐴 → 𝐶?

51

𝑨 𝑩 𝑪 𝑫𝑎 𝑏+ 𝑐+ 𝑑+𝑎 𝑏- 𝑐- 𝑑-

Have: Need:𝑐+ = 𝑐-

𝐴 → 𝐵 𝑏+ = 𝑏-𝐵 → 𝐶 𝑐+ = 𝑐-

A

Ingeneral,withbothMVD’sandFD’s,chasecangeneratebothnewtuplesandnewequalities

Counterexamplebychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵𝐶 and𝐶𝐷 → 𝐵implythat𝐴 → 𝐵?

52

𝑨 𝑩 𝑪 𝑫𝑎 𝑏+ 𝑐+ 𝑑+𝑎 𝑏- 𝑐- 𝑑-

Have: Need:𝑏+ = 𝑏-

𝑎 𝑏- 𝑐- 𝑑+𝑎 𝑏+ 𝑐+ 𝑑-

𝐴 ↠ 𝐵𝐶

D

Counterexample!

4NF

• Arelation𝑅 isinFourthNormalForm(4NF)if– Foreverynon-trivialMVD𝑋 ↠ 𝑌 in𝑅,𝑋 isasuperkey

– Thatis,allFD’sandMVD’sfollowfrom“key→otherattributes”(i.e.,noMVD’sandnoFD’sbesideskeyfunctionaldependencies)

• 4NFisstrongerthanBCNF– BecauseeveryFDisalsoaMVD

53

4NFdecompositionalgorithm

• Finda4NFviolation– Anon-trivialMVD𝑋 ↠ 𝑌 in𝑅 where𝑋 isnot asuperkey

• Decompose𝑅 into𝑅+ and𝑅-,where– 𝑅+ hasattributes𝑋 ∪ 𝑌– 𝑅- hasattributes𝑋 ∪ 𝑍 (where𝑍 contains𝑅 attributesnotin𝑋 or𝑌)

• Repeatuntilallrelationsarein4NF

• AlmostidenticaltoBCNFdecompositionalgorithm• Anydecompositionona4NFviolationislossless

54

9/10/17

10

4NFdecompositionexample

55

uid gid place

142 dps Springfield

142 dps Australia

456 abc Springfield

456 abc Morocco

456 gov Springfield

456 gov Morocco

… … …

User (uid,gid,place)4NFviolation:uid↠gid

Member(uid,gid) Visited(uid,place)4NF 4NFuid gid

142 dps

456 abc

456 gov

… …

uid place

142 Springfield

142 Australia

456 Springfield

456 Morocco

… …

Otherkindsofdependenciesandnormalforms

• Dependencypreservingdecompositions• Joindependencies• Inclusiondependencies• 5NF• Seebookifinterested(notcoveredinclass)

DukeCS,Fall2017 CompSci516:DatabaseSystems 56

Summary

• PhilosophybehindBCNF,4NF:Datashoulddependonthekey, thewholekey,andnothingbutthekey!– Youcouldhavemultiplekeysthough

• Redundancyisnotdesiredtypically– notalways,mainlyduetoperformancereasons

• Functional/multivalueddependencies– captureredundancy• Decompositions– eliminatedependencies• Normalforms

– Guaranteescertainnon-redundancy– 3NF,BCNF,and4NF

• Losslessjoin• HowtodecomposeintoBCNF,4NF• Chase

57