6A Ng Sau ping (19) 6A Chan Lai Fong (2) 6A Lau Tsz Ying (12) 6A Lee Chui Ying (14)
CompSci516 Data Intensive Computing Systems Lecture 6a...
Transcript of CompSci516 Data Intensive Computing Systems Lecture 6a...
CompSci 516DataIntensiveComputingSystems
Lecture6aDesignTheoryandNormalization– 2/2
Instructor:Sudeepa Roy
1DukeCS,Fall2017 CompSci516:DatabaseSystems
Announcements• HW1deadline:– Dueon09/21(Thurs),11:55pm,nolatedays
• Projectproposaldeadline:– Preliminaryideaandteammembersdueby09/18(Mon)byemailtotheinstructor
– Proposaldueonsakai by09/25(Mon),11:55pm
• Everyoneshouldbeinagroupnow– otherwiselettheinstructorknowasap
DukeCS,Fall2017 CompSci516:DatabaseSystems 2
Today
• FinishNormalizationfromLecture5• StartDatabaseInternals
• Recap– Whyredundancyisbad– Functionaldependencies– Closureofattributesandfunctionaldependencies
DukeCS,Fall2017 CompSci516:DatabaseSystems 3
Acknowledgement:• Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.• SomeslideshavebeenadaptedfromslidesbyProf.JunYang
NormalForms
Risin4NF⇒ RisinBCNF⇒ Risin3NF⇒ Risin2NF(ahistoricalone)⇒ Risin1NF(everyfieldhasatomicvalues)
DukeCS,Fall2017 CompSci516:DatabaseSystems 4
BCNF
3NF
2NF
1NF
OnlyBCNFand4NFarecoveredintheclass
4NF
Boyce-CoddNormalForm(BCNF)
• RelationRwithFDsF isinBCNF if,forallX→AinF– AϵX(calledatrivial FD),or– XcontainsakeyforR
• i.e.Xisasuperkey
DukeCS,Fall2017 CompSci516:DatabaseSystems 5
Nextlecture:BCNFdecompositionalgorithm
Decomposition
• Eliminatesredundancy• Togetbacktotheoriginalrelation:
6
⋈
uid uname twitterid gid fromDate
142 Bart @BartJSimpson dps 1987-04-19
123 Milhouse @MilhouseVan_ gov 1989-12-17
857 Lisa @lisasimpson abc 1987-04-19
857 Lisa @lisasimpson gov 1988-09-01
456 Ralph @ralphwiggum abc 1991-04-25
456 Ralph @ralphwiggum gov 1992-09-01
… … … … …
uid uname twitterid
142 Bart @BartJSimpson
123 Milhouse @MilhouseVan_
857 Lisa @lisasimpson
456 Ralph @ralphwiggum
… … …
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …
(ontwitter)
• Userid• username• Twitterid• Groupid• JoiningDate
(toagroup)
DukeCS,Fall2017 CompSci516:DatabaseSystems
uid twitterid
142 @BartJSimpson
123 @MilhouseVan_
857 @lisasimpson
456 @ralphwiggum
… …
uid uname
142 Bart
123 Milhouse
857 Lisa
456 Ralph
… …
Unnecessarydecomposition
• Fine:joinreturnstheoriginalrelation• Unnecessary:noredundancyisremoved;schemaismore
complicated(anduid isstoredtwice!)7
uid uname twitterid
142 Bart @BartJSimpson
123 Milhouse @MilhouseVan_
857 Lisa @lisasimpson
456 Ralph @ralphwiggum
… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
uid fromDate
142 1987-04-19
123 1989-12-17
857 1987-04-19
857 1988-09-01
456 1991-04-25
456 1992-09-01
… …
Baddecomposition
• Associationbetweengid andfromDate islost• Joinreturnsmorerowsthantheoriginalrelation
8
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …uid gid
142 dps
123 gov
857 abc
857 gov
456 abc
456 gov
… …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Losslessjoindecomposition
• Decomposerelation𝑅 intorelations𝑆 and𝑇– 𝑎𝑡𝑡𝑟𝑠 𝑅 = 𝑎𝑡𝑡𝑟𝑠 𝑆 ∪ 𝑎𝑡𝑡𝑟𝑠 𝑇– 𝑆 = 𝜋,--./ 0 𝑅– 𝑇 = 𝜋,--./ 1 𝑅
• Thedecompositionisalosslessjoindecompositionif,givenknownconstraintssuchasFD’s,wecanguaranteethat𝑅 =𝑆 ⋈ 𝑇
• 𝑅 ⊆ 𝑆 ⋈ 𝑇 or𝑅 ⊇ 𝑆 ⋈ 𝑇 ?
• Anydecompositiongives𝑅 ⊆ 𝑆 ⋈ 𝑇 (why?)– Alossy decompositionisonewith𝑅 ⊂ 𝑆 ⋈ 𝑇
9DukeCS,Fall2017 CompSci516:DatabaseSystems
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1988-09-01
857 gov 1987-04-19
456 abc 1991-04-25
456 gov 1992-09-01
… … …
Loss?ButIgotmorerows!
• “Loss”refersnottothelossoftuples,buttothelossofinformation– Or,theabilitytodistinguishdifferentoriginalrelations
10
Nowaytotellwhichistheoriginalrelation
uid fromDate
142 1987-04-19
123 1989-12-17
857 1987-04-19
857 1988-09-01
456 1991-04-25
456 1992-09-01
… …
uid gid
142 dps
123 gov
857 abc
857 gov
456 abc
456 gov
… …DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionalgorithm
• FindaBCNFviolation– Thatis,anon-trivialFD𝑋 → 𝑌 in𝑅 where𝑋 isnotasuperkeyof𝑅
• Decompose𝑅 into𝑅8 and𝑅9,where– 𝑅8 hasattributes𝑋 ∪ 𝑌– 𝑅9 hasattributes𝑋 ∪ 𝑍,where𝑍 containsallattributesof𝑅 thatareinneither𝑋 nor𝑌
• RepeatuntilallrelationsareinBCNF
• Alsogivesalosslessdecomposition!
11DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionexample- 1
• CSJDPQV,keyC,F={JP→ C,SD→ P,J→ S}– TodealwithSD→P,decomposeintoSDP,CSJDQV.– TodealwithJ→ S,decomposeCSJDQVintoJSandCJDQV
• IsJP→ CaviolationofBCNF?
• Note:– severaldependenciesmaycauseviolationofBCNF– Theorderinwhichwepickthemmayleadtoverydifferentsetsof
relations– theremaybemultiplecorrectdecompositions(canpickJ→ Sfirst)
DukeCS,Fall2017 CompSci516:DatabaseSystems 12
BCNFdecompositionexample- 2
13
UserJoinsGroup (uid,uname,twitterid,gid,fromDate)
uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate
BCNFviolation:uid→ uname,twitterid
User (uid,uname,twitterid) Member(uid,gid,fromDate)
BCNFBCNF
uid→ uname,twitteridtwitterid→ uid
uid,gid→ fromDate
DukeCS,Fall2017 CompSci516:DatabaseSystems
14
UserJoinsGroup (uid,uname,twitterid,gid,fromDate)
uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate
BCNFviolation:twitterid→ uid
UserId (twitterid,uid)
Member(twitterid,gid,fromDate)
BCNF
BCNF
twitterid→ unametwitterid,gid→ fromDate
UserJoinsGroup’ (twitterid,uname,gid,fromDate)
BCNFviolation:twitterid→ uname
UserName (twitterid,uname)BCNF
applyArmstrong’saxiomsandrules!
DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionexample- 3
Recap
• Functionaldependencies:ageneralizationofthekeyconcept
• Non-keyfunctionaldependencies:asourceofredundancy
• BCNFdecomposition:amethodforremovingredundancies– BCNFdecompositionisalosslessjoindecomposition
• BCNF:schemainthisnormalformhasnoredundancyduetoFD’s
15DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNF=noredundancy?
• User (uid,gid,place)– Ausercanbelongtomultiplegroups– Ausercanregisterplacesshe’svisited– Groupsandplaceshavenothingtodowithother– FD’s?
• None– BCNF?
• Yes– Redundancies?
• Tons!
16
uid gid place
142 dps Springfield
142 dps Australia
456 abc Springfield
456 abc Morocco
456 gov Springfield
456 gov Morocco
… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Multivalueddependencies
• Amultivalueddependency(MVD)hastheform𝑋 ↠ 𝑌,where𝑋 and𝑌 aresetsofattributesinarelation𝑅
• 𝑋 ↠ 𝑌 meansthatwhenevertworowsin𝑅 agreeonalltheattributesof𝑋,thenwecanswaptheir𝑌 componentsandgettworowsthatarealsoin𝑅
17
𝑿 𝒀 𝒁𝑎 𝑏8 𝑐8𝑎 𝑏9 𝑐9… … …
𝑿 𝒀 𝒁𝑎 𝑏8 𝑐8𝑎 𝑏9 𝑐9𝑎 𝑏9 𝑐8𝑎 𝑏8 𝑐9… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
MVDexamples
User(uid,gid,place)• uid↠ gid• uid↠ place– Intuition:givenuid,attributesgid andplaceare“independent”
• uid,gid↠ place– Trivial:LHS∪ RHS=allattributesof𝑅
• uid,gid↠ uid– Trivial:LHS⊇ RHS
18DukeCS,Fall2017 CompSci516:DatabaseSystems
CompleteMVD+FDrules
• FDreflexivity,augmentation,andtransitivity• MVDcomplementation:If𝑋 ↠ 𝑌,then𝑋 ↠ 𝑎𝑡𝑡𝑟𝑠 𝑅 − 𝑋 − 𝑌
• MVDaugmentation:If𝑋 ↠ 𝑌 and𝑉 ⊆ 𝑊,then𝑋𝑊 ↠ 𝑌𝑉
• MVDtransitivity:If𝑋 ↠ 𝑌 and𝑌 ↠ 𝑍,then𝑋 ↠ 𝑍 − 𝑌
• Replication(FDisMVD):If𝑋 → 𝑌,then𝑋 ↠ 𝑌
• Coalescence:If𝑋 ↠ 𝑌 and𝑍 ⊆ 𝑌 andthereissome𝑊 disjointfrom𝑌 suchthat𝑊 → 𝑍,then𝑋 → 𝑍
19
Tryprovingthingsusingthese!?
Verifytheseyourself!
DukeCS,Fall2017 CompSci516:DatabaseSystems
Anelegantsolution:“chase”
• GivenasetofFD’sandMVD’s𝒟,doesanotherdependency𝑑 (FDorMVD)followfrom𝒟?
• Procedure– Startwiththepremiseof𝑑,andtreatthemas“seed”tuplesinarelation
– Applythegivendependenciesin𝒟 repeatedly• IfweapplyanFD,weinferequalityoftwosymbols• IfweapplyanMVD,weinfermoretuples
– Ifweinfertheconclusionof𝑑,wehaveaproof– Otherwise,ifnothingmorecanbeinferred,wehaveacounterexample
20
Readthisslideafterlookingattheexamples
DukeCS,Fall2017 CompSci516:DatabaseSystems
Proofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵 and𝐵 ↠ 𝐶implythat𝐴 ↠ 𝐶?
21
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐9 𝑑8𝑎 𝑏9 𝑐8 𝑑9
Have: Need:
𝑎 𝑏9 𝑐8 𝑑8𝑎 𝑏8 𝑐9 𝑑9
𝐴 ↠ 𝐵
𝑎 𝑏9 𝑐8 𝑑9𝑎 𝑏9 𝑐9 𝑑8
𝐵 ↠ 𝐶
𝑎 𝑏8 𝑐9 𝑑8𝑎 𝑏8 𝑐8 𝑑9
𝐵 ↠ 𝐶
AA
DukeCS,Fall2017 CompSci516:DatabaseSystems
Anotherproofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 → 𝐵 and𝐵 → 𝐶 implythat𝐴 → 𝐶?
22
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
Have: Need:𝑐8 = 𝑐9
𝐴 → 𝐵 𝑏8 = 𝑏9𝐵 → 𝐶 𝑐8 = 𝑐9
A
Ingeneral,withbothMVD’sandFD’s,chasecangeneratebothnewtuplesandnewequalities
DukeCS,Fall2017 CompSci516:DatabaseSystems
Counterexamplebychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵𝐶 and𝐶𝐷 → 𝐵implythat𝐴 → 𝐵?
23
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
Have: Need:𝑏8 = 𝑏9
𝑎 𝑏9 𝑐9 𝑑8𝑎 𝑏8 𝑐8 𝑑9
𝐴 ↠ 𝐵𝐶
D
Counterexample!
DukeCS,Fall2017 CompSci516:DatabaseSystems
4NF
• Arelation𝑅 isinFourthNormalForm(4NF)if– Foreverynon-trivialMVD𝑋 ↠ 𝑌 in𝑅,𝑋 isasuperkey
– Thatis,allFD’sandMVD’sfollowfrom“key→otherattributes”(i.e.,noMVD’sandnoFD’sbesideskeyfunctionaldependencies)
• 4NFisstrongerthanBCNF– BecauseeveryFDisalsoaMVD
24DukeCS,Fall2017 CompSci516:DatabaseSystems
4NFdecompositionalgorithm
• Finda4NFviolation– Anon-trivialMVD𝑋 ↠ 𝑌 in𝑅 where𝑋 isnot asuperkey
• Decompose𝑅 into𝑅8 and𝑅9,where– 𝑅8 hasattributes𝑋 ∪ 𝑌– 𝑅9 hasattributes𝑋 ∪ 𝑍 (where𝑍 contains𝑅 attributesnotin𝑋 or𝑌)
• Repeatuntilallrelationsarein4NF
• AlmostidenticaltoBCNFdecompositionalgorithm• Anydecompositionona4NFviolationislossless
25DukeCS,Fall2017 CompSci516:DatabaseSystems
4NFdecompositionexample
26
uid gid place
142 dps Springfield
142 dps Australia
456 abc Springfield
456 abc Morocco
456 gov Springfield
456 gov Morocco
… … …
User (uid,gid,place)4NFviolation:uid↠gid
Member(uid,gid) Visited(uid,place)4NF 4NFuid gid
142 dps
456 abc
456 gov
… …
uid place
142 Springfield
142 Australia
456 Springfield
456 Morocco
… …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Otherkindsofdependenciesandnormalforms
• Dependencypreservingdecompositions• Joindependencies• Inclusiondependencies• 5NF,3NF,2NF• Seebookifinterested(notcoveredinclass)
DukeCS,Fall2017 CompSci516:DatabaseSystems 27
Summary
• PhilosophybehindBCNF,4NF:Datashoulddependonthekey,thewholekey,andnothingbutthekey!– Youcouldhavemultiplekeysthough
• Redundancyisnotdesiredtypically– notalways,mainlyduetoperformancereasons
• Functional/multivalueddependencies– captureredundancy• Decompositions– eliminatedependencies• Normalforms
– Guaranteescertainnon-redundancy– BCNF,and4NF
• Losslessjoin• HowtodecomposeintoBCNF,4NF• Chase
28DukeCS,Fall2017 CompSci516:DatabaseSystems