Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class...
Transcript of Today Normal Forms - Duke University...BCNF 3NF 2NF 1NF Only BCNF and 4NF are covered in the class...
9/12/17
1
CompSci 516DataIntensiveComputingSystems
Lecture6aDesignTheoryandNormalization– 2/2
Instructor:Sudeepa Roy
1DukeCS,Fall2017 CompSci516:DatabaseSystems
Announcements• HW1deadline:
– Dueon09/21(Thurs),11:55pm,nolatedays
• Projectproposaldeadline:– Preliminaryideaandteammembersdueby09/18(Mon)byemailtotheinstructor
– Proposaldueonsakai by09/25(Mon),11:55pm
DukeCS,Fall2017 CompSci516:DatabaseSystems 2
Today
• FinishNormalizationfromLecture5• StartDatabaseInternals
• Recap– Whyredundancyisbad– Functionaldependencies– Closureofattributesandfunctionaldependencies
DukeCS,Fall2017 CompSci516:DatabaseSystems 3
Acknowledgement:• Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.• SomeslideshavebeenadaptedfromslidesbyProf.JunYang
NormalForms
Risin4NF⇒ RisinBCNF⇒ Risin3NF⇒ Risin2NF(ahistoricalone)⇒ Risin1NF(everyfieldhasatomicvalues)
DukeCS,Fall2017 CompSci516:DatabaseSystems 4
BCNF
3NF
2NF
1NF
OnlyBCNFand4NFarecoveredintheclass
4NF
Boyce-CoddNormalForm(BCNF)
• RelationRwithFDsF isinBCNF if,forallX→AinF– AϵX(calledatrivial FD),or– XcontainsakeyforR
• i.e.Xisasuperkey
DukeCS,Fall2017 CompSci516:DatabaseSystems 5
Nextlecture:BCNFdecompositionalgorithm
Decomposition
• Eliminatesredundancy• Togetbacktotheoriginalrelation:
6
⋈
uid uname twitterid gid fromDate
142 Bart @BartJSimpson dps 1987-04-19
123 Milhouse @MilhouseVan_ gov 1989-12-17
857 Lisa @lisasimpson abc 1987-04-19
857 Lisa @lisasimpson gov 1988-09-01
456 Ralph @ralphwiggum abc 1991-04-25
456 Ralph @ralphwiggum gov 1992-09-01
… … … … …
uid uname twitterid
142 Bart @BartJSimpson
123 Milhouse @MilhouseVan_
857 Lisa @lisasimpson
456 Ralph @ralphwiggum
… … …
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …
(ontwitter)
• Userid• username• Twitterid• Groupid• JoiningDate
(toagroup)
DukeCS,Fall2017 CompSci516:DatabaseSystems
9/12/17
2
uid twitterid
142 @BartJSimpson
123 @MilhouseVan_
857 @lisasimpson
456 @ralphwiggum
… …
uid uname
142 Bart
123 Milhouse
857 Lisa
456 Ralph
… …
Unnecessarydecomposition
• Fine:joinreturnstheoriginalrelation• Unnecessary:noredundancyisremoved;schemaismore
complicated(anduid isstoredtwice!)7
uid uname twitterid
142 Bart @BartJSimpson
123 Milhouse @MilhouseVan_
857 Lisa @lisasimpson
456 Ralph @ralphwiggum
… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
uid fromDate
142 1987-04-19
123 1989-12-17
857 1987-04-19
857 1988-09-01
456 1991-04-25
456 1992-09-01
… …
Baddecomposition
• Associationbetweengid andfromDate islost• Joinreturnsmorerowsthantheoriginalrelation
8
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …uid gid
142 dps
123 gov
857 abc
857 gov
456 abc
456 gov
… …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Losslessjoindecomposition
• Decomposerelation𝑅 intorelations𝑆 and𝑇– 𝑎𝑡𝑡𝑟𝑠 𝑅 = 𝑎𝑡𝑡𝑟𝑠 𝑆 ∪ 𝑎𝑡𝑡𝑟𝑠 𝑇– 𝑆 = 𝜋,--./ 0 𝑅– 𝑇 = 𝜋,--./ 1 𝑅
• Thedecompositionisalosslessjoindecompositionif,givenknownconstraintssuchasFD’s,wecanguaranteethat𝑅 =𝑆 ⋈ 𝑇
• 𝑅 ⊆ 𝑆 ⋈ 𝑇 or𝑅 ⊇ 𝑆 ⋈ 𝑇 ?
• Anydecompositiongives𝑅 ⊆ 𝑆 ⋈ 𝑇 (why?)– Alossy decompositionisonewith𝑅 ⊂ 𝑆 ⋈ 𝑇
9DukeCS,Fall2017 CompSci516:DatabaseSystems
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1987-04-19
857 gov 1988-09-01
456 abc 1991-04-25
456 gov 1992-09-01
… … …
uid gid fromDate
142 dps 1987-04-19
123 gov 1989-12-17
857 abc 1988-09-01
857 gov 1987-04-19
456 abc 1991-04-25
456 gov 1992-09-01
… … …
Loss?ButIgotmorerows!
• “Loss”refersnottothelossoftuples,buttothelossofinformation– Or,theabilitytodistinguishdifferentoriginalrelations
10
Nowaytotellwhichistheoriginalrelation
uid fromDate
142 1987-04-19
123 1989-12-17
857 1987-04-19
857 1988-09-01
456 1991-04-25
456 1992-09-01
… …
uid gid
142 dps
123 gov
857 abc
857 gov
456 abc
456 gov
… …DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionalgorithm
• FindaBCNFviolation– Thatis,anon-trivialFD𝑋 → 𝑌 in𝑅 where𝑋 isnotasuperkeyof𝑅
• Decompose𝑅 into𝑅8 and𝑅9,where– 𝑅8 hasattributes𝑋 ∪ 𝑌– 𝑅9 hasattributes𝑋 ∪ 𝑍,where𝑍 containsallattributesof𝑅 thatareinneither𝑋 nor𝑌
• RepeatuntilallrelationsareinBCNF
• Alsogivesalosslessdecomposition!
11DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionexample- 1
• CSJDPQV,keyC,F={JP→ C,SD→ P,J→ S}– TodealwithSD→P,decomposeintoSDP,CSJDQV.– TodealwithJ→ S,decomposeCSJDQVintoJSandCJDQV
• IsJP→ CaviolationofBCNF?
• Note:– severaldependenciesmaycauseviolationofBCNF– Theorderinwhichwepickthemmayleadtoverydifferentsetsofrelations
– theremaybemultiplecorrectdecompositions(canpickJ→ Sfirst)DukeCS,Fall2017 CompSci516:DatabaseSystems 12
9/12/17
3
BCNFdecompositionexample- 2
13
UserJoinsGroup (uid,uname,twitterid,gid,fromDate)
uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate
BCNFviolation:uid→ uname,twitterid
User (uid,uname,twitterid) Member(uid,gid,fromDate)
BCNFBCNF
uid→ uname,twitteridtwitterid→ uid
uid,gid→ fromDate
DukeCS,Fall2017 CompSci516:DatabaseSystems 14
UserJoinsGroup (uid,uname,twitterid,gid,fromDate)
uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate
BCNFviolation:twitterid→ uid
UserId (twitterid,uid)
Member(twitterid,gid,fromDate)
BCNF
BCNF
twitterid→ unametwitterid,gid→ fromDate
UserJoinsGroup’ (twitterid,uname,gid,fromDate)
BCNFviolation:twitterid→ uname
UserName (twitterid,uname)BCNF
applyArmstrong’saxiomsandrules!
DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNFdecompositionexample- 3
Recap
• Functionaldependencies:ageneralizationofthekeyconcept
• Non-keyfunctionaldependencies:asourceofredundancy
• BCNFdecomposition:amethodforremovingredundancies– BNCFdecompositionisalosslessjoindecomposition
• BCNF:schemainthisnormalformhasnoredundancyduetoFD’s
15DukeCS,Fall2017 CompSci516:DatabaseSystems
BCNF=noredundancy?
• User (uid,gid,place)– Ausercanbelongtomultiplegroups– Ausercanregisterplacesshe’svisited– Groupsandplaceshavenothingtodowithother– FD’s?
• None– BCNF?
• Yes– Redundancies?
• Tons!
16
uid gid place
142 dps Springfield
142 dps Australia
456 abc Springfield
456 abc Morocco
456 gov Springfield
456 gov Morocco
… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Multivalueddependencies
• Amultivalueddependency(MVD)hastheform𝑋 ↠ 𝑌,where𝑋 and𝑌 aresetsofattributesinarelation𝑅
• 𝑋 ↠ 𝑌 meansthatwhenevertworowsin𝑅 agreeonalltheattributesof𝑋,thenwecanswaptheir𝑌 componentsandgettworowsthatarealsoin𝑅
17
𝑿 𝒀 𝒁𝑎 𝑏8 𝑐8𝑎 𝑏9 𝑐9… … …
𝑿 𝒀 𝒁𝑎 𝑏8 𝑐8𝑎 𝑏9 𝑐9𝑎 𝑏9 𝑐8𝑎 𝑏8 𝑐9… … …
DukeCS,Fall2017 CompSci516:DatabaseSystems
MVDexamples
User(uid,gid,place)• uid↠ gid• uid↠ place
– Intuition:givenuid,attributesgid andplaceare“independent”
• uid,gid↠ place– Trivial:LHS∪ RHS=allattributesof𝑅
• uid,gid↠ uid– Trivial:LHS⊇ RHS
18DukeCS,Fall2017 CompSci516:DatabaseSystems
9/12/17
4
CompleteMVD+FDrules
• FDreflexivity,augmentation,andtransitivity• MVDcomplementation:
If𝑋 ↠ 𝑌,then𝑋 ↠ 𝑎𝑡𝑡𝑟𝑠 𝑅 − 𝑋 − 𝑌• MVDaugmentation:
If𝑋 ↠ 𝑌 and𝑉 ⊆ 𝑊,then𝑋𝑊 ↠ 𝑌𝑉• MVDtransitivity:
If𝑋 ↠ 𝑌 and𝑌 ↠ 𝑍,then𝑋 ↠ 𝑍 − 𝑌• Replication(FDisMVD):
If𝑋 → 𝑌,then𝑋 ↠ 𝑌• Coalescence:
If𝑋 ↠ 𝑌 and𝑍 ⊆ 𝑌 andthereissome𝑊 disjointfrom𝑌 suchthat𝑊 → 𝑍,then𝑋 → 𝑍
19
Tryprovingthingsusingthese!?
Verifytheseyourself!
DukeCS,Fall2017 CompSci516:DatabaseSystems
Anelegantsolution:“chase”
• GivenasetofFD’sandMVD’s𝒟,doesanotherdependency𝑑 (FDorMVD)followfrom𝒟?
• Procedure– Startwiththepremiseof𝑑,andtreatthemas“seed”tuplesinarelation
– Applythegivendependenciesin𝒟 repeatedly• IfweapplyanFD,weinferequalityoftwosymbols• IfweapplyanMVD,weinfermoretuples
– Ifweinfertheconclusionof𝑑,wehaveaproof– Otherwise,ifnothingmorecanbeinferred,wehaveacounterexample
20
Readthisslideafterlookingattheexamples
DukeCS,Fall2017 CompSci516:DatabaseSystems
Proofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵 and𝐵 ↠ 𝐶implythat𝐴 ↠ 𝐶?
21
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐9 𝑑8𝑎 𝑏9 𝑐8 𝑑9
Have: Need:
DukeCS,Fall2017 CompSci516:DatabaseSystems
Anotherproofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 → 𝐵 and𝐵 → 𝐶 implythat𝐴 → 𝐶?
22
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
Have: Need:𝑐8 = 𝑐9
DukeCS,Fall2017 CompSci516:DatabaseSystems
Counterexamplebychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵𝐶 and𝐶𝐷 → 𝐵implythat𝐴 → 𝐵?
23
𝑨 𝑩 𝑪 𝑫𝑎 𝑏8 𝑐8 𝑑8𝑎 𝑏9 𝑐9 𝑑9
Have: Need:𝑏8 = 𝑏9
DukeCS,Fall2017 CompSci516:DatabaseSystems
4NF
• Arelation𝑅 isinFourthNormalForm(4NF)if– Foreverynon-trivialMVD𝑋 ↠ 𝑌 in𝑅,𝑋 isasuperkey
– Thatis,allFD’sandMVD’sfollowfrom“key→otherattributes”(i.e.,noMVD’sandnoFD’sbesideskeyfunctionaldependencies)
• 4NFisstrongerthanBCNF– BecauseeveryFDisalsoaMVD
24DukeCS,Fall2017 CompSci516:DatabaseSystems
9/12/17
5
4NFdecompositionalgorithm
• Finda4NFviolation– Anon-trivialMVD𝑋 ↠ 𝑌 in𝑅 where𝑋 isnot asuperkey
• Decompose𝑅 into𝑅8 and𝑅9,where– 𝑅8 hasattributes𝑋 ∪ 𝑌– 𝑅9 hasattributes𝑋 ∪ 𝑍 (where𝑍 contains𝑅 attributesnotin𝑋 or𝑌)
• Repeatuntilallrelationsarein4NF
• AlmostidenticaltoBCNFdecompositionalgorithm• Anydecompositionona4NFviolationislossless
25DukeCS,Fall2017 CompSci516:DatabaseSystems
4NFdecompositionexample
26
uid gid place
142 dps Springfield
142 dps Australia
456 abc Springfield
456 abc Morocco
456 gov Springfield
456 gov Morocco
… … …
User (uid,gid,place)4NFviolation:uid↠gid
Member(uid,gid) Visited(uid,place)4NF 4NFuid gid
142 dps
456 abc
456 gov
… …
uid place
142 Springfield
142 Australia
456 Springfield
456 Morocco
… …
DukeCS,Fall2017 CompSci516:DatabaseSystems
Otherkindsofdependenciesandnormalforms
• Dependencypreservingdecompositions• Joindependencies• Inclusiondependencies• 5NF,3NF,2NF• Seebookifinterested(notcoveredinclass)
DukeCS,Fall2017 CompSci516:DatabaseSystems 27
Summary
• PhilosophybehindBCNF,4NF:Datashoulddependonthekey,thewholekey,andnothingbutthekey!– Youcouldhavemultiplekeysthough
• Redundancyisnotdesiredtypically– notalways,mainlyduetoperformancereasons
• Functional/multivalueddependencies– captureredundancy• Decompositions– eliminatedependencies• Normalforms
– Guaranteescertainnon-redundancy– BCNF,and4NF
• Losslessjoin• HowtodecomposeintoBCNF,4NF• Chase
28DukeCS,Fall2017 CompSci516:DatabaseSystems
CompSci 516DataIntensiveComputingSystems
Lecture6bStorage andIndexing
Instructor:Sudeepa Roy
29DukeCS,Fall2017 CompSci516:DatabaseSystems
Wherearewenow?
Welearntü RelationalModelandQueryLanguages
ü SQL,RA,RCü Postgres(DBMS)ü XML(overview)§ HW1
ü DatabaseNormalization
Next• DBMSInternals
– Storage– Indexing– QueryEvaluation– OperatorAlgorithms– Externalsort– QueryOptimization
DukeCS,Fall2017 CompSci516:DatabaseSystems 30
9/12/17
6
ReadingMaterial
• [RG]– Storage:Chapters8.1,8.2,8.4,9.4-9.7– Index:8.3,8.5– Tree-basedindex:Chapter10.1-10.7– Hash-basedindex:Chapter11
Additionalreading• [GUW]
– Chapters8.3,14.1-14.4
DukeCS,Fall2017 CompSci516:DatabaseSystems 31
Acknowledgement:Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.
Whatwillwelearn?
• HowdoesaDBMSorganizefiles?– Recordformat,Pageformat
• Whatisanindex?• Whataredifferenttypesofindexes?
– Tree-basedindexing:• B+tree• insert,delete
– Hash-basedindexing• Staticanddynamic(extendiblehashing,linearhashing)
• Howdoweuseindextooptimizeperformance?
DukeCS,Fall2017 CompSci516:DatabaseSystems 32
Storage
DukeCS,Fall2017 CompSci516:DatabaseSystems 33
DBMSArchitecture
DukeCS,Fall2017 CompSci516:DatabaseSystems 34
• AtypicalDBMShasalayeredarchitecture
• Thefiguredoesnotshowtheconcurrencycontrolandrecoverycomponents
– tobedonein“transactions”
• Thisisoneofseveralpossiblearchitectures
– eachsystemhasitsownvariations
Query Parsing, Optimization,and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Theselayersmustconsiderconcurrencycontrolandrecovery
DataonExternalStorage• Datamustpersistondisk acrossprogramexecutionsina
DBMS– Dataishuge– Mustpersistacrossexecutions– ButhastobefetchedintomainmemorywhenDBMSprocessesthe
data
• Theunitofinformationforreadingdatafromdisk,orwritingdatatodisk,isapage
• Disks: Canretrieverandompageatfixedcost– Butreadingseveralconsecutivepagesismuchcheaperthanreading
theminrandomorder
DukeCS,Fall2017 CompSci516:DatabaseSystems 35
DiskSpaceManagement• LowestlayerofDBMSsoftwaremanagesspaceondisk
• Higherlevelscalluponthislayerto:– allocate/de-allocateapage– read/writeapage
• Sizeofapage= sizeofadiskblock=dataunit
• Requestforasequenceofpagesoftensatisfiedbyallocatingcontiguousblocksondisk
• SpaceondiskmanagedbyDisk-spaceManager– Higherlevelsdon’tneedtoknowhowthisisdone,orhowfreespace
ismanaged
DukeCS,Fall2017 CompSci516:DatabaseSystems 36
9/12/17
7
BufferManagement
Suppose• 1millionpagesindb,butonlyspacefor1000inmemory• Aqueryneedstoscantheentirefile• DBMShasto
– bringpagesintomainmemory– decidewhichexistingpagestoreplacetomakeroomforanewpage
– calledReplacementPolicy• ManagedbytheBuffermanager
– Filesandaccessmethodsaskthebuffermanagertoaccessapagementioningthe“recordid”(soon)
– Buffermanagerloadsthepageifnotalreadythere
DukeCS,Fall2017 CompSci516:DatabaseSystems 37
BufferManagement
• DatamustbeinRAMforDBMStooperateonit• Tableof<frame#,pageid>pairsismaintained
DB
MAIN MEMORY
DISK
disk page
free frame
Page Requests from Higher Levels
BUFFER POOL
choice of frame dictatedby replacement policy
DukeCS,Fall2017 CompSci516:DatabaseSystems 38
Bufferpool=mainmemoryispartitionedintoframeseithercontainsapagefromdiskorisafreeframe
WhenaPageisRequested...Foreveryframe,store• adirty bit:
– whetherthepagehasbeenmodifiedsinceithasbeenbroughttomemory
– initially0oroff
• apin-count:– thenumberoftimesapagehasbeenrequestedbutnotreleased
(andno.ofcurrentusers)– initially0– whenapageisrequested,thecountinincremented– whentherequestorreleasesthepage,countisdecremented– buffermanageronlyreadsapageintoaframewhenitspin-countis0– ifnopagewithpin-count0,buffermanagerhastowait(ora
transactionisaborted-- later)
DukeCS,Fall2017 CompSci516:DatabaseSystems 39
WhenaPageisRequested...
• Checkifthepageisalreadyinthebufferpool• ifyes,incrementthepin-countofthatframe• Ifno,
– Chooseaframeforreplacementusingthereplacementpolicy– Ifthechosenframeisdirty (hasbeenmodified),writeittodisk– Readrequestedpageintochosenframe
• Pin (increasepin-countof)thepageandreturnitsaddress totherequestor
• If requests can be predicted (e.g., sequential scans), pages can be pre-fetched several pages at a time
• ConcurrencyControl&recoverymayentailadditionalI/Owhenaframeischosenforreplacement• e.g.Write-AheadLogprotocol:whenwedoTransactions
DukeCS,Fall2017 CompSci516:DatabaseSystems 40
BufferReplacementPolicy
• Frameischosenforreplacementbyareplacementpolicy
• Least-recently-used(LRU)– addframeswithpin-count0totheendofaqueue– choosefromhead
• Clock(anefficientimplementationofLRU)• FirstInFirstOut(FIFO)• Most-Recently-Used(MRU)etc.
DukeCS,Fall2017 CompSci516:DatabaseSystems 41
BufferReplacementPolicy
• Policycanhavebigimpacton#ofI/O’s• Dependsontheaccesspattern• Sequentialflooding:NastysituationcausedbyLRU+
repeatedsequentialscans– Whathappenswith10framesand9pages?– Whathappenswith10framesand11pages?– #bufferframes<#pagesinfilemeanseachpagerequestineachscan
causesanI/O– MRUmuchbetterinthissituation(butnotinallsituations,ofcourse)
DukeCS,Fall2017 CompSci516:DatabaseSystems 42
9/12/17
8
DBMSvs.OSFileSystem
• OperatingSystemsdodiskspaceandbuffermanagementtoo:• WhynotletOSmanagethesetasks?
• DBMScanpredictthepagereferencepatternsmuchmoreaccurately– canoptimize– adjustreplacementpolicy– pre-fetch pages– alreadyinbuffer+contiguousallocation– pinapageinbufferpool,forceapagetodisk(importantfor
implementingTransactionsconcurrencycontrol&recovery)
• DifferencesinOSsupport:portabilityissues
• Somelimitations,e.g.,filescan’tspandisks
DukeCS,Fall2017 CompSci516:DatabaseSystems 43
FilesofRecords
• PageorblockisOKwhendoingI/O,buthigherlevelsofDBMSoperateonrecords,andfilesofrecords
• FILE:Acollectionofpages,eachcontainingacollectionofrecords
• Mustsupport:– insert/delete/modifyrecord– readaparticularrecord(specifiedusingrecordid)– scanallrecords(possiblywithsomeconditionsontherecordstoberetrieved)
DukeCS,Fall2017 CompSci516:DatabaseSystems 44
FileOrganization
• Fileorganization:Methodofarrangingafileofrecordsonexternalstorage– Onefilecanhavemultiplepages– Recordid(rid)issufficienttophysicallylocatethepagecontainingtherecordondisk
– Indexes aredatastructuresthatallowustofindtherecordidsofrecordswithgivenvaluesinindexsearchkeyfields
• NOTE:Severalusesof“keys”inadatabase– Primary/foreign/candidate/superkeys– Indexsearchkeys
DukeCS,Fall2017 CompSci516:DatabaseSystems 45
AlternativeFileOrganizationsManyalternativesexist,eachidealforsomesituations,and
notsogoodinothers:• Heap(randomorder)files: Suitablewhentypicalaccessisa
filescanretrievingallrecords• SortedFiles:Bestifrecordsmustberetrievedinsome
order,oronlya“range”ofrecordsisneeded.• Indexes:Datastructurestoorganizerecordsviatreesor
hashing– Likesortedfiles,theyspeedupsearchesforasubsetofrecords,
basedonvaluesincertain(“searchkey”)fields– Updatesaremuchfasterthaninsortedfiles
DukeCS,Fall2017 CompSci516:DatabaseSystems 46
Unordered(Heap)Files
• Simplestfilestructurecontainsrecordsinnoparticularorder
• Asfilegrowsandshrinks,diskpagesareallocatedandde-allocated
• Tosupportrecordleveloperations,wemust:– keeptrackofthepages inafile– keeptrackoffreespaceonpages– keeptrackoftherecords onapage
• Therearemanyalternativesforkeepingtrackofthis
DukeCS,Fall2017 CompSci516:DatabaseSystems 47
HeapFileImplementedasaList
• TheheaderpageidandHeapfilenamemustbestoredsomeplace
• Eachpagecontains2`pointers’plusdata• Problem:
– toinsertanewrecord,wemayneedtoscanseveralpagesonthefreelisttofindonewithsufficientspace
HeaderPage
DataPage
DataPage
DataPage
DataPage
DataPage
DataPage Pages with
Free Space
Full Pages
DukeCS,Fall2017 CompSci516:DatabaseSystems 48
9/12/17
9
HeapFileUsingaPageDirectory
• Theentryforapagecanincludethenumberoffreebytesonthepage.
• Thedirectoryisacollectionofpages– linkedlistimplementationofdirectoryisjustonealternative– Muchsmallerthanlinkedlistofallheapfilepages!
DataPage 1
DataPage 2
DataPage N
HeaderPage
DIRECTORY
DukeCS,Fall2017 CompSci516:DatabaseSystems 49
Howdowearrangeacollectionofrecordsonapage?
• Eachpagecontainsseveralslots– oneforeachrecord
• Recordisidentifiedby<page-id,slot-number>
• Fixed-LengthRecords• Variable-LengthRecords
• Forboth,thereareoptionsfor– Recordformats(howtoorganizethefieldswithinarecord)– Pageformats(howtoorganizetherecordswithinapage)
DukeCS,Fall2017 CompSci516:DatabaseSystems 50
PageFormats:FixedLengthRecords
• Recordid=<pageid,slot#>• Packed:movingrecordsforfreespacemanagementchangesrid;maynotbe
acceptable• Unpacked:useabitmap– scanthebitarraytofindanemptyslot• Eachpagealsomaycontainadditionalinfoliketheidofthenextpage(notshown)
Slot 1Slot 2
Slot N
. . . . . .
N M10. . .M ... 3 2 1
PACKED UNPACKED, BITMAP
Slot 1Slot 2
Slot N
FreeSpace
Slot M11
number of records
numberof slots
DukeCS,Fall2017 CompSci516:DatabaseSystems 51
PageFormats:VariableLengthRecords
• Needtofindapagewiththerightamountofspace– Toosmall– cannotinsert– Toolarge– wasteofspace
• ifarecordisdeleted,needtomovetherecordssothatallfreespaceiscontiguous– needabilitytomoverecordswithinapage
• Canmaintainadirectoryofslots(nextslide)– <record-offset,record-length>– deletion=setrecord-offsetto-1
• Record-idrid=<page,slot-in-directory>remainsunchanged
DukeCS,Fall2017 CompSci516:DatabaseSystems 52
PageFormats:VariableLengthRecords
• Canmoverecordsonpagewithoutchangingrid– so,attractiveforfixed-lengthrecordstoo
• Store(record-offset,record-length)ineachslot• rid-sunaffectedbyrearrangingrecordsinapage
Page iRid = (i,N)
Rid = (i,2)
Rid = (i,1)
Pointerto startof freespace
SLOT DIRECTORY
N . . . 2 120 16 24 N
# slots
DukeCS,Fall2017 CompSci516:DatabaseSystems 53
RecordFormats:FixedLength
• Eachfieldhasafixedlength– forallrecords– thenumberoffieldsisalsofixed– fieldscanbestoredconsecutively
• Informationaboutfieldtypessameforallrecordsinafile– storedinsystemcatalogs
• Findingi-th fielddoesnotrequirescanofrecord– giventheaddressoftherecord,addressofafieldcanbeobtained
easily
Base address (B)
L1 L2 L3 L4
F1 F2 F3 F4
Address = B+L1+L2
DukeCS,Fall2017 CompSci516:DatabaseSystems 54
9/12/17
10
RecordFormats:VariableLength• Cannotusefixed-lengthslotsforrecords• Twoalternativeformats(#fieldsisfixed):
• Second offers direct access to i-th field, efficient storage of nulls (special don’t know value); small directory overhead
• Modification may be costly (may grow the field and not fit in the page)
4 $ $ $ $
FieldCount
Fields Delimited by Special Symbols
F1 F2 F3 F4
F1 F2 F3 F4
Array of Field Offsets
1.usedelimiters
2.useoffsetsatthestartofeachrecord
DukeCS,Fall2017 CompSci516:DatabaseSystems 55