CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... ›...

19
October 17, 2017 Sam Siewert CS317 File and Database Systems Lecture 8 – Normalization, Bottom-Up from UNF to BCNF http://www.google.com/about/datacenters/gallery/index.html#/locations/the-dalles/1 http://nsa.gov1.info/utah-data-center/

Transcript of CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... ›...

Page 1: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

October 17, 2017 Sam Siewert

CS317File and Database Systems

Lecture 8 – Normalization, Bottom-Upfrom UNF to BCNF

http://www.google.com/about/datacenters/gallery/index.html#/locations/the-dalles/1

http://nsa.gov1.info/utah-data-center/

Page 2: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

Inside a Datacenter – E.g. Green House1 Rack is 42 Rack Units [U = 1.75”], 6.125 Feet High, StackedHDD Storage 2U to 4U, 3.5” HDDSSD Storage 1U to 2U, 2.5” SSDComputing 1U to 4U ServersStandard Rack Depth = 36”, Width = 19” or 23”Hot Rows [Fan Exhaust], Cold Rows [Front Panels]Power, Chillers for Air Handling, Optional Liquid Cooling– E.g. Emerson Liebert– 120/240VAC Power Conditioning

and Distribution – E.g. Eaton and Pulizzi PDU

– DC Telco Rack Alternatives [Higher Efficiency, Less Convenient]

Sam Siewert 2

Sam Siewert – Typical Datacenter Rack[E.g. DreamWorks, Microsoft, Green House,NCAR, NOAA, DoE, Xerox, Amazon, …]

Page 3: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

Basic Datacenter Figures of Merit [2014]Power Density – 10 KW Per Rack– 120/240VAC, 20Amp Circuits, 1-Phase Loads– E.g. 10KW Per Rack, 2 x 20Amp 240VAC PDU– Dual Circuit for Dual Power Supply Computing and Storage– Hot Swap Power Units in Servers and Storage Enclosures

Storage Density – TB/U, 1PB Rack is 24 to 48 TB/U– 60+ 3.5” 3TB HDDs, 180TB in 4U, 45TB/U– RAID10 is Striped and Mirrored Storage (Typical for DBMS)

Compute Density – 2 or 4 Socket 1U/2U Servers– E.g. 8 Cores/CPU, 4 CPU Sockets, HT3 or QPI, 32 per U

Network Port Density and Bandwidth– GigE, 10GE, 40G Infiniband or Bonded 10GE, 100G CEE– Ports Per Server, Copper [Twinax, TP, IB] or Multi-mode Optical

[LC/SC SFP/SFP+ connectors], Typically 2 to 4 or More– SFP Transceivers – Copper or Optical– LC/SC Connectors for Optical– Switch Port Density

Sam Siewert 3

E.g. LC SFP

http://en.wikipedia.org/wiki/Small_form-factor_pluggable_transceiver

Page 4: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

Significance of NormalizationStructured (Normalized), Indexed, Searchable, High Veracity Data

– E.g. 2TB HDD Could Hold 6.8 Kilobytes of Data on Every Person in US– 1 Petabyte [RAID 10 42U Rack] Stores 3 Megabytes for Every US Citizen– E.g. All Documented Life Events [Legal, Travel, Residency]– 1000 Racks, 1 Exabyte Structured + Unstructured Data [E-mail and Audio]– NSA Utah Estimated to Store up to 3 to 12 Exabytes by Forbes– 180 Petabytes for 24 hour Audio on 300 million People– 300 Petabytes for 1 Year of Phone Conversations on all US Citizens [Forbes]

Unstructured BLOB Files [Documents, Images, Audio, Video]– Audio with Compression [10 to 200 hours of MP3 per 640MB CD]– Easily 24 hours on an MP3 CD of Intelligible Conversation– Images - JPEG Lossy [10:1 to 20:1], PNG Lossless [4:1 to 10:1] – MPEG Compression is 20:1 to 100:1 [Lossy] for I-frame and MVQ B/P-frames– 24 Hours of SD Video is about 50 Gigabytes of Data [10 SD DVDs]– 14,305 Racks of RAID-10 Disk for a Day of 30Hz SD Video of All US Citizens

Structured Data – Financial & Legal Transactions, Records

Unstructured Documents, E-mail, Audio, Snapshots, [Some Video]– Not Only Capture and Store, but Search!

Sam Siewert 4

Page 5: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

RemindersEx #4 Posted

Assignment #3 Returned Next Week

Assignment #5, Physical DB Design and Project!

Assignment #6, Complete DBMS Project – FINAL– Design Schema for DBMS project in a small team

Logical design focusNormalizationPhysical is MySQL on PRClab

– Combine Network Applications with DBMS in C/C++, JDBC, or Python - http://www.mysql.com/products/connector/

– Add Stored Programs and Triggers– Add Views– Create Transactions where needed

Sam Siewert 5

Page 6: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

NormalizationConcern is Duplication of Data in DBMS and Hazards– Wastes Space – Duplicate Data– Insert Hazard - New Staff Row Also Assigned to B007, Second

Insert of bAddress, must match that Already Existing for SA9– Delete Hazard – SA9 Quits, Row Delete, Lose B007 bAddress– Modification Hazard – bAddress Change for B005 or B003– Foreign Keys are Exception (Expected Redundancy for

Relational Model)

Sam Siewert 6

RedundantAttribute Data

14.3 – UNF [1NF]

Page 7: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

7

The Process of Normalization[Follow Rules for Relational Table Design and Hints coming from ER/EER Information Model]

UNF – Paper, Spreadsheet

Page 8: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

8

The Process of Normalization[Bottom-Up Tables]

Page 9: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

UNF -> 3NFMinimizes Update Anomalies [Insert, Update, Delete], Page 420 to 426One Client Renting Multiple Properties – Typical of Spreadsheet, Paper

Sam Siewert 9

cNo cName pNo pAddr start finish Rent oNo oName

CR76 John Kay PG4, PG16

6 Lawrence Street,5 Novar Drive

7/1/12,9/1/13

8/31/13,9/1/14

350,50

CO40,CO93

Tina Murphy,Tony Shaw

CR56 Aline Stewart PG4,PG36,PG16

6 Lawrence Street,2 Manor Road,5 Novar Drive

9/1/11,10/1/12,11/1/14

6/10/12,12/1/13,8/10/15

350375450

CO40CO93CO93

Tina Murphy,Tony Shaw,Tony Shaw

Client

PropertyOwner

PropertyForRent OwnerRental

14.10 – UNF

2NFRentalClient

3NF

Page 10: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

UNF -> 1NFUNF – Table with ONE or MORE Repeating Groups [Tuple Sub-set]1NF – Relation where Intersection of Each Row and Column has ONE Value

Sam Siewert 10

cNo cName pNo pAddr start finish Rent oNo oName

CR76 John Kay PG4, PG16

6 Lawrence Street,5 Novar Drive

7/1/12,9/1/13

8/31/13,9/1/14

350,50

CO40,CO93

Tina Murphy,Tony Shaw

CR56 Aline Stewart PG4,PG36,PG16

6 Lawrence Street,2 Manor Road,5 Novar Drive

9/1/11,10/1/12,11/1/14

6/10/12,12/1/13,8/10/15

350375450

CO40CO93CO93

Tina Murphy,Tony Shaw,Tony Shaw

14.10 – UNF

cNo cName pNo pAddr start finish Rent oNo oName

CR76 John Kay PG4 6 Lawrence Street 7/1/12 8/31/13 350 CO40 Tina Murphy

CR76 John Kay PG16 5 Novar Drive 9/1/13 9/1/14 50 CO93 Tony Shaw

CR56 Aline Stewart PG4 6 Lawrence Street 9/1/11 6/10/12 350 CO40 Tina Murphy

CR56 Aline Stewart PG36 2 Manor Road 10/1/12 12/1/13 375 CO93 Tony Shaw

CR56 Aline Stewart PG16 5 Novar Drive 11/1/14 8/10/15 450 CO93 Tony Shaw

14.11 – 1NF (Still Suffers all 3 Hazards)

Page 11: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

1NF -> 2NF1NF – Relation where Intersection of Row and Column Has ONE Value2NF – 1NF Relation where Every Non-Primary Key Attribute is Fully Functionally Dependent on the PK [ER 1..1 to 1..1 Relations]

Sam Siewert 11

cNo cName pNo pAddr start finish Rent oNo oName

CR76 John Kay PG4 6 Lawrence Street 7/1/12 8/31/13 350 CO40 Tina Murphy

CR76 John Kay PG16 5 Novar Drive 9/1/13 9/1/14 50 CO93 Tony Shaw

CR56 Aline Stewart PG4 6 Lawrence Street 9/1/11 6/10/12 350 CO40 Tina Murphy

CR56 Aline Stewart PG36 2 Manor Road 10/1/12 12/1/13 375 CO93 Tony Shaw

CR56 Aline Stewart PG16 5 Novar Drive 11/1/14 8/10/15 450 CO93 Tony Shaw

14.11 – 1NF (Suffers all 3 Hazards)

cNo [PK] cName

CR76 John Kay

CR56 Aline Stewart

14.14 – 2NF (Still Suffers Update Hazard Due to Transitive Dependency pNo -> oNo -> oName)

cNo pNo start finish

CR76 PG4 7/1/12 8/31/13

CR76 PG16 9/1/13 9/1/14

CR56 PG4 9/1/11 6/10/12

CR56 PG36 10/1/12 12/1/13

CR56 PG16 11/1/14 8/10/15

pNo pAddress rent oNo oName

PG4 6 Lawrence Street 350 CO40 Tina Murphy

PG16 5 Novar Drive 450 CO93 Tony Shaw

PG36 2 Manor Road 375 CO93 Tony Shaw

Page 12: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

Figure 14.13 Alternate 1NF

Sam Siewert 12

cNo pNo pAddr start finish Rent oNo oName

CR76 PG4 6 Lawrence Street 7/1/12 8/31/13 350 CO40 Tina Murphy

CR76 PG16 5 Novar Drive 9/1/13 9/1/14 50 CO93 Tony Shaw

CR56 PG4 6 Lawrence Street 9/1/11 6/10/12 350 CO40 Tina Murphy

CR56 PG36 2 Manor Road 10/1/12 12/1/13 375 CO93 Tony Shaw

CR56 PG16 5 Novar Drive 11/1/14 8/10/15 450 CO93 Tony Shaw

cNo [PK] cName

CR76 John Kay

CR56 Aline Stewart

Page 13: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

2NF -> 3NF [Also BCNF]2NF – 1NF Relation where Every Non-Primary Key Attribute is Fully Functionally Dependent on the PK [ER 1..1 to 1..1 Relations]3NF – 2NF Relation where no Non-PK Attribute is Transitively Dependent on a PK

Sam Siewert 13

cNo [PK] cName

CR76 John Kay

CR56 Aline Stewart

14.14 – 2NF (Still Suffers Update Hazard Due to Transitive Dependency pNo -> oNo -> oName)

cNo pNo start finish

CR76 PG4 7/1/12 8/31/13

CR76 PG16 9/1/13 9/1/14

CR56 PG4 9/1/11 6/10/12

CR56 PG36 10/1/12 12/1/13

CR56 PG16 11/1/14 8/10/15

pNo pAddress rent oNo oName

PG4 6 Lawrence Street 350 CO40 Tina Murphy

PG16 5 Novar Drive 450 CO93 Tony Shaw

PG36 2 Manor Road 375 CO93 Tony Shaw

oNo [PK] oName

CO40 Tina Murphy

CO93 Tony Shaw

pNo [PK] pAddress rent oNo [FK]

PG4 6 Lawrence Street 350 CO40

PG16 5 Novar Drive 450 CO93

PG36 2 Manor Road 375 CO93

Page 14: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

Lossless-Join Property of 3NFFundamental Point – P. 425 – Lossless-Join Reversibility

3NF is a Process to Apply Relational Algebra Projections– Creates a Lossless-Join Decomposition (Reducing or Eliminating Insert,

Delete, Update Hazards)– Using Natural Join (in a View) we Can Reverse– View [Stored Query] Easily Regenerate 1NF Version– UNF Could Be Re-created Via Application Report Generation

1. Elimination of Repeating Groups -> 1NF

2. 1NF -> Every Non-PK [CK?] Attribute Fully Functionally Dependent on PK [any CK?] -> 2NF (No Partial Dependencies Allowed)

3. 2NF -> No Non-PK [CK?] is Transitively Dependent on the PK [any CK?] -> 3NF

Sam Siewert 14

Page 15: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

Reminder – SK, CK, PK, AK, FKSK – Attribute of Set of Attributes that UNIQUELY identifies Tuple in Relation

CK – An SK, s.t. no Proper Subset is also an SK [Minimal]– UNIQUIENESS – CK uniquely Identifies all Tuples in Relation– IRREDUCIBILITY – No Proper Subset of CK has UNIQUENESS

PK – CK Selected to ID Tuples UNIQUELY in Relation

AK – CK Not Selected to be PK

FK – An Attribute or Set of Attributes in R1 that Matches CK in R1 or R2..N

Sam Siewert 15

Page 16: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

Is 3NF Good Enough?Recall that PK Selection is From Set of CKs in Relation

Dependencies on Remaining CKs not Used as PK?

Strengthen 2NF and 3NF Definitions to Include ANY CK1. 1NF -> Every Non-CK Attribute Fully Functionally Dependent

on ANY CK -> 2NF (No Partial Dependencies Allowed)2. 2NF -> No Non-CK is Transitively Dependent on ANY CK ->

3NF

Even With STRONG 2NF & 3NF, Dependencies Can Still Cause Redundancy

BCNF Considers Common Cases

Sam Siewert 16

Page 17: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

BCNFBCNF – Relation is BCNF If-and-only-if Every Determinant is a CK

Determinant – Attribute or Group of Attributes on Which some OTHER Attribute is Fully Functionally Dependent

3NF allows A -> B if B is PK and A is not CK

BCNF REQUIRES A to be CK [Further Constrains]– Issues Arise when Relation contains 2+ Composite CKs– CKs Overlap [Common Attribute]

Stopping at 3NF Preferred [Sometimes] to Avoid Loss of Dependencies

Sam Siewert 17

Page 18: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

BCNF ExamplecNo intDate intTime staffNo roomNoCR76 5/13/14 10:30 SG5 G101CR56 5/13/14 12:00 SG5 G101CR74 5/13/14 12:00 SG37 G102CR56 7/1/14 10:30 SG5 G102

Sam Siewert 18

ClientInterview 3 Candidate Keys1. (cNo, intDate) - PK2. (staffNo, intDate, intTime) - CK3. (roomNo, intDate, intTime) – CK

intDate is Overlap between 3 CKs (creates Hazard)

(staffNo, intDate) determinant is not a CK for ClientIntervew

ClientInterview has following functional dependencies1. (cNo, intDate) -> intTime, staffNo, roomNo2. (staffNo, intDate, intTime) -> cNo3. (roomNo, intDate, intTime) -> staffNo, cNo4. staffNo, intDate -> roomNo

ClientInterview 3NF Relation

Page 19: CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu › ... › Lecture-Week-8-3-grayscale.pdf · Lecture 8 – Normalization, Bottom-Up from UNF to BCNF ...

BCNF Example

cNo intDate intTime staffNoCR76 5/13/14 10:30 SG5CR56 5/13/14 12:00 SG5CR74 5/13/14 12:00 SG37CR56 7/1/14 10:30 SG5

Sam Siewert 19

Interview RelationstaffNo intDate roomNoSG5 5/13/14 G101SG37 5/13/14 G102SG5 7/1/14 G102

Interview Relation

cNo intDate intTime staffNo roomNoCR76 5/13/14 10:30 SG5 G101CR56 5/13/14 12:00 SG5 G101CR74 5/13/14 12:00 SG37 G102CR56 7/1/14 10:30 SG5 G102

ClientInterview 3NF Relation