Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs...

25
cs3431 Normalization Part II
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs...

Page 1: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Normalization

Part II

Page 2: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Attribute Closure : Example

Consider R (A, B, C, D, E) with FDs A B, B C, CD E Does A E hold ? (Is A E in F+ ?)

Rephrase as : Is E in A+ ? Let us compute {A}+

{A}+ = {A, B, C} Therefore, A E is false

Page 3: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Decomposition

Decomposition:

Must be Lossless (no spurious tuples!)

Page 4: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Decomposing Relations

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 MM

StudentProf

FDs: pNumber pName

sNumber sName pNumber

s1 Dave p1

s2 Greg p2

Student

pNumber pName

p1 MM

p2 MM

Professor

Page 5: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Decomposition: Lossless Join Property

sNumber sName pName

S1 Dave MM

S2 Greg MM

Student

pNumber pName

p1 MM

p2 MM

Professor

sNumber sName pNumber pName

s1 Dave p1 MM

s1 Dave p2 MM

s2 Greg p1 MM

s2 Greg p2 MM

StudentProf

SpuriousTuples

Page 6: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Normalization

What is the algorithm for correct (lossless) decomposition ?

Page 7: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Normalization Step : Decompose Consider relation R with set of attributes AR.

Consider a FD : A B (such that no other attribute in (AR – A – B) is functionally determined by A).

If A is not a superkey for R, we may decompose R as: Create R’ (AR – B) Create R’’ with attributes A B Key for R’’ = A Foreign key : R’ (A) references R’’ (A)

Page 8: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Example Decomposition Revisited

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 MM

StudentProf

FDs: pNumber pName

sNumber sName pNumber

s1 Dave p1

s2 Greg p2

Student

pNumber pName

p1 MM

p2 MM

Professor

FOREIGN KEY: Student (PNum) references Professor (PNum)

Page 9: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Schema Refinement : Normal Forms

Question : How decide if any refinement of schema is needed ?

If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized.

Page 10: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Normal Form: BCNF

Boyce Codd Normal Form (BCNF): For every non-trivial FD X B in R,

X is a superkey of R.

Page 11: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

BCNF Example

Relation: SCI (student, course, instructor)

FDs: student, course instructorinstructor course

Decomposition:

SI (student, instructor)Instructor (instructor, course)

Page 12: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Decomposition Algo into BCNF

Repeated application of decomposition will result in: relations that are in BCNF; lossless join decomposition, and guaranteed to terminate.

Page 13: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Decomposition : Dependency Preserving ?

Intuition: Can we locally in each decomposed relation check the functional dependencies ?

Consider relation CSJDPQV, C is key, JP C and SD P. Decomposition: CSJDQV and SDP

Is it lossless ? Yes ! Is it in BCNF ? Yes ! Problem: Checking JP C requires a join!

Page 14: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Dependency Preserving Decomposition Intuition :

If R is decomposed into X, Y and Z, and we enforce FDs that hold on X, on Y and on Z, then all FDs that were given to hold on R must also hold.

Formal Definition : Decomposition of R into X and Y is dependency

preserving if (FX union FY ) + = F +

Projection of set of FDs F: If R is decomposed into X, Y, ... , then projection of F

onto X (denoted FX ) is the set of FDs U -> V in F+ (closure of F ) such that U, V are in X.

Page 15: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Dependency Preserving Decompositions

Decomposition of R into X and Y is dependency preserving if (FX union FY ) + = F +

Important to consider F +, not F, in this definition: ABC, A B, B C, C A,

decomposed into AB and BC. Is this dependency preserving? Is C A preserved ?

Page 16: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

BCNF and Dependency Preservation

In general, a dependency preserving decomposition into BCNF may not exist !

Example : CSZ, CS Z, Z C

Not in BCNF. Can’t decompose while preserving 1st FD.

Page 17: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Dependency Preservation

BCNF does not necessarily preserve FDs.But 3NF is guaranteed to be able to preserve FDs.

Page 18: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Normal Form : 3NF

Third Normal Form (3NF): For every non-trivial FD X B in R,

either X is a superkey of R, or B is a prime attribute (B is part of a key).

Page 19: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

3NF vs BCNF ? If R is in BCNF, obviously R is in 3NF. If R is in 3NF, R may not be in BCNF.

If R is in 3NF, some redundancy is possible. 3NF is a compromise used when BCNF not

achievable, i.e., when no ``good’’ decomp exists

Important: Lossless-join, dependency-preserving decomposition of R into a collection of 3NF relations always possible !

Page 20: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Algorithm : Decomposition into 3NF

Decomposition algorithm again used, but typically can stop earlier).

But how to ensure dependency preservation? Idea 1:

If X Y is not preserved, add relation XY. Problem is that XY may violate 3NF!

Idea 2 : Instead of the given set of FDs F, use a minimal cover for F.

Page 21: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Minimal Cover for a Set of FDs

Minimal cover G for a set of FDs F: Closure of F = closure of G. Right hand side of each FD in G is single attribute. If we modify G by deleting a FD or by deleting

attributes from an FD in G, the closure changes. Intuition: every FD in G is needed, and ``as

small as possible’’ in order to get the same closure as F.

Example : If both J C and JP C, then only keep the first one.

Page 22: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Minimal Cover for a Set of FDs

Theorem : Use minimum cover of FD+ in decomposition

guarantees that the decomposition is lossless-join and dependency-preserving .

Example : Given :

A B, ABCD E, EF GH, ACDF EG Then the minimal cover is:

A B, ACD E, EF G and EF H

Page 23: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Algorithm for Minimal Cover

Decompose FD into one attribute on RHS Minimize left side of each FD

Check each attribute on LHS to see if deleted while still preserving the equivalence to F+.

Delete redundant FDs.

Note: Several minimal covers may exist.

Page 24: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

3NF Decomposition Algorithm

Compute minimal cover G of F Decompose R using minimal cover G of FD into lossless

decomposition of R. Each Ri is in 3NF Fi is projection of F onto Ri

Identify dependencies in F not preserved now, X A Create relation XA :

New relation XA preserves X A X is key of XA, because G is minimal cover. Hence no Y subset

X exists, with Y A If another dependency exists in XA; can only imply attribute of X.

Page 25: Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.

cs3431

Summary Step 1: BCNF is a good form for relation

If a relation is in BCNF, it is free of redundancies that can be detected using FDs.

Step 2 : If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations.

Step 3: If a lossless-join dependency-preserving decomposition into BCNF is not possible (or unsuitable given typical queries), consider decomposition into 3NF.

Note: Decompositions should be carried out while keeping performance requirements in mind.