Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of...

43
1 Normalization 02 CSE3421 notes

Transcript of Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of...

Page 1: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

1

Normalization 02

CSE3421 notes

Page 2: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

2

Closure of a set of attributes

Given a set of FDs F, find the set of attributes functionally determined by a given set of attributes X. This is called the closure of X, and denoted as X+.

F, X � ? (or, X+ = ? ) Example: F = { A � B, B � C} Closure of A: A+ = ABC

Page 3: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

3

Example

F = { C � A,

BC � D,

ACD � B,

D � EG,

AB � C}

AB � ?C

BC

ACD

D

AB

GEDCBA

AB � ABC � ABC(A)D � ABCD(B)EG

Redundant from C

From BC

Redundant from ACDFrom D

Page 4: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

4

Closure of a set of FDs

• Given a set of FDs F, find all FDs that can be produced by F. This is called the closure of F, and denoted as F+.

• F+ can be found by applying the Armstrong Axioms.

Page 5: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

5

Armstrong Axioms

• (A1) Reflexivity (this produces all trivial FDs)

X Y X Y⊇ ⇒ →• (A2) Augmentation

,X Y XZ YZ Z→ ⇒ → ∀

• (A3) Transitivity

,X Y Y Z X Z→ → ⇒ →

Page 6: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

6

Additional rules• Union

,X Y X Z X YZ→ → ⇒ →• Decomposition

X YZ X Y and X Z→ ⇒ → →

By applying (A1), (A2), (A3) repeatedly, we get F+.

Page 7: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

7

Example

• Let R: (A, B, C) and F = { A � B, B � C}. F+ = ?

• Apply Armstrong Axioms

• (A1) (reflexivity) produces all trivial FDs, i.e., those whose left-hand-side (LHS) is a superset of the right-hand-side (RHS). For example, AB � A, AB � B, etc).

Page 8: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

8

• Non-trivial FDs:– Apply (A3), transitivity:

• A � B with B � C generate A � C.

• (A3) cannot be further applied.

– Apply (A2), augmentation, in all possible permutations.• For A � B: remaining attribute is C. Therefore, AC�BC is in

F+.

• For B�C: add A and get BA�CA. Therefore, AB�AC is in F+.

• For A�C: add B and get AB�BC. Therefore, AB�BC is in F+.

Page 9: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

9

So, F+ = { A � B, B � C, all trivial FDs, A � C, AC � BC, AB � AC, AB � BC}

From (A3)

From (A2)

Page 10: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

10

Where are we …X+: Closure of set of attributes

F+: closure of set of FDs

MC: minimal cover of F

Lossless join property

Algorithm to compute 3NF

Preservation of dependencies

Page 11: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

11

Lossless join

• If a relation R is decomposed into R1, R2, …and the natural join R1 join R2 join R3 …is exactly equal to R, then the decomposition is said to have the lossless join property.

Page 12: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

12

Example

c2b2a2

c1b1a1

CBA

R

b2a2

b1a1

BA

R1:(A,B)

c2b2

c1b1

CB

R2:(B,C)

c1b1a1

c2b2a2

CBA

R1 join R2

Exactly equal to R. so have lossless join

property.

Page 13: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

13

Example

c2b1a2

c1b1a1

CBA

R

b1a2

b1a1

BA

R1:(A,B)

c2b1

c1b1

CB

R2:(B,C)

c2b1a2

c1b1a2

c1b1a1

c2b1a1

CBA

R1 join R2

Not the same as R. So do

not have lossless join

property. (have lossy

join)

Page 14: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

14

Proposition

A decomposition

( )1 2,R Rρ =Has a lossless join with respect to F, if and only if, either

[ ]1 2 1 2R R R R F+∩ → − ∈

or

[ ]1 2 2 1R R R R F+∩ → − ∈

Page 15: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

15

Example Assume R1 = {A, B}, R2 = {B, C} is a decomposition of R. Then,

{ }1 2R R B∩ =

{ }1 2R R A− =

{ }2 1R R C− =

For the decomposition to be lossless, it should be either B�A in F+ or B�C in F+. But to figure if this is the case, we should have F first (which we don’t in this example) … stay tuned.

Page 16: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

16

Proposition

If X�Y is in F of R and X Y∩ = ∅ then

The decomposition of R into R1: R-Y; R2 : XY, is lossless.

Page 17: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

17

Proof

( )( )

1 2

1 2

R R R Y XY

XYW Y XY

XW XY

X

R R X

∩ = − ∩

= − ∩= ∩=⇒ ∩ =

For some W

Now need to show that

X�R1 (i.e., X� R-Y), or

X�R2 (i.e., X�XY)

Page 18: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

18

Proof …/ Since

X�Y (assumption), and

X�X (trivial),

we have that X�XY, which is R2.

Therefore, X�R2. (done)

Page 19: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

19

Preservation of dependencies

Definition: A decomposition

( )1 2,...,, NR R Rρ =

Preserves a set of FDs F, if

( )1

i

N

Ri

F Fπ+

+ +

=

= ∪

where ( )iR Fπ +

is the set of all dependencies from F+ that are

comprised of attributes in Ri.

Page 20: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

20

Note: for simplicity, we may denote

( )iR Fπ +

as

iF +

Page 21: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

21

Example

R:(A, B, C)

F= { A � C,

B � C,

A � B}

Assume R1 = (A, B) and R2 = (B, C).

Does the decomposition of R into R1, R2 preserve dependencies?

Page 22: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

22

Example …/

Check 1 2F F∪ first (and if not succeed then have to try

1 2F F+ +∪

{ }1 2 ,F F A B B C∪ = → →

Need only show that the 3rd dependency of F, A � C can be generated by

1 2F F∪

Since A�B and B�C, we have that A�C. (done).

Therefore the decomposition preserves dependencies.

Page 23: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

23

Example 2

R:(A, B, C)

F= { A � B,

B � C,

C � A}

( )1 2,R Rρ = Where R1 = (A, B), R2 = (B, C)

Therefore, F1 = {A � B} and F2 = { B � C}

Still need C � A. (note, A�B and B�C imply A�C but not C�A.

Page 24: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

24

Example 2 …

Have to find 1 2 .F and F+ +

1 :F +

Have to calculate F+ and then project on the appropriate attributes.

To calculate F+, apply Armstrong axioms on F.

Page 25: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

25

Example 2 …

F:

A � B (1)

B � C (2)

C � A (3)

(1), (2) � A � C, in F+.

(2), (3) � B � A, in F+ (and also in F1+)

(3), (1) � C � B, in F+ (and also in F2+)

Page 26: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

26

Example 2 …/So far …

F1+: A � B (i) B � A (ii)

F2+: B � C (iii)C � B (iv)

Still need C � A. Notice, (iv), (ii) � C � A. i.e. C � A is in � done. Therefore, the decomposition preserves dependencies.

( )1 2F F++ +∪

Page 27: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

27

Where are we …X+: Closure of set of attributes

F+: closure of set of FDs

MC: minimal cover of F

Lossless join property

Algorithm to compute 3NF

Preservation of dependencies

Page 28: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

28

Minimal cover for a set of FDs

• A minimal cover G for a set of FDs F, is an equivalent set of FDs that is minimal in the following sense:

1. Every dependency in G is as small as possible (i.e., each LHS of each FD has as few attributes as possible, and the RHS has only one attribute).

2. Every FD in G is required for the closure G+ to be equal to the closure F+.

Page 29: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

29

How to calculate G?

• Given F, how do we obtain the minimal cover G?

1. Put F in standard form(i.e., all FDs in F have RHS with one attribute).

2. Eliminate extraneous attributes from LHS.

3. Eliminate redundant FDs.

Page 30: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

30

Important !!!

1. The above steps should be performed in order (1)-(2)-(3), or else the result may not be a minimal cover.

2. The order in which we process the FDs may result in different minimal covers (… which is ok). i.e., the minimal cover is not unique for a set F.

Page 31: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

31

Example

F:A � B (1)ABCD � E (2)EF � G (3)EF � H (4)ACDF � EG (5)

Calculate the minimal cover of F.

Page 32: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

32

Step 1: Put F in standard form

• FDs (1)…(4) are already in standard form.

• For FD (5): – ACDF � E (5.1)

– ACDF � G (5.2)

Page 33: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

33

Step 2: eliminate extraneous attiributes from LHS(minimize LHSs)

(1) A � B : nothing to eliminate.

(2) ABCD � E. • If delete A, will have BCD � E.

• Is this LHS good enough?

• It is, if either • BCD � E, or

• BCD � W, such that W contains ABCD (the original LHS).

Note, (BCD)+ = BCD. Therefore, cannot delete A.

Page 34: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

34

Can we delete B?If so, then will have

ACD � EIs this LHS good enough?It is, if either

ACD � E, orACD � W that contains ABCD.

Note, ACD � ACD � ABCD. Therefore, ACD � W = ABCD, and thus B can be eliminated!

So, ABCD � E becomes ACD � E.

Page 35: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

35

Can I delete any more? i.e., delete C or D?If delete C, then ACD � E becomes AD � E. Test:

AD � AD � ABD, does not contain ACD. Therefore, cannot delete C.

Similarly, cannot delete D.

Since we finished scanning the entire LHS of this FD, step 2 is finished for this FD, and the resulting FD is

ACD � E (2.1) – replaces (2) of original F.

Page 36: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

36

Repeat the above process for FDs (3), (4), (5.1), (5.2).

For (3): EF � G.

1. Can I delete E? ... If so, will have F � G. – not possible … check.

2. Can I delete F? … if so, will have E � G – not possible .. Check.

Therefore, there is no change in (3).

Page 37: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

37

For (4) : EF � HAgain, cannot delete anything from LHS.

For (5.1) [ ACDF � E ].

Delete A?

�CDF � E ? Or,

�CDF � W that contains ACDF ?

(CDF)+ = CDF, which is neither E nor W.

� cannot delete A.

Page 38: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

38

For (5.1) [ ACDF � E ] …

Delete C?

�ADF � E, or

�ADF � W that contains ACDF.

ADF � ADF �from (1)� ABDF, which does contain E or ACDF.

� Cannot delete C.

Page 39: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

39

For (5.1) [ ACDF � E ] …

Delete D?

�ACF � E, or

�ACF � W that contains ACDF.

ACF � ACF �from (1)� ABCF, which does not contain E or ACDF.

� Cannot delete D.

Page 40: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

40

For (5.1) [ ACDF � E ] …Delete F?�ACD � E, or�ACD � W that contains ACDF.

ACD � ACD �from (1)� ABCD�from (2)� ABCDE, which contains E !!

Therefore, F can be deleted from the LHS of (5.1), and (5.1) becomes

ACD ���� E (5.1.1)

Finished step 2 of (5.1)!! .. On to (5.2) …

Page 41: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

41

ACDF � G (5.2)• Repeat the above process and find that

nothing can be eliminated from the LHS of (5.2).

• So step 2 of the minimal cover computation is finished (we minimized all LHSs of all FDs).

Page 42: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

42

The resulting FDs from step 2, are:

• A � B (1) ---------- (1)

• ACD � E (2.1) ------- (2)

• EF � G (3) --------- (3)

• EF � H (4) --------- (4)

• ACD � E (5.1.1)

• ACDF � G (5.2) ------- (5)Same as (2) !!

Page 43: Normalization 02 - York University Fall 2009/LectureNotes/Normalizat… · 4 Closure of a set of FDs • Given a set of FDs F, find all FDs that can be produced by F. This is called

43

End of Normalization 02