BCNF vs 3NF

72
Relational Database Design Relational Database Design BCNF, 3NF BCNF, 3NF

description

normalization

Transcript of BCNF vs 3NF

Page 1: BCNF vs 3NF

Relational Database Design Relational Database Design

BCNF, 3NFBCNF, 3NF

Page 2: BCNF vs 3NF

2

Learning ObjectivesLearning Objectives

Understand the rationale (anomalies) and definition of the main normal

forms based on functional dependencies (2NF, 3NF and BCNF)

Be able to decompose (or synthesize) a schema into a dependency

preserving BCNF or 3NF.

Page 3: BCNF vs 3NF

3

Anomalies: ExampleAnomalies: Example

first_name last_name address department position salary

Dewi Srijaya 12a Jln Lempeng Toys clerk 2000

Izabel Leong 10 Outram Park Sports trainee 1200

John Smith 107 Clementi Rd Toys clerk 2000

Axel Bayer 55 Cuscaden Rd Sports trainee 1200

Winny Lee 10 West Coast Rd Sports manager 2500

Sylvia Tok 22 East Coast Lane Toys manager 2600

Eric Wei 100 Jurong drive Toys assistant manager 2200

? ? ? ? security guard 1500

Redundant storage

Update anomaly

Potential deletion anomaly

Insertion anomaly

Assume the position determines the salary:

position → salary

key

T1

Page 4: BCNF vs 3NF

4

Decomposition ExampleDecomposition Example

first_name last_name address department position

Dewi Srijaya 12a Jln lempeng Toys clerk

Izabel Leong 10 Outram Park Sports trainee

John Smith 107 Clementi Rd Toys clerk

Axel Bayer 55 Cuscaden Rd Sports trainee

Winny Lee 10 West Coast Rd Sports manager

Sylvia Tok 22 East Coast Lane Toys manager

Eric Wei 100 Jurong drive Toys assistant manager

position salary

clerk 2000

trainee 1200

manager 2500

assistant manager 2200

security guard 1500

T2

T3

�No Redundant storage

�No Update anomaly

�No Deletion anomaly

�No Insertion anomaly

Page 5: BCNF vs 3NF

5

NormalizationNormalization

Normalization is the process of decomposing a relation schema R into fragments (i.e., smaller tables) R1, R2,.., Rn. Our goals are:

Lossless decomposition: The fragments should contain the same information as the original table. Otherwise decomposition results in information loss.

Dependency preservation: Dependencies should be preserved within each Ri

, i.e., otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.

Good form: The fragments Ri should not involve redundancy. Roughly speaking, a table has redundancy if there is a FD where the LHS is not a key (more on this later).

Page 6: BCNF vs 3NF

6

A relation is in a particular normal form if it satisfies certain normalization

properties.

There are several normal forms defined:

1NF - First Normal Form

2NF - Second Normal Form

3NF - Third Normal Form

BCNF - Boyce-Codd Normal Form

4NF - Fourth Normal Form

5NF - Fifth Normal Form

Each of these normal forms are stricter than the next.

For example, 3NF is better than 2NF because it removes more

redundancy/anomalies from the schema than 2NF.

Page 7: BCNF vs 3NF

7

Types of NFTypes of NF

Page 8: BCNF vs 3NF

8

Lossless Join DecompositionLossless Join Decomposition

The decomposition is lossless ( lossless join) if we can recover the initial table by performing an outer join of the fragments.

In general a decomposition of R into R1 and R2 is lossless if and only if at least one of the following dependencies is in F+:

R1 ∩ R2 → R1

R1 ∩ R2 → R2

In other words, the common attribute of R1 and R2 must be a candidate key for R1 or R2. In our example, the decomposition is lossless because position is a key for T3.

Page 9: BCNF vs 3NF

9

Example of a Example of a LossyLossy DecompositionDecomposition

Decompose R = (A,B,C) into R1 = (A,B) and R2 = (B,C)

It is a lossy decomposition:

two extraneous tuples.

You get more, not less!!

ΠΑ,B(r) ΠB,C(r)A B

α 1

α 2

β 1

B C

1 m

2 n

1 p

r A B C

α 1 m

α 2 n

β 1 p

A B C

α 1 m

α 2 n

β 1 p

α 1 p

β 1 m

∏A,B(R) ∏B,C(R)

Page 10: BCNF vs 3NF

10

Dependency Preserving DecompositionDependency Preserving Decomposition

The decomposition of a relation scheme R with FDs F is a set of tables

(fragments) Ri with FDs Fi

Fi is the subset of dependencies in F+ (the closure of F) that include only

attributes in Ri.

The decomposition is dependency preserving if and only if

(∪i Fi)+ = F+

Page 11: BCNF vs 3NF

11

NonNon--Dependency Preserving Dependency Preserving

Decomposition ExampleDecomposition Example

R = (A, B, C), F = {{A}→{B}, {B}→{C}, {A}→{C}}. Key: A

There is a dependency {B}→ {C}, where the LHS is not the key, meaning that there can be considerable

redundancy in R.

Solution: Break it in two tables R1(A,B), R2(A,C) (normalization)

424

323

322

321

CBA

The decomposition is lossless because the common attribute A is a key for R1 (and R2)

The decomposition is not dependency preserving because F1={{A}→{B}}, F2={{A}→{C}} and

(F1∪F2)+≠F+

We lost the FD {B}→{C}.

In practical terms, each FD is implemented as a constraint or assertion, which it is checked when

there are updates. In the above example, in order to find violations, we have to join R1 and R2.

Can be very expensive.

44

33

32

31

CA

24

23

22

21

BA

Page 12: BCNF vs 3NF

12

Dependency Preserving Dependency Preserving

Decomposition ExampleDecomposition Example

R = (A, B, C), F = {{A}→{B}, {B}→{C}, {A}→{C}}. Key: A

Break R in two tables R1(A,B), R2(B,C)

424

323

322

321

CBA

24

23

22

21

BA

32

42

CB

The decomposition is lossless because the common attribute B is a key for R2

The decomposition is dependency preserving because F1={{A}→{B}}, F2={{B}→{C}} and

(F1∪F2)+=F+

Violations can be found by inspecting the individual tables, without performing a join.

Page 13: BCNF vs 3NF

13

Looking for a Looking for a ““GoodGood”” FormForm

Recall that our goals are

Lossless decomposition - necessary in order to ensure correctness of the data

Dependency preservation – not necessary, but desirable in order to achieve efficiency of updates

Good form – in order to avoid redundancy.

But what it means for a table to be in good form?

First Normal Form (1NF).

If the domains of all attributes in a table contain only atomic values, then the table is in

In other words, there are no nested tables, multi-valued attributes, or complex structures such as lists.

Relational tables are always in 1NF, according to the definition of the relational model.

Page 14: BCNF vs 3NF

14

1 NF( 1 NF( contdcontd……))

A relation is in first normal form (1NF) if all its attribute values are

atomic.

That is, a 1NF relation cannot have an attribute value that is:

a set of values (multi-valued attribute)

a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs.

However, object-oriented DBMSs and nested relational DBMSs relax this

constraint.

A relation that is not in 1NF is an unnormalized relation.

Page 15: BCNF vs 3NF

15

INF( INF( ContdContd……))

Two ways to convert a non-1NF relation to a 1NF relation:

1) Splitting Method - Divide the existing relation into two relations: non-

repeating attributes and repeating attributes.

FMake a relation consisting of the primary key of the original relation

and the repeating attributes. Determine a primary key for this new

relation.

FRemove the repeating attributes from the original relation.

2) Flattening Method - Create new tuples for the repeating data combined

with the data that does not repeat.

FIntroduces redundancy that will be later removed by normalization.

FDetermine primary key for this flattened relation.

Page 16: BCNF vs 3NF

16

INF(ContdINF(Contd……))

Converting a non-1NF Relation to 1NF Using Splitting

Page 17: BCNF vs 3NF

17

INF(ContdINF(Contd……))

Converting a non-1NF Relation to 1NF Using Flattening

Page 18: BCNF vs 3NF

18

Second Normal Form (Second Normal Form (2NF2NF))Not permit Partial FDs

R is a relation schema, with the set F of FDs

R is in 2NF if and only if

for each FD: X → {A} in F+

Then

• A ∈ X (the FD is trivial), or

• Either X is not a proper subset of a candidate key for R, or

• If X is Proper sub set then A must be a prime attribute

• R is in 2NF if it is in INF and if all non prime attributes are fully functionally

depend on the ralation key(s).

• Not permits partial dependency between a nonprime attributes and KEYs

A prime attribute is an attribute that is part of a candidate key

In 2NF, a subset of a candidate key cannot determine a non-prime

attribute.

HINT: whenever you try to determine the normal form (2NF, 3NF, BCNF)

of a table, you always have to find all candidate keys.

Page 19: BCNF vs 3NF

19

2NF(Contd2NF(Contd…….).)

A relation is in second normal form (2NF) if it is in 1NF and every non-

primary key (non-prime) attribute is fully functionally dependent on the

primary key.

Alternative definition from your text: every nonkey column depends on

all candidate keys, not a subset of any candidate key

Violations:

Part of key -> nonkey

Violations only for combined keys

Note: By definition, any relation with a single primary key attribute is

always in 2NF.

If a relation is not in 2NF, we will divide it into separate relations each in 2NF

by insuring that the primary key of each new relation functionally

determines all the attributes in the relation.

Page 20: BCNF vs 3NF

20

2NF Example2NF Example--11

Consider the relation scheme {A,B,C,D} with the FDs:

{A,B} → {C,D} and

{A} → {D}

{A,B} is a candidate key (it is not a proper subset)

{A} is a proper subset of a candidate key

{D} is not a prime attribute

This scheme is not in 2NF because of {A} → {D}

2NF is not important because we can always achieve a better

form (3NF) that is lossless, preserves dependencies and

contains less redundancy.

Page 21: BCNF vs 3NF

21

2NF Example2NF Example--22

fd1 and fd4 are partial functional dependencies. Normalize to:

Emp (eno, ename, title, bdate, salary, supereno, dno)

WorksOn (eno, pno, resp, hours)

Proj (pno, pname, budget)

Page 22: BCNF vs 3NF

22

2NF Example2NF Example--2 2 contdcontd……....

Page 23: BCNF vs 3NF

23

Third Normal Form (Third Normal Form (3NF3NF))NOTE : 3NF does not permit partial FD and Transitive FD

R is a relation schema, with the set F of FDs

R is in 3NF if and only if

for each FD: X → {A} in F+

Then

• A ∈ X (trivial FD), or

• X is a superkey for R, or

• A is prime attribute for R

In words: For every FD that does not contain extraneous (useless)

attributes:

the LHS is a candidate key, or

the RHS is a prime attribute, i.e., it is an attribute that is part of a candidate

key

Page 24: BCNF vs 3NF

24

Third Normal Form (Third Normal Form (3NF3NF) ) contdcontd……..

Third normal form (3NF) is based on the notion of transitive dependency. A

transitive dependency A → C is a FD that can be inferred from existing

FDs A → B and B → C.

Note that a transitive dependency may involve more than 2 FDs.

A relation is in third normal form (3NF) if it is in 2NF and there is no non-

primary key (non-prime) attribute that is transitively dependent on the

primary key.

Alternate definition from your text: A table is in 3NF if it is in 2NF and each

nonkey column depends only on candidate keys, not on other nonkey

columns

Violations: Nonkey→ Nonkey

Converting a relation to 3NF from 2NF involves the removal of transitive

dependencies. If a transitive dependency exists, we remove the

transitively dependent attributes from the relation and put them in a new

relation along with a copy of the determinant (LHS of FD).

Page 25: BCNF vs 3NF

25

3NF Example3NF Example

R = (B, C, E)

F = {{E}→{B}, {B,C}→{E}}

Remember that you always have to find all candidate keys in order to

determine the normal form of a table

Two candidate keys: BC and EC

{E}→{B} B is prime attribute ( Here E is a proper subset but B is prime

attribute so Allowed)

{B,C}→{E} BC is a candidate key ( Allowed)

None of the FDs violates the rules of the previous slide. Therefore, R is in

3NF

Page 26: BCNF vs 3NF

26

Redundancy in 3NFRedundancy in 3NF

Bank-schema = (Branch B, Customer C, Employee E) .Two candidate keys: BC and ECF = {{E}→{B}, e.g., an employee works in a single branch{B,C}→{E}}, e.g., when a customer goes to a certain branch s/he is always served by the same employee

ChengnullCentral

JonesWongCentral

AuChinHKUST

AuWongHKUST

EmployeeCustomerBranch

A 3NF table still has problems

� redundancy (e.g., we repeat that Au works at HKUST branch)

� need to use null values (e.g., to represent that Cheng works at Central even though he is not

assigned any customers).

Page 27: BCNF vs 3NF

27

3NF Example3NF Example--22

fd2 results in a transitive dependency eno → salary. Remove it.

Page 28: BCNF vs 3NF

28

3NF 3NF ContdContd……..

A relation schema R is in 3NF if for all functional

dependencies that hold on R of the form X →Y, at least

one of the following holds:

Y is a prime attribute of R

X is a superkey of R

The last condition deals with transitive dependencies. Since

X is a superkey of R, we cannot have a non-prime attribute

(alone) for X and hence we cannot have transitive

dependencies.

Page 29: BCNF vs 3NF

29

General Definitions of 2NF and 3NFGeneral Definitions of 2NF and 3NF

We have defined 2NF and 3NF in terms of primary keys.

However, a more general definition considers all candidate

keys (just not the primary key we have chosen).

General definition of 2NF:

A relation is in 2NF if it is in 1NF and every non-prime

attribute is fully functionally dependent on any

candidate key.

General definition of 3NF:

A relation is in 3NF if it is in 2NF and there is no non-

prime attribute that is transitively dependent on any

candidate key.

Note that a prime attribute is an attribute that is in any key

(candidate or primary).

Page 30: BCNF vs 3NF

30

General Definition of 3NF ExampleGeneral Definition of 3NF Example

The relation is not in 3NF according to the basic definition

because SSN is not a primary key attribute.

However, there is nothing wrong with this schema (no

anomalies) because the SSN is a candidate key and any

attributes fully functionally dependent on the primary key

will also be fully functionally dependent on the candidate

key.

Thus, the general definition of 2NF and 3NF includes all

candidate keys instead of just the primary key

Page 31: BCNF vs 3NF

31

Normalization QuestionNormalization Question

Consider the universal relation R(A,B,C,D,E,F,G,H,I,J) and

the set of functional dependencies:

F= { A,B → C ; A → D,E ; B → F ; F → G,H ; D → I,J }

List the keys for R.

Decompose R into 2NF and then 3NF relations.

Page 32: BCNF vs 3NF

32

BoyceBoyce--CoddCodd Normal Form (Normal Form (BCNFBCNF))

R is a relation schema, with the set F of FDs

R is in BCNF if and only if

for each FD: X → {A} in F+

Then

• A ∈ X (trivial FD), or

• X is a superkey for R

In words: For every FD that does not contain extraneous (useless) attributes, the LHS of every FD is a candidate key.

BCNF tables have no redundancy.

If a table is in BCNF it is also in 3NF (and 2NF and 1NF)

Page 33: BCNF vs 3NF

33

BoyceBoyce--CoddCodd Normal Form (BCNF)Normal Form (BCNF)

A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key.

To test if a relation is in BCNF, we take the determinant of each FD in the relation and determine if it is a candidate

key.Special cases not covered by 3NF

1. Part of key → Part of key

2. Nonkey→ Part of key

Special cases are not common

The difference between 3NF and BCNF is that 3NF allows a FD X → Y to remain in the relation if X is a superkey or Y is a prime attribute. BCNF only allows this FD if X is a superkey.

Thus, BCNF is more restrictive than 3NF. However, in practice most relations in 3NF are also in BCNF.

Page 34: BCNF vs 3NF

34

BCNF ExampleBCNF Example

R = (B, C, E)F = {{E}→{B}, {B,C}→{E}}

Two candidate keys: BC and EC

{B,C}→{E} does not violate BCNF because BC is a key

{E}→{B} violates BCNF because E is not a key because in BCNF LHS of every FDsmust be a KEY(s)

In order to achieve BCNF we have to decompose the table but how?

Since the decomposition must be lossless, we only have one option: R1(B,E), and R2(C,E). The common attribute E should be key of one fragment, here R1.

Page 35: BCNF vs 3NF

35

BCNF Example (cont)BCNF Example (cont)

Bank-schema = (Branch B, Customer C, Employee E)

F = {{E}→{B}, {B,C}→{E}} . Decompose into R1(B,E), and R2(C,E)

ChengnullCentral

JonesWongCentral

AuChinHKUST

AuWongHKUST

EmployeeCustomerBranch

JonesCentral

ChengCentral

AuHKUST

EmployeeBranch

JonesWong

AuChin

AuWong

EmployeeCustomer

We have avoided the problems of redundancy and null values

of 3NF

Page 36: BCNF vs 3NF

36

BCNF Example (cont)BCNF Example (cont)

We can generate the original table by joining the two fragments

(however, but we must use an outer join -an outer join fills null values for tuples that do not have join

partners)

ChengnullCentral

JonesWongCentral

AuChinHKUST

AuWongHKUST

Empl.Cust.Branch

JonesCentral

ChengCentral

AuHKUST

EmployeeBranch

JonesWong

AuChin

AuWong

EmployeeCustomer

Is the decomposition dependency preserving?

No. We loose {B,C}→{E}

Can we have a dependency preserving decomposition?

No. No matter how we break we loose {B,C}→{E} since it involves all attributes

=

Page 37: BCNF vs 3NF

37

BCNFBCNF--ExampleExample--22

Consider the WorksOn relation where we have the added constraint that given the hours worked, we know exactly the employee who performed the work. (i.e. each employee is FD from the hours that they work on projects). Then:

Note that we lose the FD eno,pno → resp, hours.

Page 38: BCNF vs 3NF

38

Observations about BCNFObservations about BCNF

1. Best Normal Form

2. Avoids the problems of redundancy and all anomalies

3. There is always a lossless decomposition that generates

BCNF tables

4. However, we may not be able to preserve all

dependencies

Next step: an algorithm for automatically generating BCNF

tables.

Page 39: BCNF vs 3NF

39

BCNF versus 3NFBCNF versus 3NF

1. We can decompose to BCNF but sometimes we do not

want to if we lose a FD.

2. The decision to use 3NF or BCNF depends on the amount

of redundancy we are willing to accept and the willingness

to lose a functional dependency.

3. Note that we can always preserve the lossless-join

property (recovery) with a BCNF decomposition, but we do

no always get dependency preservation.

4. In contrast, we get both recovery and dependency

preservation with a 3NF decomposition.

Page 40: BCNF vs 3NF

40

Algorithm for BCNF Decomposition Algorithm for BCNF Decomposition

Let R be the initial table with FDs F

S={R}

Until all relation schemes in S are in BCNFfor each R in S

for each FD X → Y that violates BCNF for R

S = (S – {R}) ∪ (R-Y) ∪ (X,Y)

enduntil

This is a simplified version. In words:

When we find a table R with BCNF violation X→Y we:

1] Remove R from S

2] Add a table that has the same attributes as R except for Y

3] Add a second table that contains the attributes in X and Y

Page 41: BCNF vs 3NF

41

BCNF Decomposition Example BCNF Decomposition Example

Let us consider the relation scheme R=(A,B,C,D,E) and the FDs:{A} → {B,E}, {C} → {D}

Candidate key: AC

Both functional dependencies violate BCNF because the LHS is not a candidate key

Pick {A} → {B,E}

We can also choose {C} → {D} – different choices lead to different decompositions.

(A,B,C,D,E) generates R1=(A,C,D) and R2=(A,B,E)

Do we need to decompose further?

Page 42: BCNF vs 3NF

42

BCNF Decomposition Example (cont)BCNF Decomposition Example (cont)

(A,C,D) and (A,B,E)

{A}→{B,E}, {C}→{D}

We need to decompose R1=(A,C,D) because of the FD {C}→{D}

Thus (A,C,D) is replaced with R3=(A,C) and R4=(C,D).

Final decomposition: R2=(A,B,E), R3=(A,C), R4=(C,D)

Is the decomposition lossless?

Yes the algorithm always creates lossless decompositions. In step S = (S – {R}) ∪ (R-

Y) ∪ (X,Y) we replace R with tables (R-Y) and (X,Y) that have X as the common

attribute and X→Y, i.e., X is the key of (X,Y)

Is the decomposition dependency preserving?

Yes because F2={{A}→{B,E}}, F3=∅, F4={{C}→{D}} and (F2∪F3∪F4)+=F+

But remember: sometimes we may not be able to preserve dependencies

Page 43: BCNF vs 3NF

43

Testing if a FD violates BCNFTesting if a FD violates BCNF

Important question: which dependencies to check for BCNF violations? F or F+?

Answer-Part 1: To check if a table R with a given set of FDs F is in BCNF, it suffices to check only the dependencies in F

Consider R (A, B, C, D), with F = {{A}→{B}, {B}→{C}}

The key is {A,D}.

R violates BCNF because the LHS of both {A}→{B} and {B}→{C}. Neither A nor B is a key.

We can see that by simply using F - we do not need F+ (e.g., we do not need to check the implicit FD {A}→{C})

We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.

Page 44: BCNF vs 3NF

44

Testing if a FD violates BCNF (cont)Testing if a FD violates BCNF (cont)

Answer-Part 2: However, using only F is insufficient when testing a fragment in the decomposition of R

Consider again R(A,B,C,D), with F = {{A}→{B}, {B}→{C}} that violates BCNF

Decompose R into and R1(A,C,D) and R2(A,B)

There is no FD in F that contains only attributes from R1(A,C,D) so we might be mislead into thinking that R1 is in BCNF.

In fact, dependency {A}→{C} in F+ shows that R1 is not in BCNF.

Therefore, for the decomposed relations we also need to consider dependencies in F+(see next slide).

Page 45: BCNF vs 3NF

45

Testing if a FD violates BCNF (cont)Testing if a FD violates BCNF (cont)

To check if a relation Ri in a decomposition of R is in BCNF,

Either test Ri for BCNF with respect to the restriction of F+ to Ri (that is, all FDs in F+ that contain only attributes from Ri)

or use the the following test:

1. for every set of attributes X ⊆ Ri, check that X+ either includes no attribute of Ri-X,( Ex. BC->B, BC is a KEY) or includes all attributes of Ri . (Means X must be a Candidate key)

2. If the condition is violated, the dependency X → (X+ - X ) ∩ Ri holds on Ri, and Ri violates BCNF.

We use above dependency ( in BCNF Algorithm) to decompose Ri

Note: we have seen how to compute X+ in the previous class about FDs.

Page 46: BCNF vs 3NF

46

Testing if a FD violates BCNF Testing if a FD violates BCNF -- ExampleExample

Consider again: R(A,B,C,D), F = {{A}→{B}, {B}→{C}} and the decomposition R1(A,C,D) and R2(A,B)

A+={A,B,C}, B+={B,C}, C+={C}

R2(A,B) is in BCNF because

A+∩R2 ={A,B,C}∩{A,B}={A,B} includes all attributes of R2

B+∩R2 ={B,C}∩{A,B}={B} includes no attributes of R2-{B}

In other words, each attribute (e.g., A) determines everything (it is a key) or nothing (e.g., B).

R1(A,C,D) is not in BCNF because

A+∩R1 = {A,B,C}∩{A,C,D}={A,C} does not include all attributes of R1

Therefore, the dependency {A}→{C} causes a BCNF violation and will be used for further

decomposing R1

Final decomposition: R2(A,B), R3(A,D), R4(A,C)

Page 47: BCNF vs 3NF

47

Conversion to BCNFConversion to BCNF

There is a direct algorithm for converting to BCNF without goingthrough 2NF and 3NF given relation R with FDs F:

1. Eliminate extraneous columns from the LHSs

2. Remove derived FDs

3. Arrange the FDs into groups with each group having the same determinant.

4. For each FD group, make a table with the determinant as the primary key.

5. Merge tables in which one table contains all columns of the other table.

Page 48: BCNF vs 3NF

48

Different BCNF DecompositionsDifferent BCNF Decompositions

The different possible orders in which we consider FDs violating BCNF in the algorithm may lead to

different decompositions

Previous example: R(A,B,C,D), F = {{A}→{B}, {B}→{C}}

Previous BCNF decomposition: R2(A,B), R3(A,D), R4(A,C)

Question: is the decomposition dependency preserving?

Answer: No – we lost the dependency {B}→{C}

Question: Can you obtain a dependency preserving decomposition?

Answer: Yes – in the first decomposition we first applied violation {A}→{B}. If, instead, we apply {B}→{C}

we obtain:

R1=(A,B,D) and R2=(B,C)

We decompose R1=(A,B,D) further using {A} → {B} to obtain:

R3=(A,D) and R4=(A,B)

The final decomposition R2=(B,C), R3=(A,D), R4=(A,B) is dependency preserving.

Page 49: BCNF vs 3NF

49

Third Normal Form: MotivationThird Normal Form: Motivation

We can always obtain a lossless join decomposition in BCNF using the previous

algorithm.

However, there are some situations where

there does not exist a dependency preserving BCNF decomposition, and

efficient checking for FD violation on updates is important

Solution: use the weaker Third Normal Form (3NF).

Allows some redundancy (with related problems)

But FDs can be checked on individual relations without computing a join.

There is always a lossless-join, dependency-preserving decomposition into 3NF. see next

algorithm

Page 50: BCNF vs 3NF

50

Algorithm for 3NF Synthesis Algorithm for 3NF Synthesis

Let R be the initial table with FDs F

Compute the canonical cover Fc of F

S=∅

for each FD X→Y in the canonical cover Fc

S=S∪(X,Y)

if no scheme contains a candidate key for R

Choose any candidate key CN

S=S ∪ table with attributes of CN

Note: unlike the BCNF algorithm where we break the original relation, in 3NF we

synthesize the tables using the FDs in the canonical cover

Page 51: BCNF vs 3NF

51

3NF Example3NF Example

Bank=(branch-name, customer-name, banker-name, office-number)

Functional dependencies (also canonical cover):{banker-name}→{branch-name, office-number}{customer-name, branch-name}→{banker-name}

Candidate Keys: {customer-name, branch-name} or {customer-name, banker-name}

{banker-name}→{office-number} violates 3NF

3NF tables – for each FD in the canonical cover create a table

Banker = (banker-name, branch-name, office-number)

Customer-Branch = (customer-name, branch-name, banker-name)

Since Customer-Branch contains a candidate key for Bank, we are done.

Question: is the decomposition lossless and dependency preserving?

Answer: Yes – all decompositions generated by this algorithm have these properties

Page 52: BCNF vs 3NF

52

BCNF versus 3NF ExampleBCNF versus 3NF Example

An example of not having dependency preservation with BCNF:

street,city → zipcode and zipcode → city

Two keys: {street,city} and {street, zipcode}

Page 53: BCNF vs 3NF

53

BCNF versus 3NF ExampleBCNF versus 3NF Example

Consider an example instance:

Join tuples with equal zipcodes:

Note that the decomposition did not allow us to enforce the constraint that

street,city → zipcode even though no FDs were violated in the

decomposed relations.

Page 54: BCNF vs 3NF

54

Normalization to BCNF QuestionNormalization to BCNF Question

Given this schema normalize into BCNF directly.

Page 55: BCNF vs 3NF

55

Normalization Question 2Normalization Question 2

Given this database schema normalize into BCNF.

New FD5 says that the size of the parcel of land determines what

county it is in.

Page 56: BCNF vs 3NF

56

Normalization to BCNF QuestionNormalization to BCNF Question

Given this schema normalize into BCNF:

R (courseNum, secNum, offeringDept, creditHours, courseLevel, instructorSSN,

semester, year, daysHours, roomNum, numStudents)

courseNum → offeringDept,creditHours, courseLevel

courseNum, secNum, semester, year → daysHours, roomNum, numStudents,

instructorSSN

roomNum, daysHours, semester, year → instructorSSN, courseNum, secNum

Page 57: BCNF vs 3NF

57

MultiMulti--Valued DependenciesValued Dependencies

A multi-valued dependency (MVD) occurs when two independent, multi-valued attributes are present in the schema.

A MVD occurs when two independent 1:N relationships are in the relational schema.

When these multi-valued attributes are flattened into a 1NF relation, we must have a tuple for every combination of the values in the two attributes.

It may seem strange why we would want to do this as it obviously increases the number of tuples and redundancy.

The reason is that since the two attributes are independent it does not make sense to store some combinations and not the others because all combinations are equally valid. By leaving out some combination, we are unintentionally favoring one combination over the other which should not be the case.

Page 58: BCNF vs 3NF

58

MultiMulti--Valued Dependencies ExampleValued Dependencies Example

Employee may:

- work on many projects

- be in many departments

Page 59: BCNF vs 3NF

59

MultiMulti--Valued Dependencies (Valued Dependencies (MVDsMVDs))

A multi-valued dependency (MVD) is a dependency between attributes A, B, C

in a relation such that for each value of A there is a set of values B and a set of

values C where the set of values B and C are independent of each other.

A MVD is denoted as A → → B and A → → C or abbreviated as A → → B | C.

A trivial MVD A → → B occurs when either:

B is a subset of A or

A B = R

Page 60: BCNF vs 3NF

60

MultiMulti--Valued Dependencies RulesValued Dependencies Rules

1) Every FD is a MVD.

If X →Y, then swapping Y ’s between two tuples that agree on X doesn’t

change the tuples.

Therefore, the “new” tuples are surely in the relation, and we know X → → Y.

2) Complementation: If X →→ Y, and Z is all the other attributes, then X

→→ Z.

Note that the splitting rule for FDs does not apply to MVDs.

Page 61: BCNF vs 3NF

61

Fourth Normal Form (4NF)Fourth Normal Form (4NF)

Fourth normal form (4NF) is based on the idea of multi-valued dependencies.

A relation is in fourth normal form (4NF) if it is in BCNF and contains no non-trivial multi-valued dependencies.

Formal definition: A relation schema R is in 4NF with respect to a set of dependencies F if, for every nontrivial multi-valued dependency X → → Y, X is a superkey of R.

If X → → Y is a 4NF violation for relation R, we can decompose R using the same technique as for BCNF:

XY is one of the decomposed relations.

All but Y – X is the other.

Page 62: BCNF vs 3NF

62

Fourth Normal Form (4NF) ExampleFourth Normal Form (4NF) Example

Page 63: BCNF vs 3NF

63

LosslessLossless--join Dependencyjoin Dependency

The lossless-join property refers to the fact that whenever we decompose relations using normalization we can rejoin the relations to produce the original relation.

A lossless-join dependency is a property of decomposition which ensures that no spurious tuples are generated when relations are natural joined.

There are cases where it is necessary to decompose a relation into more than two relations to guarantee a lossless-join.

Page 64: BCNF vs 3NF

64

Fifth Normal Form (5NF)Fifth Normal Form (5NF)

Fifth normal form (5NF) is based on join dependencies.

A relation is in fifth normal form (5NF) if nad only if every nontrivial join dependency is implied by the superkeys of R.

A join dependency (JD) denoted by JD(R1, R2, …, Rn) on relational schema R specifies a constraint on the states r of R. The constraint states that every legal state r of R is equal to the join of its projections on R1, R2, …, Rn. That is for every such r we have:

ΠR1(r) ∗ ΠR2(r) ∗… ∗ ΠRn(r) = r

Page 65: BCNF vs 3NF

65

Fifth Normal Form (5NF) ExampleFifth Normal Form (5NF) Example

Consider a relation Supply (sname, partName, projName). Add the additional constraint that:

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j Then

supplier s also supplies part p to project j

Page 66: BCNF vs 3NF

66

Fifth Normal Form (5NF) ExampleFifth Normal Form (5NF) Example

Note: That only joining all three relations together will get you back to the original

relation. Joining any two will create spurious tuples!

Let R be in BCNF and let R have no composite keys. Then R is in 5NF

Page 67: BCNF vs 3NF

67

4NF and 5NF in Practice4NF and 5NF in Practice

In practice, 4NF and especially 5NF are rare.

4NF relations are easy to detect because of the many redundant tuples.

5NF are so rare than no one really cares about them in practice.

Further, it is hard to detect join dependencies in large-scale designs, so even if they do exist, they often go unnoticed.

The redundancy in 5NF is often tolerable.

The redundancy in 4NF is not acceptable, but good designs starting from conceptual models (such as ER modeling) will rarely produce a non-4NF schema.

Page 68: BCNF vs 3NF

68

Normalization GoalsNormalization Goals

Goal for a relational database design is:

BCNF.

Lossless join.

Dependency preservation.

If we cannot achieve this, we accept one of

Lack of dependency preservation in BCNF

Redundancy due to use of 3NF

Interestingly, SQL does not provide a direct way of specifying functional dependencies

other than superkeys.

Can specify FDs using assertions/triggers, but they are expensive to test

Normal forms are used to prevent anomalies and redundancy. However, just because successive

normal forms are better in reducing redundancy that does not mean they always have to be used.

For example, query execution time may increase because of normalization as more joins become

necessary to answer queries

Page 69: BCNF vs 3NF

69

ER Model and NormalizationER Model and Normalization

When an E-R diagram is carefully designed, the tables generated from the E-R diagram

should not need further normalization.

However, in a real (imperfect) design there can be FDs from non-key attributes of an

entity to other attributes of the entity

E.g. employee entity with attributes department-number and department-address, and

an FD department-number → department-address

Good design would have made department an entity

Normalization and ER modeling are two independent concepts.

• You can use ER modeling to produce an initial relational schema and then use

normalization to remove any remaining redundancies.

• If you are a good ER modeler, it is rare that much normalization will be

required.

• In theory, you can use normalization by itself. This would involve identifying all

attributes, giving them unique names, discovering all FDs and MVDs, then applying

the normalization algorithms.

Since this is a lot harder than ER modeling, most people do not do it.

Page 70: BCNF vs 3NF

70

Universal Relation ApproachUniversal Relation Approach

We start with a single universal relation and we decompose it using the FDs (no ER

diagrams)

Assume Loans(branch-name, loan-number, amount, customer-id, customer-name)

and FDs:

{loan-number} → {branch-name, amount, customer-id}

{customer-id} → {customer-name}

We apply existing decomposition algorithms to generate tables :

Loan(loan-number, branch-name, amount, customer-id)

Customer(customer-id,customer-name)

Page 71: BCNF vs 3NF

71

DenormalizationDenormalization for Performancefor Performance

May want to use non-normalized schema for performance

E.g. displaying customer-name along with loan-number and amount requires join of loan with

customer

Alternative 1: Use denormalized relation containing attributes of loan as well as customer with

all above attributes

faster lookup

Extra space and extra execution time for updates

extra coding work for programmer and possibility of error in extra code

Alternative 2: use a materialized view defined as

loan JOIN customer

Benefits and drawbacks same as above, except no extra coding work for programmer and avoids

possible errors

Page 72: BCNF vs 3NF

72

Other Design IssuesOther Design Issues

Some aspects of database design are not caught by normalization

Examples of bad database design, to be avoided:

Instead of earnings(company-id, year, amount), use

earnings-2000, earnings-2001, earnings-2002, etc., all on the schema (company-id, earnings).

Above are in BCNF, but make querying across years difficult and needs new table each year

company-year(company-id, earnings-2000, earnings-2001, earnings-2002)

Also in BCNF, but also makes querying across years difficult and requires new attribute each year.