1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.

Post on 17-Jan-2016

227 views 0 download

Tags:

Transcript of 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.

1

CSE 480: Database Systems

Lecture 18: Normal Forms and Normalization

2

Functional Dependencies

A functional dependency (FD) takes the form of X Y, where X and Y are subsets of attributes in a relation

What does X Y mean?

Values of attributes X determines the values of attributes Y;

Values of attributes Y depends on the values of attributes X;

Suppose t1 and t2 are two tuples in the relation. If t1 and t2 have the same values for attribute set X, then their values for attribute set Y

must be identical to each other in these two tuples

3

Functional Dependencies

EMP_PRJ(Ssn, Pnumber, Hours, Ename, Pname, Plocation)

{Ssn} {Ename} is a FD

Ename depends on Ssn

{Pnumber} {Pname, Plocation} is a FD

Pname and Plocation depends on Pnumber

Two rows with the same Pnumber must have the same values of Pname and Plocation

{Plocation} {Pnumber} is not a FD

{Ename, Plocation} {Pnumber} is not a FD

4

Functional Dependencies

Graphical Representation of FDs:

FD1: {SSN, Pnumber} {Hours} FD2: {SSN} {Ename} FD3: {PNumber} {PName, PLocation}

5

Functional Dependencies

A relation may contain many functional dependencies– How to derive all of them?

Given a set of functional dependencies of a relation R:

= {AC B, A C, D A}

– Does entail AD BC (i.e., is AD BC also a FD of R)?

6

Inference Rules (Example)

Given AC B, A C, D A }

Does entail AD BC?

1. D A (given in )

2. AD A (augmenting (1) with A)

3. A C (given in )

4. A AC (augmenting (3) with A)

5. AC B (given in )

6. AC BC (augmenting (5) with C)

7. A BC (transitive between (4) and (6))

8. AD BC (transitive between (2) and (7))

7

Normal Forms and Normalization

Functional dependencies can help us analyze whether a relational schema is “good” or “bad”

In relational model, we don’t say that a schema is good/bad. We say it is in 1NF, 2NF, 3NF, etc

– Properties The higher the NF, the stricter the conditions placed on the schema A higher NF relation is also in lower NF but not vice-versa

– A 3NF relation is in 2NF and 1NF (but not in 4NF, 5NF)

Normalization:– The process of decomposing "bad" (lower normal form) relations

by breaking up their attributes into smaller relations

8

First Normal Form

A schema is in 1NF if it permits only atomic (indivisible) attribute values

1NF disallows– composite attributes

– multivalued attributes

The relational model itself prohibits relations that contain composite and multivalued attributes– Therefore, all the schemas in relational model are at least in 1NF

9

Example

Relation is not in 1NF because it has a multivalued attribute (Dlocations)

10

Normalization into 1NF

3 strategies for normalization:– Place the “offending” attributes in a separate relation

DEPARTMENT(Dname, Dnumber, Dmgr_ssn) DEPTLOCATIONS(Dnumber, Dlocation)

– Change Dlocations into Dlocation and modify the primary key DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dlocation)

– If the maximum number of locations per department is 3: DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dloc1, Dloc2, Dloc3)

11

Is 1NF Sufficient?

Key of the relation is the combination of (Dnumber, Dlocation)

Relation is in 1NF, but there are redundancies:– Two rows with the same Dnumber must have the same Dname

and Dmgr_ssn (even though their Dlocations are different)

12

2NF (Motivating Example)

Functional dependencies – {Dnumber, Dlocation} {Dname, Dmgr_ssn} (from primary key)

– {Dnumber} {Dname, Dmgr_ssn}

Consequence: two tuples with same Dnumber but different Dlocation will have same Dname and Dmgr_ssn, which leads to redundancy!

If {Dnumber} {Dname, Dmgr_ssn} is not a FD, then there won’t be a redundancy problem

13

2NF (Motivating Example)

This example suggests that if X Y is a FD, where X is the key, you can’t have X’ Y also a FD of the same table (where X’ is a subset of X), otherwise, there’ll be redundancies in the table

– We say that X Y must be a full FD

{Dnumber, Dlocation} {Dname, Dmgr_ssn} (from primary key)

{Dnumber} {Dname, Dmgr_ssn}

14

Full versus Partial Dependencies

X Y is a full FD if removal of any attribute from X means the FD does not hold any more

X Y is a partial FD if there is a FD X’ Y where X’ is a subset of X

Example:

– {Dnumber, Dlocation} {Dname, Dmgr_ssn} is a partial FD because {Dnumber} {Dname, Dmgr_ssn} is also a FD of the schema

15

Prime versus NonPrime Attributes

Prime attribute: – an attribute that is a member of the candidate key K

– Example (from previous slide): Dnumber, Dlocation

Nonprime attribute:– an attribute that is not a member of any candidate key.

– Example (from previous slide): Dname, Dmgr_ssn

16

2NF Definition

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the key of R

Since {Dnumber, Dlocation} is the key– {Dnumber, Dlocation} {Dname, Dmgr_ssn} is FD of the schema– But {Dnumber} {Dname, Dmgr_ssn} is also a FD of the schema

The non-prime attributes are not fully functionally dependent on the key

So schema is not in 2NF

17

Example

FDs:– {SSN, Pnumber} {Hours, Ename, Pname, Plocation},

– {SSN} {Ename},

– {Pnumber} {Pname, Plocation}

18

Example

– {SSN, PNUMBER} HOURS is a full FD since neither SSN HOURS nor PNUMBER HOURS hold

– But {SSN, PNUMBER} ENAME is a partial dependency since SSN ENAME also holds

19

2NF

– Is {SSN, PNUMBER} {Hours} a full FD? Yes– Is {SSN, PNUMBER} {Ename} a full FD? No– Is {SSN, PNUMBER} {Pname} a full FD? No– Is {SSN, PNUMBER} {Plocation} a full FD? No

Conclusion: The EMP_PROJ relation is not in 2NF 2NF normalization: take the “offending” FDs and create

separate relations

20

Normalizing into 2NF

{SSN, Pnumber} {Hours},

{SSN} {Ename},

{Pnumber} {Pname, Plocation}

21

Is 2NF sufficient?

Key is SSN FDs:

– {SSN} {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn}– {Dnumber} {Dname, Dmgr_ssn}

Is the table in 2NF? – Yes because every non-prime attribute is fully FD on the key

22

Is 2NF sufficient?

Are there still redundancies in the relation? Yes– Two tuples with the same Dnumber have the same Dname and

Dmgr_ssn

What is the “offending” FD that causes redundancy?

23

Is 2NF sufficient?

Functional dependencies:– {SSN} {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn}

– {Dnumber} {Dname, Dmgr_ssn}

Since Dnumber is not a key, you can have two rows with the same Dnumber. Hence their Dname and Dmgr_ssn must be the same => redundancy!

24

3NF

A relation schema R is in third normal form (3NF) if – It is in 2NF and

– There is no non-prime attribute in R that is transitively dependent on the primary key In X Y and Y Z are FDs, with X as the primary key, we consider

Z to be transitively dependent on X only if Y is not a candidate key. If Y is a candidate key, then we do not consider this as a transitive dependency problem

25

Example of 3NF

FDs:– SSN Ename, Bdate, Address, Dnumber– SSN Dnumber– Dnumber Dname, Dmgr_ssn

Dname is transitively dependent on the primary key SSN because SSN Dnumber and Dnumber Dname are FDs of the relation

– Therefore the relation is not in 3NF

26

Third Normal Form

Another way to check whether a relation is in 3NF (without checking for partial and transitive dependencies):

– A relation schema R is in 3NF if whenever a nontrivial FD X A holds, either X is a superkey of R or A is a prime attribute of R

27

3NF

FDs:– SSN Ename, Bdate, Address

– SSN Dnumber

– Dnumber Dname, Dmgr_ssn But Dnumber is not superkey and Dname,Dmgr_ssn are not prime

attributes

Therefore the relation is not in 3NF

Transitive dependency

28

Normalizing into 3NF

Take the “offending” FDs and create separate relations

29

Is 3NF enough to remove redundancy?

FDs: – {Student, Course} Instructor

– Instructor Course

Relation is in 3NF (but there is still redundancy)

Assume every instructor teaches only 1 course

Key is (Student, Course)

No transitive dependency because Course is not a

prime attribute

30

BCNF (Boyce-Codd Normal Form)

A relation schema R is in BCNF if whenever an FD X A holds in R, then X must be a superkey of R

FDs: – {Student, Course} Instructor

– Instructor Course

Relation is not in BCNF because Instructor is not a superkey

31

Achieving BCNF by Decomposition

STUD_COURSE– Key is {Student,Course}

COURSE_INSTRUCT– Key is {Instructor}

– FD: Instructor Course

Loses the FD: {Student, Course} Instructor– But no redundancy

STUD_COURSE COURSE_INSTRUCT

32

Decomposition 1

Problem: decomposition does not result in lossless join (i.e., does not have nonadditive join property)

– i.e., spurious tuples may be generated

33

Decomposition 2

Dependency preserving? No– loses the FD: {Student, Course} Instructor

Lossless join? Yes

34

Decomposition 3

Dependency preserving? No– loses the FD: {Student, Course} Instructor

Lossless join? No

35

Summary

1st normal form– no composite/multivalued attributes in relations

2nd, 3rd, and Boyce-Code normal forms– Eliminate redundancies based on FDs

More normal forms (see textbook)– 4th : deal with multivalued dependencies

– 5th : deal with join dependencies