Database Normalization

10
CHAPTER 4: NORMALIZATION Prof. Erwin M. Globio, MSIT 4 - 1 Chapter Objectives At the end of the chapter, you should be able to: understand the purpose of normalization; perform first, second and third normalization; merging relations (view integration); transforming E-R diagrams to relations. Essential Reading Modem Database Management (4th Edition), red R. Mcfadden & Jeffrey A. Hoffer (1994), Benjamin/Cummings.[Chapter 6, page 199 - 237] Useful Websites to learn Database and Programming: http://erwinglobio.wix.com/ittraining http://ittrainingsolutions.webs.com/ http://erwinglobio.sulit.com.ph/ http://erwinglobio.multiply.com/

description

 

Transcript of Database Normalization

Page 1: Database Normalization

CHAPTER 4: NORMALIZATION

Prof. Erwin M. Globio, MSIT 4 - 1

Chapter Objectives At the end of the chapter, you should be able to:

understand the purpose of normalization;

perform first, second and third normalization;

merging relations (view integration);

transforming E-R diagrams to relations.

Essential Reading

Modem Database Management (4th Edition), red R. Mcfadden & Jeffrey A. Hoffer (1994),

Benjamin/Cummings.[Chapter 6, page 199 - 237]

Useful Websites to learn Database and Programming:

http://erwinglobio.wix.com/ittraining

http://ittrainingsolutions.webs.com/

http://erwinglobio.sulit.com.ph/

http://erwinglobio.multiply.com/

Page 2: Database Normalization

DB212 CHAPTER 4: NORMALISATION

4 - 2 Prof. Erwin M. Globio, MSIT

4.1 Basic Concepts

Normalization is a process for converting complex data structures into simple, stable data

structures.

Why normalisation is necessary ?

The database design must be efficient (performance-wise).

The amount of data should be reduced if possible.

The design should be free of update, insertion and deletion anomalies.

The design must comply with rules regarding relational databases.

The design has to show pertinent relationship between entities.

The design should permit simple retrieval, simplify data maintenance and reduce the need

to restructure data.

Figure 4-1: Steps in normalisation

Table with

repeating group

First normal

form

Second

normal form

Third normal

form

Fourth normal

form

Boyce-codd

Normal form

Fifth normal

form

Remove remaining

anormalies

Remove multivalued

dependencies

Remove remaining

anormalies resulting from

functional dependencies

Remove transitive

dependencies

Remove partial

dependencies

Remove

Repeating group

Page 3: Database Normalization

DB212 CHAPTER 4: NORMALISATION

Prof. Erwin M. Globio, MSIT 4 - 3

4.1.1 Functional Dependency

Normalisation is based on the analysis of functional dependence. A functional dependency is a

particular relationship between two attributes. For any relation R, that attribute B is

functionally dependent on attribute A if, for every valid instance of A, that value of A

uniquely determines then value of B. This is usually represented by an arrow, as follows:

A --> B

An attribute may be functionally dependent on two (or more) attributes, rather than on a single

attribute. For example, in the following relation:

ORDER (ORDER-NO, PART-NO, NO-ORDERED, PART-DESC, QUOTED-PRICE)

ORDER-NO, PART-NO --> NO-ORDERED, PART-DESC, QUOTE-PRICE

In this case, the attribute on the left-hand side of the arrow is called a determinant.

For examples: CUST-NO - - > CUST-NAME, ADDRESS, COMPANY

INVOICE-NO - - > INVOICE-DATE, CUST-NO, ORDER-NO

CUST-NO and INVOICE-NO examples of determinants.

4.1.2 Keys

An attitude (or field), K, is the primary key of a table if:

All columns (all the fields in the table) are functionally dependent on K.

Each value is unique.

If K is a composite/concatenate key then it must comply with the following conditions:

No portion of the key should be a primary key.

All attributes that make up the key are not null.

4.2 Steps in Normalisation

First normal form (1NF). Any repeating groups have been removed, so that there is a

single value at the intersection of each row and column of the table.

Second normal form (2NF). Any partial functional dependencies have been removed.

Third normal form (3NF). Any transitive dependencies have been removed.

Note: If a relation meets the criteria for 3NF, it also meets criteria for 2NF and 1NF. Most

design problems can be avoided if the relations are in 3NF.

Page 4: Database Normalization

DB212 CHAPTER 4: NORMALISATION

4 - 4 Prof. Erwin M. Globio, MSIT

4.2.1 First Normal Form

Example:

UNF

Order-no Date

Part-no

Qty-ordered

Part-description

Quote-price

Cust-no

Cust-name

Cust-address

INF

Order no Order-no

Date Part-no

Cust-no Qty-ordered

Cust-name Part-description

Cust-address Quote-price

4.2.2 Second Normal Form

A relation is in 2NF if:

It is in INF, and

all non-key attributes are fully functionally dependent on the primary key and not on only

a portion of the primary key.

Steps to transform into 2NF

Identify all functional dependencies in INF.

Make each determinant the primary key of a new relation.

Place all attributes that depend on a given determinant in the relation with that

determinant that depend on a given determinant in the relation with that determinant as

non-key attributes.

All the functional dependencies in this case are:

ORDER-NO - - > DATE, CUST-NO, CUSTNAME, CUST-ADDRESS

PART-NO - - > PART-DESC

Note : In this case, we say that PART-NO is only partially functional dependent on the key.

(ORDER-NO, PART-NO) - - > QTY-ORDERED, QUOTE-PRICE

Page 5: Database Normalization

DB212 CHAPTER 4: NORMALISATION

Prof. Erwin M. Globio, MSIT 4 - 5

The partial functional dependency in

ITEM (ORDER-NO, PART-NO, QTY-ORDERES, QUOTE-PRICE)

creates redundancy in that relation, which results in anomalies when the table is updated.

Insertion anomaly. To insert a row for the ITEM table, we must provide the part

description information too.

Deletion anomaly. If we delete a row for the ITEM table, we may lose some PART information.

Modification anomaly. If a PART's description changes, we must record the change in

multiple rows in the ITEM table.

Example:

1NF

Order-no Order-no

Date Part-no

Cust-no Qty-ordered

Cust-name

Part-description

Cust-address Quoted-price

2NF

Order-no Order-no

Date Part-no

Cust-no

Quoted-price

Cust-name

Quoted-price

Cust-address

Part-no

Part-description

Note: A relation that is in first normal firm will be in second normal form if any one

of the following conditions apply:

The primary key consists of only one attitude (such as the attribute ORDER-NO in

ORDER).

No nonkey attributes exist in the relation.

Every nonkey attribute is functionally dependent on the full set of primary key attributes.

4.2.3 Third Normal Form

A relation is in 3NF if:

It is in 2NF, and

no transitive dependencies.

Transitive dependencies are when A - - > B - - > C. Thus it can be split into A - - > B and B - -

> C.

Page 6: Database Normalization

DB212 CHAPTER 4: NORMALISATION

4 - 6 Prof. Erwin M. Globio, MSIT

Steps to transform into 3NF:

Create one relation for each determinant in the transitive dependency.

Make the determinants the primary keys in their respective relations.

Include as non-key attributes those attributes that depend on the determinant.

In the functional dependency:

ORDER ( ORDER-NO, DATE, CUST-NO, CUST-NAME, CUST ADDRESS)

there is a transitive dependency. That is, one of the non-key attribute can be used to determine

other attributes.

CUST-NO - - > CUST-NAME, CUST-ADDRESS

Therefore, there are update anomalies in this table.

Insertion anomaly. A new customer is found and cannot be entered until it has made an

order.

Deletion anomaly. If an order-no is deleted from the ORDER table, we may lose some

CUSTOMER information.

Modification anomaly. If the address of a customer changes, we have to update all the

associated past order records.

To remove such anomalies, we can decompose the ORDER relation into two relations.

Example:

2NF

Order-no Date

Cust-no

Cust-name

Cust-address

Order-no

Part-no Qty-priced

Quoted-price

Part-no Part-description

3NF

Order-no Date

Cust-no

Cust-no Cust-name

Cust-address

Order-no Part-no

Qty-ordered

Quoted-price

Part-no Part-description

Notice that CUST-NO is the primary key of a new relation and is a foreign key in the ORDER

relation. A foreign key is an attribute that appears as a nonkey attribute in one relation and as a

primary key attribute in another relation.

Therefore the final result is

ORDER (ORDER-NO, DATE, CUST-NO)

ITEMS (ORDER-NO, PART-NO, NO-ORDERED, QUOTED-PRICE)

CUSTOMER (CUST-NO, CUST-NAME, CUST-ADDRESS)

PART (PART-NO, PART-DESC)

Page 7: Database Normalization

DB212 CHAPTER 4: NORMALISATION

Prof. Erwin M. Globio, MSIT 4 - 7

4.3 Transforming E-R Diagram to Relations

4.3.1 Represent Entities

Each entity type in an E-R diagram is transformed into a relation. The primary key of the entity type becomes the primary key of the corresponding relation.

Taking the following E-R diagram as an example,

Cust-no

Cust-name

Address

Qty-ordered

Quoted-price

Order-no Part-no

Date Part-description

Cust-no

the ORDER entity is transform into the following relation :

ORDER ( ORDER-NO, DATE, CUST-NO )

4.3.2 Represent Relationships

Binary 1:N Relationship

A binary one-to-many (1:N) relationship in an E-R diagram is represented by adding the

primary key attribute of the entity on the one-sided of the relationship, as a foreign key.

Thus the CUSTOMER and ORDER relations in the E-R diagram are then transformed

into

ORDER ( ORDER-NO, DATE, CUST-NO )

CUSTOMER ( CUST-NO, CUST-NAME, CUST-ADDRESS )

CUST-NO is a foreign key in the ORDER relation but a primary key in the CUSTOMER

relation.

CUSTOMER

PLACES

ORDER

CONSISTS

PART

Page 8: Database Normalization

DB212 CHAPTER 4: NORMALISATION

4 - 8 Prof. Erwin M. Globio, MSIT

Binary M:N Relationship

For a binary any-to-many relationship between two entity types A and B, create a separate

relation C. The primary key of this C relation is the composite key consisting of the

primary keys for entities A and B.

Thus, in the entities types PART and ORDER, a relation called ORDER-LINE is created

which consists of the two primary keys in the PART and ORDER as well as the attributes QTY-ORDERED, QUOTED-PRICE.

That is,

ORDER-LINE ( ORDER-NO, PART-NO, QTY-ORDERED, QUOTED-PRICES)

Unary Relationships

In a unary relationship (recursive relationship), the primary key of that relation is the

same as for the entity type. A foreign key is added to the relation that references the

primary key values. This is known as the recursive foreign key.

Example:

EMPLOYEE (EMP-ID, NAME, BIRTHDATE, MANAGER-ID)

4.4 Merging Relations

As part of the logical design process, normalised relations may have been created from a

number of separate E-R diagrams and other user views. Some of these relations may be

redundant and can be integrated with other relations (view integration).

Example:

Suppose that modelling a user view results in the following 3NF relation:

STUDENT1 (STUDENTID, NAME, ADDRESS, PHONE, GUARDIAN).

Modelling a second user view might result in the following relation:

STUDENT2 (STUENTID, NAME, ADDRESS, DEPT)

Since these two relations have the same primary key (STUDENTID), they describe the same

entity and may be merged into one relation. Therefore the result of the merging is:

STUDENT (STUDENTID, NAME, ADDRESS, PHONE, GUARDIAN, DEPT)

This reduces duplication of NAME and ADDRESS.

Page 9: Database Normalization

DB212 CHAPTER 4: NORMALISATION

Prof. Erwin M. Globio, MSIT 4 - 9

4.5 Review Questions

1. For each of the following relations, indicate the normal form for that relation. If

the relation is not in 3NF, normalise it.

(Note: Functional dependencies are shown where appropriate.)

a. CLASS (COURSE NO, SECTION NO)

b. CLASS (COURSE NO, SECTION NO, ROOM)

c. CLASS (COURSE NO, SECTION NO, ROOM, APACITY)ROOM - - >

CAPACITY

d. CLASS (COURSE NO, SECTION NO, COURSE NAME, ROOM,

CAPACITY)ROOM - - > CAPACITYCOURSE NO - - > COURSE NAME

2. The table below contains sample data for parts and for vendors.

Part No. Description Vendor Name Address Unit Cost

123 Logic Chip Fast Chips

Smart Chips

Cupertino

Phoenix

10.00

8.00

5678 Memory chip Fast Chips

Quality Chips

Smart Chips

Cupertino

Austin

Phoenix

3.00

2.00

5.00

a. Convert this table to a relation (named PART SUPPLIER) in first normal form.

b. List the functional dependencies in PART SUPPLIER and identify a candidate key.

c. Identify each of the following: an insert anomaly, a delete anomaly, and

modification anomaly in the above 1NF relation.

d. Convert the relation to 3NF.

3. When integrating relations, the database analyst must understand the meaning of data and

try to resolve problems arising form synonyms, homonyms relations. Illustrate with

examples (quoting from your project), how such problems can be resolved.

Page 10: Database Normalization

DB212 CHAPTER 4: NORMALISATION

4 - 10 Prof. Erwin M. Globio, MSIT

ROOM

PATIENT

PHYSICIAN

ITEM

LOCATION Accom

Is billed for

Description

Item code

May be Assigned to Patient no

Location

Extension

(Other patient attributes)

Attenda

Patient name

Charge

Patient

address

Procedure

Physician

phone Physician ID