Normalization. Introduction Badly structured tables, that contains redundant data, may suffer from...

31
Normalization

Transcript of Normalization. Introduction Badly structured tables, that contains redundant data, may suffer from...

Normalization

Introduction

Badly structured tables , that contains redundant

data, may suffer from Update anomalies :

• Insertions

• Deletions

• Modification

Bad structure may occur due to :

• Errors in the original ER diagram.

• Or in the process of translating ER models into

tables.

Database Tables and Normalization

• Normalization is a technique to support the design of databases based on relational model

• Normalization helps reduce data redundancies and helps eliminate the data anomalies.

• Normalization works through a series of stages called normal forms:– First normal form (1NF)– Second normal form (2NF)– Third normal form (3NF)

• The highest level of normalization is not always desirable.

© Pearson Education Limited, 2004 4

Data redundancy and update anomalies

Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables.

Staff(staffNo , name, position, salary, branchNo)Primary key staffNoForeign key branchNo references Branch(branchNo)

Branch(branchNo , branchAddress , TelNo)Primary key branchNoForeign key branchNo references Branch(branchNo)

StaffBranch(staffNo , name, position, salary, branchNo, branchAddress,TelNo)

Primary key staffNo

Data redundancy and update anomalies

Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables

© Pearson Education Limited, 2004 6

Data redundancy and update anomalies

7

Data redundancy and update anomalies

© Pearson Education Limited, 2004 8

What is the problem ?

• StaffBranch table has redundant data; the details of

a branch (branchAddress and telNo) are repeated

for every member of staff located at that branch.

• In contrast, in Branch table the branch information

appears only once for each branch and only the

branch number (branchNo) is repeated in the Staff

table, to represent where each member of staff is

located.

• Tables having redundant data may suffer from

update anomalies (insertion , deletion or

modification anomalies)

© Pearson Education Limited, 2004 9

Insertion anomalies

• How to insert the details of a new member of staff

at branch B002 into the StaffBranch table ?

• Any problem with the tables separated ?

© Pearson Education Limited, 2004 10

Insertion anomalies (1)

To insert the details of a new member of staff at branch B002

into the StaffBranch table , we must enter the correct details of

Branch B002 so that the branch details are consistent with values

for branch B002 in other records of StaffBranch table.

No problem with the tables separated, because no need to enter

the details , just the foreign key is enough.

© Pearson Education Limited, 2004 11

Insertion anomalies(2)

How to insert the details of a new branch that

currently has no member of staff into StaffBranch table

?

Any problem with the tables separated ?

© Pearson Education Limited, 2004 12

Insertion anomalies(2)

To insert the details of a new branch that currently has no

member of staff into StaffBranch table , it is necessary to enter

null into the staff-related column , such as StaffNo (the

primary key). This violates entity integrity and not allowed

No problem with the tables separated. We just enter the new brach

in the branch table ?

© Pearson Education Limited, 2004 13

Deletion anomalies

What happen if we delete a record from the StaffBranch table

that represent the last member of staff located at a branch ?

Any problem with the tables separated, Why ?

© Pearson Education Limited, 2004 14

Deletion anomalies

If we delete a record from the StaffBranch table that

represent the last member of staff located at a branch,

details about the branch are also lost from the database.

No problem with the tables separated, because branch

records are stored separately

© Pearson Education Limited, 2004 15

Modification anomalies

What if we change of the value of one of the

columns of a particular branch in the

StaffBranch table (ex: telephone number)?

© Pearson Education Limited, 2004 16

Modification anomalies

We must update the records of all staff located

at that branch

© Pearson Education Limited, 2004 17

First normal form (1NF)

Definition

A table in which the intersection of every column

and record contains only one value.

Only 1NF is critical in creating appropriate tables

for relational databases. All subsequent normal

forms are optional.

However to avoid update anomalies, proceed to

3NF

© Pearson Education Limited, 2004 18

Problem : Column telNos does not comply with 1NF, because there are multiple values at the intersection of the telNos column with every record.

How to solve the problem ?

© Pearson Education Limited, 2004 19

Solution : create a separate table BranchTelephone to hold the telephone numbers of branches , by removing telNo column from Branch table

NOTE : Primary key of the new table BranchTelephone table is the new telNo column

© Pearson Education Limited, 2004 20

Functional dependency

•The particular relationships that we show between the columns of a table are more formally referred to as functional dependencies.

•Functional dependency describes the relationship between columns in a table.

Functional dependency• Functional dependency in a table indicate

how columns relate to one another.• Column B is functionally dependent on

column A (A→B) = if we know the value of A , we find only one value of B in all records that has this value of A.

• We say that B is worked out from A• However, for a given value of B there

may be several values of A

© Pearson Education Limited, 2004 22

Problem : TempStaffAllocation table is not in 2NF, why ?

No primary-key columns

primary-key columns

Functional dependency

© Pearson Education Limited, 2004 23

Second normal form (2NF)A table in 2NF is one that is :

1NF Each non-primary-key column can be worked out from the values in all the columns that make up the primary key (primary-key columns).

This means every non-primary-key column is fully functional dependent on the primary key.

Fully means dependent on A but not on any proper subset of A

© Pearson Education Limited, 2004 24

Second normal form (2NF)

NB: 2NF only applies only to tables with composite primary keys ( primary key composed of 2 or more columns).

NB: 1NF table with a single column primary key is automatically in at least 2NF.

© Pearson Education Limited, 2004 25

Functional dependencyBranchAddress can be worked out from

BranchNo (part of the primary key). Every time B002 appears in branchNo column , the same address ”City center ……..” appears in branchAddress . The reverse is true. (partial dependency)

Name and position can be worked out from staffNo (part of the primary key).

Every time S455 appears in staffNo column , the name “Ellen Layman” and position “assistant” appears in name and position columns ( partial dependency)

© Pearson Education Limited, 2004 26

Functional dependency

hoursPerWeek can be worked only out from both staffNo and BranchNo ( the whole primary key).

•As a partial dependency exists on the primary key, the table is not 2NF .

•2NF is achieved by removing partial dependency. How ?

© Pearson Education Limited, 2004 27

Converting TempStaffAllocation table to 2NF

© Pearson Education Limited, 2004 28

Third normal form (3NF)

DefinitionA table that is in 1NF and 2NF and in which all non-primary-key column can be worked out from only the primary key column(s) and no other columns.

Is StaffBranch table 3NF?Draw the dependency arrows ?

© Pearson Education Limited, 2004 30

StaffBranch table is not in 3NF

© Pearson Education Limited, 2004 31

Converting the StaffBranch table to 3NF