IT 20303
• The Relational DBMS
• Section 05
Relational Database Theory
• Normalization for Logical Database Design
Relational Database Theory
• Normalization
– Process of analyzing a grouping of data items
• Based on inherent characteristics
• Often applied to existing files or databases
Relational Database Theory
• Normalization
– Principles
• Data items belong together in a logical group
• Group of items can be identified by own unique identifier
Relational Database Theory
• Normalization
– Data in the group describes one, and only one, thing
– A Bottom-Up approach
Relational Database Theory
• Why Normalize
– Avoid update anomalies
• Nasty side effects
– Minimize storage of redundant data
– Support simpler logic for manipulating data
Relational Database Theory
• Why Not Normalize
– Data is never (very rarely) updated
– Data warehouse system is seldom normalized
Relational Database Theory
• Sample Data Not Normalized
WARD NAME
WARD TYPE NO. OF BEDS
SENIOR NURSE
PATINET NO
PATIENT NAME
DATE OF BIRTH
Liston Orthopedic 6 J Bryan 45812 D Carter 21/02/65
71384 R Willis 08/10/46
69355 G Barnes 17/06/41
Godlee General 10 V Fox 52217 M Brown 21/02/35
10823 R Willis 12/03/54
Relational Database Theory
• How to Normalize Data using Functional Dependencies– Definition of Functional Dependency
• Given a relation R, attribute Y of R is functionally dependent on attribute X of R, if and only if each X value in R has associated with it precisely one Y-value in R (at any one time)
Relational Database Theory
• Y of R is Dependent on X of R• X (-->)functionally determines Y
WARD NAME
WARD TYPE NO. OF BEDS
SENIOR NURSE
PATINET NO
PATIENT NAME
DATE OF BIRTH
Liston Orthopedic 6 J Bryan 45812 D Carter 21/02/65
Liston Orthopedic 6 J Bryan 71384 R Willis 08/10/46
Liston Orthopedic 6 J Bryan 69355 G Barnes 17/06/41
Godlee General 10 V Fox 52217 M Brown 21/02/35
Godlee General 10 V Fox 10823 R Willis 12/03/54
X Y
Relational Database Theory
• Functional Dependency Diagram of Hospital Ward Example
Patient Name
Date of Birth
Patient No
Ward NameWard Type
No of Beds Senior Nurse
Relational Database Theory
• Table structure based on FD Diagram
WARD NAME
WARD TYPE
NO OF BEDS
SENIOR NURSE
Liston Orthopedic 6 J Bryan
Godlee General 10 V Fox
WARD
PATIENT NO
PATIENT NAME
DATE OF BIRTH
WARD NAME
45812 D Carter 21/2/65 Liston
71384 R Willis 8/10/46 Liston
52217 M Brown 21/2/85 Godlee
PATIENT
Relational Database Theory
• Normalization using Codd’s Rules– Origin
• Early enthusiasts wanted to use relational theory
• Sought rules for structuring data in relational model
Relational Database Theory
• Normalization using Codd’s Rules– Codd and contemporaries developed rules for
“Normal Forms”• 1NF• 2NF• 3NF
– Normal levels to do in database design• Boyce/Codd NF – 3.5NF• 4NF• 5NF
Relational Database Theory
• Customer-Order-Line Item Example– Assume an existing order-entry program
and data file:
ORD-NO CUST-NO CUST-NAME
ADDR PROD-NO PROD-NAME
UNIT-PRC QTY
TOT-ITEM-PRC … PROD-NO PROD-NAME
UNIT-PRC QTY TOT-ITEM-PRC
Relational Database Theory
• 1NF – Break out repeating groups
ORD-NO CUST-NO CUST-NAME
ADDR PROD-NO PROD-NAME
UNIT-PRC QTY
TOT-ITEM-PRC … PROD-NO PROD-NAME
UNIT-PRC QTY TOT-ITEM-PRC
ORD-NO CUST-NO CUST-NAME
ADDR
ORD-NO PROD-NO PROD-NAME UNIT-PRC QTY TOT-ITEM-PRC
ORD-NO PROD-NO PROD-NAME UNIT-PRC QTY TOT-ITEM-PRC
ORDER
ORDER
LINEITEM
Relational Database Theory
• 2NF- Break out attributes dependent on part of the primary key
ORD-NO CUST-NO CUST-NAME
ADDR
ORD-NO PROD-NO PROD-NAME UNIT-PRC QTY TOT-ITEM-PRC
ORD-NO PROD-NO QTY TOT-ITEM-PRC
ORDER
LINEITEM
LINEITEM
PROD-NO PROD-NAME UNIT-PRC
PRODUCT
Relational Database Theory
• 3NF- Break out attributes wholly dependent on another key
ORD-NO CUST-NO CUST-NAME
ADDR
ORD-NO PROD-NO QTY TOT-ITEM-PRC
ORDER
LINEITEM
PROD-NO PROD-NAME UNIT-PRC
PRODUCT
ORD-NO CUST-NO
ORDER CUST-NO CUST-NAME
ADDR
CUSTOMER
Relational Database Theory
• Rules for 1NF, 2NF, & 3NF– 1NF
• Break out repeating groups into a separate entity– 2NF
• Break out attributes that are dependent on part of the primary key into a separate entity
• Called Partial Dependency– 3NF
• Break out attributes that are wholly dependent on another key (not PK) into a separate entity
• Called Transitive Dependency
Relational Database Theory
• Normalization– A relation R is in 3rd Normal Form (3NF) if
and only if the non-key attributes of R (if any) are:• Mutually independent, and• Fully dependent on the primary key of R
Relational Database Theory
• Normalization Cont’d– A relation is in 3NF if all the attributes are
functionally dependent• On the Key• On the Whole Key, and• On Nothing but the Key
–(So Help Me Codd)
Relational Database Theory
• Reconcile differences between the Data Model and Normalized Data Structures– Data model and normalized data structures must be
reconciled– Discard data items from old files that are no longer
needed• Calculation fields• Redundant fields
– Resolve discrepancies in data item names– Ensure that new fields are really necessary
• Use standard naming conventions
Relational Database Theory
• Example 01:
– What happens when a part has more than four suppliers?
– What happens when a supplier is dropped?– How do you query the parts with two or more
suppliers?– Normalized Table:
PART-NO SUPP-1 SUPP-2 SUPP-3 SUPP-4
WDGT01 XYZZY FOOBAR NULL NULL
PART-NO SUPP
WDGT01 XYZZY
WDGT01 FOOBAR
Relational Database Theory
• Example 02: Normalize this table
PART-NO SUPP PART_DESC SUPP-ADDRESS
WDGT01 XYZZY Blue Widget 123 Bluejay Way
WDGT01 FOOBAR Blue Widget 544 Old Orchard
PART-NO PART_DESC
WDGT01 Blue Widget
PART-NO SUPP
WDGT01 XYZZY
WDGT01 FOOBAR
SUPP SUPP-ADDRESS
XYZZY 123 Bluejay Way
FOOBAR 544 Old Orchard
End Section
Relational Database Theory
• Multiply ways to Normalize Data– Normalization can be accomplished in different
ways• Well-formed E-R model is normalized• Functional dependencies• Codd’s Rules for 1NF, 2NF, & 3NF
– Discrepancies indicate something is missing or changed
– One approach validates or checks another approach
Relational Database Theory
• Impact of Normalization– Improve the integrity of data
• Purpose is to eliminate update anomalies– Minimize storage of redundant data– Reduce the complexity of programming logic
• Emphasis now is on maintainability, simplicity of program• Normalized data can minimize complexity of code that
manipulates the data– Enhance the stability, “goodness” of database design
• Normalized data tends to be easier to understand• Normalized data can be used by many different
applications more easily
Relational Database Theory
• Impact of Normalization on Performance– Concern that a large number of tables-and
table joins-will result in poor performance• Join can be a very expensive operation• Test to determine frequency of joins,
number of tables joined–After database is created and
available
Relational Database Theory
• Impact of Normalization on Performance Cont’d– Requirements for application performance,
response time dictate corrective actions– Performance addressed in section on
physical database design• There are alternatives to de-normalizing
data to improve performance
Relational Database Theory
• Recommendations for Data that is Updated– First Normalize– Don’t be dismayed by too many tables
• Normalization increases number of tables but improves logic
– Normalization is a helpful logical database design technique…for any DBMS
Relational Database Theory
• Objective of the design process is a “Good” design– The logical database design process
• Is well understood• Uses complementary techniques• Can be automated with CASE tools
Relational Database Theory
• Objective of the design process is a “Good” design cont’d– A “Good” database design
• Contains all the important entities and data items• Has stable primary keys• Identifies clearly all relationships• Has table structures in 3NF• Is understood by designers and users• Accurately models the real world, as described
in the requirements
Relational Database Theory
• Questions?
Top Related