Post on 02-Jan-2016
Chapter 12
Further Normalization I
1NF, 2NF, 3NF, BCNF
Topics in this Chapter
• Nonloss Decomposition and Functional Dependencies
• First, Second, and Third Normal Forms• Dependency Preservation• Boyce/Codd Normal Form• A Note on Relation-Valued Attributes
Normalization and Database Design
• The “normal forms represent stages in achieving a more desirable design.
(“More desirable” means being more robust, having greater integrity.)
• First normal form ( 1NF ) is what we
achieved by specifying that relations contain single valued attributes only (each tuple has exactly one value for each attribute).
• So, relations are always in (at least) 1NF.
Normalization and Database Design
• Additional constraints that produce “further normalization” lead to one of the other designations ( 2NF, 3NF, etc.)
• Each “higher” normal form (2nd, 3rd, etc.)
includes the previous ones—i.e., to be in “third normal form” means that the data is also in 2nd and in 1st.
Normalization
• Normalized and 1 NF are the same thing; Frequently “normalized” is used to refer (incorrectly) to 3NF
• Normalization helps control redundancy• Normalization is reversible; i.e. nonloss,
or information preserving• Six normal forms are discussed: 1
through 5, and Boyce-Codd Normal Form (BCNF), which is an improvement on 3NF
First Normal Form
• A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute
• In this way, relvars are always in 1NF• A relvar in 1NF may display functional
dependencies other than those emanating from the primary key
• Such non-primary-key dependencies promote a miasma of update anomolies
S+------+-------+--------+--------+| snum | sname | status | city |+------+-------+--------+--------+| S1 | Smith | 20 | London || S2 | Jones | 10 | Paris || S3 | Blake | 30 | Paris || S4 | Clark | 20 | London || S5 | Adams | 30 | Athens |+------+-------+--------+--------+
SP+------+------+------+| snum | pnum | qty |+------+------+------+| S1 | P1 | 300 || S1 | P2 | 200 || S1 | P3 | 400 || S1 | P4 | 200 || S1 | P5 | 100 || S1 | P6 | 100 || S2 | P1 | 300 || S2 | P2 | 400 || S3 | P2 | 200 || S4 | P2 | 200 || S4 | P4 | 300 || S4 | P5 | 400 |+------+------+------+
P+------+-------+-------+--------+--------+| pnum | pname | color | weight | city |+------+-------+-------+--------+--------+| P1 | Nut | Red | 12.0 | London || P2 | Bolt | Green | 17.0 | Paris || P3 | Screw | Blue | 17.0 | Rome || P4 | Screw | Red | 14.0 | London || P5 | Cam | Blue | 12.0 | Paris || P6 | Cog | Red | 19.0 | London |+------+-------+-------+--------+--------+
The suppliers and parts database
Just “looks” right
--because it is.
Satisfies all normal forms.
+------+--------+------+------+| snum | scity | pnum | qty |+------+--------+------+------+| S1 | London | P1 | 300 || S1 | London | P2 | 200 || S2 | Paris | P1 | 300 || S2 | Paris | P2 | 400 || S3 | Paris | P2 | 200 || S4 | London | P2 | 200 || S4 | London | P4 | 300 || S4 | London | P5 | 400 |+------+--------+------+------+
The table “SCP”
recording supplier city in SCP rather than in S
redundancy!
update problems:
how to change S4’s city (in three places)
how to record the city of a new supplier for whom there are no shipments?primary key
Second Normal Form
• A relation violates 2NF if a non-key field is a fact about a subset of a key.
• A relation satisfies 2NF (is in 2NF) if it is in
1NF and every non-key attribute is irreducibly dependent on the primary key.
(i.e., dependent on the entire primary key)
Second Normal Form
• A relvar is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key
• (Assumes only one candidate key)• A relvar in 2NF is less susceptible to
update anomalies, but may still exhibit transitive dependencies
• Both attributes in a transitive dependency are irreducibly implied by the primary key, and each implies the other
+--------+-----------+--------+------------+| Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------|| A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting || B120 | Gomez | 20 | Sales || B211 | Davis | 20 | Sales || A227 | Greenberg | 40 | Production || C340 | Brown | 40 | Production || C389 | Lopez | 40 | Production || C395 | Clark | 40 | Production || A502 | Edwards | 20 | Sales || A616 | Scott | 40 | Production || A700 | Sanyo | 60 | Delivery || A722 | Adams | 20 | Sales |+--------+-----------+--------+------------+
The table “Employees”
In 1NF, but not good
REDUNDANCY!
And
update problems:
change name of a department? (multiple updates required)
eliminate employee Sanyo? (what is the name of Dept 60?)
Update Anomalies
• “Update anomalies” include three operations:• An INSERT anomaly occurs when the user
wishes to record a subordinate fact that is not dependent on the primary key (e.g., recording a supplier location before the supplier supplies a part)
• A DELETE anomaly, conversely, may delete the location inadvertently
• An UPDATE anomaly occurs when many updates are required to record a simple fact
+--------+-----------+--------+------------+| Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------|| A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting || B120 | Gomez | 20 | Sales || B211 | Davis | 20 | Sales || A227 | Greenberg | 40 | Production || C340 | Brown | 40 | Production || C389 | Lopez | 40 | Production || C395 | Clark | 40 | Production || A502 | Edwards | 20 | Sales || A616 | Scott | 40 | Production || A700 | Sanyo | 60 | Delivery || A722 | Adams | 20 | Sales |+--------+-----------+--------+------------+
The table “Employees”
a transitive dependency
Emp_Id Dept#
Dept# DeptName
Emp_Id transitively determines DeptName
Emp_Id DeptName
Mutually independent keys
Non-key attributes are “mutually independent” if no such key is functionally dependent on any combination of the others (assuming only one candidate key).
Mutually independent => no transitive dependencies, such as
Emp# → Dept# Dept# → DeptName
Third Normal Form
• A relation violates 3NF if some non-key attribute is a fact about another non-key attribute.
• A relation is in 3NF if it is in 2NF and the non-key
attributes are mutually independent.
• A relation satisfies 3NF if it is in 2NF (and
therefore also in 1NF) and every attribute is either part of the key or provides a fact about the key (all of it) and nothing else.
Third Normal Form
• A relvar is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key
• (Assumes only one candidate key)• The process of normalization is a series of
projections that eliminate complex functional dependencies
• Such projections must be able to be recombined via JOIN to form the original relvar
Third Normal Form
A table is in 3NF if every column is either the key, or part of the key, or a fact about
the key, the whole key, and nothing but the key.
Third Normal Form
• A relvar is in 3NF if and only if the nonkey attributes are both mutually independent and irreducibly dependent on the primary key
• A relvar is in 3NF if and only if, for all time, each tuple consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way
Nonloss Decomposition and Functional Dependencies
• Normalization uses a process of projection to decompose relvars
• Recomposition is a process of joins• The decomposition of relvar R into
projections R1…Rn is nonloss if R = the join of R1…Rn
• The normalization procedure can be seen as a method for eliminating functional dependencies that do not emanate from a candidate key
+------+--------+------+------+| snum | scity | pnum | qty |+------+--------+------+------+| S1 | London | P1 | 300 || S1 | London | P2 | 200 || S2 | Paris | P1 | 300 || S2 | Paris | P2 | 400 || S3 | Paris | P2 | 200 || S4 | London | P2 | 200 || S4 | London | P4 | 300 || S4 | London | P5 | 400 |+------+--------+------+------+
“SP”+------+--------+| snum | scity |+------+--------+
+------+------+------+| snum | pnum | qty |+------+------+------+
decompose by projection
the decomposition is lossless since a join of the two tables reproduces the original
The table “SCP”
“S”
+--------+-----------+--------+------------+| Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------|| A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting || B120 | Gomez | 20 | Sales || B211 | Davis | 20 | Sales || A227 | Greenberg | 40 | Production || C340 | Brown | 40 | Production || C389 | Lopez | 40 | Production || C395 | Clark | 40 | Production || A502 | Edwards | 20 | Sales || A616 | Scott | 40 | Production || A700 | Sanyo | 60 | Delivery |+--------+-----------+--------+------------+
“Employees”
+--------+-----------+--------+| Emp_Id | Emp_Name | Dept# | +--------+-----------+--------+
+--------+------------+| Dept# | DeptName | +--------+------------|
decompose by projection
the decomposition is lossless since a join of the two tables reproduces the original
“Departments”“Employees”
Dependency Preservation
• Dependency preservation refers to a specific case of nonloss decomposition, such that the normalized relvars are independent of each other
• Some nonloss decompositions do not exhibit dependency preservation
• Example: decompose supplier, city, status where supplier implies city and status, and city and status imply each other
Dependency Preservation
• Dependency is preserved in this projection:
• SC {S#, CITY}• CS {CITY, STATUS}• Dependency is not preserved in this one:• SC {S#, CITY}• CS {S#, STATUS}• Although the second is nonloss, you still
cannot update them independently
+------+--------+------+------+| snum | sname | pnum | qty |+------+--------+------+------+| S1 | Smith | P1 | 300 || S1 | Smith | P2 | 200 || S2 | Jones | P1 | 300 || S2 | Jones | P2 | 400 || S3 | Blake | P2 | 200 || S4 | Clark | P2 | 200 || S4 | Clark | P4 | 300 || S4 | Clark | P5 | 400 |+------+--------+------+------+
The table “SSP”
again, assume unique supplier names
+------+------+| snum | pnum | +------+------++--------+------+| sname | pnum |+--------+------+
candidate key:
candidate key:
qty
qty
snum snamebut:
obviously bad (redundancy, etc.) but satisfies 3NF:
every attribute is key, part of the key, or about key, whole key, nothing but key
violates BCNF
Boyce/Codd Normal Form
• BCNF refers to decompositions involving relvars with more than one candidate key, where the candidate keys are composite and overlapping
• A relvar is in BCNF if and only if every nontrivial, left- irreducible FD has a candidate key as its determinant
• That is, a relvar is in BCNF if and only if every determinant is a candidate key
+------+--------+------+------+| snum | sname | pnum | qty |+------+--------+------+------+| S1 | Smith | P1 | 300 || S1 | Smith | P2 | 200 || S2 | Jones | P1 | 300 || S2 | Jones | P2 | 400 || S3 | Blake | P2 | 200 || S4 | Clark | P2 | 200 || S4 | Clark | P4 | 300 || S4 | Clark | P5 | 400 |+------+--------+------+------+
“SP”+------+--------+| snum | sname |+------+--------+
+------+------+------+| snum | pnum | qty |+------+------+------+
decompose by projection
the decomposition is lossless since a join of the two tables reproduces the original
The table “SSP”
“S”
again, assume unique supplier names
+--------+-----------+--------+------------+| Emp_Id | Emp_Name | Skill | Language | +--------+-----------+--------+------------|| A001 | Johnson | Cook | English | | A001 | Johnson | Cook | French | | A001 | Johnson | Cook | Spanish | | A001 | Johnson | Type | English | | A001 | Johnson | Type | French | | A001 | Johnson | Type | Spanish | | B211 | Davis | Weld | English || B211 | Davis | Weld | German | | B211 | Davis | Type | English | | B211 | Davis | Type | German |
etc.
+--------+-----------+--------+------------+
“Employees”
In BCNF, but lots of redundancy (violates 4NF)
(multi-valued dependencies)
again, the solution is projection--a skills table and a language table
The Normalization Process
• The “normal forms” are simply formalisms for describing problems that usually are apparent and that can cause obvious problems.
• They are usually apparent in the form of redundancies, and common sense says to remove them.
• Removal is a process of projecting the offending (proposed) table into two or more tables (in a lossless way).
Relation-Valued Attributes
• A relation may include attributes whose values are relations
• Traditionally this would be seen to violate 1NF, which was held to prohibit repeating groups
• Now they are theoretically sound, but in practice you should avoid them because they have complicated predicates