Schema Refinement and Normal Formskrunapon/courses/178370/slides/lecture4.pdf · to BCNF, 4NF, and...
-
Upload
vuongthuan -
Category
Documents
-
view
219 -
download
2
Transcript of Schema Refinement and Normal Formskrunapon/courses/178370/slides/lecture4.pdf · to BCNF, 4NF, and...
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 1
1
Asst. Prof. Dr. Kanda
Runapongsa Saikaew
Department of Computer Engineering
Khon Kaen University
Database Normalization
2
Overview
What and why normalization
Background to normalization
Functional dependency
Superkey
Candidate key
Primary key
Normal forms and their purposes
1NF
2NF
3NF
3
What and why normalization
Normalization is a systematic way of ensuring that a
database structure is suitable for general-purpose
querying and free of certain undesirable
characteristics —insertion, update, and deletion
anomalies—that could lead to a loss of data integrity
Integrity constraints, in particular functional dependencies, can be used to identify schemas with
such problems and to suggest refinements.
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 2
Normalized tables
Informally, a relational database table
is often described as "normalized" if it
is in the third normal form (3NF)
Most 3NF tables are free of insertion,
update, and deletion anomalies, i.e. in
most cases 3NF tables adhere
to BCNF, 4NF, and 5NF (but typically
not 6NF)
4
When we apply normalization
A standard piece of database design
guidance is that the designer should
begin by fully normalizing the design
Selectively denormalize only in places
where doing so is absolutely necessary
to address performance issues
Some modeling disciplines, such as the
dimensional modeling approach to data
warehouse design, explicitly
recommend non-normalized designs 5
6
Functional dependency
Functional dependency: Attribute B has a
functional dependency on attribute B (i.e., A ->
B)
If, for each value of attribute A, there is exactly one
value of attribute B
If value of A is repeating in tuples then value of B
will also repeat
In our example, branch address and
branch phone number has a functional
dependency on branch no.
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 3
7
All instance that satisfies AB C
A B C D
a1 b1 c1 d1
a1 b1 c1 d2
a1 b2 c2 d1
a2 b1 c3 d1
8
Trivial functional dependency
Trivial functional dependency
A trivial functional dependency is a
functional dependency of attribute of a
superset of itself
Examples
{Branch No, Branch Address} -> {Branch
Address}
{Branch Address} -> {Branch Address}
Full functional dependency
An attribute is fully functional dependency
on a set of attributes X if it is
Functionally dependent on X, and
Not functionally dependent on any
proper subset of X
Example:
{Employee Address} has a functional
dependency on {Employee ID, Skill}, but
not a full functional dependency
because it is also dependent on
{Employee ID} 9
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 4
Transitive dependency
Transitive dependency is an indirect
functional dependency, one in
which X→Z only by virtue
of X→Y and Y→Z
Example:
Employee ID -> Employee office
Employee office -> Employee phone
Employee ID -> Employee phone
10
Superkey
Superkey
A superkey is an attribute or set of
attributes that uniquely identifies rows
within a table
Two distinct rows are always guaranteed to
have distinct superkeys
Example
{Employee ID, Salary, Year} would be a
superkey for the {Employee ID, Salary,
Bonus, Year} table
{Employee ID, Year} would also be a
superkey11
Candidate key
Candidate key is a minimal superkey,
that is, a superkey for which we can
say that no proper subset of it is also
a superkey
Example:
{Employee ID, Year} would be a
candidate key for the table {Employee
ID, Salary, Bonus, Year} table.
12
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 5
Non-prime attribute
A non-prime attribute is an attribute
that does not occur in any candidate
key
Bonus would be a non-prime attribute
in the "Employees' Bonus" table.
13
Primary key
Most DBMSs require a table to be
defined as having a single unique
key, rather than a number of possible
unique keys
A primary key is a candidate key
which the database designer has
designated for this purpose
14
15
Example: Constraints on Entity Set
Consider relation obtained from Hourly_Emps:
Hourly_Emps (ssn, name, lot, rating, hrly_wages,
hrs_worked)
Notation: We will denote this relation schema by
listing the attributes: SNLRWH
This is really the set of attributes {S,N,L,R,W,H}.
Sometimes, we will refer to all attributes of a relation by
using the relation name. (e.g., Hourly_Emps for SNLRWH)
Some FDs on Hourly_Emps:
ssn is the key: S SNLRWH
rating determines hrly_wages: R W
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 6
16
Data redundancy example1 (1/3)
ssn name lot rating Hr_Wages Hr_Worked
1 A 48 8 10 40
2 B 22 8 10 30
3 C 35 5 7 30
4 D 35 5 7 32
5 E 35 8 10 40
17
Data redundancy example1 (2/3)
Redundant Storage: The rating value
8 corresponds to the hourly wage 10,
and this association is repeated three
times
Update Anomalies: The
hourly_wages in the first tuple could
be updated without making a similar
change in the second tuple
18
Data redundancy example1 (3/3)
Insertion Anomalies: We cannot insert a
tuple for an employee unless we know the
hourly wage for the employee‟s rating
value
Deletion Anomalies: If we delete all tuples
with a given rating value (e.g., we delete
the tuples for C and D) we lose the
association between the rating value and
its hourly_wage value
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 7
19
Decompositions (1/2)
Redundancy arises when a relational
schema forces an association
between attributes that is not natural.
Functional dependencies can be used
to identify such situations and
suggest refinement to the schema
20
Decompositions (2/2)
The essential idea is that many problems
arising from redundancy can be addressed
by replacing a relation with a collection of
„smaller‟ relations
R consists of replacing the relation
schema by two (or more) relation schemas
that each contain a subset of attributes of
R and together include all attributes in R
21
Examples
We can decompose Hourly_Emps into two relations:
Hourly_Emps2(ssn, name, lot, rating, hours_worked)
Wages(rating, hourly_wages)
This is more efficient than updating several tuples (as in the original design), and it eliminates the potential for inconsistency
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 8
Data redundancy example2
22
Redundant Storage: the branchNo B001 and B002 corresponds to their branchAddress and telNo are repeated
Example2 Problem
Update Anomalies: If we change address of
Tom Daniels, we also need to change
address of Sally Adams
Insertion Anomalies: We cannot insert a
tuple for a staff unless we know the address
and phone number of the branch of that stuff
Deletion Anomalies: If we delete all tuples
with a given branch value (e.g., we delete the
tuples for branch B002) we lose the
association between the branch and its
address and phone number23
Example2 Solution
24
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 9
25
Normal Forms (1/2)
Given a relation schema, we need to
decide whether it is a good design or
we need to decompose it into smaller
relations
To provide such guidance, several
normal forms have been proposed
26
Normal Forms (2/2)
The normal forms based on FDs are
first normal form (1NF), second
normal form (2NF), and third normal
form (3NF)
Every relation in 3NF is also in 2NF
Every relation in 2NF is also in 1NF
First Normal Form (1NF)
A relation is in first normal form if
every field contains only atomic
values, that is, no lists or sets
This requirement is implicit in our
definition of the relation model
Table faithfully represents
a relation and has no "repeating
groups"
27
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 10
Branch Table is not in 1NF
28
Converting Branch Table into 1NF
29
Second normal form (2NF)
2NF only applies to tables with
composite primary keys.
No non-prime attribute in the table
is functionally dependent on a part
(proper subset) of a candidate key
30
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 11
TempStaffAllocation Table is not in 2NF
31
Formal Definition of 2NF
Formal definition of 2NF is a table
that is in 1NF and every non-prime
attribute is fully functional dependent on the primary key
Full functional dependency indicates
that if A and B are columns of a table,
B is fully dependent on A if B is
functionally dependent on A but not
on any proper subset of A
32
Converting TempStaffAllocation table to 2NF
33
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 12
Third Normal Form (3NF)
A table that is in 1NF and 2NF and in
which all non-prime attributes can be
derived from only the primary key column(s) and no other columns
The formal definition of 3NF is a table
that is in 1NF and 2NF and in which
no non-prime attributes is transitively
dependent on the primary key
34
StaffBranch table is not in 3NF
35
Converting StaffBranch table to 3NF
36
Electricite Du Laos (EDL)
Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU 13
Summary (1/2)
Normalization is a systematic way of
ensuring that a database structure is
suitable for general-purpose querying
Free of certain undesirable
characteristics—insertion, update, and
deletion anomalies—that could lead to a
loss of data integrity
Informally, a relational database table
is often described as "normalized" if it
is in the 3NF37
Summary (2/2)
1NF: Table faithfully represents
a relation and has no "repeating
groups"
2NF: No non-prime attribute in the
table is functionally dependent on a
part (proper subset) of a candidate
key
3NF: Every non-prime attribute is
non-transitively dependent on every
key of the table 38
References
Connolly and Begg, “Database
Systems: A Practical Approach to
Design, Implementation and
Management”, Pearson, 2004
http://en.wikipedia.org/wiki/Database_
normalization
Ramakrishnan and Gehrke,
“Database Management Systems”,
McGraw-Hill
Science/Engineering/Math, 2003 39