Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
-
date post
22-Dec-2015 -
Category
Documents
-
view
237 -
download
5
Transcript of Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.
Boyce-Codd NF & Lossless Decomposition
Professor Sin-Min Lee
Armstrong’s Axioms
For computing the set of FDs that follow a given FD, the following rules called Armstrong’s axioms are useful:
1. Reflexivity: If B A, then A B
2. Augmentation: If A B, then A C B C Note also that if A B, then A C B for any set of attributes C.
3. Transitivity: If A B and B C then A C
Example of 3NF but not BCNF
R(A B C D) 1 2 1 2 1 3 2 1 2 1 1 1 1 3 2 2 2 1 3 2 Is this table BCNF?
FD graph of R is A B C D B A AC B *BD A, C *CD A, B Prime Attribute = key attributes are
{B,C,D} B A and B is not a key, So R is not 3NF R is not BCNF
Projecting FDs
Given a relation R (A,B,C,D) and F(R) = {AB, BC, CD}.
Suppose S is projected from R as S(A,C,D). What is F(S). To compute F(S), start by computing the closures of all attributes in S.
In R, A+ = {AB, AC, AD}In S, A+ = {AC, AD} C+ = {CD} and D+ = {D}
Since A+ contains all attributes of S, it is not required to compute (AC)+, (AD)+ or (ACD)+.
Inference Rules for FD’s
Is equivalent to
Splitting rule and Combining rule
A1 ... Am B1 ... Bm
A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm
A1, A2, …, An B1
A1, A2, …, An B2
. . . . .A1, A2, …, An Bm
A1, A2, …, An B1
A1, A2, …, An B2
. . . . .A1, A2, …, An Bm
Inference Rules for FD’s(continued)
Trivial Rule
Why ?
A1 … Am
where i = 1, 2, ..., n
A1, A2, …, An AiA1, A2, …, An Ai
Inference Rules for FD’s(continued)
Transitive Closure Rule
If
and
then
Why ?
A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm
B1, B2, …, Bm C1, C2, …, CpB1, B2, …, Bm C1, C2, …, Cp
A1, A2, …, An C1, C2, …, CpA1, A2, …, An C1, C2, …, Cp
A1 … Am B1 … Bm C1 ... Cp
Example (continued)
Start from the following FDs:
Infer the following FDs:
1. name color2. category department3. color, category price
1. name color2. category department3. color, category price
Inferred FDWhich Ruledid we apply ?
4. name, category name
5. name, category color
6. name, category category
7. name, category color, category
8. name, category price
Another Rule
If
then
Augmentation follows from trivial rules and transitivityHow ?
A1, A2, …, An BA1, A2, …, An B
A1, A2, …, An , C1, C2, …, Cp BA1, A2, …, An , C1, C2, …, Cp B
Augmentation
Problem: infer ALL FDs
Given a set of FDs, infer all possible FDs
How to proceed ? Try all possible FDs, apply all 3 rules
E.g. R(A, B, C, D): how many FDs are possible ?
Drop trivial FDs, drop augmented FDs Still way too many
Better: use the Closure Algorithm (next)
Closure of a set of Attributes
Given a set of attributes A1, …, An
The closure, {A1, …, An}+ , is the set of attributes Bs.t. A1, …, An B
Given a set of attributes A1, …, An
The closure, {A1, …, An}+ , is the set of attributes Bs.t. A1, …, An B
name colorcategory departmentcolor, category price
name colorcategory departmentcolor, category price
Example:
Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}
Closure Algorithm
Start with X={A1, …, An}.
Repeat until X doesn’t change do:
if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.
Start with X={A1, …, An}.
Repeat until X doesn’t change do:
if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.
{name, category}+ = {name, category, color, department, price}
name colorcategory departmentcolor, category price
name colorcategory departmentcolor, category price
Example:
Example
Compute {A,B}+ X = {A, B, }
Compute {A, F}+ X = {A, F, }
R(A,B,C,D,E,F) A, B CA, D EB DA, F B
A, B CA, D EB DA, F B
Using Closure to Infer ALL FDs
A, B CA, D BB D
A, B CA, D BB D
Example:
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD
A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :
AB CD, ADBC, ABC D, ABD C, ACD BAB CD, ADBC, ABC D, ABD C, ACD B
Problem: Finding FDs
Approach 1: During Database Design Designer derives them from real-world
knowledge of users Problem: knowledge might not be available
Approach 2: From a Database Instance Analyze given database instance and find all
FD’s satisfied by that instance Useful if designers don’t get enough
information from users Problem: FDs might be artifical for the given
instance
Find All FDs
Student Dept Course Room
Alice CSE C++ 020
Bob CSE C++ 020
Alice EE HW 040
Carol CSE DB 045
Dan CSE Java 050
Elsa CSE DB 045
Frank EE Circuits 020
Do all FDsmake sensein practice ?
Answer
Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course
Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course
Do all FDsmake sensein practice ?
Keys
A key is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An B
A minimal key is a set of attributes which is a key and for which no subset is a key
Note: book calls them superkey and key
Computing Keys
Compute X+ for all sets X If X+ = all attributes, then X is a key List only the minimal keys
Note: there can be many minimal keys !
Example: R(A,B,C), ABC, BCAMinimal keys: AB and BC
Examples of Keys
Product(name, price, category, color)name, category pricecategory color
Keys are: {name, category} and all supersets
Enrollment(student, address, course, room, time)student addressroom, time coursestudent, course room, time
Keys are:
Relational Schema Design(or Logical Schema Design)
Main idea: Start with some relational schema Find out its FD’s Use them to design a better
relational schema
Data Anomalies
When a database is poorly designed we get anomalies:
Redundancy: data is repeated
Update anomalies: need to change in several places
Delete anomalies: may lose data when we don’t want
Relational Schema Design
Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number:
what is his city ?
Example: Persons with several phones
SSN Name, CitySSN Name, City
Name SSN PhoneNumber
City
Fred 123-45-6789 206-555-1234
Seattle
Fred 123-45-6789 206-555-6543
Seattle
Joe 987-65-4321 908-555-2121
Westfieldbut not SSN PhoneNumber
Relation DecompositionBreak the relation into two:
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121
Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joe’s phone number (how ?)
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Relational Schema Design
PersonbuysProduct
name
price name ssn
Conceptual Model:
Relational Model:plus FD’s
Normalization:Eliminates anomalies
Decompositions in General
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)
Decomposition
Sometimes it is correct:
Name PriceCategory
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Price
Gizmo 19.99
OneClick 24.99
Gizmo 19.99
NameCategory
Gizmo Gadget
OneClick Camera
Gizmo Camera
Lossless decomposition
Incorrect Decomposition
Sometimes it is not:
Name PriceCategory
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
NameCategory
Gizmo Gadget
OneClick Camera
Gizmo Camera
PriceCategory
19.99 Gadget
24.99 Camera
19.99 Camera
What’sincorrect ??
Lossy decomposition
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
If A1, ..., An B1, ..., Bm
Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)
Example: name price, hence the first decomposition is lossless
Note: don’t need necessarily A1, ..., An C1, ..., Cp
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Third Normal Form (3NF) = this lecture
Boyce Codd Normal Form (BCNF) = this lecture
Others...
R (J, K, L)
F = (JK L, L K)
Two candidate keys: JK and JL
R is in 3NFJK L JK is a superkey
L K K is prime
BCNF decomposition yields:R1 (L,K), R2 (L,J)
testing for JK L requires a join
There is some redundancy in R
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R.
A relation R is in BCNF if:
If A1, ..., An B is a non-trivial dependency
in R , then {A1, ..., An} is a key for R
A relation R is in BCNF if:
If A1, ..., An B is a non-trivial dependency
in R , then {A1, ..., An} is a key for R
BCNF Decomposition Algorithm
A’s OthersB’s
R1
Is there a 2-attribute relation that isnot in BCNF ?
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
R2
Example
What are the dependencies?SSN Name, City
What are the keys?{SSN, PhoneNumber}
Is it in BCNF?
Name SSN PhoneNumber
City
Fred 123-45-6789 206-555-1234
Seattle
Fred 123-45-6789 206-555-6543
Seattle
Joe 987-65-4321 908-555-2121
Westfield
Joe 987-65-4321 908-555-1234
Westfield
Decompose it into BCNF
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121
987-65-4321 908-555-1234
SSN Name, City
Let’s check anomalies:• Redundancy ?• Update ?• Delete ?
Summary of BCNF Decomposition
Find a dependency that violates the BCNF condition:
A’sOthers B’s
R1 R2
Heuristics: choose B , B , … B “as large as possible”1 2 m
Decompose:
2-attribute relations are BCNF
Continue untilthere are noBCNF violationsleft.
A1, A2, …, An B1, B2, …, Bm
Example Decomposition Person(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Decompose in BCNF (in class):
Step 1: find all keys (How ? Compute S+, for various sets S)
Step 2: now decompose
Other Example
R(A,B,C,D) A B, B C
Key: AD Violations of BCNF: A B, A C, ABC Pick A BC: split into R1(A,BC)
R2(A,D) What happens if we pick A B first ?
Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C)
R1(A,B) R2(A,C)
R’(A,B,C) should be the same as R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
Decompose
Recover
Lossless Decompositions
Given R(A,B,C) s.t. AB, the decomposition into R1(A,B), R2(A,C) is lossless
3NF: A Problem with BCNF
Unit Company Product
Unit Company
Unit Product
FD’s: Unit Company; Company, Product UnitSo, there is a BCNF violation, and we decompose.
Unit Company
No FDs
Notice: we loose the FD: Company, Product Unit
So What’s the Problem?
Unit Company Product
Unit Company Unit Product
Galaga99 UW Galaga99 databasesBingo UW Bingo databases
No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again (anomalies?):
Galaga99 UW databasesBingo UW databases
Violates the dependency: company, product -> unit!
Solution: 3rd Normal Form (3NF)
A simple condition for removing anomalies from relations:
A relation R is in 3rd normal form if :
Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } is a key for R, or B is part of a key.
A relation R is in 3rd normal form if :
Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } is a key for R, or B is part of a key.
Tradeoff:BCNF = no anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies
Purpose of Normalization
To reduce the chances for anomalies to occur in a database.
normalization prevents the possible corruption of databases stemming from what are called “insertion anomalies," "deletion anomalies," and "update anomalies."
Insertion Anomaly
A failure to place a new database entry into all the places in the database where that new entry needs to be stored.
In a properly normalized database, a new entry needs to be inserted into only one place in the database
Deletion Anomaly
A failure to remove an existing database entry when it is time to remove that entry.
In a properly normalized database, an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database
Update anomaly
An update of a database involves modifications that may be additions, deletions, or both. Thus "update anomalies" can be either of the kinds of anomalies discussed above.