Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.

Boyce-Codd NF & Lossless Decomposition

Professor Sin-Min Lee

Armstrong’s Axioms

For computing the set of FDs that follow a given FD, the following rules called Armstrong’s axioms are useful:

1. Reflexivity: If B A, then A B

2. Augmentation: If A B, then A C B C Note also that if A B, then A C B for any set of attributes C.

3. Transitivity: If A B and B C then A C

Example of 3NF but not BCNF

R(A B C D) 1 2 1 2 1 3 2 1 2 1 1 1 1 3 2 2 2 1 3 2 Is this table BCNF?

FD graph of R is A B C D B A AC B *BD A, C *CD A, B Prime Attribute = key attributes are

{B,C,D} B A and B is not a key, So R is not 3NF R is not BCNF

Projecting FDs

Given a relation R (A,B,C,D) and F(R) = {AB, BC, CD}.

Suppose S is projected from R as S(A,C,D). What is F(S). To compute F(S), start by computing the closures of all attributes in S.

In R, A+ = {AB, AC, AD}In S, A+ = {AC, AD} C+ = {CD} and D+ = {D}

Since A+ contains all attributes of S, it is not required to compute (AC)+, (AD)+ or (ACD)+.

Inference Rules for FD’s

Is equivalent to

Splitting rule and Combining rule

A1 ... Am B1 ... Bm

A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm

A1, A2, …, An B1

A1, A2, …, An B2

. . . . .A1, A2, …, An Bm

A1, A2, …, An B1

A1, A2, …, An B2

. . . . .A1, A2, …, An Bm

Inference Rules for FD’s(continued)

Trivial Rule

Why ?

A1 … Am

where i = 1, 2, ..., n

A1, A2, …, An AiA1, A2, …, An Ai

Inference Rules for FD’s(continued)

Transitive Closure Rule

If

and

then

Why ?

A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm

B1, B2, …, Bm C1, C2, …, CpB1, B2, …, Bm C1, C2, …, Cp

A1, A2, …, An C1, C2, …, CpA1, A2, …, An C1, C2, …, Cp

A1 … Am B1 … Bm C1 ... Cp

Example (continued)

Start from the following FDs:

Infer the following FDs:

1. name color2. category department3. color, category price

1. name color2. category department3. color, category price

Inferred FDWhich Ruledid we apply ?

4. name, category name

5. name, category color

6. name, category category

7. name, category color, category

8. name, category price

Another Rule

If

then

Augmentation follows from trivial rules and transitivityHow ?

A1, A2, …, An BA1, A2, …, An B

A1, A2, …, An , C1, C2, …, Cp BA1, A2, …, An , C1, C2, …, Cp B

Augmentation

Problem: infer ALL FDs

Given a set of FDs, infer all possible FDs

How to proceed ? Try all possible FDs, apply all 3 rules

E.g. R(A, B, C, D): how many FDs are possible ?

Drop trivial FDs, drop augmented FDs Still way too many

Better: use the Closure Algorithm (next)

Closure of a set of Attributes

Given a set of attributes A1, …, An

The closure, {A1, …, An}+ , is the set of attributes Bs.t. A1, …, An B

Given a set of attributes A1, …, An

The closure, {A1, …, An}+ , is the set of attributes Bs.t. A1, …, An B

name colorcategory departmentcolor, category price


Example:

Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}

Closure Algorithm

Start with X={A1, …, An}.

Repeat until X doesn’t change do:

if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.

Start with X={A1, …, An}.

Repeat until X doesn’t change do:

if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.

{name, category}+ = {name, category, color, department, price}



Example:

Example

Compute {A,B}+ X = {A, B, }

Compute {A, F}+ X = {A, F, }

R(A,B,C,D,E,F) A, B CA, D EB DA, F B

A, B CA, D EB DA, F B

Using Closure to Infer ALL FDs

A, B CA, D BB D

A, B CA, D BB D

Example:

Step 1: Compute X+, for every X:

A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD

A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD

Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :

AB CD, ADBC, ABC D, ABD C, ACD BAB CD, ADBC, ABC D, ABD C, ACD B

Problem: Finding FDs

Approach 1: During Database Design Designer derives them from real-world

knowledge of users Problem: knowledge might not be available

Approach 2: From a Database Instance Analyze given database instance and find all

FD’s satisfied by that instance Useful if designers don’t get enough

information from users Problem: FDs might be artifical for the given

instance

Find All FDs

Student Dept Course Room

Alice CSE C++ 020

Bob CSE C++ 020

Alice EE HW 040

Carol CSE DB 045

Dan CSE Java 050

Elsa CSE DB 045

Frank EE Circuits 020

Do all FDsmake sensein practice ?

Answer

Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course

Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course

Do all FDsmake sensein practice ?

Keys

A key is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An B

A minimal key is a set of attributes which is a key and for which no subset is a key

Note: book calls them superkey and key

Computing Keys

Compute X+ for all sets X If X+ = all attributes, then X is a key List only the minimal keys

Note: there can be many minimal keys !

Example: R(A,B,C), ABC, BCAMinimal keys: AB and BC

Examples of Keys

Product(name, price, category, color)name, category pricecategory color

Keys are: {name, category} and all supersets

Enrollment(student, address, course, room, time)student addressroom, time coursestudent, course room, time

Keys are:

Relational Schema Design(or Logical Schema Design)

Main idea: Start with some relational schema Find out its FD’s Use them to design a better

relational schema

Data Anomalies

When a database is poorly designed we get anomalies:

Redundancy: data is repeated

Update anomalies: need to change in several places

Delete anomalies: may lose data when we don’t want

Relational Schema Design

Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number:

what is his city ?

Example: Persons with several phones

SSN Name, CitySSN Name, City

Name SSN PhoneNumber

City

Fred 123-45-6789 206-555-1234

Seattle

Fred 123-45-6789 206-555-6543

Seattle

Joe 987-65-4321 908-555-2121

Westfieldbut not SSN PhoneNumber

Relation DecompositionBreak the relation into two:

Name SSN City

Fred 123-45-6789 Seattle

Joe 987-65-4321 Westfield

SSN PhoneNumber

123-45-6789 206-555-1234

123-45-6789 206-555-6543

987-65-4321 908-555-2121

Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joe’s phone number (how ?)

Name SSN PhoneNumber City

Fred 123-45-6789 206-555-1234 Seattle

Fred 123-45-6789 206-555-6543 Seattle

Joe 987-65-4321 908-555-2121 Westfield

Relational Schema Design

PersonbuysProduct

name

price name ssn

Conceptual Model:

Relational Model:plus FD’s

Normalization:Eliminates anomalies

Decompositions in General

R1 = projection of R on A1, ..., An, B1, ..., Bm

R2 = projection of R on A1, ..., An, C1, ..., Cp

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)

Decomposition

Sometimes it is correct:

Name PriceCategory

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera

Name Price

Gizmo 19.99

OneClick 24.99

Gizmo 19.99

NameCategory

Gizmo Gadget

OneClick Camera

Gizmo Camera

Lossless decomposition

Incorrect Decomposition

Sometimes it is not:

Name PriceCategory

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera

NameCategory

Gizmo Gadget

OneClick Camera

Gizmo Camera

PriceCategory

19.99 Gadget

24.99 Camera

19.99 Camera

What’sincorrect ??

Lossy decomposition

Decompositions in General

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

If A1, ..., An B1, ..., Bm

Then the decomposition is lossless

R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)

Example: name price, hence the first decomposition is lossless

Note: don’t need necessarily A1, ..., An C1, ..., Cp

Normal Forms

First Normal Form = all attributes are atomic

Second Normal Form (2NF) = old and obsolete

Third Normal Form (3NF) = this lecture

Boyce Codd Normal Form (BCNF) = this lecture

Others...

R (J, K, L)

F = (JK L, L K)

Two candidate keys: JK and JL

R is in 3NFJK L JK is a superkey

L K K is prime

BCNF decomposition yields:R1 (L,K), R2 (L,J)

testing for JK L requires a join

There is some redundancy in R

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R.

A relation R is in BCNF if:

If A1, ..., An B is a non-trivial dependency

in R , then {A1, ..., An} is a key for R

A relation R is in BCNF if:

If A1, ..., An B is a non-trivial dependency

in R , then {A1, ..., An} is a key for R

BCNF Decomposition Algorithm

A’s OthersB’s

R1

Is there a 2-attribute relation that isnot in BCNF ?

Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2

Until no more violations

Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2

Until no more violations

R2

Example

What are the dependencies?SSN Name, City

What are the keys?{SSN, PhoneNumber}

Is it in BCNF?

Name SSN PhoneNumber

City

Fred 123-45-6789 206-555-1234

Seattle

Fred 123-45-6789 206-555-6543

Seattle

Joe 987-65-4321 908-555-2121

Westfield

Joe 987-65-4321 908-555-1234

Westfield

Decompose it into BCNF

Name SSN City

Fred 123-45-6789 Seattle

Joe 987-65-4321 Westfield

SSN PhoneNumber

123-45-6789 206-555-1234

123-45-6789 206-555-6543

987-65-4321 908-555-2121

987-65-4321 908-555-1234

SSN Name, City

Let’s check anomalies:• Redundancy ?• Update ?• Delete ?

Summary of BCNF Decomposition

Find a dependency that violates the BCNF condition:

A’sOthers B’s

R1 R2

Heuristics: choose B , B , … B “as large as possible”1 2 m

Decompose:

2-attribute relations are BCNF

Continue untilthere are noBCNF violationsleft.

A1, A2, …, An B1, B2, …, Bm

Example Decomposition Person(name, SSN, age, hairColor, phoneNumber)

SSN name, ageage hairColor

Decompose in BCNF (in class):

Step 1: find all keys (How ? Compute S+, for various sets S)

Step 2: now decompose

Other Example

R(A,B,C,D) A B, B C

Key: AD Violations of BCNF: A B, A C, ABC Pick A BC: split into R1(A,BC)

R2(A,D) What happens if we pick A B first ?

Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C)

R1(A,B) R2(A,C)

R’(A,B,C) should be the same as R(A,B,C)

R’ is in general larger than R. Must ensure R’ = R

Decompose

Recover

Lossless Decompositions

Given R(A,B,C) s.t. AB, the decomposition into R1(A,B), R2(A,C) is lossless

3NF: A Problem with BCNF

Unit Company Product

Unit Company

Unit Product

FD’s: Unit Company; Company, Product UnitSo, there is a BCNF violation, and we decompose.

Unit Company

No FDs

Notice: we loose the FD: Company, Product Unit

So What’s the Problem?

Unit Company Product

Unit Company Unit Product

Galaga99 UW Galaga99 databasesBingo UW Bingo databases

No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again (anomalies?):

Galaga99 UW databasesBingo UW databases

Violates the dependency: company, product -> unit!

Solution: 3rd Normal Form (3NF)

A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } is a key for R, or B is part of a key.

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } is a key for R, or B is part of a key.

Tradeoff:BCNF = no anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies

Purpose of Normalization

To reduce the chances for anomalies to occur in a database.

normalization prevents the possible corruption of databases stemming from what are called “insertion 　 anomalies," "deletion anomalies," and "update anomalies."

Insertion Anomaly

A failure to place a new database entry into all the places in the database where that new entry needs to be stored.

In a properly normalized database, a new entry needs to be inserted into only one place in the database

Deletion Anomaly

A failure to remove an existing database entry when it is time to remove that entry.

In a properly normalized database, an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database

Update anomaly

An update of a database involves modifications that may be additions, deletions, or both. Thus "update anomalies" can be either of the kinds of anomalies discussed above.

Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.

Documents

Transcript of Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee.