7.1 Chapter 7: Relational Database Design. 7.2 Chapter 7: Relational Database Design Features of...

7.1

Chapter 7: Relational Database DesignChapter 7: Relational Database Design

7.2

Chapter 7: Relational Database DesignChapter 7: Relational Database Design

Features of Good Relational Design

Atomic Domains and First Normal Form

Decomposition Using Functional Dependencies

Functional Dependency Theory

Algorithms for Functional Dependencies

Decomposition Using Multivalued Dependencies

More Normal Form

Database-Design Process

Modeling Temporal Data

7.3

Pitfalls in Relational Database DesignPitfalls in Relational Database Design

Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to

Repetition of Information.

Inability to represent certain information.

Design Goals:

Avoid redundant data

Ensure that relationships among attributes are represented

Facilitate the checking of updates for violation of database integrity constraints.

7.4

ExampleExample Consider the relation schema:

Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)

Redundancy: Data for branch-name, branch-city, assets are repeated for each

loan that a branch makes (functional dependency: branch-name branch-city assets)

Wastes space Complicates updating, introducing possibility of inconsistency of

assets value Null values

Cannot store information about a branch if no loans exist Can use null values, but they are difficult to handle.

7.5

DecompositionDecomposition

Decompose the relation schema Lending-schema into:

Branch-schema = (branch-name, branch-city,assets)

Loan-info-schema = (customer-name, loan-number, branch-name, amount)

All attributes of an original schema (R) must appear in the decomposition (R1, R2):

R = R1 R2

Lossless-join decomposition.For all possible relations r on schema R

r = R1 (r) R2 (r)

7.6

A Lossy-join DecompositionA Lossy-join Decomposition

7.7

Goal — Devise a Theory for the FollowingGoal — Devise a Theory for the Following

Decide whether a particular relation R is in “good” form.

In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that

each relation is in good form

the decomposition is a lossless-join decomposition

Our theory is based on:

functional dependencies

multivalued dependencies

7.8

Functional DependenciesFunctional Dependencies

Constraints on the set of legal relations.

Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.

A functional dependency is a generalization of the notion of a key.

7.9

Functional Dependencies (Cont.)Functional Dependencies (Cont.)

Let R be a relation schema

R and R The functional dependency

holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is,

t1[] = t2 [] t1[ ] = t2 [ ] Example: Consider r(A,B ) with the following instance of r.

On this instance, A B does NOT hold, but B A does hold.

1 41 53 7

7.10


K is a superkey for relation schema R if and only if K R

K is a candidate key for R if and only if

K R, and

for no K, R

Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema:

bor_loan = (customer_id, loan_number, amount ).

We expect this functional dependency to hold:

loan_number amount

but would not expect the following to hold:

amount customer_name

7.11


A functional dependency is trivial if it is satisfied by all instances of a relation

Example:

customer_name, loan_number customer_name

customer_name customer_name

In general, is trivial if

When we design a relational database, we first list those functional dependencies that must always hold

The list of functional dependencies in banking database (next page)

7.12

Functional dependencies in Functional dependencies in banking databasebanking database

• On Branch-schema branch-name branch-city branch-name assets

• On Customer-schema customer-name branch-city customer-name customer-street

• On Loan-schema loan-number amount loan-number branch-name

• On Account-schema account-number branch-name account-number balance

7.13

Closure of a Set of Functional Closure of a Set of Functional DependenciesDependencies

Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by F.

E.g. If A B and B C, then we can infer that A C

The set of all functional dependencies logically implied by F is the closure of F.

We denote the closure of F by F+.

We can find all of F+ by applying Armstrong’s Axioms:

if , then (reflexivity)

if , then (augmentation)

if , and , then (transitivity)

These rules are

sound (generate only functional dependencies that actually hold) and

complete (generate all functional dependencies that hold).

7.14

ExampleExample

R = (A, B, C, G, H, I)F = { A B

A CCG HCG I B H}

some members of F+

A H by transitivity from A B and B H

AG I by augmenting A C with G, to get AG CG

and then transitivity with CG I CG HI

from CG H and CG I : “union rule” can be inferred from

– Augmentation of CG I to infer CG CGI, augmentation of CG H to infer CGI HI, and then transitivity

7.15

Procedure for Computing FProcedure for Computing F++

To compute the closure of a set of functional dependencies F:

F+ = Frepeat

for each functional dependency f in F+

apply reflexivity and augmentation rules on f add the resulting functional dependencies to F+

for each pair of functional dependencies f1and f2 in F+

if f1 and f2 can be combined using transitivity then add the resulting functional dependency to

F+

until F+ does not change any further

NOTE: We will see an alternative procedure for this task later

7.16

Closure of Functional DependenciesClosure of Functional Dependencies

We can further simplify manual computation of F+ by using the following additional rules.

If holds and holds, then holds (union)

If holds, then holds and holds (decomposition)

If holds and holds, then holds (pseudotransitivity)

The above rules can be inferred from Armstrong’s axioms.

7.17

Closure of Attribute SetsClosure of Attribute Sets

Given a set of attributes define the closure of under F (denoted by +) as the set of attributes that are functionally determined by under F:

is in F+ +

Algorithm to compute +, the closure of under F

result := ;while (changes to result) do

for each in F dobegin

if result then result := result

end

7.18

Example of Attribute Set ClosureExample of Attribute Set Closure

R = (A, B, C, G, H, I) F = {A B

A C CG HCG IB H}

(AG)+

1. result = AG

2. result = ABCG (A C and A B)

3. result = ABCGH (CG H and CG AGBC)

4. result = ABCGHI (CG I and CG AGBCH) Is AG a candidate key?

1. Is AG a super key?

1. Does AG R?

2. Is any subset of AG a superkey?

1. Does A+ R?

2. Does G+ R?

7.19

Uses of Attribute ClosureUses of Attribute Closure

There are several uses of the attribute closure algorithm:

Testing for superkey: To test if is a superkey, we compute +, and check if + contains

all attributes of R.

Testing functional dependencies To check if a functional dependency holds (or, in other

words, is in F+), just check if +.

That is, we compute + by using attribute closure, and then check if it contains .

Is a simple and cheap test, and very useful

Computing closure of F For each R, we find the closure +, and for each S +, we

output a functional dependency S.

7.20


Figure 7.1: (a bad database design) Lending -schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)

Repetition of information (add a new loan)

Inability to represent certain information (a branch no loan)

Observation: branch-name branch-city branch-name assets but not branch-name loan-number

From above example, we should decompose a relation schema into several schema with fewer attributes

7.21


Lending-schema is decomposed into two schemas: Branch-customer-schema = (branch-name, branch-city, assets, customer-name)Customer-loan-schema = (customer-name, loan-number, amount)branch-customer = branch-name, branch-city, assets, customer-name(Lending)customer-loan = customer-name, loan-number, amount(Lending)

Figure7.9 and Figure 7.10

We wish to find all branches that have loans with amount less than $1000: Branch-customer Customer-loan (Figure 7.11)

We are no longer able to represent in the database information about which customers are borrowers from which branch

7.22

databasedatabasedatabasedatabasedatabasedatabaseoperating systemsoperating systemsoperating systemsoperating systems

AviAviHankHankSudarshanSudarshanAviAvi PetePete

DB ConceptsUllmanDB ConceptsUllmanDB ConceptsUllmanOS ConceptsStallingsOS ConceptsStallings

classes

Lending

7.23

Branch-customer Customer-loan

7.24

Branch-customer Customer-loan

7.25

Because of this loss of information, we call the decomposition of Lending-schema into Branch-customer-schema and Customer-loan-schema a lossy decomposition, or a lossy-join decomposition

A decomposition that is not a lossy-join decomposition is a lossless-join decomposition

Consider another alternative design, in which Lending-schema is decomposed into two schemas:

Branch-schema = (branch-name, branch-city, assets)

Loan-info-schema = (branch-name, customer-name, loan-number, amount)

This decomposition is a lossless-join decomposition, because branch-name branch-city assets


7.26


All attributes of an original schema (R) must appear in the decomposition (R1, R2):

R = R1 R2

Lossless-join decomposition.For all possible relations r on schema R

r = R1 (r) R2 (r)

A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+:

R1 R2 R1

R1 R2 R2

7.27


Example: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount). The set F of functional dependencies that we require to hold on Lending-schema are branch-name branch-city assets loan-number amount branch-name

We decompose Lending-schema into two schemas Branch-schema = (branch-name, branch-city, assets) Loan-info-schema = (branch-name, customer-name, loan-number, amount)

Since branch-name branch-name branch-city assetsThis decomposition is a lossless-join decomposition

Next, we decompose Loan-info-schema into Loan-schema = (branch-name, loan-number, amount) borrower-schema = (customer-name, loan-number)

Since loan-number amount branch-name This decomposition is a lossless-join decomposition

7.28

Dependency PreservationDependency Preservation

When an update is made to the database, the system should be able to check that the update will satisfy all the given functional dependencies

To check updates efficiently, we should design relational database schemas that allow update validation without the computation of joins

F : a set of functional dependencies on a schema RR1 , R2, …, Rn : a decomposition of RFi : a subset of all functional dependencies in F+

that include only attributes of Ri

The set of restrictions F1, F2, …, Fn is the set of dependencies that can be checked efficiently

Let F’ = F1 F2 … Fn

Dependency preserving decomposition: A decomposition has the property F’+ = F+

7.29

Normalization Using Functional DependenciesNormalization Using Functional Dependencies

When we decompose a relation schema R with a set of functional dependencies F into R1, R2,.., Rn we want

Lossless-join decomposition: Otherwise decomposition would result in information loss.

No redundancy: The relations Ri preferably should be in either Boyce-Codd Normal Form or Third Normal Form.

Dependency preservation: Let Fi be the set of dependencies F+ that include only attributes in Ri.

Preferably the decomposition should be dependency preserving, that is, (F1 F2 … Fn)+ = F+

Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.

7.30

ExampleExample R = (A, B, C)

F = {A B, B C)

R1 = (A, B), R2 = (B, C)

Lossless-join decomposition:

R1 R2 = {B} and B BC

Dependency preserving

R1 = (A, B), R2 = (A, C)

Lossless-join decomposition:

R1 R2 = {A} and A AB

Not dependency preserving (cannot check B C without computing R1 R2)

7.31

Testing for Dependency PreservationTesting for Dependency Preservation

Algorithm for testing if a decomposition D is dependency preservation:

compute F+

for each schema Ri in D do begin Fi := the restriction of F+ to Ri; endF’ :=for each restriction Fi do begin F’ = F’ Fi endcompute F’+;if (F’+ = F+) then return (true) else return (false);

7.32

Boyce-Codd Normal FormBoyce-Codd Normal Form

A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form , where R and R, at least one of the following holds: is trivial (i.e., ) is a superkey for R

A database design is in BCNF if each relation schema in the database is in BCNF

7.33

ExampleExample

Customer-schema = (customer-name, customer-street, customer-city) customer-name customer-street customer-city Customer-schema is in BCNF

Branch-schema = (branch-name, assets, branch-city) branch-name assets branch-city Branch-schema is in BCNF

Loan -info-schema = (branch-name, customer-name, loan-number, amount) loan-number branch-name amount Loan-info-schema is not in BCNFLoan-info-schema suffers from the problem of information repetition:(Downtown, Mr. Bell, L-44, 1000)(Downtown, Ms. Bell, L-44, 1000)

Decompose Loan-info-schema into two schemas:Loan-schema = (branch-name, loan-number, amount)Borrower-schema = (customer-name, loan-number)

This decomposition is a lossless-join decomposition

7.34

Testing for BCNFTesting for BCNF

To check if a non-trivial dependency causes a violation of BCNF

1. compute + (the attribute closure of ), and

2. verify that it includes all attributes of R, that is, it is a superkey of R.

Simplified test: To check if a relation schema R with a given set of functional dependencies F is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.

We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.

However, using only F is incorrect when testing a relation in a decomposition of R

E.g. Consider R (A, B, C, D), with F = { A B, B C}

Decompose R into R1(A,B) and R2(A,C,D)

Neither of the dependencies in F contain only attributes from (A,C,D) so we might be mislead into thinking R2 satisfies BCNF.

In fact, dependency A C in F+ shows R2 is not in BCNF.

7.35

Testing for BCNFTesting for BCNF

An alternative BCNF test is sometimes easier than computing every dependency in F+

To check if a relation Ri in a decomposition of R is in BCNF, we apply this test

For every subset of attributes in Ri , check that + (the attribute closure of under F) either includes no attribute of Ri - , or includes all attributes of Ri

If the condition is violated by some set of attributes in Ri , the following functional dependency must be in F+

(+ - ) Ri

7.36

BCNF Decomposition AlgorithmBCNF Decomposition Algorithm

result := {R};done := false;compute F+;while (not done) do

if (there is a schema Ri in result that is not in BCNF)then begin

let be a nontrivial functionaldependency that holds on Risuch that Ri is not in F+, and = ;

result := (result – Ri) (Ri – ) (, ); end

else done := true; Note: each Ri is in BCNF, and decomposition is lossless-

join.

Since the dependency holds on Ri, schema Ri is decomposed into (Ri - ) and ( , ), and (Ri - ) ( , ) =

7.37

Example of BCNF DecompositionExample of BCNF Decomposition

Lending-schema = (branch-name, branch-city, assets,customer-name, loan-number, amount)

F = {branch-name assets branch-cityloan-number amount branch-name}

Key = {loan-number, customer-name} For the dependency branch-name assets branch-city, Lending-

schema is not in BCNF:Branch = (branch-name, branch-city, assets)Loan-info = (branch-name, customer-name, loan-number, amount)

For the dependency loan-number branch-name amount, Loan-info is not in BCNF:Loan = (branch-name, loan-number, amount) is in BCNFBorrower = (customer-name, loan-number) is in BCNF

Not every BCNF decomposition is dependency preserving

7.38


Banker-schema = (branch-name, customer-name, banker-name) banker-name branch-name branch-name customer-name banker-name

Since banker-name is not a superkey,Banker-schema is not in BCNFThe BCNF decomposition: Banker-branch = (banker-name, branch-name) Customer-banker = (customer-name, banker-name)

This decomposition is not dependency preserving:F1 = {banker-name branch-name}F2 = F’ = {banker-name branch-name}F’+ F+

Hence, this decomposition is not dependency preserving

7.39


R = (A, B, C )F = {A B

B C}Key = {A}

R is not in BCNF (B C but B is not superkey)

Decomposition

R1 = (B, C)

R2 = (A,B)

7.40


Original relation R and functional dependency F

R = (branch_name, branch_city, assets,

customer_name, loan_number, amount )

F = {branch_name assets branch_city

loan_number amount branch_name }

Key = {loan_number, customer_name}

Decomposition

R1 = (branch_name, branch_city, assets )

R2 = (branch_name, customer_name, loan_number, amount )

R3 = (branch_name, loan_number, amount )

R4 = (customer_name, loan_number )

Final decomposition R1, R3, R4

7.41

BCNF and Dependency PreservationBCNF and Dependency Preservation

R = (J, K, L )F = {JK L

L K }Two candidate keys = JK and JL

R is not in BCNF

Any decomposition of R will fail to preserve

JK L

This implies that testing for JK L requires a join

It is not always possible to get a BCNF decomposition that is dependency preserving

7.42

Third Normal Form: MotivationThird Normal Form: Motivation

There are some situations where

BCNF is not dependency preserving, and

efficient checking for FD violation on updates is important

Solution: define a weaker normal form, called Third Normal Form (3NF)

Allows some redundancy (with resultant problems; we will see examples later)

But functional dependencies can be checked on individual relations without computing a join.

There is always a lossless-join, dependency-preserving decomposition into 3NF.

7.43

Third Normal FormThird Normal Form

A relation schema R is in third normal form (3NF) if for all:

in F+

at least one of the following holds:

is trivial (i.e., )

is a superkey for R

Each attribute A in – is contained in a candidate key for R.

(NOTE: each attribute may be in a different candidate key)

If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold).

Third condition is a minimal relaxation of BCNF to ensure dependency preservation.

7.44

3NF Example3NF Example

Relation R:

R = (J, K, L )F = {JK L, L K }

Two candidate keys: JK and JL

R is in 3NF

JK L JK is a superkeyL K K is contained in a candidate key

7.45

Redundancy in 3NFRedundancy in 3NF

Jj1

j2

j3

null

Ll1

l1

l1

l2

Kk1

k1

k1

k2

A schema that is in 3NF but not in BCNF has the problems of

repetition of information (e.g., the relationship l1, k1)

need to use null values (e.g., to represent the relationship l2, k2 where there is no corresponding value for J).

There is some redundancy in this schema

Example of problems due to redundancy in 3NF

R = (J, K, L)F = {JK L, L K }

7.46

Testing for 3NFTesting for 3NF

Optimization: Need to check only FDs in F, need not check all FDs in F+.

Use attribute closure to check for each dependency , if is a superkey.

If is not a superkey, we have to verify if each attribute in is contained in a candidate key of R

this test is rather more expensive, since it involve finding candidate keys

testing for 3NF has been shown to be NP-hard

Interestingly, decomposition into third normal form (described shortly) can be done in polynomial time

7.47

3NF Decomposition Algorithm3NF Decomposition Algorithm

Let Fc be a canonical cover for F;

i := 0;for each functional dependency in Fc do

if none of the schemas Rj, 1 j i contains then begin

i := i + 1;Ri :=

endif none of the schemas Rj, 1 j i contains a candidate key for R

then begini := i + 1;Ri := any candidate key for R;

end return (R1, R2, ..., Ri)

7.48

3NF Decomposition Algorithm (Cont.)3NF Decomposition Algorithm (Cont.)

Above algorithm ensures:

each relation schema Ri is in 3NF

decomposition is dependency preserving and lossless-join

7.49

ExampleExample

Relation schema:

Banker-info-schema = (branch-name, customer-name,banker-name, office-number)

The functional dependencies for this relation schema are:banker-name branch-name office-numbercustomer-name branch-name banker-name

The key is:

{customer-name, branch-name}

7.50

Applying 3NF to Applying 3NF to Banker-info-schemaBanker-info-schema

The for loop in the algorithm causes us to include the following schemas in our decomposition:

Banker-office-schema = (banker-name, branch-name,

office-number)Banker-schema = (customer-name, branch-name,

banker-name)

Since Banker-schema contains a candidate key for Banker-info-schema, we are done with the decomposition process.

7.51

ExampleExample

Relation schema:

cust_banker_branch = (customer_id, employee_id, branch_name, type )

The functional dependencies for this relation schema are:customer_id, employee_id branch_name, typeemployee_id branch_name

The for loop generates:

(customer_id, employee_id, branch_name, type )

It then generates

(employee_id, branch_name)

but does not include it in the decomposition because it is a subset of the first schema.

7.52

Comparison of BCNF and 3NFComparison of BCNF and 3NF

It is always possible to decompose a relation into a set of relations that are in 3NF such that:

the decomposition is lossless

the dependencies are preserved

It is always possible to decompose a relation into a set of relations that are in BCNF such that:

the decomposition is lossless

it may not be possible to preserve dependencies.

7.53

Design GoalsDesign Goals

Goal for a relational database design is:

BCNF.

Lossless join.

Dependency preservation.

If we cannot achieve this, we accept one of

Lack of dependency preservation

Redundancy due to use of 3NF

7.54

Canonical CoverCanonical Cover

Sets of functional dependencies may have redundant dependencies that can be inferred from the others Eg: A C is redundant in: {A B, B C, A C}

Parts of a functional dependency may be redundant

E.g. on RHS: {A B, B C, A CD} can be simplified to

{A B, B C, A D}

E.g. on LHS: {A B, B C, AC D} can be simplified to

{A B, B C, A D}

Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, with no redundant dependencies or having redundant parts of dependencies

7.55

Extraneous AttributesExtraneous Attributes

Consider a set F of functional dependencies and the functional dependency in F. Attribute A is extraneous in if A

and F logically implies (F – { }) {( – A) }.

Attribute A is extraneous in if A and the set of functional dependencies (F – { }) { ( – A)} logically implies F.

Example: Given F = {A C, AB C } B is extraneous in AB C because A C logically implies

AB C.

Example: Given F = {A C, AB CD} C is extraneous in AB CD since A C can be inferred even

after deleting C

7.56

Testing if an Attribute is ExtraneousTesting if an Attribute is Extraneous

Consider a set F of functional dependencies and the functional dependency in F. To test if attribute A is extraneous in

(check if ( – {A}) )

1. compute ( – {A})+ using the dependencies in F

2. check that ( – {A})+ contains ; if it does, A is extraneous To test if attribute A is extraneous in

1. compute + using only the dependencies in F’ = (F – { }) { ( – A)}, (check if A can be inferred from F’)

2. check that + contains A; if it does, A is extraneous

Example: R = (A, B, C, D, E)

F = { AB CD A E E C }

To check if C is extraneous in AB CD

7.57

Canonical CoverCanonical Cover

A canonical cover for F is a set of dependencies Fc such that

F logically implies all dependencies in Fc, and

Fc logically implies all dependencies in F, and

No functional dependency in Fc contains an extraneous attribute, and

Each left side of functional dependency in Fc is unique.

To compute a canonical cover for F: Fc = F repeat

Use the union rule to replace any dependencies in Fc of the form 1 1 and 1 2 with 1 1 2

Find a functional dependency in Fc with an extraneous attribute either in or in

If an extraneous attribute is found, delete it from until Fc does not change

Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied

7.58

Example of Computing a Canonical CoverExample of Computing a Canonical Cover

R = (A, B, C)F = {A BC

B C A BAB C}

Combine A BC and A B into A BC

Set is now {A BC, B C, AB C}

A is extraneous in AB C because B C logically implies AB C.

Set is now {A BC, B C}

C is extraneous in A BC since A BC is logically implied by A B and B C.

The canonical cover is:

A BB C

7.1 Chapter 7: Relational Database Design. 7.2 Chapter 7: Relational Database Design Features of...

Documents

Transcript of 7.1 Chapter 7: Relational Database Design. 7.2 Chapter 7: Relational Database Design Features of...