Design of databases
description
Transcript of Design of databases
Design Of Databases
•What is Good Design•Normalization
Pitfalls in Relational-Database Design
• Repetition of information
• Inability to represent certain information
Repetition of Information
• Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)
• t[assets] is the asset figure for the branch named t[branch-name].
• t[branch-city] is the city in which the branch named t[branch-name] is located.
Cont…
• t[loan-number] is the number assigned to a loan given by the branch named t[branch-name] to the customer named t[customer-name].
• t[amount] is the amount of the loan whose number is t[loan-number].
Example of Repetition
• (branch-name, branch-city, assets, customer-name, customer-city, loan-number, amount)
• (Perryridge, Horseneck, 1700000, Adams, Brooklyn, L-31, 1500)
• Suppose that we wish to add a new loan to our database. Say that the loan is made by the Perryridge branch to Adams in the amount of $1500. Let the loan-number be L-31. In our design, we need a tuple with values on all the attributes of Lendingschema.
• Thus, we must repeat the asset and city data for the Perryridge branch and the customer-city.
Branch-Name
Branch-City
Assets Customer-name
Customer-City
Loan-number
Amount
Perryridge Horseneck 1700000 Adams Brooklyn L-31 1500
Perryridge Horseneck 1700000 Adams Brooklyn L-32 30000
Perryridge Horseneck 1700000 Adams Brooklyn L-33 2500
Perryridge Horseneck 1700000 Bob Horseneck L-39 4500
Redwood Palo Alto 2100000 Smith Rye L-23 2000
Redwood Palo Alto 2100000 Smith Rye L-52 3000
• Repeating information wastes space.
• Furthermore, it complicates updating the database.
• for example, that the assets of the Perryridge branch change from 1700000 to 1900000.
• Each tuple with Branch-Name Perryridge must be updated.
Inability to Represent Information
• Another problem with the Lending-schema design is that we cannot represent directly the information concerning a branch (branch-name, branch-city, assets) unless there exists at least one loan at the branch.
• One solution to this problem is to introduce null values.
Functional Dependency
• We know that a bank branch has a unique value of assets, so given a branch name we can uniquely identify the assets value.
• In other words, we say that the functional dependency
branch-name → assetsholds good.
• The fact that a branch has a particular value of assets, and the fact that a branch makes a loan are independent; these facts are best represented in separate relations (Tables).
Super Key
• Let R be a relation schema. A subset K of R is a superkey of R if, in any legal relation r(R), for all pairs
• t1 and t2 of tuples in r such that if t1[K] = t2[K], then t1 = t2.
• That is, no two tuples in any legal relation r(R) may have the same value on attribute set K.
Back to Functional Dependencies
• The notion of functional dependency generalizes the notion of superkey.
• Consider a relation schema R, and let α R ⊆and β R. The ⊆ functional dependency
α →β holds on schema R if, in any legal relation r(R),
for all pairs of tuples t1 and t2 in r such that if t1[α] = t2[α], it is also the case that
t1[β] = t2[β].
• Consider our original Lending-Schema:
– Functional dependencies on it are:
– Branch Name -> Branch City Branch– Branch Name -> Assets Schema
– Loan Number -> Amount Loan– Loan Number -> Branch Name Schema– Loan Number -> Customer Name
– Customer Name -> Customer City - Customer Schema
Branch Schema
Branch-Name Branch-City Assets
Perryridge Horseneck 1700000
Redwood Palo Alto 2100000
Loan Schema
Loan-number Customer-name Branch-Name Amount
L-31 Adams Perryridge 1500
L-32 Adams Perryridge 30000
L-33 Adams Perryridge 2500
L-39 Bob Perryridge 4500
L-23 Smith Redwood 2000
L-52 Smith Redwood 3000
Customer Schema
Customer – Name Customer – City
Adam Brooklyn
Bob Horseneck
Smith Rye
Closure on Set of Functional Dependencies
• Armstrong Rules:• Reflexivity - If α is a set of attributes and β ⊆
α, then α →β holds.
• Augmentation rule - If α → β holds and γ is a set of attributes, then γα → γβ holds.
• Transitivity rule - If α →β holds and β → γ holds, then α → γ holds.
Rules derived from Armstrong Rules
• Union rule. If α → β holds and α → γ holds, then α →βγ holds.
• Decomposition rule. If α →βγ holds, then α → β holds and α →γ holds.
• Pseudotransitivity rule. If α→β holds and γβ →δ holds, then αγ →δ holds.
Algorithm to compute F+ (F closure)
F+ = Frepeat
for each functional dependency f in F+apply reflexivity and augmentation rules on fadd the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivityAdd the resulting functional dependency to
F+
until F+ does not change any further
Properties of Decomposition
• Lossless join decomposition
• Dependency Preservation
• Decrease in Repetition of Information
Boyce–Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional dependencies if, for all functional dependencies in F+ of the form α → β, where α R and β R, at least one of the ⊆ ⊆following holds:
• α → β is a trivial functional dependency (that is, β α).⊆• α is a superkey for schema R.
• A database design is in BCNF if each member of the set of relation schemas that constitutes the design is in BCNF.
• Branch Schema, Loan Schema and Customer Schema make up the BCNF of the Lending-Schema
BCNF Decomposition Algorithmresult := {R};done := false;compute F+;while (not done) do
if (there is a schema Ri in result that is not in BCNF)then begin
let α → β be a nontrivial functional dependency that holds on Ri such that α → Ri is not in F+, and α ∩ β =
∅result := (result − Ri) (Ri − ∪ β) ( α, β)∪
endelse done := true