1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
-
Upload
joshua-smith -
Category
Documents
-
view
215 -
download
1
description
Transcript of 1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
1
Lecture 10:Database Design andRelational Algebra
Monday, October 20, 2003
2
Outline
• Design of a Relational schema (3.6)• Relational Algebra (5.2)• Operations on bags (5.3, 5.4)
– Reading assignment 5.3 and 5.4 (won’t have time to cover in class)
3
Relational Schema Design(or Logical Design)
Main idea:• Start with some relational schema• Find out its FD’s• Use them to design a better relational
schema
4
Data Anomalies
When a database is poorly designed we get anomalies:
Redundancy: data is repeated
Updated anomalies: need to change in several places
Delete anomalies: may lose data when we don’t want
5
Relational Schema Design
Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number:
what is his city ?
Recall set attributes (persons with several phones):
SSN Name, City
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
but not SSN PhoneNumber
6
Relation DecompositionBreak the relation into two:
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joe’s phone number (how ?)
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
7
Relational Schema Design
PersonbuysProduct
name
price name ssn
Conceptual Model:
Relational Model:plus FD’s
Normalization:Eliminates anomalies
8
Decompositions in General
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
9
Decomposition
• Sometimes it is correct:Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Price
Gizmo 19.99
OneClick 24.99
Gizmo 19.99
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Lossless decomposition
10
Incorrect Decomposition
• Sometimes it is not:
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Price Category
19.99 Gadget
24.99 Camera
19.99 Camera
What’sincorrect ??
Lossy decomposition
11
Decompositions in GeneralR(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
If A1, ..., An B1, ..., Bm
Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
Example: name price, hence the first decomposition is lossless
Note: don’t need necessarily A1, ..., An C1, ..., Cp
12
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Third Normal Form (3NF) = this lecture
Boyce Codd Normal Form (BCNF) = this lecture
Others...
13
Boyce-Codd Normal FormA simple condition for removing anomalies from relations:
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.
A relation R is in BCNF if:
If A1, ..., An B is a non-trivial dependency
in R , then {A1, ..., An} is a key for R
14
BCNF Decomposition Algorithm
A’s OthersB’s
R1
Is there a 2-attribute relation that isnot in BCNF ?
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
R2
15
Example
What are the dependencies?SSN Name, City
What are the keys?{SSN, PhoneNumber}
Is it in BCNF?
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Joe 987-65-4321 908-555-1234 Westfield
16
Decompose it into BCNF
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121
987-65-4321 908-555-1234
SSN Name, City
Let’s check anomalies:• Redundancy ?• Update ?• Delete ?
17
Summary of BCNF Decomposition
Find a dependency that violates the BCNF condition:
A’sOthers B’s
R1 R2
Heuristics: choose B , B , … B “as large as possible”1 2 m
Decompose:
Is there a 2-attribute relation that isnot in BCNF ?
Continue untilthere are noBCNF violationsleft.
A1, A2, …, An B1, B2, …, Bm
18
Example Decomposition Person(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Decompose in BCNF (in class):
Step 1: find all keys (How ? Compute S+, for various sets S)
Step 2: now decompose
19
Other Example
• R(A,B,C,D) A B, B C
• Key: AD• Violations of BCNF: A B, A C,
ABC• Pick A BC: split into R1(A,BC) R2(A,D)• What happens if we pick A B first ?
20
Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C)
R1(A,B) R2(A,C)
R’(A,B,C) should be the same as R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
Decompose
Recover
21
Lossless Decompositions
• Given R(A,B,C) s.t. AB, the decomposition into R1(A,B), R2(A,C) is lossless
22
3NF: A Problem with BCNFUnit Company Product
Unit Company
Unit Product
FD’s: Unit Company; Company, Product UnitSo, there is a BCNF violation, and we decompose.
Unit Company
No FDs
Notice: we loose the FD: Company, Product Unit
23
So What’s the Problem?
Unit Company Product
Unit Company Unit Product
Galaga99 UW Galaga99 databasesBingo UW Bingo databases
No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again:
Galaga99 UW databasesBingo UW databases
Violates the dependency: company, product -> unit!
24
Solution: 3rd Normal Form (3NF)
A simple condition for removing anomalies from relations:
A relation R is in 3rd normal form if :
Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } a super-key for R, or B is part of a key.
Tradeoff:BCNF = no anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies
25
Relational Algebra
• Formalism for creating new relations from existing ones
• Its place in the big picture:
Declartivequery
languageAlgebra Implementation
SQL,relational calculus
Relational algebraRelational bag algebra
26
Relational Algebra• Five operators:
– Union: – Difference: -– Selection:– Projection: – Cartesian Product:
• Derived or auxiliary operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)– Renaming:
27
1. Union and 2. Difference
• R1 R2• Example:
– ActiveEmployees RetiredEmployees
• R1 – R2• Example:
– AllEmployees -- RetiredEmployees
28
What about Intersection ?
• It is a derived operator• R1 R2 = R1 – (R1 – R2)• Also expressed as a join (will see later)• Example
– UnionizedEmployees RetiredEmployees
29
3. Selection• Returns all tuples which satisfy a condition• Notation: c(R)• Examples
– Salary > 40000 (Employee)– name = “Smithh” (Employee)
• The condition c can be =, <, , >, , <>
30
Selection Example
EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000
SSN Name DepartmentID Salary888888888 Alice 2 45,000
Find all employees with salary more than $40,000.Salary > 40000 (Employee)
31
4. Projection• Eliminates columns, then removes
duplicates• Notation: A1,…,An (R)• Example: project social-security number
and names:– SSN, Name (Employee)– Output schema: Answer(SSN, Name)
32
Projection Example
EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000
SSN Name999999999 John777777777 Tony888888888 Alice
SSN, Name (Employee)
33
5. Cartesian Product
• Each tuple in R1 with each tuple in R2• Notation: R1 R2• Example:
– Employee Dependents• Very rare in practice; mainly used to
express joins
34
Cartesian Product Example Employee Name SSN John 999999999 Tony 777777777 Dependents EmployeeSSN Dname 999999999 Emily 777777777 Joe Employee x Dependents Name SSN EmployeeSSN Dname John 999999999 999999999 Emily John 999999999 777777777 Joe Tony 777777777 999999999 Emily Tony 777777777 777777777 Joe
35
Relational Algebra• Five operators:
– Union: – Difference: -– Selection:– Projection: – Cartesian Product:
• Derived or auxiliary operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)– Renaming:
36
Renaming
• Changes the schema, not the instance• Notation: B1,…,Bn (R)• Example:
– LastName, SocSocNo (Employee)– Output schema:
Answer(LastName, SocSocNo)
37
Renaming Example
EmployeeName SSNJohn 999999999Tony 777777777
LastName SocSocNoJohn 999999999Tony 777777777
LastName, SocSocNo (Employee)
38
Natural Join• Notation: R1 R2⋈• Meaning: R1 R2 = ⋈ A(C(R1 R2))
• Where:– The selection C checks equality of all common
attributes– The projection eliminates the duplicate common
attributes
39
Natural Join ExampleEmployeeName SSNJohn 999999999Tony 777777777
DependentsSSN Dname999999999 Emily777777777 Joe
Name SSN DnameJohn 999999999 EmilyTony 777777777 Joe
Employee Dependents = Name, SSN, Dname( SSN=SSN2(Employee x SSN2, Dname(Dependents))
40
Natural Join
• R= S=
• R ⋈ S=
A B
X Y
X Z
Y Z
Z V
B C
Z U
V W
Z V
A B CX Z UX Z VY Z UY Z VZ V W
41
Natural Join
• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ?
• Given R(A, B, C), S(D, E), what is R ⋈ S ?
• Given R(A, B), S(A, B), what is R ⋈ S ?
42
Theta Join
• A join that involves a predicate• R1 ⋈ R2 = (R1 R2)• Here can be any condition
43
Eq-join
• A theta join where is an equality• R1 ⋈A=B R2 = A=B (R1 R2)• Example:
– Employee ⋈SSN=SSN Dependents
• Most useful join in practice
44
Semijoin
• R ⋉ S = A1,…,An (R ⋈ S)
• Where A1, …, An are the attributes in R• Example:
– Employee ⋉ Dependents
45
Semijoins in Distributed Databases
• Semijoins are used in distributed databases
SSN Name
. . . . . .
SSN Dname Age
. . . . . .
EmployeeDependents
network
Employee ⋈ssn=ssn (age>71 (Dependents))
T = SSN age>71 (Dependents)R = Employee T⋉
Answer = R ⋈ Dependents
46
Complex RA Expressions
Person Purchase Person Product
name=fred name=gizmo
pid ssn
seller-ssn=ssn
pid=pid
buyer-ssn=ssn
name
47
Operations on Bags
A bag = a set with repeated elementsAll operations need to be defined carefully on bags• {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}• C(R): preserve the number of occurrences
• A(R): no duplicate elimination• Cartesian product, join: no duplicate eliminationImportant ! Relational Engines work on bags, not sets !
Reading assignment: 5.3 – 5.4
48
Finally: RA has Limitations !
• Cannot compute “transitive closure”
• Find all direct and indirect relatives of Fred• Cannot express in RA !!! Need to write C program
Name1 Name2 Relationship
Fred Mary Father
Mary Joe Cousin
Mary Bill Spouse
Nancy Lou Sister