Decompositions - sztaki.hufodroczi/dbs/normalization3.pdfAdatbázis-kezelés. (4 előadás: Relácó...

36
Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 1 © Bércesné Novák Ágnes Módosítva: 2003-03-17 Decompositions The ideal properties of a database are standardized in a so called normal forms. Even geting the relational schema from an E/R diagram, these properties are not necessarily fullfilled. So the model needs further refinements. The process, through which the database is rewritten into this standardized from is called normalization. The aim of the normalization process is to reduce redundancy in order to minimize anomalies, the resource of the pitfalls we already dicussed. Method: Decomposition

Transcript of Decompositions - sztaki.hufodroczi/dbs/normalization3.pdfAdatbázis-kezelés. (4 előadás: Relácó...

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 1

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Decompositions

The ideal properties of a database are standardized in a so called normal forms. Even geting the relational schema from an E/R diagram, these properties are not necessarily fullfilled. So the model needs further refinements. The process, through which the database is rewritten into this standardized from is called normalization. The aim of the normalization process is to reduce redundancy in order to minimize anomalies, the resource of the pitfalls we already dicussed. Method: Decomposition

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 2

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Anomalies Modification anomalies: - update - insertion - deletion

Example What anomalies can you discover int he following schema?

Teachers(SSN, Name, Address, Phone, CourseName, No-of-lectures, Description) Redundancy: How many times are data stored about teachers? About courses? Uncontrolled redundancy leads to anomalies. Update: What if the address of a teacher is changed? Delete: What information is lost if a teacher is fired? Insertion: A new curricula is under introducing, but no teachers are assigned yet for the courses. What would be resulted to store the informatin about the courses?

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 3

© Bércesné Novák Ágnes Módosítva: 2003-03-17

The figure below shows functional dependencies hold ont he relation Teachers.

From the figure we conclude, that decomposing the original relation into the following schemas the problem can be eliminated(please check!): Teachers (SSN, Name, Address, Phone)

Courses (Course-Name, No-of-lectures, Description)

SSN

C-name

Name

Address

Phone

No-of lectures

Descriptions

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 4

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Decompositions

Def.: The set of relations R1(A11, A12, ...A1i), R2(A21, A22, ...A2j), : Rk (Ak1, Ak2, ...Akn)

give a decomposition of R(A1, A2, ...An) , if U A nm = { A1, A2, ...An } and the instances of Rk egy előfordulása:= ∏ Ak1, Ak2, ...Akn(r) where r is the giveninstance of R.

The decomposition is needed to avoid anomalies.

↑ ↓

Normalization

However there is a redundancy also after normalization, but it is in a minimized and controlled form.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 5

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Getting back original relation Outline: R(A, B, C) is given a decomposition of R: R1(A, B), R2(B, C)

Restoring R : R1 >< R2 (>< natural join)

Examples: 1. (a, b, c) decomposed into: (a,b) (b,c) R1>< R2 : (a, b, c) , OK. 2. tuple t (a, b, c) decomposed into: t: (a,b) and (b,c)

tuple v (d, b, e) v: (d,b) and (b, e) Taking natural join: (a,b,c),

(a,b,e), (d,b,e), (d,b,c)

We get 4 rows instead of the original 2 tuples. So, this decomposition is LOSSY, information is last, through getting new rows.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 6

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Getting back original relation

3. R(A, B, C, D, E) is decomposed into : R1(A, B, C) and R2(C, D, E) This decomposition is also lossy, because: An instance of R: ΠABC( r ): ΠCDE( r ):

ΠABC( r ) >< ΠCDE( r ) (>< stands for natural join): A B C D E a1 b1 c1 d1 e1 a1 b1 c1 d2 e2 a2 b2 c1 d1 e1 a2 b2 c1 d2 e2 We are not able to decide, which were the original rows, information is lost.

A B C D E 1 b1 C1 d1 e1 a2 b2 C1 d2 e2

A B C C D E a1 b1 c1 c1 d1 e1 a2 b2 c1 c1 d2 e2

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 7

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Lossless decompositions Example: 3. R(A, B, C, D, E) is decomposed into : R1(A, B, C) and R2(C, D, E), and C→DE r: ΠABC( r ): ΠCDE( r ): A B C D E a1 b1 c1 d1 E1 a2 b2 c1 ? ?? ΠABC( r ) >< ΠCDE( r ): A B C D E a1 b1 c1 d1 E1 a1 b1 c1 ? ?? a2 b2 c1 d1 E1 a2 b2 c1 ? ?? r is legal under F, if at the position of ? there is d1, and at the position of ?? there is e1. This reults then int he only raw in ΠCDE(r): (c1, d1, e1). So there is no way to generate new tuples in this situation, droven by C→DE. So, this decomposition is lossless. Conclusion: The decomposition must be worked out with respect to the given dependency set.

A B C C D E a1 b1 c1 c1 d1 e1 a2 b2 c1 c1 ? ??

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 8

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Lemma: r ⊆ ΠR1( r ) >< ΠR2( r ) >< …. >< ΠRk( r ) Def.: lossless decomposition: r = ΠR1( r ) >< ΠR2( r ) >< …. >< ΠRk( r )

Lossless decomposition Theorem: If R is decomposed into relatins R1, R2 , then this decomposition is lossless if and only if et least one of the following functional dependencies holds: R1 ∩ R2 → R1 or R1 ∩ R2 → R2 (schemes!)

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 9

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Normalization Normal forms:

1-3NF, BCNF: based on functional dependencies 4NF: based on functional and multivalued dependencies 5NF: join dependencies

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 10

© Bércesné Novák Ágnes Módosítva: 2003-03-17

1st normal form

Def.: 1NF: each component of a tuple in the relation has an atomic value. SO, each row for each attribute has only one, atomic value. If we would have nonatomic values, it can be eliminated by introducing new relation(s)-for set values, or new attributes for composed attributes. In the mathematical definition of the relation we understood that it is a set of tuples, so, there must not be identical rows. Whish also means, that a relation in 1NF has a candidate key.

Example: Student(SSN, Name, Address, Course-name, Mark)-NOT in 1NF Address(Street, Number, City) is a composed attribute Course probably has more then one value, it is a set. Decomposition: Student (SSN, City, Street, Number)

Studies(SSN, Course-name, Mark)

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 11

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Remark: most of the vendors (Oracle, Informix) offers the so called object – relational extensions through which we can allow sets, even multisets and embedded relations.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 12

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Normalization Method: Lossless decompositions can be get if we chose one (functional) dependency, and decompose R with respect to this α→β into R1 and R2 as follows: 1. R1= αβ 2. R2=R-β Remarks: 1. Since α→β holds in R1, this means that R1∩R2→R1 is satisfied by the decomposition, so it is lossless. 2. We used the term (functional) dependency, because other type of dependencies can occur in this widely used simple decomposition method. For 1NF, 2NF , 3NF and BCNF we will use only functional dependencies, but eg. For 4NF multivalued dependencies will be used.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 13

© Bércesné Novák Ágnes Módosítva: 2003-03-17

2nd Normal Form

In case of the relation Teachers(SSN, Name, Address, Phone, CourseName, No-of-lectures, Description), As we saw already, all the 3 type of anomalies can be discovered.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 14

© Bércesné Novák Ágnes Módosítva: 2003-03-17

What is the cause of these anomalies? - Two different entitiy sets are in one relation, so the corresponding keys determine the other

atributes - In other words: some attributes are determined functionally already by the part of the

candidate key

SSN

C-name

Name

Address

Phone

No-of lectures

Descriptions

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 15

© Bércesné Novák Ágnes Módosítva: 2003-03-17

2NF: Def.: In the functional dependency α→β β is totally functionally determined by α if there is no δ ⊂ α, such that δ→β. Def.: A relation is in 2nd Normal Form, if all non-prome attribute is functionally totally depends on any candidate key. α→β violates the given normal form, if it does not satisfy the conditions prewritten in the definitionof the normal form.

In case of 2NF δ→β violates 2NF if δ is a subset of a candidate key, and β is not prime(not involved in any candidate key)

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 16

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Second Normal Form Teachers(SSN, Name, Address, Phone, CourseName, No-of-lectures, Description) The relation is not in 2NF, since No-of-lectures, Description depend on the part of the candidate key CourseName.

Similarly, Name, Address, Phone depend on SSN. Teachers(SSN, Name, Address, Phone, CourseName, No-of-lectures, Description) Decomposition: α→β violates 2NF SSN→ (Name, Address, Phone) R1 : αβ Teachers (SSN, Name, Address, Phone) R2 : R-β College(SSN, CourseName, No-of-lectures, Description) R2 is not in 2NF since CourseName → No-of-lectures, Description Decompose into R3: αβ Courses (CourseName, No-of-lectures, Description)

R4: R2-β Teaches (SSN, CourseName) The resultant decomposition: R1, R3, R4. Each of these relations is in 2NF.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 17

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Third Normal Form Def.: The attribute set γ depends transitively on a set of attribute α if there is a set of attributes β such that β is functionally depends on α , and β functionally determines γ: α→β and β→γ. Example: Consider the relation schema below (suppose that one employee works for exactly one department) Employees (SSN, E-name, E-address, Dept-name, Salary, Dept-Address, Dept-Phone,). Is this relation in 1st or 2nd normal form? What are the functional dependencies in this shcema? What anomalies can you find? Is there a transitive dependency? What do you think is the source of anomalies?

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 18

© Bércesné Novák Ágnes Módosítva: 2003-03-17

3NF Def1: A relation is in 3NF , if it is in 2NF and a non-prime attribute does not depend transitively on any candidate key. Def2: A relation is in 3NF, if for each nontrivial functional dependemcy α→β in the closure of the given dependency set either α is a superkey, or β is a prime attribute. Theoreml: Def1 is eqvivalent to Def2 Proof: I.Def2 implies Def1: If α→β violates 3NF by Def2, it can happen in the folloing way: α is not a superkey AND β non-prime If α is not a superkey, then either α ⊂ Key, then α→β means a partial dependency on that key, so R is not in 2NF OR α ⊄ Key, then Key→α must hold, so because of α→β Key determines β transitively, so the relation is not in 3NF. II. Def1 implies Def2: proof is omitted

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 19

© Bércesné Novák Ágnes Módosítva: 2003-03-17

3NF

The relation Employees is in 2NF, since the candidate key, SSN is a singleton. As we already discussed, it is not in 3NF, since SSN → Dept-name→Salary, Dept-Address, Dept-Phone, so the latter attributes are transitively dependent on the candidate key. Employees (SSN, E-name, E-address, Dept-name, Salary, Dept-Address, Dept-Phone). 3NF: Dept-name→Salary, Dept-Address, Dept-Phone violates to be in 3NF, so R1: Departments (Dept-name, Salary, Dept-Address, Dept-Phone) R2:= Employees(SSN, E-name, Dept-name, Salary)

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 20

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Boyce-Codd Normal Form (BCNF) Def.: relation R is in BCNF if for each nontrivial functional dependency α→β, α is a superkey. Theorem: Every two attributes relation is in BCNF (Homework)

Conclusion:

1NF

2NF

3NF

BCNF

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 21

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Lossless decomposition into BCNF Theorem: If R is decomposed into relatins R1, R2 , then this decomposition is lossless if and only if et least one of the following functional dependencies holds: R1 ∩ R2 → R1 or R1 ∩ R2 → R2 (schemes!)

REMARK

Since BCNF is one of the most important normal form, it is good to know that

there is ALWAYS a LOSSLESS decomposition into BCNF

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 22

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Boyce-Codd Normal Form (BCNF) Example: R( A, B, C, D, E), F={A→D, B→E, DE→C}. What is the best normal form for R? If it is not in BCNF, then decompose it. Solution: 1. Find the candidate keys for R. 2. Check whether each fd in F+ corresponds to BCNF 3. Decompose the relation.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 23

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Example: R( A, B, C, D, E), F={A→D, B→E, DE→C}. Candidate key is AB, so R is not in 2NF, since both A→D, B→E violates 2NF definition. Decomposition: R1:= AD R2:= ABCE R1 i sin BCNF, but R2 is not, because F2:={ B→E , trivial ones } , so the candidate key for R2 is ABC, so B→E violates 2NF. R2 is decomposed into: R21:=BE R22:=ABC Decomposition: R1, R21, R22 all 3 relations are in BCNF.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 24

© Bércesné Novák Ágnes Módosítva: 2003-03-17

BCNF Example: R( A, B, C, D, E) is decomposed into S(A, B, C) under the given functional dependency set: F={A→D, B→E, DE→C}. Is S in BCNF?

Solution: For that we need to know what dependencies hold on S:

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 25

© Bércesné Novák Ágnes Módosítva: 2003-03-17

BCNF Example: R( A, B, C, D, E) is decomposed into S(A, B, C) under the given functional dependency set: F={A→D, B→E, DE→C}. Is S in BCNF? Solution: For that we need to know what dependPtiencies hold on S: {A}+= {A, D}, that means that A→A, A→D, follows, but only A→A is in S. It does not count, because it is trivial. {B}+= {B, E} B→B, B→E , same as above {C}+= {C} C→C, same… {AB}+ = {A, B, C, D, E}, trivials (involving AB→A, AB→B) and AB→E, AB→D, AB→C {AC}+ = {A, C, D}, trivials, and AC→D not in S {BC}+ = { B, C, E}, trivials, and BC→C, BC→A, BC→E not in S The only nontrivial dependency in S is: AB→C. But then AB is the (only ) candidate key in S. So, AB→C does NOT violate BCNF, being AB a CK.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 26

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Summary for decomposing into 3NF (HW later on after BCNF: what is the difference if we want the relation to be in BCNF?): Orders(ord-num, date, customer-name, customer-code, customer-address, bill-no, product(product-code, product-name, price/unit, measurement, number-of-product), deadline). What is the „best” normal form for this relation? If it is not in 3NF, then decompose it into 3NF. Check, whether the relations in the decomposition are also in BCNF?

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 27

© Bércesné Novák Ágnes Módosítva: 2003-03-17

BCNF

Example: R(CTHRSG), C=course, T= teacher, H=hour,R=room, S=student, G=grade. F:= {C→T, HR→C, HT→R, CS→G, HS→R } Is the relation BCNF? If not, then find a decomposition having relations only in BCNF.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 28

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Lossless decomposition into BCNF Theorem: If R is decomposed into relatins R1, R2 , then this decomposition is lossless if and only if et least one of the following functional dependencies holds: R1 ∩ R2 → R1 or R1 ∩ R2 → R2 (schemes!)

REMARK

Since BCNF is one of the most important normal form, it is good to know that

there is ALWAYS a LOSSLESS decomposition into BCNF

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 29

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Method for deciding losslessness for a GIVEN decomposition Table: Number of rows = number of relations in the decomposition. For each relation one row is constructed. Number of columns=number of columns in the original relation (= the number of attributes) Element in row i. column k. = ak, iff attribute k. is in the schema of the corresponding relation in the decomposition. Element in row i. column k. = bik, iff attribute k. is NOT in the schema of the corresponding relation in the decomposition. Method: „Apply” dependencies in F+: If two rows agree on a column(s) (having this attribute in common), then find a functional dependency in F+ having this (these) attributes on its left side. Equate the values in this rows in the columns corresponding to the right hand side of the FD. Put “ai”, if at least one of those is an “ai” , otherwise put “bik” if all of those values are some “bjl”. Decision: The decomposition is lossless, if in this way we can achieve a row full of „a”-s. Otherwise the decomposition is lossy. (if no more dependecy can apply)

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 30

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Deciding losslessness

Examples: Suppliers ( NAME, ADDRESS, PRODUCT,PRICE) Szállítók(N, A, PRO, PRI) Functional dependencies: N→A, {N, PRO}→PRI Decide wheteher the following decomposition is lossy. Decomposition: R1(N, A) and R2(N, PRO, PRI) (From the theorem it immediately comes) Initial table:

N A PRO PRI R1 a1 a2 b13 b14 R2 a1 b22 a3 a4

N→A can be applied: N A PRO PRI

R1 a1 a2 b13 b14 R2 a1 a2 a3 a4

The second row is full of „a”-s, so this decomposition is lossless.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 31

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Deciding losslessness Example: R(A, B, C, D, E) and F={A →BC, CD→E, B→D, E→A} is given. A decomposition of R is as follows: R1(A, B, C) and R2(C, D, E). Is that a lossless decomposition? Initial table: A B C D E R1 a1 a2 a3 b14 E15 R2 b21 b21 a3 a4 A5 Since the rows have identical values only on C, the dependency which can be applies should have C on the left hand side. But there is no such a dependency, so the decomposition is lossy.

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 32

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Deciding losslessness Example: R(A, B, C, D, E) and F={A→C, B → C, C→D, DE→C, CE→A} is given. A decomposition of R is as follows: R1(A, D), R2(A, B), R3(B, E), R4(C, D, E), R5(A, E). Is that a lossless decomposition? Solution: Construct initial table:

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 33

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Initial table: A B C D E

R1 a1 b12 b13 a4 b15 R2 a1 a2 b23 b24 b25

R3 b31 a2 b33 b34 a5 R4 b41 b42 a3 a4 a5 R5 a1 b52 b53 b54 a5

Apply: A→C : b13, b23, b53 become:

B→C : b13, b33 become: C→D : a4, b24, b34, b54 become:

Write here modified table:

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 34

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Initial table: A B C D E R1 a1 b12 b13 a4 b15 R2 a1 a2 b23 b24 b25

R3 b31 a2 b33 b34 a5 R4 b41 b42 a3 a4 a5 R5 a1 b52 b53 b54 a5

A→C : b13, b23, b53 should be equate, say b13 (can be choosen from b13, b23, b53 ) B→C : b13, b33 should be equate to b13 (no choice here) C→D : a4, b24, b34, b54 become a4 (no choice here):

A B C D E

R1 a1 b12 b13 a4 b15 R2 a1 a2 b13 a4 b25

R3 b31 a2 b13 a4 a5 R4 b41 b42 a3 a4 a5 R5 a1 b52 b13 a4 a5

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 35

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Cont..

A B C D E

R1 a1 b12 b13 a4 b15 R2 a1 a2 b13 a4 b25

R3 b31 a2 b13 a4 a5 R4 b41 b42 a3 a4 a5 R5 a1 b52 b13 a4 a5

Apply DE→C : b13, a3 become: CE→A: b31, b41 és a1 become: Write here modified table:

Adatbázis-kezelés. (4 előadás: Relácó felbontásai (dekomponálás)) 36

© Bércesné Novák Ágnes Módosítva: 2003-03-17

Cont..

A B C D E R1 a1 b12 b13 a4 b15 R2 a1 a2 b13 a4 b25

R3 b31 a2 b13 a4 a5 R4 b41 b42 a3 a4 a5 R5 a1 b52 b13 a4 a5

Apply DE→C : b13, a3 become: a3 CE→A: b31, b41 és a1 become a1:

A B C D E R1 a1 b12 a3 a4 b15 R2 a1 a2 a3 a4 b25

R3 a1 a2 a3 a4 a5 R4 a1 b42 a3 a4 a5 R5 a1 b52 a3 a4 a5

See 3rd row! Is it lossy or lossless?