K C 37..~ 157../ K C A 47B ..~ 157B ../ K C H 47B ..~ 157B ...
4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B.
-
Upload
david-barton -
Category
Documents
-
view
225 -
download
1
Transcript of 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B.
44THTH NORMAL FORM NORMAL FORM&&
Lossless Decomposition Lossless Decomposition
By: Karen McVayBy: Karen McVay
CS 157BCS 157B
REVIEW OF NFsREVIEW OF NFs
1NF 1NF All values of the columns All values of the columns are atomic. That is, they contain are atomic. That is, they contain no repeating values. no repeating values.
2NF 2NF it is in 1NF and every non- it is in 1NF and every non-key column is fully dependent key column is fully dependent upon the primary key (avoid upon the primary key (avoid partial dependencies)partial dependencies)
REVIEW OF NF Cont…REVIEW OF NF Cont…
3NF 3NF it is in 2NF and every non-key column it is in 2NF and every non-key column is non transitively dependent upon its primary is non transitively dependent upon its primary key. In other words, all non-key attributes are key. In other words, all non-key attributes are functionally dependent only upon the primary functionally dependent only upon the primary key. key.
BCNF BCNF A relation is in BCNF if every A relation is in BCNF if every determinant is a candidate key. This is an determinant is a candidate key. This is an improved form of third normal form. improved form of third normal form.
Determinant: an attribute on which some other Determinant: an attribute on which some other attribute is fully functionally dependentattribute is fully functionally dependent
4NF and Multivalued 4NF and Multivalued DependenciesDependencies
Some relations can exist that are in BCNF but they have redundant data and have update anomalies
The next highest normal form is 4NF
4NF is based on multivalued dependencies
Multivalued Multivalued DependenciesDependencies
Consider a relation R with attributes X, Y, Z where X, Y, Z are sets of attributes
The multivalued dependency, X Y, exists if
when two tuples exist having the same X values:
T1(x, y1, z1) and T2(x, y2, z2), implies the two tuples
– T4(x, y2, z1) and T3(x, y1, z2) also exist
ExampleExample
Suppose we have two one-to-many relationships:
Each employee may have many dependants Each employee may work on many projects For any employee, the dependents are
completely independent of the projects– For a given value of ename, the values of pname
are only determined by ename and not dname– For a given value of ename, the values of dname
are only determined by ename and not pname– So, each dname is repeated for each pname, and
viceversa
Consider the relation EMP
ename pname dnameEMP
If (Smith, X, John) and (Smith, Y, Anna) exist, then
(Smith, Y, John) and (Smith, X, Anna) exist
The MVD ename pname | dname exists in EMP
Note that EMP is BCNF, and there is a lot of redundancy in EMP
We might have liked to have:
ename pname dnameEMP
Smith X, Y John, Anna
But 1NF does not permit multivalued attributes
ename pname dnameEMP
Smith X John
Smith Y Anna
So, instead of :
ename pname dnameEMP
Smith X, Y John, Anna
We have:
Smith Y John
Smith X Anna
Note that if X Y | Z exists, then R can be decomposed into (X,Y) and (R-Y)
X Y ZR
X YRa
X ZRb
And this is a lossless decomposition
Decomposing a MVD without loss of information
As ename pname | dname exists, EMP can be decomposed into
ename pname dnameEMP
ename pnameEMPa
ename dnameEMPb
This is a lossless decomposition
4th Normal Form4th Normal Form
A Boyce Codd normal form relation A Boyce Codd normal form relation is in fourth normal form if is in fourth normal form if
(a)(a) there is no multi value there is no multi value dependency in the relation or dependency in the relation or
(b)(b) there are multi value dependency there are multi value dependency but the attributes, which are but the attributes, which are multi value dependent on a multi value dependent on a specific attribute, are dependent specific attribute, are dependent between themselves. between themselves.
44thth Normal Form Cont… Normal Form Cont…
This is best discussed through mathematical This is best discussed through mathematical notation. notation.
Assume the following relationAssume the following relation
R(a:pk1, b:pk2, c:pk3)R(a:pk1, b:pk2, c:pk3)
Recall that a relation is in BCNF if all its Recall that a relation is in BCNF if all its determinant are candidate keys, in other words determinant are candidate keys, in other words each determinant can be used as a primary each determinant can be used as a primary key. key.
Because relation Because relation RR has only one determinant has only one determinant (a, (a, b, c)b, c), which is the composite primary key and , which is the composite primary key and since the primary is a candidate key therefore since the primary is a candidate key therefore R is in BCNF.R is in BCNF.
44thth Normal Form Cont… Normal Form Cont…
Now R may or may not be in fourth normal form. Now R may or may not be in fourth normal form. 1. If R contains 1. If R contains no multi value dependencyno multi value dependency then R then R
will be in Fourth normal form.will be in Fourth normal form. 2. Assume R has the following two-multi value 2. Assume R has the following two-multi value
dependencies:dependencies: a --->> b a --->> b and and a --->> c a --->> c In this case R will be in the fourth normal form if In this case R will be in the fourth normal form if bb and and
c c dependent on each otherdependent on each other.. However if b and However if b and c are independent of each other c are independent of each other
then then R is notR is not in in fourth fourth normal formnormal form and the relation and the relation has to be projected to two non-loss projections.has to be projected to two non-loss projections.
Consider a case of class enrollment. Consider a case of class enrollment. Each student can be enrolled in one Each student can be enrolled in one or more classes and each class can or more classes and each class can contain one or more students. contain one or more students.
Clearly, there is a many-to-many Clearly, there is a many-to-many relationship between classes and relationship between classes and students. This relationship can be students. This relationship can be represented by a Student/Class represented by a Student/Class cross-reference table:cross-reference table:
{StudentID, ClassID}{StudentID, ClassID}
ExampleExample
Example Cont…Example Cont…
The key for this table is the combination The key for this table is the combination of StudentID and ClassID. To avoid of StudentID and ClassID. To avoid violation of 2NF, all other information violation of 2NF, all other information about each student and each class is about each student and each class is stored in separate Student and Class stored in separate Student and Class tables, respectively.tables, respectively.
Note that each StudentID determines not Note that each StudentID determines not a unique ClassID, but a well-defined, finite a unique ClassID, but a well-defined, finite setset of values. This kind of behavior is of values. This kind of behavior is referred to as referred to as multi-valued multi-valued dependencydependency of ClassID on StudentID. of ClassID on StudentID.
Consider another example with two many-to-many relationships, Consider another example with two many-to-many relationships, between students and classes and between classes and teachers.between students and classes and between classes and teachers.
Example 2Example 2
Students Classes* *
Also, a many-to-many relationship between Also, a many-to-many relationship between students and teachers is implied. students and teachers is implied.
Classes Teachers* *
The combination of StudentID and TeacherID The combination of StudentID and TeacherID does not contain any additional information does not contain any additional information beyond the information implied by the beyond the information implied by the student/class and class/teacher relationships. student/class and class/teacher relationships.
Consequentially, the student/class and Consequentially, the student/class and class/teacher relationships are independent class/teacher relationships are independent of each other—these relationships have no of each other—these relationships have no additional constraints. The following table is, additional constraints. The following table is, then, in violation of 4NF:then, in violation of 4NF:
{StudentID, ClassID, TeacherID}{StudentID, ClassID, TeacherID}
Example 2 Cont…Example 2 Cont…
As an example of the anomalies As an example of the anomalies that can occur, realize that it is not that can occur, realize that it is not possible to add a new class taught possible to add a new class taught by some teacher without adding at by some teacher without adding at least one student who is enrolled in least one student who is enrolled in this class.this class.
44thth NF and Anomalies NF and Anomalies
44thth Normal Form and Normal Form and anomalies Cont…anomalies Cont…
Case 1:Case 1:
Assume the following relation:Assume the following relation:Employee (Eid:pk1, Language:pk2, Employee (Eid:pk1, Language:pk2,
Skill:pk3) Skill:pk3)
No multi value dependency, No multi value dependency, therefore R is in fourth therefore R is in fourth normal form.normal form.
case 2: Assume the following relation with multi-value dependency: Employee (Eid:pk1, Languages:pk2, Skills:pk3) Eid --->> Languages Eid --->> Skills
Languages and Skills are dependent.This says an employee speak several languages and has several skills. However for each skill a specific language is used when that skill is practiced.
4th Normal Form and 4th Normal Form and anomalies Cont…anomalies Cont…
Thus employee 100 when he/she teaches speaks English but when he cooks speaks French. This relation is in fourth normal form and does not suffer from any anomalies.
EidEid LanguageLanguage SkillSkill
100100 English English TeachingTeaching
100100 KurdishKurdish PoliticPolitic
100100 FrenchFrench CookingCooking
200200 EnglishEnglish CookingCooking
200200 ArabicArabic SingingSinging
case 3: case 3: Assume the following relation with Assume the following relation with multi-value dependency:multi-value dependency: Employee (Eid:pk1, Languages:pk2, Skills:pk3)Employee (Eid:pk1, Languages:pk2, Skills:pk3)
Eid --->> LanguagesEid --->> Languages Eid --->> Eid --->> SkillsSkills
Languages and Skills are Languages and Skills are independentindependent..
4th Normal Form and 4th Normal Form and anomalies Cont…anomalies Cont…
EidEid LanguageLanguage SkillSkill
100100 EnglishEnglish TeachingTeaching
100100 KurdishKurdish PoliticPolitic
100100 EnglishEnglish PoliticPolitic
100100 KurdishKurdish TeachingTeaching
200200 ArabicArabic SingingSinging
4th Normal Form and 4th Normal Form and anomalies Cont…anomalies Cont…
This relation is This relation is notnot in fourth normal form and in fourth normal form and suffers from all three types of anomalies.suffers from all three types of anomalies.
Insertion anomaly:Insertion anomaly: To insert row (200 English Cooking) To insert row (200 English Cooking) we have to insert two extra rows (200 Arabic cooking), we have to insert two extra rows (200 Arabic cooking), and (200 English Singing) otherwise the database will and (200 English Singing) otherwise the database will be inconsistent. Note the table will be as follow:be inconsistent. Note the table will be as follow:
EidEid LanguagLanguagee
SkillSkill
100100 EnglishEnglish TeachingTeaching
100100 KurdishKurdish PoliticsPolitics
100100 EnglishEnglish PoliticsPolitics
100100 KurdishKurdish TeachingTeaching
200200 ArabicArabic SingingSinging
200200 EnglishEnglish CookingCooking
200200 ArabicArabic CookingCooking
200200 EnglishEnglish SingingSinging
Deletion anomalyDeletion anomaly: If employee 100 discontinue : If employee 100 discontinue politic skill we have to delete two rows: politic skill we have to delete two rows:
(100 Kurdish Politic), and (100 English Politic) (100 Kurdish Politic), and (100 English Politic) otherwise the database will be inconsistent.otherwise the database will be inconsistent.
EidEid LanguageLanguage SkillSkill
100100 EnglishEnglish TeachingTeaching
100100 KurdishKurdish PoliticsPolitics
100100 EnglishEnglish PoliticsPolitics
100100 KurdishKurdish TeachingTeaching
200200 ArabicArabic SingingSinging
200200 EnglishEnglish CookingCooking
200200 ArabicArabic CookingCooking
200200 EnglishEnglish SingingSinging
More anomaliesMore anomalies
Update anomaly:Update anomaly: If employee If employee 200 changes his skill from 200 changes his skill from singing to dancing we have to singing to dancing we have to make changes in more than make changes in more than one place.one place.
The relation is projected to the following two The relation is projected to the following two non-loss projections which are in forth normal non-loss projections which are in forth normal formform
Emplyee_Language(Eid:pk1, Languages:pk2)Emplyee_Language(Eid:pk1, Languages:pk2)
EidEid LanguageLanguage
100100 EnglishEnglish
100100 KurdishKurdish
200200 ArabicArabic
Emplyee_skill(Eid:pk1, Skills:pk2)Emplyee_skill(Eid:pk1, Skills:pk2)
EidEid SkillSkill
100100 TeachingTeaching
100100 PoliticPolitic
200200 SingingSinging
Cont…Cont…
ReferencesReferences
Functional Dependency Functional Dependency (Normalization)(Normalization) http://www.emunix.emich.edu/~khttp://www.emunix.emich.edu/~khailany/files/Normalization.htmhailany/files/Normalization.htm
Multivalued Dependencies (Ozmar Zaine):Multivalued Dependencies (Ozmar Zaine):http://www.cs.sfu.ca/CC/354/zaianhttp://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node13.e/material/notes/Chapter7/node13.htmlhtml