Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design...

100
Chapter 6 - Relational Database Design • First Normal Form • Pitfalls in relational database design • Functional Dependencies • Armstrong Axioms •2 nd , 3 rd , BCNF, and 4 th normal form • Decomposition, Desirable properties of decomposition • Overall database design process

Transcript of Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design...

Page 1: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Chapter 6 - Relational Database Design

• First Normal Form• Pitfalls in relational database design• Functional Dependencies• Armstrong Axioms• 2nd, 3rd, BCNF, and 4th normal form• Decomposition, Desirable properties

of decomposition• Overall database design process

Page 2: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

RecapRecap

• ER modeling is a diagrammatic representation of the conceptual design of a database

• ER diagrams consist of entity types, relationship types and attributes

• RDBMS handles data in the form of relations, tuples and attributes

• Keys identify tuples uniquely

Page 3: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Pitfalls in relational Pitfalls in relational database designdatabase design

• Redundancy of data• Unable to express certain

information• Wastage of storage

Page 4: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Pitfalls in relational Pitfalls in relational database designdatabase design

• Let us consider Company schema • Emp_dept (ename, eno, dob,

address, dnumber, dname, dmgreno)• Emp_proj(eno, pnumber, hours,

ename, pname, plocation)

Page 5: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Anomalies• Redundant information• Insert anomalies• Delete anomalies• Update anomalies• Null values in tuple• Generation of Spurious tuples

Page 6: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Redundant Information• In Emp_dept, the attribute values

pertaining to a particular department are repeated for every employee who works for that department

• Similar comment applies to Emp_proj

Page 7: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Insertion AnomaliesInsertion Anomalies• Insert anomalies can be differentiated into

two types on Emp_dept relation– To insert a new employee into Emp_dept , we

must include either the attribute values for department that the employee works for or null

– It is difficult to insert a new department that has no employees as yet. The only way is to place null values for employee attributes. This causes a problem because eno is primary key

Page 8: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Deletion AnomaliesDeletion Anomalies• If we delete an employee tuple that

happens to represent the last employee working for a particular department, the information regarding that department is lost from the database

Page 9: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Modification AnomaliesModification Anomalies• If we change the value of one of the

attributes of a particular department- say manager of department 5-we must update the tuples of all employees who work in that department otherwise the database will become inconsistent. If we fail to update some tuples the same department will show two different values for manager in different tuples

Page 10: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Null Values in Tuples Null Values in Tuples • If many of the attributes do not apply to all

tuples in the relation, we end up with many nulls in those tuples. This can waste storage space and may also lead to problems with understanding the meaning of the attributes and with specifying JOIN. Another problem is how to account them when aggregate functions are applied. Nulls can have multiple interpretations– Attribute does not apply to this tuple– Attribute value for this tuple is unknown– Attribute value known but not yet recorded

Page 11: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Generation of Spurious Tuples

• Let us consider two relation schema Emp_Locs(ename, plocation) Emp_proj1(eno, pnumber, hours, pname, plocation)

If we attempt a natural join operation on above relation schema, the result produces many more tuples than the original set of tuples. Additional tuples that were not there in Emp_proj are called spurious tuples because they represent wrong information which is not valid

Page 12: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Guidelines For Relation Schema

• Guideline 1 – Design a relation schema so that it is easier to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation

• Guideline 2 – Design the base schema so that no insertion, deletion, or modification anomalies are present in the relations

• Guideline 3 – As far as possible, avoid placing attributes in a base relation whose values may frequently be null. If nulls are unavoidable make sure that they apply in exceptional cases only and do not apply to majority of the tuples in the relation

• Guideline 4 – Design relation schemas so that they can be joined with equality conditions on attributes that are either primary keys or foreign key that guarantees that no spurious tuples are generated

Page 13: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Functional Dependencies• It is a constraint between two sets of attributes

from the database. Let us consider that our relation schema has n attributes A1, A2, . . ., An. The whole database is described by a single universal relation schema R = {A1, A2, . . ., An}

A functional dependency denoted by X→Y between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is for any two tuples t1 and t2 in r that have t1 [X] = t2[X] must also have t1 [Y] = t2[Y]

Page 14: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

• Values of Y component of a tuple in r depend on, or are determined by, the values of the X component OR The values of X component of a tuple uniquely or functionally determine the values of Y component• There is a functional dependency from X to Y or Y is functionally dependent on X • The abbreviation of functional dependency is FD or f.d.

Functional Dependencies

Page 15: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Functional Dependencies• If the values of an attribute “Marks”

is known then the values of an attribute “Grade” are determined since Marks→ Grade

• X is called the left-hand side of FD, and Y is called the right-hand side

Page 16: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Inference Rules for Functional Dependencies

• Let F is the set of functional dependencies that are specified on relation schema R. However, numerous other dependencies can be deduced or inferred from the FDs in R. In real life it is impossible to specify all them for given situation

e. g. if each department has one manager, so dept_no uniquely determines manager_no, and a manager has a unique phone number then manager_no uniquely determines mgr_phone, these two dependencies together imply that dept_no determines mgr_phone. This is an inferred FD and need not be explicitly stated.

Page 17: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Closure of Functional Dependencies

• The set of all dependencies that include F as well as all dependencies that can be inferred from F is called closure of F. It is denoted by F+

e. g.

F = {eno {ename, dob, address, dnumber}, dnumber {dname, dmgrno}

Inferred functional dependencies areeno {dname, dmgrno}eno enodnumber dname

Page 18: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Inference Rules for Functional Dependencies

• A set of rules that can be used to infer new dependencies from a given set of dependencies

• We use the notation F = X Y to denote that the functional dependency X Y is inferred from the set of functional dependencies F

Page 19: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Inference Rules for Functional Dependencies

• IR1 (Reflexive rule) If X Y then X Y• IR2 (Augmentation rule) {X Y} =XZ YZ• IR3 (Transitive rule) {X Y, Y Z} = X Z• IR4 (Decomposition or Projective rule)

{X YZ} = X Y• IR5 (Union or Additive rule)

{X Y, X Z} = {X YZ}• IR6 (Pseudo transitive rule)

{X Y, WY Z} = {WX Z}

Page 20: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Types of Functional Dependencies

• Full Functional dependency• Partial Functional dependency• Transitive Functional dependency

Page 21: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Definitions of Keys and Attributes Participating in Keys• Super Key – A super key of a relation

schema R = {A1, A2,… , An} is a set of attributes S R with property that has no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

• Key – A key K is a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more

Page 22: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Definitions of Keys and Definitions of Keys and Attributes Participating in KeysAttributes Participating in Keys• The difference between Key and Super key

is that a key has to be minimal.e. g. If we have a key K = {A1, A2,… , Ak} of R, then

K – {Ai} is not a key of R for any Ai, 1≤i ≤k

• candidate key - If a relation schema has more than one key, each is called a candidate key

• Primary And Secondary Key – One of the candidate key is designated to be a primary key and others are called secondary keys

Page 23: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Definitions of Keys and Attributes Participating in Keys• Prime Attribute – If an attribute of a

relation R is member of some candidate key of R

• Nonprime - If an attribute of a relation R is not member of any candidate key of R

Page 24: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Normalization• The normalization process was first proposed by

Codd in 1972• It takes relation schema through a series of tests

to certify whether it satisfies a certain normal form

• The process proceeds in a top-down fashion by evaluating each relation against the criteria for normal forms and decomposing relations as necessary

• Codd proposed three normal forms which he called First, Second, Third normal form

• A strong definition of 3NF is called Boyce-Codd normal (BCNF) form was proposed by Boyce and Codd

Page 25: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Normalization• All these normal forms are based on the

functional dependencies among the attributes of the relation

• Later, the 4NF and 5NF were proposed, based on the concept of multi-valued and join-dependencies, respectively

• Normalization is a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties– Minimizing redundancy– Minimizing insertion, deletion, modification anomalies

• A relation that do not meet normal form tests are decomposed into smaller relation schemas that meet the tests and posses the desirable properties

Page 26: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

First Normal Form• It was designed to disallow multi-valued

attributes, composite attributes and their combinations

• It states that the domain of an attribute must include only atomic i.e. simple or indivisible values and that value of any attribute in a tuple must be a single value from the domain of that attribute

• Relations within relations or relations as attribute values within tuples

Page 27: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

First Normal Form (1NF)• Consider Department relation schema whose

primary key is dnumber. If extended by adding dlocations attribute. Each department can have a number of locations

Departmentdno dname dmgrno dlocations

• The Department schema is not in 1NF– The domain of dlocations contains atomic values,

but some tuples can have set of these values. Dlocations is not functionally depend on the primary key

Page 28: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

First Normal Form (1NF)• There are 3 techniques to achieve 1NF for such a

relation– Remove the attribute that violates 1NF and place it

in a separate relation along with the primary key of original relation. The primary key of new relation is the combination.

– Expand the key so that there will be a separate tuple in the original relation for each location of a department. The primary key becomes the combination. Disadvantage of introducing redundancy

– If maximum number of values is known for the attribute. Replace that attribute by those many atomic attributes . Disadvantages of introducing null

Page 29: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Second Normal Form (2NF)• Second Normal form is based on the concept of

full functional dependencies• A functional dependency X Y is a full functional

dependency if removal of any attribute A from X means that the dependency does not hold for any more{eno, pnumber} hoursis a full functional dependencyNeither eno hours nor

pnumber hours

Page 30: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Second Normal Form (2NF)• A relation schema R is in 2NF if it is in 1NF and

every non prime attribute A in R is fully functionally dependent on the primary key of R

• If a relation schema is not in 2NF it can be 2NF normalized into a number of 2NF relations in which non prime attributes are associated only with the part of the primary key on which they are fully functionally dependent

Page 31: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Third Normal Form (3NF)• Third Normal form is based on the concept of

transitive dependency• A functional dependency X Y in a relation

schema is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R and both X Z and Z Y holdseno dnumber

dnumber dmgrno

eno dmgrno is a transitive

Page 32: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Third Normal Form (3NF)• A relation schema R is in 3NF if it satisfies 2NF

and no nonprime attribute of R is transitively dependent on the primary key

• If a relation schema is not in 3NF it can be 3NF normalized by decomposing and setting up a relation that includes the non-key attributes that functionally determines other non-key attributes

Page 33: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Report(S#, C#, StudentName, DateOfBirth, CourseName, PreRequisite, DurationInDays, DateOfExam, Marks, Grade)

Normalize relation schema

Page 34: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 35: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Second Normal Form• STUDENT# is key attribute for Student,• COURSE# is key attribute for Course• STUDENT# COURSE# together form the composite key attributes for Results relationship.• Other attributes like StudentName (Student Name), DateofBirth, CourseName, PreRequisite, DurationInDays, DateofExam, Marks and Grade are non-key attributes.

To make this table 2NF compliant, we have to remove all the partial dependencies.

Student #, Course# -> Marks, GradeStudent# -> StudentName, DOB,Course# -> CourseName, Prerequisite, DurationInDaysCourse# -> Date of Exam

Page 36: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 37: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 38: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 39: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 40: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Third normal form (3NF)

A relation R is said to be in the Third Normal Form (3NF) if and only if− It is in 2NF and− No transitive dependency exists between non-key attributes and key attributes.

• STUDENT# and COURSE# are the key attributes.

• All other attributes, except grade are non-partially,non-transitively dependent on key attributes.• Student#, Course# - > Marks• Marks -> Grade

Page 41: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Note : - All transitive dependencies are eliminated

Page 42: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 43: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Boyce-Codd Normal form (BCNF)

A relation is said to be in Boyce Codd Normal Form (BCNF)

- if and only if all the determinants are candidate keys.BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

Page 44: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 45: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 46: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 47: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 48: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Now both the tables are not only in 3NF, but also in BCNF because all the determinants are Candidate keys. In the first table, STUDENT# decides EMailID and EMailID decides STUDENT# and both are candidate keys. In second table, STUDENT# COURSE# decides all other non-key attributesand they are composite candidate key as well as determinants

Note: If the table has a single attribute as candidate key or no overlapping candidate keys and if it is in 3NF, then definitely the table will also be in BCNF

Basically BCNF takes away the redundancy, anomalies which exist among the key attributes

Page 49: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Multi-valued Dependency And Fourth Normal Form (4NF)

• Multi-valued dependencies are a consequence of First Normal Form (1NF), which disallows an attribute in a tuple to have a set of values

• If we have two or more multi-valued independent attributes in the same relation schema, we get into a problem of having to repeat every value of one of the attributes with every value of the other attribute to keep the relation state consistent and to maintain the dependency among the attributes involved. This constraint is specified by a multi-valued dependency

Page 50: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Multi-valued Dependency And Fourth Normal Form (4NF)

• Consider the relation EMP (ename, pname, dname)• A tuple in EMP represents

– An Employee whose name is ename works on project whose name is pname and has a dependent whose name is dname

• An employee may work on several projects and may have several dependents and the employee’s projects and dependents are independent of one another

• To keep the relation consistent, we must have a separate tuple to represent every combination of an employee’s dependent and an employee’s project

• This constraint is called as MVD on EMP• Whenever two independent 1:N relationships A:B and

A:C are mixed in the same relation, a MVD may arise

Page 51: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Multi-valued Dependency (MVD) And Fourth Normal Form (4NF)

• A multi-valued dependency X Y (X multidetermines Y) specified on relation schema R where X and Y are subset of R, specifies the following constraint on any relation state r of R

– If two tuples t1 and t2 exist in r such that t1[X] = t2[X] then two tuples t3 and t4 should also exist in r with following properties

– Where Z denote (R – (X U Y))Z is the attributes remaining in R after the attributes

in X U Y are removed from Rt3[X] = t4[X] = t1[X] = t2[X]

t3[Y] = t1[Y] and t4[Y] = t2[Y]

t3[Z] = t2[Z] and t4[Z] = t1[Z]

Page 52: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Fourth Normal Form (4NF)• A relation R is in 4NF with respect to a set of

dependencies F that includes functional and multi-valued dependencies if, for every nontrivial multi-valued dependency X Y in F+, X is a super key for R

Page 53: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Boyce-Codd Normal Form (BCNF)

• Boyce-Codd is a simpler form of 3NF, but it is more stricter than 3NF

• Every relation in BCNF is aloso in 3NF; however a relation in 3NF is not necessarily in 3NF

A relation is said to be in Boyce Codd Normal Form (BCNF)

- if and only if all the determinants are candidate keys.BCNF relation is a strong 3NF, but not every 3NF

relation is BCNF.

Page 54: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Boyce-Codd Normal Form (BCNF)

• Consider the relation schema LOTS which describes parcels of land for sale in various countries of a state. Suppose that there are two candidate keys {property_id} and {country_name, lots#}. Lot numbers are unique only within each country, but property_id are unique across countries for the entire state

LOTSproperty_id country_name LOTS# Area price tax_rate

FD1

FD2FD3

FD4

Page 55: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Boyce-Codd Normal Form (BCNF)

• Boyce-Codd is a simpler form of 3NF, but it is more stricter than 3NF

• Every relation in BCNF is aloso in 3NF; however a relation in 3NF is not necessarily in 3NF

Page 56: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

In this Relation

• STUDENT# - Student Number• COURSE# - Course Number• CourseName - Course Name• IName - Name of the Instructor who

delivered the course• Room# - Room number which is assigned

to respective Instructor• Marks - Scored in Course COURSE# by

Student STUDENT#• Grade - obtained by Student STUDENT# in

Course COURSE#

Page 57: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Domain is atomic if its elements are considered to be indivisible units

Examples of non-atomic domains

• Set of names, composite attributes• Identification numbers like CS101 that can be broken up into parts

Page 58: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Atomicity is actually a property of how the elements of the domain are used.

Example: Strings would normally be considered indivisible

Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127

If the first two characters are extracted to find the department, the domain of roll numbers is not atomic.

Doing so is a bad idea: leads to encoding of information in application program rather than in the database.

Page 59: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Functional Dependencies for the Functional Dependencies for the exampleexample

• • STUDENT# COURSE#STUDENT# COURSE# MarksMarks

• • COURSE# COURSE# CourseName, CourseName,• • COURSE# COURSE# IName IName (Assuming (Assuming one course is one course is taught by one taught by one

and only one Instructor)and only one Instructor)

• • IName IName Room# Room# (Assuming each (Assuming each Instructor has Instructor has his/her own and his/her own and

non-shared room)non-shared room)

• • Marks Marks Grade Grade

Page 60: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Dependency diagramDependency diagram

Report( S#, C#, SName, CTitle, LName, Room#, Marks, Grade)

Page 61: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Report( S#, C#, SName, CTitle, LName, Room#, Marks, Grade)

• S# SName• C# CTitle,• C# LName• LName Room#• C# Room#• S# C# Marks• Marks Grade• S# C# Grade

Assumptions:• Each course has only one lecturer and each lecturer has a room.• Grade is determined from Marks.

Page 62: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Full dependencies

--

Page 63: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Marks is fully functionally dependent on STUDENT# COURSE# and not on sub set of STUDENT# COURSE#. This means Marks can not be determined either by STUDENT# OR COURSE# alone. It can be determined only using STUDENT# AND COURSE# together. HenceMarks is fully functionally dependent on STUDENT# COURSE#.

CourseName is not fully functionally dependent on STUDENT# COURSE# because subset of STUDENT# COURSE# i.e only COURSE# determines the CourseName and STUDENT# does not have any role in deciding CourseName. Hence CourseName is not fully functionally dependent on STUDENT# COURSE#.

Page 64: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Partial dependencies

X and Y are attributes.Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of attribute X.

Page 65: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

In the above relationship CourseName, IName, Room# are partially dependent on composite attributes STUDENT# COURSE# because COURSE# alone defines the CourseName, IName, Room#.

Page 66: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Transitive dependencies

X, Y and Z are three attributes.X -> YY-> Z=> X -> Z

Page 67: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

In above example, Room# depends on IName and in turn IName depends on COURSE#. Hence Room# transitively depends on COURSE#.

Similarly Grade depends on Marks, in turn Marks depends on STUDENT# COURSE# hence Grade depends Fully transitively on STUDENT# COURSE#

Transitive: Indirect

Page 68: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 69: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

First normal form (1NF)

• A relation schema is in 1NF

– if and only if all the attributes of the relation R are atomic in nature.

– Atomic: the smallest level to which data may be broken down and remain meaningful

In relational database design it is not practically possible to have a table which is not in 1NF.

Page 70: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Domain is atomic if its elements are considered to be indivisible units

Examples of non-atomic domains

• Set of names, composite attributes• Identification numbers like CS101 that can be broken up into parts

Page 71: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Atomicity is actually a property of how the elements of the domain are used.

Example: Strings would normally be considered indivisible

Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127

If the first two characters are extracted to find the department, the domain of roll numbers is not atomic.

Doing so is a bad idea: leads to encoding of information in application program rather than in the database.

Page 72: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 73: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Second normal form (2NF)

• A Relation is said to be in Second Normal Form if and only if :– It is in the First normal form, and– No partial dependency exists between non-key attributes and key attributes.

• An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a non-prime attribute

To make a table 2NF compliant, we have to remove all the partial dependencies

Note : - All partial dependencies are eliminated

Page 74: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Prime Vs Non-Prime Attributes• An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a non-prime attribute

Report(S#, C#, StudentName, DateOfBirth, CourseName, PreRequisite, DurationInDays, DateOfExam, Marks, Grade)

Page 75: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 76: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Second Normal Form• STUDENT# is key attribute for Student,• COURSE# is key attribute for Course• STUDENT# COURSE# together form the composite key attributes for Results relationship.• Other attributes like StudentName (Student Name), DateofBirth, CourseName, PreRequisite, DurationInDays, DateofExam, Marks and Grade are non-key attributes.

To make this table 2NF compliant, we have to remove all the partial dependencies.

Student #, Course# -> Marks, GradeStudent# -> StudentName, DOB,Course# -> CourseName, Prerequisite, DurationInDaysCourse# -> Date of Exam

Page 77: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 78: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

To remove this partial dependency we can create four separate tables, Student, Course and Result, Exam_Date tables as shown below.In the first table (STUDENT), the key attribute is STUDENT# and all other non-key attributes are fully functionally dependant on the key attributes.In the second table (COURSE), COURSE# is the key attribute and all the non-key attributes are fully functionally dependant on the key attributes.In third table (Result) STUDENT# COURSE# together are key attributes and all other non key attributes Marks and Grade fully functionally dependant on the key attributes.In the fourth table (Exam_Date), DateOfExam depends only on Course#.These four tables also are compliant with First Normal Form definition. Hence these four tables are in Second Normal Form (2NF).

Page 79: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 80: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 81: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 82: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Third normal form (3NF)

A relation R is said to be in the Third Normal Form (3NF) if and only if− It is in 2NF and− No transitive dependency exists between non-key attributes and key attributes.

• STUDENT# and COURSE# are the key attributes.

• All other attributes, except grade are non-partially,non-transitively dependent on key attributes.• Student#, Course# - > Marks• Marks -> Grade

Page 83: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Note : - All transitive dependencies are eliminated

Page 84: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 85: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Boyce-Codd Normal form (BCNF)

A relation is said to be in Boyce Codd Normal Form (BCNF)

- if and only if all the determinants are candidate keys.BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

Page 86: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Let us understand this using slightly different RESULT table structure. In the table, we have two candidate keys namelySTUDENT# COURSE# and COURSE# EmaiIId.COURSE# is overlapping among those candidate keys.Hence these candidate keys are called as “Overlapping Candidate Keys”.The non-key attributes Marks is non-transitively and fully functionally dependant on key attributes.Hence this is in 3NF. But this is not in BCNF because there are four determinants in this relation namely:STUDENT# (STUDENT# decides EmailiD)EMailID (EmailID decides STUDENT#)STUDENT# COURSE# (decides Marks)COURSE# EMailID (decides Marks).All of them are not candidate keys. Only combination of STUDENT# COURSE# and COURSE# EMailID are candidate keys.

Page 87: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 88: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 89: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 90: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 91: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Now both the tables are not only in 3NF, but also in BCNF because all the determinants are Candidate keys. In the first table, STUDENT# decides EMailID and EMailID decides STUDENT# and both are candidate keys. In second table, STUDENT# COURSE# decides all other non-key attributesand they are composite candidate key as well as determinants.

Note: If the table has a single attribute as candidate key or no overlapping candidate keys and if it is in 3NF, then definitely the table will also be in BCNF.

Basically BCNF takes away the redundancy, anomalies which exist among the key attributes.

Page 92: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Merits of Normalization

• Normalization is based on a mathematical foundation.

• Removes the redundancy to a greater extent. After 3NF, data redundancy is minimized to the extent of foreign keys.

• Removes the anomalies present in INSERTs, UPDATEs and DELETEs.

Page 93: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Demerits of Normalization

• Data retrieval or SELECT operation performance will be severely affected.

• Normalization might not always represent real world scenarios.

Page 94: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 95: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,
Page 96: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Summary

• Normalization is a refinement process. It helps in removing anomalies present in INSERTs/UPDATEs/DELETEs

• Normalization is also called “Bottom-up approach”, because this technique requires very minute details like every participating attribute and how it is dependant on the key attributes, is crucial. If you add new attributes after normalization, it may change the normal form itself.• There are four normal forms that were defined being commonly used.

• 1NF makes sure that all the attributes are atomic in nature.

• 2NF removes the partial dependency.

Page 97: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Summary – contd.

• 3NF removes the transitive dependency.

• BCNF removes dependency among key attributes.

• Too much of normalization adversely affects SELECT or RETRIEVAL operations.

• It is always better to normalize to 3NF for INSERT, UPDATE and DELETE intensive (On-line transaction) systems.

• It is always better to restrict to 2NF for SELECT intensive (Reporting) systems.

• While normalizing, use common sense and don’t use the normal forms as absolute measures.

Page 98: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

• Database designed based on the E-R model may Database designed based on the E-R model may have some amount ofhave some amount of

– Inconsistency– Uncertainty– Redundancy

To eliminate these draw backs some refinement has to To eliminate these draw backs some refinement has to be done on the database.be done on the database.• Refinement process is called NormalizationRefinement process is called Normalization• Defined as a step-by-step process of decomposing Defined as a step-by-step process of decomposing a complex relation into a simple and stable data a complex relation into a simple and stable data structurestructure• The formal process that can be followed to achieve The formal process that can be followed to achieve a good database designa good database design• Also used to check that an existing design is of Also used to check that an existing design is of good qualitygood quality• The different stages of normalization are known as The different stages of normalization are known as “normal forms”“normal forms”• To accomplish normalization we need to To accomplish normalization we need to understand the concept of Functional Dependencies.understand the concept of Functional Dependencies.

Page 99: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Need for NormalizationNeed for NormalizationStudent_Course_Result Student_Course_Result

TableTableInsert AnomalyInsert Anomaly

Delete Anomaly Delete Anomaly Update Anomaly Update Anomaly Data DuplicationData Duplication

Page 100: Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,