Post on 16-Jan-2016
BIS 360 – Lecture Eight
Ch. 12: Database Design and Normalization
Objectives
Where are we?
ER Diagram Transformation
Why DB Normalization?
Database Anomalies
Normalization Theory
1NF - 3NF
Where we are now
Project ID and Selection
Project Initiation & Planning
Analysis
Logical Design
Physical Design
Implementation
Maintenance
1. Database Design (Ch. 12)
2. Form and Report Design (Ch. 13)
3. Interface Design (Ch. 14)
Why we design a database
E-R Model DFD Model Information
Database
Applications
ReportsBillsCharts...
DB Design – Where we start
From the Conceptual Data Model (ERD)examine the diagram for accuracy –
entities, attributes,relationships, and cardinalities
Transform this Conceptual Data Model into Logical Data Model (i.e., Relational Data Model) a set of flat tables (relations)
ER Diagram TransformationStep 1: For each regular entity type E, create a relation R that includes all the simple attributes. A unique identifier of E will be a primary key of R
EMPLOYEE
EmpID EmpLName EmpFName Salary
PK: EmpID
E
R
ER Diagram TransformationStep 2: For each weak entity type W with identifying entity type E, create a relation Rw with all the attributes of W and also create a relation Re for E as explained in Step 1. Then, include the primary key of Re as a foreign key of Rw. A combination of the partial identifier of W and the unique identifier of E will be a primary key of Rw.
EMPLOYEE
EmpID EmpLName EmpFName Salary
PK: EmpID
DEPENDENTE W
DpdSSN DpdLName DpdFName EmpID
PK: EmpID + DpdSSN FK: EmpID
Re
Rw
ER Diagram TransformationStep 3: When there is a 1:1 relationship between entity type S and T, create relations Rs and Rt as explained in step 1. Include a primary key of S as a foreign key of T.
EMPLOYEE
EmpID EmpLName EmpFName Salary
PK: EmpID
DEPARTMENTS T
DeptID DeptName DeptPhone EmpID
PK: DeptID FK: EmpID
Rs
Rt
manages
is managed by
ER Diagram TransformationStep 4: When there is a 1:M relationship between entity types S and T, create relations Rs and Rt as explained in step 1. If T is an entity type at the MANY-side, include a primary key of Rs as a foreign key of Rt.
DEPARTMENT
DeptID DeptName DeptPhone
PK: DeptID
EMPLOYEES T
EmpID EmpLName EmpFName Salary DeptID
PK: EmpID FK: DeptID
Rs
Rt
works for
hires
ER Diagram TransformationStep 5: When there is a M:N relationship P between entity types S and T and there is no property associated with this relationship P, create relations Rs and Rt as explained in step 1. Also create a relation Rp to represent the relationship and include primary keys of Rs and Rt as foreign keys of Rp. A combination of primary keys of Rs and Rt will be a primary key of Rp.
WhID WhName
PK: WhID
WAREHOUSE PRODUCTS T
WhID ProdID
PK: WhID + ProdID FK: WhID, ProdID
Rs
Rp
ProdID ProdName ProdPriceRt
PK: ProdID
ER Diagram TransformationStep 6: When there is a M:N relationship P between entity types S and T and there are some properties associated with this relationship P, create relations Rs, Rt, and Rp as explained in step 5. All properties associated with the relationship P will be the non-key attributes of Rp.
SchID
ST
UD
EN
T
SC
HO
OL
SSN
Name
Phone
ZIP
Name
Type
ZIP
Attends
Date
STUDENT ( SSN , Name , Phone , Zip )
SCHOOL ( SchID , Name , Type , Zip )
STU_SCH ( SSN , SchID , Date , Degree )
Degree
Complex ERD Transformation Example of (M:N) Relationship
ORDER ( CID , ProdID , Date , Units )
Same customer may order same product many times
Q: Are you happy with the above design?
ProdID
CU
ST
OM
ER
PR
OD
UC
T
CID
Name
Phone
ZIP
...
Desc.
U_Price
Qty
...
Order
Date
Units
Complex ERD Transformation
Example of (M:N) Relationship
ORDER ( OrderID , CID , ProdID , Date , Units )
Same customer may order same product many times
ProdID
CU
ST
OM
ER
PR
OD
UC
T
CID
Name
Phone
ZIP
...
Desc.
U_Price
Qty
...
Order
Date
Units
Create a unique primary key – OrderID
Why DB Normalization?
To avoid database processing errors (anomalies)
To verify the relations derived from the ER diagram – each derived relation would be at least in 3rd normal form (3NF)
Database Anomalies
Anomalies -- Data errors occurred during or after the processing of data
Three types of anomaliesInsertion anomaly - the difficulty in adding new data due to the poor design of a relation Deletion Anomaly - unintentional data loss due to the deletion of some data Update Anomaly - data become inconsistent after some data were updated
Insertion Anomaly
EmpID Dept Name Project Budget Location
1 Mkt John A 100000 MI
1 Mkt John B 200000 IN
2 Eng Jim A 100000 MI
3 Acct Joe C 350000 IL
EMPLOYEE-PROJECT
EmpID Dept Name Project Budget Location
1 Mkt John A 100000 MI
1 Mkt John B 200000 IN
2 Eng Jim A 100000 MI
3 Acct Joe C 350000 IL
4 Fin Jack A 100000 MI
4 Fin Jack C 350000 IL
5 HR June null !! null null
Inse
rt n
ew e
mpl
oyee
s
Insertion Anomaly
EmpID Dept Name Project Budget Location
1 Mkt John A 100000 MI
1 Mkt John B 200000 IN
2 Eng Jim A 100000 MI
3 Acct Joe C 350000 IL
EMPLOYEE-PROJECT
EmpID Dept Name Project Budget Location
1 Mkt John A 100000 MI
1 Mkt John B 200000 IN
2 Eng Jim A 100000 MI
3 Acct Joe C 350000 IL
1 Mkt John D 250000 MN
2 Eng Jim D 250000 MN
null !! null null E 400000 WI
Inse
rt n
ew p
roje
cts
Deletion Anomaly
EmpID Dept Name Project Budget Location
1 Mkt John A 100000 MI
1 Mkt John B 200000 IN
2 Eng Jim A 100000 MI
3 Acct Joe C 350000 IL
EMPLOYEE-PROJECT
EmpID Dept Name Project Budget Location
2 Eng Jim A 100000 MI
3 Acct Joe C 350000 IL
Del
ete
emp
loye
e #
1
EmpID Dept Name Project Budget Location
1 Mkt John B 200000 IN
3 Acct Joe C 350000 IL
Del
ete
pro
ject
A
Update Anomaly
EmpID Dept Name Project Budget Location
1 Mkt John A 100000 MI
1 Mkt John B 200000 IN
2 Eng Jim A 100000 MI
3 Acct Joe C 350000 IL
EMPLOYEE-PROJECT
EmpID Dept Name Project Budget Location
1 Fin John A 100000 MI
1 Fin John B 200000 IN
2 Eng Jim A 100000 MI
3 Acct Joe C 350000 IL
Up
date
em
plo
yees
# 1
Normalization TheoryBasic Concept - Functional DependencyFunctional Dependency (FD): A
relationship between attributes of an entity. FD is the foundation of Normalization.
Notation: a b- value of a uniquely determines the value of b
a is a “determinant” a functionally determines b b is functionally dependent on a
Normalization TheoryFunctional Dependency - Examples
Interpretation: either SSN or EmpID can uniquely determine his/her Name, Phone, and DOB, but not the reverse!
Both SSN and EmpID are candidate keys You can choose one of them as a PK
SSN Name Phone DOBEmpID
EMPLOYEE ( SSN , EmpID , Name , Phone , DOB ) orEMPLOYEE ( SSN , EmpID , Name , Phone , DOB )
Normalization TheoryFunctional Dependency - Examples
VIN # of doors Color Type
Interpretation: VIN can uniquely determine a vehicle’s # of doors, Color, and Type, but not the reverse!
VEHICLE ( VIN , # of doors , Color , Type )
VIN is the only candidate key and it is used as a PK
Database NormalizationWhere is the beef?
Reality is not that simple !All candidate keys, including PK, are the determinantBut, determinant may not be the candidate key
Q: What is the difference between a candidate key and a determinant?
A: They are similar, but not the same - Scope
Database NormalizationWhere is the beef?
• Call # is a candidate key and is used as a PK
• Call # is also a determinant of CourseID, Title, and Classroom
• CourseID is a determinant of Title
Q: Should we put these four attributes on the same table (relation)?
A: No !! We need database normalization
Title Call # ClassroomCourseID
Database NormalizationBasic Ideas
Unnormalized
1st NF (1NF)
2nd NF (2NF)
3rd NF (3NF)
Database NormalizationNormal Form Definitions
• A relation is in its first normal form (1NF) if it does not contain repeating groups.
• A relation is in its second normal form (2NF) if every non-primary key attribute is fully dependent on the (whole) primary key.
• A relation is in its third normal form (3NF) if it has no transitive dependency between non-key attributes.
First Normal Form (1NF)
1NF: A relation is in its first normal form (1NF) if it does not contain repeating groups
PhoneSID Name Sex DOB Phone
MajorMajor
Major
Repeatinggroups
Normalization
STUDENT ( SID , Name , Sex , DOB )STU_PHONE ( SID , Phone )STU_MAJOR ( SID , Major )
Second Normal Form (2NF) 2NF: A relation is in its second normal form (2NF) if it is in
1NF and every non- primary key attribute is fully functionally dependent on the (whole) primary key
123, Jim, Line crew, 01/01/96, Factory123, Jim, Supervisor, 01/01/99, Factory211, John, Sales Rep, 09/01/94, MKT211, John, Sales Manager, 01/01/98, MKT235, Joe, Accountant, 07/01/96, Acct
A combination of EmpID and SDate is the only candidate key and is used as a PK
(EmpID, SDate) Name , Position , Dept EmpID Name , Dept
Q: Do we see any partially functional dependency?
JOB_HIST ( EmpID , Name , Position , SDate , Dept )
Second Normal Form (2NF)
JOB_HIST ( EmpID , Name , Position , SDate , Dept )
Normalization
JOB_HIST ( EmpID , SDate , Position )
EMPLOYEE ( EmpID , Name , Dept )
Comments on 2NF Verification
You don’t need to worry about whether a relation is in its 2NF if its PK includes only one attribute (Why?)
BecausePartially functional dependency only occurs when the PK is a composite (compound) key
Third Normal Form (3NF) 3NF: A relation is in its third normal form (3NF) if it is in
2NF and there is no transitive dependency between non-key attributes in the relation
Transitive dependency:If a b , and b c , then there is a transitive dependency between a and c
EmpID Name , Phone , Office , Street , City , State , Zip
Phone Office
Zip City , State
Q: Do you see the transitive dependency?
EMPLOYEE ( EmpID , Name , Phone , Office , Street , City , State ,
Zip )
Third Normal Form (3NF)
EMPLOYEE ( EmpID , Name , Phone , Office , Street , City , State ,
Zip )
Normalization
EMPLOYEE ( EmpID , Name , Phone , Street , Zip )
PHONE ( Phone , Office )
ZIP ( Zip , City , State )