Data Analysis Improving Database Design. Normalization The process of transforming a data model into...
-
Upload
delilah-miller -
Category
Documents
-
view
215 -
download
0
Transcript of Data Analysis Improving Database Design. Normalization The process of transforming a data model into...
Data Analysis
Improving Database Design
Normalization The process of transforming a data
model into a flexible, stable structure.
Reduces anomalies Anomaly – An unintended negative
consequence of changing the contents of the data.
Anomalies
100 Swimming 50
150 Skiing 200
200 Dance 50
250 Swimming 50
300 Skiing 200
SID Activity Fee
Functional Dependency A relationship between attributes such
that if the value of one attribute is known, the value of another attribute can be determined.
In a database including social security numbers and names, given the value of SSN the value of Name can be determined.
SSN Name
SSN functional determines Name
Name is functionally dependent on SSN
SSN is the determinant of Name
Functional Dependency (cont.)
May exist among groups of attributes.
100 Accounting 101 A
150 Finance 101 B
100 Marketing 101 B
200 Accounting 101 B
150 Marketing 101 A
SID Class Grade
Functional Dependency: (SID, Class) Grade
Keys A key is an attribute or group of
attributes that uniquely identifies a row in a table.
If a key is a group of attributes, it is called a composite key.
A key functionally determines the entire row.
Often called the primary key.
Uniqueness Keys must be unique in a table. Determinants may or may not be
unique in a table.
100 Jones Hall 200
150 Trevor Hall 300
200 Smith Hall 500
250 Jones Hall 200
300 Smith Hall 500
SID Dorm Rent
Normal Forms Classifications of tables based on the types of
anomalies to which they are vulnerable.
There are currently 7 normal forms(1NF, 2NF, 3NF, etc.)
Each normal form eliminates a particular type of anomaly.
Normal forms are cumulative.
First Normal Form (1NF)
A table is in 1NF if: Every cell contains a single value (no
repeating groups or arrays) Each column has a unique name All values in a column are of the same
kind The order of the columns is insignificant Every row is unique The order of the rows is insignificant
1NF Example
Emp_IDEmp_FNameEmp_LNameEmp_PhoneEmp_DepName
EMPLOYEE
Employee has multiple phone numbers.Employee has multiple Dependents.
1NF Example Solution
Emp_IDEmp_FNameEmp_LNameEmp_OfficePhoneEmp_HomePhoneEmp_CellPhone
EMPLOYEE
Emp_IDEmp_DepName
DEPENDENT
Second Normal Form (2NF)
A table is in 2NF if it is in 1NF and it has no partial dependencies. 2NF is only a concern if a table has a
composite key. A partial dependency is when a non-key
attribute is functionally dependent on only part of a composite key.
2NF Example
100 Swimming 50 Beginner
150 Skiing 200 Advanced
200 Dance 50 Beginner
150 Swimming 50 Intermediate
200 Skiing 200 Intermediate
SID Activity Fee
Recreation(SID, Activity, Fee, Expertise)
Key: (SID, Activity)
Functional Dependency: Activity Fee
Expertise
2NF Example solution
100 Swimming Beginner
150 Skiing Advanced
200 Dance Beginner
150 Swimming Intermediate
200 Skiing Intermediate
SID Activity
Fee
Expertise
Dance 50
Skiing 200
Swimming 50
Activity
Third Normal Form (3NF) A table is in 3NF if it is in 2NF and
has no transitive dependencies.
A transitive dependency is when one non-key attribute determines another non-key attribute.
3NF Example
100 Jones Hall 200
150 Trevor Hall 300
200 Smith Hall 500
250 Jones Hall 200
300 Smith Hall 500
SID Dorm Rent
Housing(SID, Dorm, Rent)
Key: SID
Functional Dependency: Dorm Rent
3NF Example solution
100 Jones Hall
150 Trevor Hall
200 Smith Hall
250 Jones Hall
300 Smith Hall
SID Dorm Rent
Jones Hall 200
Smith Hall 500
Trevor Hall 300
Dorm
Boyce-Codd Normal Form (BCNF) Special form of 3NF
A table is in BCNF if it is in 3NF and every determinant is a candidate key.
Arises when a non-key attribute determines part of a composite key.
BCNF Example
SID Major Advisor
101 Accounting Smith
101 CIS Jones
102 Management Johnson
103 CIS Lewis
103 Marketing Thomas
104 CIS Jones
A student can have many majors. A student has a different advisor for each major. Each advisor advises for only one major.
BCNF Example
SID Major Advisor
101 Accounting Smith
101 CIS Jones
102 Management Johnson
103 CIS Lewis
103 Marketing Thomas
104 CIS Jones
Advising (SID, Major, Advisor)
Candidate Keys: (SID, Major) or (SID, Advisor)
Functional Dependency: Advisor Major
BCNF Solution
SID Advisor
101 Smith
101 Jones
102 Johnson
103 Lewis
103 Thomas
104 Jones
Advisor Major
Smith Accounting
Jones CIS
Johnson Management
Lewis CIS
Thomas Marketing
Fourth Normal Form (4NF) A table is in 4NF if it is in BCNF and has no
multivalued dependencies (MVDs).
A multivalued dependency exists when one attribute determines multiple values for two or more other attributes that are independent of each other.
4NF ExampleSID Major Activity
101 Accounting Swimming
101 CIS Swimming
102 Management Skiing
102 Management Dance
103 CIS Hiking
103 Marketing Hiking
104 CIS Skiing
4NF Solution
SID Activity
101 Swimming
102 Dance
102 Skiing
103 Hiking
104 Skiing
SID Major
101 Accounting
101 CIS
102 Management
103 CIS
103 Marketing
104 CIS
Fifth Normal Form (5NF) A table is in 5NF if it is already in 4NF and cannot
have any lossless decompositions.
The table cannot be represented by a set of smaller tables that can reconstruct the original table.
Also called Projection-Join Normal Form (PJNF)
Defines a point where a table cannot be decomposed further.
5NF ExampleEmpID ProjectID Skill
101 A Program
101 B Design
102 B Program
EmpID ProjectID
101 A
101 B
102 B
EmpID Skill
101 Program
101 Design
102 Program
EmpID ProjectID Skill
101 A Program
101 A Design
101 B Program
101 B Design
102 B Program
5NF ExampleEmpID ProjectID Skill
101 A Program
101 B Design
102 B Program
EmpID ProjectID
101 A
101 B
102 B
EmpID Skill
101 Program
101 Design
102 Program
EmpID ProjectID Skill
101 A Program
101 B Design
102 B Program
ProjectID Skill
A Program
B Design
B Program
Domain Key Normal Form (DKNF) Theoretical structure that is free of
all anomalies.
“Every constraint on the database is a logical consequence of the definition of keys and domains.”
Denormalization
Tables may be denormalized to improve performance. Normalization increases the number of
tables and relationships Accessing multiple tables across
relationships requires more processing than accessing a single table
Normalized Model Evaluate the attributes of the tables
to ensure compliance with normalization rules.
Create new tables as needed. Place foreign keys for new tables.