Data Analysis Improving Database Design. Normalization The process of transforming a data model into...

30
Data Analysis Improving Database Design

Transcript of Data Analysis Improving Database Design. Normalization The process of transforming a data model into...

Page 1: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Data Analysis

Improving Database Design

Page 2: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Normalization The process of transforming a data

model into a flexible, stable structure.

Reduces anomalies Anomaly – An unintended negative

consequence of changing the contents of the data.

Page 3: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Anomalies

100 Swimming 50

150 Skiing 200

200 Dance 50

250 Swimming 50

300 Skiing 200

SID Activity Fee

Page 4: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Functional Dependency A relationship between attributes such

that if the value of one attribute is known, the value of another attribute can be determined.

In a database including social security numbers and names, given the value of SSN the value of Name can be determined.

SSN Name

SSN functional determines Name

Name is functionally dependent on SSN

SSN is the determinant of Name

Page 5: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Functional Dependency (cont.)

May exist among groups of attributes.

100 Accounting 101 A

150 Finance 101 B

100 Marketing 101 B

200 Accounting 101 B

150 Marketing 101 A

SID Class Grade

Functional Dependency: (SID, Class) Grade

Page 6: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Keys A key is an attribute or group of

attributes that uniquely identifies a row in a table.

If a key is a group of attributes, it is called a composite key.

A key functionally determines the entire row.

Often called the primary key.

Page 7: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Uniqueness Keys must be unique in a table. Determinants may or may not be

unique in a table.

100 Jones Hall 200

150 Trevor Hall 300

200 Smith Hall 500

250 Jones Hall 200

300 Smith Hall 500

SID Dorm Rent

Page 8: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Normal Forms Classifications of tables based on the types of

anomalies to which they are vulnerable.

There are currently 7 normal forms(1NF, 2NF, 3NF, etc.)

Each normal form eliminates a particular type of anomaly.

Normal forms are cumulative.

Page 9: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

First Normal Form (1NF)

A table is in 1NF if: Every cell contains a single value (no

repeating groups or arrays) Each column has a unique name All values in a column are of the same

kind The order of the columns is insignificant Every row is unique The order of the rows is insignificant

Page 10: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

1NF Example

Emp_IDEmp_FNameEmp_LNameEmp_PhoneEmp_DepName

EMPLOYEE

Employee has multiple phone numbers.Employee has multiple Dependents.

Page 11: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

1NF Example Solution

Emp_IDEmp_FNameEmp_LNameEmp_OfficePhoneEmp_HomePhoneEmp_CellPhone

EMPLOYEE

Emp_IDEmp_DepName

DEPENDENT

Page 12: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Second Normal Form (2NF)

A table is in 2NF if it is in 1NF and it has no partial dependencies. 2NF is only a concern if a table has a

composite key. A partial dependency is when a non-key

attribute is functionally dependent on only part of a composite key.

Page 13: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

2NF Example

100 Swimming 50 Beginner

150 Skiing 200 Advanced

200 Dance 50 Beginner

150 Swimming 50 Intermediate

200 Skiing 200 Intermediate

SID Activity Fee

Recreation(SID, Activity, Fee, Expertise)

Key: (SID, Activity)

Functional Dependency: Activity Fee

Expertise

Page 14: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

2NF Example solution

100 Swimming Beginner

150 Skiing Advanced

200 Dance Beginner

150 Swimming Intermediate

200 Skiing Intermediate

SID Activity

Fee

Expertise

Dance 50

Skiing 200

Swimming 50

Activity

Page 15: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Third Normal Form (3NF) A table is in 3NF if it is in 2NF and

has no transitive dependencies.

A transitive dependency is when one non-key attribute determines another non-key attribute.

Page 16: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

3NF Example

100 Jones Hall 200

150 Trevor Hall 300

200 Smith Hall 500

250 Jones Hall 200

300 Smith Hall 500

SID Dorm Rent

Housing(SID, Dorm, Rent)

Key: SID

Functional Dependency: Dorm Rent

Page 17: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

3NF Example solution

100 Jones Hall

150 Trevor Hall

200 Smith Hall

250 Jones Hall

300 Smith Hall

SID Dorm Rent

Jones Hall 200

Smith Hall 500

Trevor Hall 300

Dorm

Page 18: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Boyce-Codd Normal Form (BCNF) Special form of 3NF

A table is in BCNF if it is in 3NF and every determinant is a candidate key.

Arises when a non-key attribute determines part of a composite key.

Page 19: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

BCNF Example

SID Major Advisor

101 Accounting Smith

101 CIS Jones

102 Management Johnson

103 CIS Lewis

103 Marketing Thomas

104 CIS Jones

A student can have many majors. A student has a different advisor for each major. Each advisor advises for only one major.

Page 20: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

BCNF Example

SID Major Advisor

101 Accounting Smith

101 CIS Jones

102 Management Johnson

103 CIS Lewis

103 Marketing Thomas

104 CIS Jones

Advising (SID, Major, Advisor)

Candidate Keys: (SID, Major) or (SID, Advisor)

Functional Dependency: Advisor Major

Page 21: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

BCNF Solution

SID Advisor

101 Smith

101 Jones

102 Johnson

103 Lewis

103 Thomas

104 Jones

Advisor Major

Smith Accounting

Jones CIS

Johnson Management

Lewis CIS

Thomas Marketing

Page 22: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Fourth Normal Form (4NF) A table is in 4NF if it is in BCNF and has no

multivalued dependencies (MVDs).

A multivalued dependency exists when one attribute determines multiple values for two or more other attributes that are independent of each other.

Page 23: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

4NF ExampleSID Major Activity

101 Accounting Swimming

101 CIS Swimming

102 Management Skiing

102 Management Dance

103 CIS Hiking

103 Marketing Hiking

104 CIS Skiing

Page 24: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

4NF Solution

SID Activity

101 Swimming

102 Dance

102 Skiing

103 Hiking

104 Skiing

SID Major

101 Accounting

101 CIS

102 Management

103 CIS

103 Marketing

104 CIS

Page 25: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Fifth Normal Form (5NF) A table is in 5NF if it is already in 4NF and cannot

have any lossless decompositions.

The table cannot be represented by a set of smaller tables that can reconstruct the original table.

Also called Projection-Join Normal Form (PJNF)

Defines a point where a table cannot be decomposed further.

Page 26: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

5NF ExampleEmpID ProjectID Skill

101 A Program

101 B Design

102 B Program

EmpID ProjectID

101 A

101 B

102 B

EmpID Skill

101 Program

101 Design

102 Program

EmpID ProjectID Skill

101 A Program

101 A Design

101 B Program

101 B Design

102 B Program

Page 27: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

5NF ExampleEmpID ProjectID Skill

101 A Program

101 B Design

102 B Program

EmpID ProjectID

101 A

101 B

102 B

EmpID Skill

101 Program

101 Design

102 Program

EmpID ProjectID Skill

101 A Program

101 B Design

102 B Program

ProjectID Skill

A Program

B Design

B Program

Page 28: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Domain Key Normal Form (DKNF) Theoretical structure that is free of

all anomalies.

“Every constraint on the database is a logical consequence of the definition of keys and domains.”

Page 29: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Denormalization

Tables may be denormalized to improve performance. Normalization increases the number of

tables and relationships Accessing multiple tables across

relationships requires more processing than accessing a single table

Page 30: Data Analysis Improving Database Design. Normalization The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly.

Normalized Model Evaluate the attributes of the tables

to ensure compliance with normalization rules.

Create new tables as needed. Place foreign keys for new tables.