Normalization Dr. Mario Guimaraes. Data Normalization Primarily a tool to validate and improve a...

18
Normalization Dr. Mario Guimaraes

Transcript of Normalization Dr. Mario Guimaraes. Data Normalization Primarily a tool to validate and improve a...

Normalization

Dr. Mario Guimaraes

Data Normalization

• Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.

• The process of decomposing relations with anomalies to produce smaller, well-structured relations.

• Primary Objective: Reduce Redundancy,Reduce nulls,

• Improve “modify” activities:– insert, – update, – delete, – but not read

• Price: degraded query, display, reporting

Functional Dependency and Keys

• Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute.

• Candidate Key: Each non-key field is functionally dependent on every candidate key.

Steps in Normalization

Normalization – most used

• Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF).

• Based on functional dependencies among the attributes of a relation.

• A relation can be normalized to a specific form to prevent possible occurrence of update anomalies.

First Normal Form

• No multi-valued attributes.• Every attribute value is atomic.

• Why are the following tables not in 1NFEmployee (ssn, Name, Salary, Address, ListOfSkills)

Department (Did, Dname, ssn)

Second Normal Form

• 1NF and every non-key attribute is fully functionally dependent on the primary key.

• Every non-key attribute must be defined by the entire key, not by only part of the key.

• No partial functional dependencies.

Assuming that we have a composite PK (LicensePlate, OwnerSSN) for the Vechicle Table below, why is the table not in 2NF ?

Vehicle (LicensePlate, Brand, Model, PurchasePrice, Year, OwnerSSN, OwnerName)

Third Normal Form & BCNF

• 2NF and no transitive dependencies (or no functional dependency between non-key attributes = BCNF)

Why are the following tables not in 3NF or BCNF ?

• Why is Employee [ssn, name, salary, did, dname]

• Customer

3NF & BCNF

• It is very rare for a Table to be in 3NF and not be in BCNF (violation of BCNF).

• Given a Relation R with attributes A, B and C where A and B are together the composite PK,

IF A, B -> C and C -> BTHEN R is in 3NF and is not in BCNF

Example: Student, course -> InstructorInstructor -> Course

Steps in Normalization

• 1NF: a table, without multivalued attributes– if not, then decompose

• 2NF: 1NF and every non-key attribute is fully functionally dependent on the primary key– if not, then decompose

• 3NF: 2NF and no transitive dependencies– if not, then decompose

• GENERAL: – Each table should describe a single theme– Modification anomalies are minimized

Hint: THE KEY, THE WHOLE KEY AND NOTHING BUT THE KEY

Normalized Tables Must:

• Two important properties of decomposition:- Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations. - Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.

4NF

• Dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set of values for C. However, set of values for B and C are independent of each other.

Matching

• 1) A Weak Entity _____ a) skills• 2) A Derived Attribute _____ b) ssn• 3) A Composite Attribute _____ c) Monthly Sales• 4) A Multi-Valued Attribute_____ d) full name• 5) Primary Key _______ e) Section of a Course

Which tables are in 3NF ?

Employee (ssn, Name, Salary, pid) (obs.: employee can only work in one project)

Employee (ssn, Name, Salary, pid) (obs.: employee can work in many projects)

Vehicle (OwnerSSN, OwnerName, OwnerAddress, LicensePlate, Brand, Model, PurchasePrice, Year)(obs.: an owner can have several cars)

Project (pid, pname, did) (pid determines did)

Project (pid, pname, ssn) (ssn determines pid)

Normalize to 3NF

Normalize to 3NF

• ClientName -> Address, EmpNo -> Name, Service -> AmountDue• ClientName and Date -> Service, ClientName and Date -> EmpNo

E-R to Table 1:N binary

End of Lecture

End Of

Today’s

Lecture.