CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8...

23
October 17, 2017 Sam Siewert CS317 File and Database Systems Lecture 8 – Introduction to Normalization http://dilbert.com/strips/comic/2010-08-24/

Transcript of CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8...

Page 1: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

October 17, 2017 Sam Siewert

CS317File and Database Systems

Lecture 8 – Introduction to Normalization

http://dilbert.com/strips/comic/2010-08-24/

Page 2: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

RemindersExam #1 Questions?

Working on Grading Ex #3 - Return Next Week

Grading Breakdown here -http://mercury.pr.erau.edu/~siewerts/cs317/policies/Grading-Breakdown.pdf

Assignment #4, Wednesday, NormalizationAssignment #5, Logical and Physical DB DesignAssignment #6, DBMS Project of Your Interest

Sam Siewert 2

Page 3: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

NormalizationConcern is Duplication of Data in DBMS– Wastes Space– Insert Hazard (Update Multiple Tables?)– Delete Hazard (Delete from Multiple Tables?)– Modification Hazard (Modify in Multiple Tables?)– Foreign Keys are Exception (Expected Redundancy for

Relational Model)

Minimal Attributes (Columns in Relations [Tables])Attributes in Table with Close Logical Relationship– Functionally Dependent Attributes in Same Relation– Models of Functional Dependency

Minimal Redundancy [Foreign Keys Only]

Sam Siewert 3

Page 4: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

4

How Normalization Supports Database Design (Ref. Connolly-Begg)

Page 5: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

5

Data Redundancy and Update Anomalies

FK Duplication [ok]

RedundantAttribute Data

Page 6: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

6

Example Functional Dependency that holds for all Time

Consider the values shown in staffNo and sNameattributes of the Staff relation (previous slide).

Based on sample data, the following functional dependencies appear to hold.

staffNo → sNamesName → staffNo

Page 7: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

7

Data Redundancy and Update Anomalies

StaffBranch relation has redundant data; the details of a branch are repeated for every member of staff.

In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (branchNo) is repeated in the Staff relation, to represent where each member of staff is located.

Page 8: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

8

Duplicate Data and Update AnomaliesRelations that contain redundant information may suffer from update anomalies.

3 update anomalies– Row Insertion

Enter SL99 assinged B003“fat finger” bAddressSG37, SG14, SG5 share with SL99Which one is right?

– DeletionDelete SA9 (fired)What is bAddress of B007?Do we still have B007?

– ModificationCorrect Bad Street # for Deer Rd.Which row - SL21 or SL41 row?

updates

Page 9: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

9

Lossless-join and Dependency Preservation Properties

Two important properties of decomposition.

– Lossless-join property enables us to find any instance of the original relation from corresponding instances in the smaller relations. I can create UNF table as a view if I want to!

– Dependency preservation property enables us to enforce a constraint on the original relation by enforcing some constraint on each of the smaller relations. E.g. Domain, Referential Integrity (all staff must have one branch assignment), StaffNo must be unique, etc.

Page 10: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

10

Functional Dependencies

Important concept associated with normalization.

Functional dependency describes relationship between attributes.

For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A B), if each value of A in R is associated with exactly one value of B in R.

Page 11: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

11

Characteristics of Functional Dependencies

Property of the meaning or semantics of the attributes in a relation.

Diagrammatic representation.

The determinant of a functional dependency refers to the attribute or group of attributes on the left-hand side of the arrow.

Page 12: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

12

An Example Functional Dependency

Page 13: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

13

Characteristics of Functional Dependencies

Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A.E.g. Branch assignment does not depend on your salary or position, just who you are

Page 14: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

14

Functional DependenciesDeterminants with minimal number of attributes necessary to maintain the functional dependency with the attribute(s) on the right hand-side. E.g. A staff member is assigned to one and only one branch. A branch has many staff members assigned to it.

This requirement is called full functional dependency.

Page 15: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

15

Full vs. Partial Functional DependencyStaff relation: staffNo, sName → branchNoEach value of (staffNo, sName) is associated with a single

value of branchNo. However, branchNo is also functionally dependent on a subset of (staffNo, sName), namely staffNo. Example above is a partial dependency (name irrelevant)

Page 16: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

16

Better Staff Branch Relations

assigned to only one

assigned to only one

Full functional

Partial functional

E.g. Two employees named John WhiteSL21 & New John White as SL100SL21 -> B005SL100 -> B099

Page 17: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

17

Transitive Dependencies

Important to recognize a transitive dependency because its existence in a relation can potentially cause update anomalies.

Transitive dependency describes a condition where A, B, and C are attributes of a relation such that if A →B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).

Page 18: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

18

Example Transitive Dependency

Consider functional dependencies in the StaffBranchrelation (see Slide 17).

staffNo → sName, position, salary, branchNo, bAddress

branchNo → bAddress

• Transitive dependency, branchNo → bAddress exists on staffNo via branchNo.

Page 19: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

19

Better Staff Branch Relations

assigned to only one has only one address

Branch address of StaffNo is transitive

Page 20: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

20

The Process of Normalization

Page 21: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

21

The Process of Normalization

Page 22: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

22

Unnormalized Form (UNF)

A table that contains one or more repeating groups.

Worst case

May also have Full/Partial Functional Dependencies

May also have Transitive Functional Dependencies

E.g. Most Excel Spreadsheets!!!

Page 23: CS317 File and Database Systemsmercury.pr.erau.edu › ... › Lecture-Week-8-2-grayscale.pdf · 8 Duplicate Data and Update Anomalies Relations that contain redundant information

Case in PointOmission of data

Would an RDBMS have caught?

Perhaps if data for plot was queried from well formed schema?

Spreadsheets tend to use “ranges” rather than predicates

Sam Siewert 23

Reinhart, Rogoff... and Herndon: The student who caught out the profsBBC News story