DeborahCostaCS157A Presentation on 8182

download DeborahCostaCS157A Presentation on 8182

of 21

Transcript of DeborahCostaCS157A Presentation on 8182

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    1/21

    Chapter 8

    Normal Forms Based on

    Functional Dependencies

    Deborah Costa

    Oct 18, 2007

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    2/21

    8.1 Normalization

    Data redundancy and the consequentmodification (insertion, deletion, and update)anomalies can be traced to undesirable

    functional dependencies in a relation schemaDesirable FD: is any FD in a relation schema, R

    where the determinant is a candidate key ofR; this will not cause data redundancy.

    Undesirable FD: is where the determinant of anFD in R is nota candidate key of R and thiswill cause data redundancy.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    3/21

    A little bit of the History

    Database Normalization was first proposed byEdgar F. Codd.

    Codd defined the first three Normal Forms, whichwell look into, of the 7 known Normal Forms.

    In order to do normalization we must know what therequirements are for each of the three NormalForms that well go over.

    One of the key requirements to remember is thatNormal Forms are progressive. That is, in order tohave 3rd NF we must have 2nd NF and in order tohave 2nd NF we must have 1st NF.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    4/21

    Normalization: Update AnomalyThe same information can be expressed

    on multiple records; thereforeupdates to the table may result inlogical inconsistencies.

    Example: each record in an "Employees'Skills" table might contain anEmployee ID, Employee Address,and Skill; thus a change of addressfor a particular employee will

    potentially need to be applied tomultiple records (one for each of hisskills). If the update is not carriedthrough successfullyif, that is, theemployee's address is updated onsome records but not othersthenthe table is left in an inconsistentstate. Specifically, the table provides

    conflicting answers to the question ofwhat this particular employee'saddress is. This phenomenon isknown as an update anomaly.

    An update anomaly. Employee 519 isshown as having different addresses on

    different records.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    5/21

    Normalization: Insertion Anomaly

    There are circumstances in whichcertain facts cannot berecorded at all. For example,each record in a "Faculty andTheir Courses" table mightcontain a Faculty ID, Faculty

    Name, Faculty Hire Date, andCourse Codethus we canrecord the details of any facultymember who teaches at leastone course, but we cannotrecord the details of a newly-hired faculty member who hasnot yet been assigned to teachany courses. Thisphenomenon is known as aninsertion anomaly.

    An insertion anomaly.Until the newfaculty member is assigned to teach at least

    one course, his details cannot be recorded.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    6/21

    Normalization: Deletion Anomaly

    There are circumstances in whichthe deletion of datarepresenting certain factsnecessitates the deletion ofdata representing completelydifferent facts. The "Facultyand Their Courses" table

    described in the previousexample suffers from this typeof anomaly, for if a facultymember temporarily ceases tobe assigned to any courses,we must delete the last of the

    records on which that facultymember appears. Thisphenomenon is known as adeletion anomaly.

    A deletion anomaly. All information

    about Dr. Giddens is lost when he

    temporarily ceases to be assigned to

    any courses.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    7/21

    Normalization (cont)

    In order to eliminate this problem with undesirableFD is to somehow render the undesirable FDsdesirable and the process of doing this is callednormalization.

    Normal Forms (NFs) provides a stepwiseprogression towards the goal of a fullynormalized relation schema that is guaranteedto be free of data redundancies that causemodification anomalies from a functionaldependency perspective.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    8/21

    Normalization (cont)

    A relation schema is said to be in a particular normalform if it satisfies certain prescribed criteria;otherwise the relation is said to violate the normalform. The violation of each of these normal forms

    signals the presence of a specific type ofundesirable FD.

    It is important to note that the normalization processis anchored to the candidate key of a relation

    schema, R. We will use the primary key as the basis for

    evaluating and normalizing a relation schema.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    9/21

    First Normal Form (1NF)

    Firs t Norm al formimposes conditions sot that a base relationwhich is physically stored as a file does not contain recordswith a variable number of fields. This is accomplished byprohibiting multi-valued attributes, composite attributes, andcombinations thereof in a relation schema. As a

    consequence the value of an attribute in a tuple of a relationcan be neither a set of values, nor another tuple. Suchconstraint in effect prevents relations from containing otherrelations.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    10/21

    1NF Violation and Resolution Figure 8.1 pg 348

    As you can see this is schema violates the 1NF because there are multiple

    Artirst_nm associated with an Album_no or the domain of Artist_nm does not

    have atomic values. In fact by definition, ALBUM is not even a relation.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    11/21

    1NF Violation and Resolution Figure 8.1 pg 348

    In order to fix ALBUM we must expand the relation so that there is a tuple for

    each (atomic) Artist_nm for a given Album_no. The primary key for this is

    {Album_no, Artist_nm} as we all should hopefully know by now.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    12/21

    Second Normal Form (2NF)

    The requirements to satisfy the 2nd NF:

    All requirements for 1st NF must be met.

    Redundant data across multiple rows of a table must be

    moved to a separate table.

    The resulting tables must be related to each other by use of foreign

    key.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    13/21

    2nd NF

    Example

    Only Candidate key is

    (Employee, Skill)

    Not in 2NF

    Current Work Locationis dependent on

    Employee

    Can Cause an Anomaly

    Updating Jones Work location for Typing and Shorthand but not Whittling.

    Then asking What is Jones current work location, can cause a contradictory

    answer, because there are 2 different locations.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    14/21

    2nd NF

    Example

    Both tables are in 2NF

    Meets 1NF

    requirements

    No non-primary keyattribute is dependent

    on part of a key

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    15/21

    Third Normal Form (3NF)

    The requirements to satisfy the 3rd NF:

    All requirements for 2nd NF must be met.

    Eliminate fields that do not depend on the primary key;

    That is, any field that is dependent not only on the primary key but

    also on another field must be moved to another table

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    16/21

    Third Normal Form

    Example

    Eliminate Columns Not Dependent On Keyi.e. if a column is in a relation, then it must bedependent on the key.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    17/21

    Third Normal Form

    Example

    Move non-key-dependent attributes to a

    new table.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    18/21

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    19/21

    8.2 The Motivating Exemplar RevisitedNormalization concepts have been presented by analyzing 1NF, 2NF 3NF

    and BCNF in isolation.

    However in practice normal form violations rarely occur in isolation.

    We can see from figure 8.8a that STOCK follows 1NF because there are

    not composite or multi-valued attributes in it.

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    20/21

    Motivating Exemplar Revisited (cont)Using Armstrongs axioms we get {Store, Product} and {Manager, Location,

    Product} for candidate keys, however we choose {Store, Product} as a primarykey.

    Now that we have the primary key for STOCK we can see that:

    fd1, fd2, fd3 and fd4 violates 2NF in STOCK

    fd6 violates 3NF in STOCK.

    fd7 violates BCNF in STOCK

    To fix all of the violations above we must decompose the relational schema

    D:{R1 R2 R3 R4 R5}

  • 7/27/2019 DeborahCostaCS157A Presentation on 8182

    21/21

    Motivating Exemplar Revisited (cont) This section is very confusing in my opinion. So for better

    understanding please read it more then once. After reading a couple of times we should be able to know how

    to decompose the base relation schema under investigation

    and know if our decomposition is complete and correct without

    looking at the same data.

    A decomposition is complete when it is a dependency-

    preserving lossless-join decomposition. Preservation of FDs is

    a verification process and is accomplished by inspecting the

    decomposition to see if the union of the FDs hold on individual

    relation schema of D is a cover for F. This is demonstrated in

    Section 8.1.5.1.You should also test for the lossless-join

    property, the method for testing is presented in Section

    8.1.5.2.