44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian...

30
44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: [email protected] http://itsy.co.uk/ac/0405/sem3/44271_DDI/

Transcript of 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian...

Page 1: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

44271: Database Design & Implementation

Logical Data Modelling(Avoiding Database Anomalies)

Ian PerryRoom: C49 Tel Ext.: 7287

E-mail: [email protected]

http://itsy.co.uk/ac/0405/sem3/44271_DDI/

Page 2: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 244271: Database Design & Implementation: Logical Data Modelling

What is a Logical Data Model? A ‘robust’ representation of the initial

decisions made when building our Conceptual Data Model, which was composed of: Entities Attributes Relationships

When I say ‘robust’ I mean that this model MUST ‘perform’ well with respect to a specific style/type of software.

Page 3: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 344271: Database Design & Implementation: Logical Data Modelling

Database Theories & Software Hardware independent, the match to

‘type’ of software is only concern, e.g.: Hierarchical DBMS Relational DBMS Object-based DBMS

Each Database Theory addresses: Data Structure Data Integrity Data Manipulation

Page 4: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 444271: Database Design & Implementation: Logical Data Modelling

Database Theory = Relational Model First proposed by Dr. E. F. Codd in June 1970.

Codd E F, (1970), A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Vol. 13, No. 6, Pgs 377 – 387.

Codd's model is now accepted as the definitive model for relational database management systems (RDBMS).

Structured English QUEry Language ("SEQUEL") was developed by IBM Corporation, Inc., to use Codd's model.

SEQUEL later became SQL. In 1979, Relational Software, Inc. (now Oracle

Corporation) introduced the first commercial implementation of SQL.

SQL is the most widely used RDBMS manipulation language.

Page 5: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 544271: Database Design & Implementation: Logical Data Modelling

Relations look like Entities, but …

Entity Staff(SCode, Name, Address, DoB, …)

May discover requirement for ‘extra’ Attributes, and also need to ‘complete’ our list of Attributes for each Relation.

Relation Staff(SCode, Name, Address, DoB, DoE)

Entity Contract(CCode, Site, Begin, End, …)

Can’t draw relationship lines, so need to ‘add’ extra attributes to Relations at the ‘M’ end of any ‘1:M’ relationships; e.g. 1 Staff “take part in” M Contract.

Relation Contract(CCode, Site, Begin, End, SCode)

Page 6: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 644271: Database Design & Implementation: Logical Data Modelling

Use Tables to ‘flesh-out’ your Logical Model

Staff(SCode, Name, Address, DoB, DoE)SCode Name Address DoB DoE 9491 Smith 6 Shaw St 13/02/65 03/10/98 7416 Day 2 Sale St 14/01/57 22/11/02 8912 Jones 15 Ayr Av 28/12/76 01/03/04

CCode Site Begin End SCode 279 Hull 27/02/05 03/03/05 9491 665 York 14/09/04 02/12/04 7416 183 York 04/03/05 16/06/05 9491

Contract (CCode, Site, Begin, End, SCode)

NB. Tables ARE NOT Relations!

Page 7: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 744271: Database Design & Implementation: Logical Data Modelling

Primary & Foreign Keys Most important Attributes in a Relation are

know as ‘Keys’: of which there are two types.

Primary Key: One, or more, Attribute(s) that identify a

unique occurrence of the ‘Entity’ that this ‘Relation’ represents.

Foreign Key: Attributes used (i.e. instead of the lines of an

ER Diagram) to represent the presence of relationships.

Often referred to as: The Primary/Foreign Key Mechanism.

Page 8: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 844271: Database Design & Implementation: Logical Data Modelling

Attributes, Domains & Relationships

Attribute Values should be atomic (i.e. simple/single values only); e.g.: ‘address’ should be separated into ‘street’ &

‘town’ & ‘postcode’. Set of eligible Attribute Values is known as

an Attribute’s Domain; e.g.: if we only have 100 members of staff, then

the Domain of the ‘SCode’ Attribute could be “whole numbers between 1 & 100”.

The Relational Model is weak at explicitly modelling relationships: Attributes in different Relations MUST

HAVE same Attribute Domain for relationship to be possible.

Page 9: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 944271: Database Design & Implementation: Logical Data Modelling

Codd’s Rules Each Tuple (i.e. row) MUST BE unique,

i.e.: need a way to discriminate between Tuples.

Therefore: each Relation MUST HAVE a Primary Key.

There may be many Candidates for the job of Primary Key, so select on basis of: uniqueness AND/OR minimality.

Keys with more than one Attribute: are know as composite keys.

Page 10: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1044271: Database Design & Implementation: Logical Data Modelling

Rules for Integrity No Attribute that is part of the Primary

Key can assume a ‘null’ value, else: how could we discriminate between

Tuples?

Foreign Key Attributes must take values that are either ‘null’, or from same Domain as the Primary Key Attribute to which they are logically linked, else: we will lose the possibility of making

relationships.

Page 11: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1144271: Database Design & Implementation: Logical Data Modelling

Avoiding Database Anomalies Most Database books have a section

describing a mathematically-based technique called Normalisation: I will show you a much easier way of

achieving the same result. What we want to achieve is a ‘robust’

Logical Data Model; i.e. by: Transforming a Conceptual Data Model into

a set of Relations. Checking these Relations for any

Anomalies. Documenting them as a Database Schema.

Page 12: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1244271: Database Design & Implementation: Logical Data Modelling

What is an Anomaly? Anything we try to do with a database

that may lead to unexpected and/or unpredictable results.

Three types of Anomaly; i.e.: insert delete update

Need to check your database design carefully: the only good database is an anomaly

free database.

Page 13: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1344271: Database Design & Implementation: Logical Data Modelling

Insert Anomaly When we want to enter a value into a data

cell but the attempt is prevented, as the primary key value is not known.

e.g. We have built a new Room (e.g. B123), but it has not yet been timetabled for any courses (so we don’t have a CoNo value).

CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45

Page 14: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1444271: Database Design & Implementation: Logical Data Modelling

Delete Anomaly When a value we want to delete also means

we will delete values we wish to keep.

CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45

e.g. CoNo 351 has ended, but Room C320 will be used elsewhere.

Page 15: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1544271: Database Design & Implementation: Logical Data Modelling

Update Anomaly When we want to change a single data item

value, but must update multiple entries

CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45

e.g. Room H940 has been improved, it is now of RSize = 500.

Page 16: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1644271: Database Design & Implementation: Logical Data Modelling

Conceptual Model & Translation Process

Conceptual Model:

Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB, ...)

Student(Enrol-No, Name, Address, OLevelPoints, ...)

Course(CourseCode, Name, Duration, ...)

Staff Course Student1 MM M

Translation Process: Entities become Relations Attributes become Attributes(?) Key Attribute(s) become Primary Key(s) Relationships are represented by additional Foreign Key

Attributes; for those Relations that are at the ‘M’ end of each 1:M

Relationship.

Page 17: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1744271: Database Design & Implementation: Logical Data Modelling

The ‘Staff’ & ‘Student’ Relations

Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB, ...)

becomes:

Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)

Student(Enrol-No, Name, Address, OLevelPoints, ...)

becomes:

Student(Enrol-No, Name, Address, OLevelPoints, Tutor)

NB. Foreign Key Tutor references Staff.Staff-ID

Page 18: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1844271: Database Design & Implementation: Logical Data Modelling

The ‘Staff’ & ‘Course’ Relations

Course(CourseCode, Name, Duration, ...)

becomes:

Course(CourseCode, Name, Duration)

NB. Can’t ‘simply’ add extra attributes to act as Foreign Keys; as BOTH Relations have a ‘M’ end: I warned you about leaving M:M

relationships in your Conceptual Data Model.

MUST create an ‘artificial’ linking Relation.

Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)

Page 19: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 1944271: Database Design & Implementation: Logical Data Modelling

‘Staff’, ‘Course’ & ‘Team’ Relations

NB.In the ‘artificial’ Team Relation:Primary Key is a ‘composite’ of CourseCode &

Staff-IDForeign Key CourseCode references

Course.CourseCodeForeign Key Staff-ID references Staff.Staff-ID

Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)

Course(CourseCode, Name, Duration)

Team(Staff-ID, CourseCode)

Page 20: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2044271: Database Design & Implementation: Logical Data Modelling

4 Relations from 3 Entities?

OK, BUT are they anomaly free?• Is every Tuple unique?

• i.e. is there a Primary Key.• Are the Attributes Atomic?

• i.e. do they store only ONE item of data.

• Does every Attribute within each Relation ‘depend’ upon the Primary Key?

Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)

Course(CourseCode, Name, Duration)

Team(Staff-ID, CourseCode)

Student(Enrol-No, Name, Address, OLevelPoints, Tutor)

Page 21: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2144271: Database Design & Implementation: Logical Data Modelling

What if the checks fail? If any Relation fails ‘checks’:

especially those checking dependency. we MUST split that Relation into

multiple Relations: until they pass the tests.

but MUST remember to leave behind a Foreign Key: to ‘point’ forwards to the Primary Key of

the ‘new’ split-off Relation.

Page 22: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2244271: Database Design & Implementation: Logical Data Modelling

Are they Anomaly Free?

Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)

Course(CourseCode, Name, Duration)

Team(Staff-ID, CourseCode)

Student(Enrol-No, Name, Address, OLevelPoints, Tutor)

NOT Dependentupon Staff-ID;Requires a slightlymore complex ‘solution’.

NOT very Atomic;Could easily be split into ‘Street’, ‘Town’ & ‘PostCode’.

Page 23: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2344271: Database Design & Implementation: Logical Data Modelling

‘Fixing’ the Dependency ‘Problem’

The Attribute ‘RateOfPay’ depends upon ‘ScalePoint’ NOT ‘Staff-ID’. So, we MUST remove ‘RateOfPay’ from the

‘Staff’ Relation, like this:

NB. In the ‘Staff’ Relation:Foreign Key ScalePoint references

Pay.ScalePoint

Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)

Staff(Staff-ID, Name, ScalePoint, DOB)

Pay(ScalePoint, RateOfPay)

Page 24: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2444271: Database Design & Implementation: Logical Data Modelling

5 Relations from 3 Entities

Now all we need to do: Is to document our ‘Anomaly Free’

Relations as a Database Schema.

Staff(Staff-ID, Name, ScalePoint, DOB)

Course(CourseCode, Name, Duration)

Team(Staff-ID, CourseCode)

Student(Enrol-No, Name, Street, Town, PostCode,

OLevelPoints, Tutor)

Pay(ScalePoint, RateOfPay)

Page 25: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2544271: Database Design & Implementation: Logical Data Modelling

Document Relations as a Database Schema

A Database Schema: defines all Relations, lists all Attributes (with

their Domains), and identifies all Primary & Foreign Keys.

We may/should have ‘discovered’ a number of constraints during our analysis of the Business situation, e.g: the College only delivers 10 Courses. there are only 12 Points on the Pay Scale. Staff MUST be at least 21 Years Old.

These constraints can/should be expressed as the ‘Domains’ of the Database Schema.

Page 26: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2644271: Database Design & Implementation: Logical Data Modelling

Logical Schema 1 - Domains Schema College Domains

StudentIdentifiers = 1 - 9999; StaffIdentifiers = 1001 - 1199; GeneralNames = TextString (15 Characters); Addresses = TextString (20 Characters); PostCodes = TextString (7 or 8 Characters); CourseIdentifiers = 101 - 110; OLevelPoints = 0 - 100; ScalePoints = 1 - 12; StaffBirthDates = Date (dd/mm/yyyy), >21

Years before Today;

Page 27: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2744271: Database Design & Implementation: Logical Data Modelling

Logical Schema 2 - Relations Relation Student

Enrol-No: StudentIdentifiers; Name: GeneralNames; Street: Addresses; Town: Addresses; PostCode: PostCodes; OLevelPoints: OLevelPoints; Tutor: StaffIdentifiers;

Primary Key: Enrol-No Foreign Key Tutor references Staff.Staff-

ID

Page 28: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2844271: Database Design & Implementation: Logical Data Modelling

Logical Schema 3 - Relations Relation Staff

Staff-ID: StaffIdentifiers; Name: GeneralNames; ScalePoint: ScalePoints; DOB: StaffBirthDates;

Primary Key: Staff-ID Foreign Key ScalePoint references

Pay.ScalePoint

Page 29: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 2944271: Database Design & Implementation: Logical Data Modelling

Logical Schema ... Relation Course

CourseCode: CourseIdentifiers; Name: GeneralNames; … etc.

Continue to define each of the Relations in a similar manner.

NB. Make sure that you define ALL of the Relations, including: ‘artificial’ ones (e.g. Team) ‘split-off’ ones (e.g. Pay)

Page 30: 44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: I.P.Perry@hull.ac.ukI.P.Perry@hull.ac.uk.

Ian Perry Slide 3044271: Database Design & Implementation: Logical Data Modelling

This Week’s Workshop The purpose of this week’s Workshop

is to practice developing ‘robust’ logical data models that conform to the ‘rules’ of Codd’s Relational Model. Exploring the ‘definition’ of Relations. Identifying potential anomalies in a

Table of data, and ‘solving’ these ‘problems’.

Documenting a Database Schema (i.e. a Logical Model), in the format required by Part 2 of the Assignment.