Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School...

Post on 26-Dec-2015

217 views 0 download

Transcript of Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School...

Fundamentals/ICY: Databases2010/11

WEEK 11

John BarndenProfessor of Artificial Intelligence

School of Computer ScienceUniversity of Birmingham, UK

Today

Maths

Structure of Exam

Lecture by Funmi on his industrial experience

Remember:((Items in double round brackets

are optional material))

Reminder of Week 10on Normalization and Relational

Operators

Summary:Normalization and Database Design

Normalization helps eliminate data redundancies and some other aspects of poor structure.

Normalization focusses on problems in individual entity types.

Difficult to separate normalization from overall ER modelling process.

Normalization cannot, by itself, guarantee good designs.

1NF, 2NF, and 3NF are the most commonly encountered, and 3NF is often enough, but BCNF, 4NF etc. may also need to be considered.

Non-normalized tables may be desirable in some cases, to increase processing speed and/or reduce conceptual complexity of operations.

Natural Join (continued) SQL:

SELECT …all the attributes but including only one version of each shared one … FROM T1, T2 WHERE … explicit condition of equalities for ALL the shared attributes ...

SELECT * FROM T1 NATURAL JOIN T2;

Relational algebra notation: Result table is T1 T2 where T1 and T2 are the given tables.

is the “bow tie” symbol.

Outer Join Of CUSTOMER and AGENT, using equal AGENT_CODE

Left outer Uses all the rows in the CUSTOMER table, by doing equijoin on

AGENT_CODE but also including non-matching CUSTOMER rows.

Right outer Uses all the rows in the AGENT table, doing equijoin on

AGENT_CODE but also including non-matching AGENT rows.

Full outer Using all the rows in the AGENT and CUSTOMER tables, doing

equijoin on AGENT_CODE but also including non-matching rows from each table.

Union of Left Outer Join result and Right Outer Join result.

Outer Joins (continued)

SQL: SELECT * FROM T1, T2 WHERE … explicit join condition … UNION … a SELECT expression that gets the extra LEFT rows UNION … a SELECT expression that gets the extra RIGHT rows

SELECT * FROM T1 LEFT/RIGHT/FULL JOIN T2 USING (… some shared attribs …) // ON … explicit join cond …

Relational algebra notation: Variants of bow tie symbol . See R,C&Crockett sec. 4.2.3 (though their symbols need a

subscript stating the join condition unless natural).

DIVIDE operation: optional

Reminder of Week 9on Mathematical Background

Relation from a Table

The relation at the moment is

‘9568876A’, ‘Chopples’, 37 > ‘2544799Z’, ‘Blurp’, NULL > ‘1698674F’, ‘Rumpel’, 88 >

PERS-ID NAME AGE

9568876A Chopples 37

2544799Z Blurp

1698674F Rumpel 88

People

A Table as a Relation? People loosely talk about tables being relations.

This is mathematically inaccurate for several reasons:1) The table properly speaking includes not just the rows but also

the attribute names themselves, their domains, specification of primary and foreign keys, etc.

2) It’s only the rows at any given moment that form a relation. When a value in the table changes or a row is added or deleted, the mathematical relation is replaced by a different one.

3) Relations do not cater for tables with repeated rows.

• ((But there is a more advanced notion of relation, based on “bags” rather than sets, that does cater for repeated rows.))

But OK if you know what you (and those people) mean.

New for Week 10on Mathematical Background

Some “Relational Operations”:Set Operations Applied to Relations

Union of relations R and S:

R S = the set of tuples that are in R or S (or both).

NB: no repetitions created!

Intersection of relations R and S:

R S = the set of tuples that are in both R and S.

Difference of relations R and S:

R S = the set of tuples that are in R but not S.

Relational Operations: contrast to SQL

Those operations do NOT themselves require R and S to have similar tuples in order to be well-defined.

E.g., R could be binary and on integer sets, S could be ternary and on character-string sets.

But the corresponding DB table operations (which are usually called “relational operators”) do require the tables to have the same shape (same number of columns, same domains for corresponding columns).

((“Relations don’t remember where they came from”))

Consider a relation R on A, B, C, D, E, … i.e., R A B C D E ….

Suppose A AA, B BB, C CC, etc.

Then: a tuple formed from sets A, B, … is also automatically a tuple formed from AA, BB, …

That is, R AA BB CC DD EE ….

So R is also a relation on AA, BB, CC, DD, EE, ….

So a relation has no very close connection to the original sets it might have been defined from, unlike the case of tables, where the attribute domains are part of the nature of the table.

“Arity” of Relations

A relation on two sets is binary, on three sets is ternary, … … even when not all the sets are different.

So a relation on A and A is still binary and NOT “unary.” The members of the relation are two-element tuples.

A relation on, say, A, B and A is ternary and not binary. The members of the relation are three-element tuples.

“Arity” of Relations, contd.

A “unary relation” on A is a set of singleton tuples formed from A elements.

Unusual (though not inconceivable) to want a single-attribute table in a finalized ER model.

But one-attribute tables often arise dynamically from table operations, as you know.

Relations from Somewhere to Somewhere

A relation R “from” set A “to” set B is the same thing as a relation “on” A “and” B — just different terminology.

Similarly, a relation from A, B, C to D, E is the same thing as a relation on A, B, C, D, E.

Changing the Sets in a Relation Around

A relation R on A, B, C, D, E, say, obviously “induces” (i.e., gives rise to, in a natural way) a relation on any reordering of the sets, such as D, A, B, E, C, just by reordering each tuple in the same way.

Thus, R induces a relation from, say, D, A to B, E, C.

((When there are just two sets A and B, the (only possible) reordering of the sets gives the inverse of R.))

Removing some of the Sets in a Relation(Projection)

And we can remove some of the sets and the corresponding items from each tuple.

Given the relation on D,A,B,E,C, we can get a relation on, say, D,B,C, just by removing the second and fourth item from each tuple.

This is the mathematical operation underlying the PROJECT relational operator on tables (what I would prefer call Select-Columns or Select-Attributes).

Functional Relations(Partial Functions)

A relation R from A to B is functional if, for any a in A, there is AT MOST one (but perhaps no) b in B such that a, b> is in R.

So several things in A can be related to the same thing in B.

But you can’t have several things in B related to the same thing in A.

A functional relation from A to B is also called a partial function from A to B.

Functional Relations, contd.

Can generalize:

a relation R from A1, A2, A3 … to B1, B2, B3, …is functional if,

for each combination of things a1, a2, a3, … in A1, A2, A3, … respectively,

there is at most one b1, b2, b3, … in B1, B2, B3, … respectively such that a1, a2, a3, …, b1, b2, b3, …> is in R.

Functional Relations arising from Functional Dependencies

Suppose attribute X is functionally dependent on (= determined by) attributes A, B, … in a table.

Then, at any moment, the induced relation from A, B to X is a partial function from the A, B, … value domains to the X value domain.

Special case:

Consider any superkey (e.g., the primary key) of a table.

Then the relation in the table at a given moment is a partial function from the superkey’s domains to the remaining attribute domains.

Caution The word “partial” in the phrase “partial function” has

nothing to do with the word “partial” in “partial dependency” as discussed under Normalization.

Any dependency relationship in a table gives us a partial function, irrespective of whether the dependency is also “partial” in the special sense of involving only a part of the PK.

Remaining material on relations is optional

((Restriction of a Relation))

Consider a relation R from A to B,

and a subset AA of A.

Then the restriction of R to AA is the relation derived from R by restricting attention to AA,

i.e., including only tuples whose first element is in AA.

The new relation is notated R|R|AAAA

((Restriction More Generally))

Consider a relation R from A, B, …C to D, E, …, F

and subsets AA of A, BB of B, …, CC of C.

Then the restriction of R to AA, BB, …, CC is the relation derived from R by restricting attention to AA, BB, …, CC

i.e., including only tuples whose first few elements are in AA, BB, …, CC respectively.

The new relation is notated R|R|AA, BB, …, CCAA, BB, …, CC

((Totality of Relations))

A relation R from A to B is total (on A) if it relates everything in A to at least one thing in B.

I.e., for every member a of A, there is at least one b in B such that

a, b >> is in R.

A relation may be merely partial (on A above) in not being total. However, technically all relations are “partial”, with total being a special case.

((Totality, contd.))

Can generalize:

A relation R from A, B, C, … to D, E, … is total (on A, B, C, …)

if for every member a of A, b of B, c of C, etc.

there is at least one d in D, e in E, etc. such that

a, b, c, …, d, e, … >> is in R.

((Partiality of Table Relations))

The relation in a table (at a given moment), considered as a relation from any of its attribute value domains to the remaining value domains, will almost always be merely partial.

This is simply because it’s highly unlikely that all possible combinations of values from the former collection of value domains will appear in the table!!

((Functions)) A total functional relation from A to B is called a function from

A to B.

Each thing in A is related to exactly one thing in B. (But two different things in A can be related to the same thing in B, and not everything in B needs to be related to anything in A. So the inverse relation is not necessarily either functional or total.)

Caution: every function is also a partial function.

((From Partiality to Totality by Restriction))

We can always turn a merely-partial R from A to B into a total one by slimming A down enough! Just remove the members of A that aren’t related to anything by R, to get a new set AA. We don’t remove any tuples from R.

R (as a relation from AA to B) is total on AA.

And note that R|AA = R.

AA is called the domain of R, notated dom(R). Not to be confused with “value domains” of DB entity attributes.

Can generalize the above to non-binary relations.

((Totality contd. and “Onto”))

A relation R from A to B is onto if for everything in B there is at least one thing in A that is related by R to it. I.e.:

For every member b of B,

there is at least one a in A such that a, b> is in R.

Onto-ness is just totality in the other direction.

You can also say that R is total on B, or that the inverse of R is total.

((Other Categories of Relation)) A relation R from A to B is one-to-one (1-1) if, for any a in A,

there is at most one b in B such that a, b> is in R, AND for any b in B, there is at most one a in A such that a, b> is in R.

That is, both the relation and its inverse from B to A are functional. (But they don’t need to be total.)

To put it another way: it is functional and different members of A map to (= are related to) different members of B.

Or again: Different members of A map to different members of B and different members of B map to different members of A.

A relation R from A to B is many-to-one if it is functional but not one-to-one: i.e., there are different members of A that map to the same member of B, in at least one case.

A relation R from A to B is one-to-many if it is not functional but its inverse from B to A is functional. That is, there’s a member of A that maps to more than one member of B; but each member of B maps to at most one member of A.

A relation R from A to B is many-to-many if neither it nor its inverse is functional: i.e., there’s a case of a member of A mapping to more than one member of B, and a case of a member of B mapping to more than one member of A.

((Other Categories of Relation, contd.))

((Relations from Entity Relationships)) Surprisingly … the concentration on mathematical relations in

introductory accounts of “relational” DBs is on

a relation as arising from each single table (entity type),

despite …

the importance of “relationships” between entity types in Entity-Relationship modelling!

However, between-entity-type relationships also correspond to mathematical relations, distinct from the ones within individual tables.

((Intuitively ...))

Recall that for each entity type there is the set of possible entities of that type (the entity set).

A “relationship” between two (or more) entity types/sets is a description of the fact that at any given moment the database stores a particular mathematical relation on the entity sets.

E.g., the EMPLOYED-BY relationship from the People entity type to the Organizations entity type says that the database (at any moment) stores a relation on the People entity set and Organizations entity set.

((Example Continued)) So at any given moment the relation might be

{Person1, Org1>, Person2, Org1>, Person3, Org1>,

Person4, Org2>, Person3, Org2>}

Each Person… and Org… is an entity represented as a row of the corresponding table …... therefore itself mathematically represented as a tuple of attribute values:

So Person1, Org1> could be, in more detail,

E156, ‘Sam’, ‘Finks’, I678>, I678, ‘IBM’, ‘USA’> >

Note the nested tuples.

((Bridging Entity Types)) Recall that bridging entity types are brought in to

represent M:N relationships (and similarly M:N:P relationships, etc.)

People/Organizations again: the relation within the bridging table would look like

{ E156, I678>, E257, I996>, E714, I678>, … }.

This relation can also be said to correspond to the original People-Organization relationship, but is abstracted from the above relation by replacing tuples representing entities, such as E156, ‘Sam’, ‘Finks’, I678>, by the PK values in them, such as E156.

((Bridging Entity Types (contd.) ))

But now … what about the relationship between the People and Organization entity types and the bridging entity type!! (Exercise)

And note: We could have chosen to use the bridging-entity-style relation to begin with as our mathematical formulation of the People/Organization relationship.

A mathematical formulation is not objectively given by the world … it is chosen by us, on the basis of convenience for whatever purposes we have.

((Connectivities)) If a relationship from an entity type to another is “1:1”

then at any moment the actual relation is one-to-one (1-1).

If the relationship is “1:M”

then the relation at any moment may be one-to-many (but may by chance be one-to-one).

If the relationship is “M:N”

then the relation at any moment may be many-to-many (but may by chance be one-to-many, many-to-one or one-to-one).

((Optionality/Mandatoriness))

If a relationship from an entity type E to another type F is mandatory then

the relation at any moment after restriction to the set of entities currently in E is total.

If a relationship from an entity type E to another type F is optional then

the relation at any moment is not required to be total in the above restricted sense (but may happen to be).

((Another Caution)) A one-to-one correspondence between a set A and B is

a SPECIAL one-to-one relation from A to B (or B to A): it is not only one-to-one but also TOTAL (on A) and ONTO (B). (Or we can say: total on both A and B.)

But any 1-1 relation from A to B is a 1-1 correspondence between the subsets of A, B consisting of those members that do happen to feature in the relation!

A 1-1 relation induced by a single table will almost certainly NOT be a one-to-one correspondence between whole attribute domains!

EXAM(May/June;

fine detail for Resit may differ)

See notes about past exams on the module website.

Structure of Exam

One and a half hours.

Four questions.

DO THREE.

Material on mathematical relations and relational algebra is only used in one of the four Questions. (That Question may also involve other material.)

The Remaining Three Questions

They will range from precise technical things to more general considerations.

SQL query expressions required in several parts of the three questions. Amount to about 28% of the marks for those three questions.

Extra credit of 8% is available in one of the questions for providing SQL create expressions.

Extra credit of 8% is available in one of the questions for providing creative ERD notation suggestions.

Material Needed for Exam

Anything in the required textbook reading may be useful in the exam, except of course that a detailed memory of specific, data-full examples is not expected, and except for some SQL detail (see next slide).

You need to study all Additional Notes, except that: The exam will not rely on the treatment of functional dependencies and

normalization there (in the 1st of the three parts in the Week 9 batch).

The exam will not rely on material on physical design (in the 3rd of the three parts in the Week 9 batch).

You need to study all Exercise Answer Notes.

The content of the demonstrators’ lectures on experiences in industry will not be relied upon (but of course may help you in overall understanding of some issues).

Textbook Parts (R&C 2009)

See my module website (top page).

On SQL: the exam doesn’t rely on fine detail beyond what’s in the handouts (and occasional lectures).

Talk by demonstratoron

experiences in industry

Funmi Faniyi

(about 30 mins incl. questions)