Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School...

50
Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

Transcript of Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School...

Page 1: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Fundamentals/ICY: Databases2010/11

WEEK 11

John BarndenProfessor of Artificial Intelligence

School of Computer ScienceUniversity of Birmingham, UK

Page 2: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Today

Maths

Structure of Exam

Lecture by Funmi on his industrial experience

Page 3: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Remember:((Items in double round brackets

are optional material))

Page 4: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Reminder of Week 10on Normalization and Relational

Operators

Page 5: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Summary:Normalization and Database Design

Normalization helps eliminate data redundancies and some other aspects of poor structure.

Normalization focusses on problems in individual entity types.

Difficult to separate normalization from overall ER modelling process.

Normalization cannot, by itself, guarantee good designs.

1NF, 2NF, and 3NF are the most commonly encountered, and 3NF is often enough, but BCNF, 4NF etc. may also need to be considered.

Non-normalized tables may be desirable in some cases, to increase processing speed and/or reduce conceptual complexity of operations.

Page 6: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Natural Join (continued) SQL:

SELECT …all the attributes but including only one version of each shared one … FROM T1, T2 WHERE … explicit condition of equalities for ALL the shared attributes ...

SELECT * FROM T1 NATURAL JOIN T2;

Relational algebra notation: Result table is T1 T2 where T1 and T2 are the given tables.

is the “bow tie” symbol.

Page 7: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Outer Join Of CUSTOMER and AGENT, using equal AGENT_CODE

Left outer Uses all the rows in the CUSTOMER table, by doing equijoin on

AGENT_CODE but also including non-matching CUSTOMER rows.

Right outer Uses all the rows in the AGENT table, doing equijoin on

AGENT_CODE but also including non-matching AGENT rows.

Full outer Using all the rows in the AGENT and CUSTOMER tables, doing

equijoin on AGENT_CODE but also including non-matching rows from each table.

Union of Left Outer Join result and Right Outer Join result.

Page 8: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Outer Joins (continued)

SQL: SELECT * FROM T1, T2 WHERE … explicit join condition … UNION … a SELECT expression that gets the extra LEFT rows UNION … a SELECT expression that gets the extra RIGHT rows

SELECT * FROM T1 LEFT/RIGHT/FULL JOIN T2 USING (… some shared attribs …) // ON … explicit join cond …

Relational algebra notation: Variants of bow tie symbol . See R,C&Crockett sec. 4.2.3 (though their symbols need a

subscript stating the join condition unless natural).

Page 9: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

DIVIDE operation: optional

Page 10: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Reminder of Week 9on Mathematical Background

Page 11: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Relation from a Table

The relation at the moment is

‘9568876A’, ‘Chopples’, 37 > ‘2544799Z’, ‘Blurp’, NULL > ‘1698674F’, ‘Rumpel’, 88 >

PERS-ID NAME AGE

9568876A Chopples 37

2544799Z Blurp

1698674F Rumpel 88

People

Page 12: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

A Table as a Relation? People loosely talk about tables being relations.

This is mathematically inaccurate for several reasons:1) The table properly speaking includes not just the rows but also

the attribute names themselves, their domains, specification of primary and foreign keys, etc.

2) It’s only the rows at any given moment that form a relation. When a value in the table changes or a row is added or deleted, the mathematical relation is replaced by a different one.

3) Relations do not cater for tables with repeated rows.

• ((But there is a more advanced notion of relation, based on “bags” rather than sets, that does cater for repeated rows.))

But OK if you know what you (and those people) mean.

Page 13: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

New for Week 10on Mathematical Background

Page 14: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Some “Relational Operations”:Set Operations Applied to Relations

Union of relations R and S:

R S = the set of tuples that are in R or S (or both).

NB: no repetitions created!

Intersection of relations R and S:

R S = the set of tuples that are in both R and S.

Difference of relations R and S:

R S = the set of tuples that are in R but not S.

Page 15: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Relational Operations: contrast to SQL

Those operations do NOT themselves require R and S to have similar tuples in order to be well-defined.

E.g., R could be binary and on integer sets, S could be ternary and on character-string sets.

But the corresponding DB table operations (which are usually called “relational operators”) do require the tables to have the same shape (same number of columns, same domains for corresponding columns).

Page 16: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((“Relations don’t remember where they came from”))

Consider a relation R on A, B, C, D, E, … i.e., R A B C D E ….

Suppose A AA, B BB, C CC, etc.

Then: a tuple formed from sets A, B, … is also automatically a tuple formed from AA, BB, …

That is, R AA BB CC DD EE ….

So R is also a relation on AA, BB, CC, DD, EE, ….

So a relation has no very close connection to the original sets it might have been defined from, unlike the case of tables, where the attribute domains are part of the nature of the table.

Page 17: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

“Arity” of Relations

A relation on two sets is binary, on three sets is ternary, … … even when not all the sets are different.

So a relation on A and A is still binary and NOT “unary.” The members of the relation are two-element tuples.

A relation on, say, A, B and A is ternary and not binary. The members of the relation are three-element tuples.

Page 18: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

“Arity” of Relations, contd.

A “unary relation” on A is a set of singleton tuples formed from A elements.

Unusual (though not inconceivable) to want a single-attribute table in a finalized ER model.

But one-attribute tables often arise dynamically from table operations, as you know.

Page 19: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Relations from Somewhere to Somewhere

A relation R “from” set A “to” set B is the same thing as a relation “on” A “and” B — just different terminology.

Similarly, a relation from A, B, C to D, E is the same thing as a relation on A, B, C, D, E.

Page 20: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Changing the Sets in a Relation Around

A relation R on A, B, C, D, E, say, obviously “induces” (i.e., gives rise to, in a natural way) a relation on any reordering of the sets, such as D, A, B, E, C, just by reordering each tuple in the same way.

Thus, R induces a relation from, say, D, A to B, E, C.

((When there are just two sets A and B, the (only possible) reordering of the sets gives the inverse of R.))

Page 21: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Removing some of the Sets in a Relation(Projection)

And we can remove some of the sets and the corresponding items from each tuple.

Given the relation on D,A,B,E,C, we can get a relation on, say, D,B,C, just by removing the second and fourth item from each tuple.

This is the mathematical operation underlying the PROJECT relational operator on tables (what I would prefer call Select-Columns or Select-Attributes).

Page 22: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Functional Relations(Partial Functions)

A relation R from A to B is functional if, for any a in A, there is AT MOST one (but perhaps no) b in B such that a, b> is in R.

So several things in A can be related to the same thing in B.

But you can’t have several things in B related to the same thing in A.

A functional relation from A to B is also called a partial function from A to B.

Page 23: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Functional Relations, contd.

Can generalize:

a relation R from A1, A2, A3 … to B1, B2, B3, …is functional if,

for each combination of things a1, a2, a3, … in A1, A2, A3, … respectively,

there is at most one b1, b2, b3, … in B1, B2, B3, … respectively such that a1, a2, a3, …, b1, b2, b3, …> is in R.

Page 24: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Functional Relations arising from Functional Dependencies

Suppose attribute X is functionally dependent on (= determined by) attributes A, B, … in a table.

Then, at any moment, the induced relation from A, B to X is a partial function from the A, B, … value domains to the X value domain.

Special case:

Consider any superkey (e.g., the primary key) of a table.

Then the relation in the table at a given moment is a partial function from the superkey’s domains to the remaining attribute domains.

Page 25: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Caution The word “partial” in the phrase “partial function” has

nothing to do with the word “partial” in “partial dependency” as discussed under Normalization.

Any dependency relationship in a table gives us a partial function, irrespective of whether the dependency is also “partial” in the special sense of involving only a part of the PK.

Page 26: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Remaining material on relations is optional

Page 27: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Restriction of a Relation))

Consider a relation R from A to B,

and a subset AA of A.

Then the restriction of R to AA is the relation derived from R by restricting attention to AA,

i.e., including only tuples whose first element is in AA.

The new relation is notated R|R|AAAA

Page 28: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Restriction More Generally))

Consider a relation R from A, B, …C to D, E, …, F

and subsets AA of A, BB of B, …, CC of C.

Then the restriction of R to AA, BB, …, CC is the relation derived from R by restricting attention to AA, BB, …, CC

i.e., including only tuples whose first few elements are in AA, BB, …, CC respectively.

The new relation is notated R|R|AA, BB, …, CCAA, BB, …, CC

Page 29: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Totality of Relations))

A relation R from A to B is total (on A) if it relates everything in A to at least one thing in B.

I.e., for every member a of A, there is at least one b in B such that

a, b >> is in R.

A relation may be merely partial (on A above) in not being total. However, technically all relations are “partial”, with total being a special case.

Page 30: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Totality, contd.))

Can generalize:

A relation R from A, B, C, … to D, E, … is total (on A, B, C, …)

if for every member a of A, b of B, c of C, etc.

there is at least one d in D, e in E, etc. such that

a, b, c, …, d, e, … >> is in R.

Page 31: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Partiality of Table Relations))

The relation in a table (at a given moment), considered as a relation from any of its attribute value domains to the remaining value domains, will almost always be merely partial.

This is simply because it’s highly unlikely that all possible combinations of values from the former collection of value domains will appear in the table!!

Page 32: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Functions)) A total functional relation from A to B is called a function from

A to B.

Each thing in A is related to exactly one thing in B. (But two different things in A can be related to the same thing in B, and not everything in B needs to be related to anything in A. So the inverse relation is not necessarily either functional or total.)

Caution: every function is also a partial function.

Page 33: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((From Partiality to Totality by Restriction))

We can always turn a merely-partial R from A to B into a total one by slimming A down enough! Just remove the members of A that aren’t related to anything by R, to get a new set AA. We don’t remove any tuples from R.

R (as a relation from AA to B) is total on AA.

And note that R|AA = R.

AA is called the domain of R, notated dom(R). Not to be confused with “value domains” of DB entity attributes.

Can generalize the above to non-binary relations.

Page 34: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Totality contd. and “Onto”))

A relation R from A to B is onto if for everything in B there is at least one thing in A that is related by R to it. I.e.:

For every member b of B,

there is at least one a in A such that a, b> is in R.

Onto-ness is just totality in the other direction.

You can also say that R is total on B, or that the inverse of R is total.

Page 35: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Other Categories of Relation)) A relation R from A to B is one-to-one (1-1) if, for any a in A,

there is at most one b in B such that a, b> is in R, AND for any b in B, there is at most one a in A such that a, b> is in R.

That is, both the relation and its inverse from B to A are functional. (But they don’t need to be total.)

To put it another way: it is functional and different members of A map to (= are related to) different members of B.

Or again: Different members of A map to different members of B and different members of B map to different members of A.

Page 36: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

A relation R from A to B is many-to-one if it is functional but not one-to-one: i.e., there are different members of A that map to the same member of B, in at least one case.

A relation R from A to B is one-to-many if it is not functional but its inverse from B to A is functional. That is, there’s a member of A that maps to more than one member of B; but each member of B maps to at most one member of A.

A relation R from A to B is many-to-many if neither it nor its inverse is functional: i.e., there’s a case of a member of A mapping to more than one member of B, and a case of a member of B mapping to more than one member of A.

((Other Categories of Relation, contd.))

Page 37: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Relations from Entity Relationships)) Surprisingly … the concentration on mathematical relations in

introductory accounts of “relational” DBs is on

a relation as arising from each single table (entity type),

despite …

the importance of “relationships” between entity types in Entity-Relationship modelling!

However, between-entity-type relationships also correspond to mathematical relations, distinct from the ones within individual tables.

Page 38: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Intuitively ...))

Recall that for each entity type there is the set of possible entities of that type (the entity set).

A “relationship” between two (or more) entity types/sets is a description of the fact that at any given moment the database stores a particular mathematical relation on the entity sets.

E.g., the EMPLOYED-BY relationship from the People entity type to the Organizations entity type says that the database (at any moment) stores a relation on the People entity set and Organizations entity set.

Page 39: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Example Continued)) So at any given moment the relation might be

{Person1, Org1>, Person2, Org1>, Person3, Org1>,

Person4, Org2>, Person3, Org2>}

Each Person… and Org… is an entity represented as a row of the corresponding table …... therefore itself mathematically represented as a tuple of attribute values:

So Person1, Org1> could be, in more detail,

E156, ‘Sam’, ‘Finks’, I678>, I678, ‘IBM’, ‘USA’> >

Note the nested tuples.

Page 40: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Bridging Entity Types)) Recall that bridging entity types are brought in to

represent M:N relationships (and similarly M:N:P relationships, etc.)

People/Organizations again: the relation within the bridging table would look like

{ E156, I678>, E257, I996>, E714, I678>, … }.

This relation can also be said to correspond to the original People-Organization relationship, but is abstracted from the above relation by replacing tuples representing entities, such as E156, ‘Sam’, ‘Finks’, I678>, by the PK values in them, such as E156.

Page 41: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Bridging Entity Types (contd.) ))

But now … what about the relationship between the People and Organization entity types and the bridging entity type!! (Exercise)

And note: We could have chosen to use the bridging-entity-style relation to begin with as our mathematical formulation of the People/Organization relationship.

A mathematical formulation is not objectively given by the world … it is chosen by us, on the basis of convenience for whatever purposes we have.

Page 42: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Connectivities)) If a relationship from an entity type to another is “1:1”

then at any moment the actual relation is one-to-one (1-1).

If the relationship is “1:M”

then the relation at any moment may be one-to-many (but may by chance be one-to-one).

If the relationship is “M:N”

then the relation at any moment may be many-to-many (but may by chance be one-to-many, many-to-one or one-to-one).

Page 43: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Optionality/Mandatoriness))

If a relationship from an entity type E to another type F is mandatory then

the relation at any moment after restriction to the set of entities currently in E is total.

If a relationship from an entity type E to another type F is optional then

the relation at any moment is not required to be total in the above restricted sense (but may happen to be).

Page 44: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

((Another Caution)) A one-to-one correspondence between a set A and B is

a SPECIAL one-to-one relation from A to B (or B to A): it is not only one-to-one but also TOTAL (on A) and ONTO (B). (Or we can say: total on both A and B.)

But any 1-1 relation from A to B is a 1-1 correspondence between the subsets of A, B consisting of those members that do happen to feature in the relation!

A 1-1 relation induced by a single table will almost certainly NOT be a one-to-one correspondence between whole attribute domains!

Page 45: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

EXAM(May/June;

fine detail for Resit may differ)

See notes about past exams on the module website.

Page 46: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Structure of Exam

One and a half hours.

Four questions.

DO THREE.

Material on mathematical relations and relational algebra is only used in one of the four Questions. (That Question may also involve other material.)

Page 47: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

The Remaining Three Questions

They will range from precise technical things to more general considerations.

SQL query expressions required in several parts of the three questions. Amount to about 28% of the marks for those three questions.

Extra credit of 8% is available in one of the questions for providing SQL create expressions.

Extra credit of 8% is available in one of the questions for providing creative ERD notation suggestions.

Page 48: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Material Needed for Exam

Anything in the required textbook reading may be useful in the exam, except of course that a detailed memory of specific, data-full examples is not expected, and except for some SQL detail (see next slide).

You need to study all Additional Notes, except that: The exam will not rely on the treatment of functional dependencies and

normalization there (in the 1st of the three parts in the Week 9 batch).

The exam will not rely on material on physical design (in the 3rd of the three parts in the Week 9 batch).

You need to study all Exercise Answer Notes.

The content of the demonstrators’ lectures on experiences in industry will not be relied upon (but of course may help you in overall understanding of some issues).

Page 49: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Textbook Parts (R&C 2009)

See my module website (top page).

On SQL: the exam doesn’t rely on fine detail beyond what’s in the handouts (and occasional lectures).

Page 50: Fundamentals/ICY: Databases 2010/11 WEEK 11 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Talk by demonstratoron

experiences in industry

Funmi Faniyi

(about 30 mins incl. questions)