Copyright © C. J. Date 2005page 49 SAMPLE QUERIES : l Query A:Get S#-FROM-TO triples for suppliers...

48
Copyright © C. J. Date 2005 page 1 SAMPLE QUERIES : Query A: Get S#-FROM-TO triples for suppliers who have been able to supply at least one part during at least one interval of time, where FROM and TO together designate a maximal interval during which supplier S# was in fact able to supply at least one part Query B: Get S#-FROM-TO triples for suppliers who have been unable to supply any parts at all during at least one interval of time, where FROM and TO together designate a maximal interval during which supplier S# was in fact unable to supply any part at all You’ve got to be joking!

Transcript of Copyright © C. J. Date 2005page 49 SAMPLE QUERIES : l Query A:Get S#-FROM-TO triples for suppliers...

Copyright © C. J. Date 2005 page 1

SAMPLE QUERIES :

Query A: Get S#-FROM-TO triples for suppliers who havebeen able to supply at least one part during at least one interval of time, where FROM and TO together designate a maximal interval during which supplierS# was in fact able to supply at least one part

Query B: Get S#-FROM-TO triples for suppliers who have beenunable to supply any parts at all during at least one interval of time, where FROM and TO together designate a maximal interval during which supplier S# was in fact unable to supply any part at all

You’ve got to be joking!

Copyright © C. J. Date 2005 page 2

TO SUM UP :

"Temporal" constraints and queries can be expressed,but they quickly get very complicated indeed

We need some carefully thought out and well-designed shorthands …

… which typically don’t exist yet in today’s commercial DBMSs, of course

So let’s investigate!

Copyright © C. J. Date 2005 page 3

2. LAYING THE FOUNDATIONS :

Time and the DB

What’s the problem?

Intervals

Interval operators

The EXPAND and COLLAPSE operators

The PACK and UNPACK operators

Relational operators

Copyright © C. J. Date 2005 page 4

INTERVALS :

Crucial observation: Need to deal with intervals as such (i.e., as values in their own right), instead of as pairs of FROM-TO values

But what’s an interval?

Consider proposition: "Supplier S1 was able to supply part P1 from day 4 to day 10"

Does interval "from day 4 to day 10" include day 4? day 10?

Given some interval, we sometimes want to regard specified begin and end points as included and sometimes not

Copyright © C. J. Date 2005 page 5

[d04:d10] -- closed-closed = d04 d05 d06 d07 d08 d09 d10[d04:d11) -- closed-open = d04 d05 d06 d07 d08 d09 d10(d03:d10] -- open-closed = d04 d05 d06 d07 d08 d09 d10(d03:d11) -- open-open = d04 d05 d06 d07 d08 d09 d10

Closed-open is convenient and most often used in practice:

E.g., split [d04:d11) immediately before, say, day 7 … result is [d04:d07) and [d07:d11)

But closed-closed is most intuitive and we’ll favor it throughout this presentation

Copyright © C. J. Date 2005 page 6

FULLY TEMPORALIZING SUPPLIERS AND SHIPMENTSUSING INTERVALS :

S_DURING SP_DURING

S# DURING S# P# DURING

S1 [d04:d10] S1 P1 [d04:d10]S2 [d02:d04] S1 P2 [d05:d10]S2 [d07:d10] S1 P3 [d09:d10]S3 [d03:d10] S1 P4 [d05:d10]S4 [d04:d10] S1 P5 [d04:d10]S5 [d02:d10] S1 P6 [d06:d10]

S2 P1 [d02:d04]S2 P1 [d08:d10]S2 P2 [d03:d03]S2 P2 [d09:d10]S3 P2 [d08:d10]S4 P2 [d06:d09]S4 P4 [d04:d08]S4 P5 [d05:d10]

Copyright © C. J. Date 2005 page 7

PREDICATES : S_DURING:

From the begin point of DURING to the end point of DURING inclusive (and not immediately before the begin point of DURING or immediately after the end point of DURING), supplier S# was under contract

SP_DURING:

From the begin point of DURING to the end point of DURING inclusive (and not immediately before the begin point of DURING or immediately after the end point of DURING), supplier S# was able to supply part P#

Copyright © C. J. Date 2005 page 8

SOME IMMEDIATE ADVANTAGES :

Constraints to prohibit FROM-TO pairs in which TO < FROM are now unnecessary ("FROM < TO" is implicit)

Primary keys {S#,DURING} (for S_DURING), {S#,P#,DURING}

(for SP_DURING)—choice no longer arbitrary

Don’t need to worry about whether FROM-TO intervals in previous version of DB are open or closed wrt FROM and TO

[d04:d10], [d04:d11), (d03:d10], (d03:d11) are distinct "possreps" for the very same interval* —don’t need to know which, if any, is actual physical representation

* See The Third Manifesto

Copyright © C. J. Date 2005 page 9

INTERVALS AREN’T NECESSARILY TEMPORAL :

Tax brackets are represented by taxable income ranges (intervals whose contained points are money values)

Machines operate within certain temperature and voltage ranges (intervals whose contained points are

temperatures and voltages, respectively)

Animals vary in the range of frequencies of light and sound waves to which their eyes and ears are receptive

Various natural phenomena occur in ranges in depth of soil or sea or height above sea level

Etc., etc.

Copyright © C. J. Date 2005 page 10

POINT TYPES AND INTERVAL TYPES :

Granularity of interval [d04:d10]

= one day

= granularity of type DATE

Assume DATE is a builtin type representing Gregorian dates: i.e., points on timeline accurate to one day (granularity thus one day by definition)

Exact type of interval value [d04:d10] = INTERVAL_DATE

Copyright © C. J. Date 2005 page 11

POINT TYPES AND INTERVAL TYPES (cont.) :

INTERVAL is a type generator

With associated generic interval operators (see later) and constraints

DATE is the point type of this particular interval type

Determines specific set of interval values that make up this particular interval type

Namely, the set of all possible intervals of the form [di:dj], where di and dj are DATE values and di < dj

Copyright © C. J. Date 2005 page 12

FURTHER EXAMPLES :

INTERVAL_INTEGER

Values are intervals of the form [i:j], where i and j are INTEGER values and i < j

Granularity = one (unity)

INTERVAL_TIMESTAMP

Values are intervals of the form [ti:tj], where ti and tj are TIMESTAMP values and ti < tj

Granularity = one microsecond (assume TIMESTAMP values are accurate to the microsecond)

Copyright © C. J. Date 2005 page 13

POINT TYPES :

Type T can be used as a point type if all of the following are defined for it:

A total ordering (">" etc. available for any pair of values of type T)

Niladic FIRST and LAST operators

Monadic NEXT and PRIOR operators (can fail)

NEXT = successor function

Successor function assumed unique

Thus, e.g., DATE is a valid point type

Copyright © C. J. Date 2005 page 14

Informal notation: Successor of d = d+1Predecessor of d = d-1

Successor function is what enables us to determine what points are contained in any given interval

E.g., if i = [d04:d10], contained points are exactly d04, d04+1, d04+2, …, d10

Successor function for type DATE = "next day"

Copyright © C. J. Date 2005 page 15

INTERVALS :

Let T be a point type. Then an interval (value) i of type INTERVAL_T is a scalar value for which two monadic ops, BEGIN and END, and one dyadic op, , are defined, such that:

BEGIN(i) and END(i) each return a value of type T

BEGIN(i) < END(i)

If p is a value of type T, then p i is true if and only if BEGIN(i) < p and p < END(i) are both true

Note that intervals are always nonempty

Copyright © C. J. Date 2005 page 16

A MORE SEARCHING EXAMPLE :A RELATION WITH TWO INTERVAL ATTRIBUTES

S_PARTS_DURING

S# PARTS DURING

S1 [P1:P3] [d01:d04] Not meant to correspondS1 [P2:P4] [d07:d08] in any particular wayS1 [P5:P6] [d09:d09] to sample SP_DURING valueS2 [P1:P1] [d08:d09]S2 [P1:P2] [d08:d08] Note problems: e.g., "S3S2 [P3:P4] [d07:d08] was able to supply P4 on S3 [P2:P4] [d01:d04] days 1-4" appears twiceS3 [P3:P5] [d01:d04]S3 [P2:P4] [d05:d06] Will revisit this example S3 [P2:P4] [d06:d09] laterS4 [P3:P4] [d05:d08]

Copyright © C. J. Date 2005 page 17

2. LAYING THE FOUNDATIONS :

Time and the DB

What’s the problem?

Intervals

Interval operators

The EXPAND and COLLAPSE operators

The PACK and UNPACK operators

Relational operators

Copyright © C. J. Date 2005 page 18

INFORMAL NOTATION :

Point type T, typical value p—use p+1, p+2, p-1, p-2, etc., as shorthands with obvious meanings (a real language would provide NEXT_T / PRIOR_T ops, also FIRST_T / LAST_T)

Interval type INTERVAL_T—use [p1:pn] to denote typical interval selector invocation (a real language would use more explicit syntax— e.g., INTERVAL_T ( [p1:pn] ) )

Let i be the interval [b:e]. Then:

BEGIN(i) and END(i) return b and e, resp.

p i b < p AND p < e

PRE(i) returns b-1 / POST(i) returns e+1 (can fail)

Let i = unit interval [p:p]; POINT FROM i returns p

Copyright © C. J. Date 2005 page 19

ALLEN’S OPERATORS :

i1 = i2

b1 e1i1

b2 e2i2

i1 i2 / i2 i1 (also i1 i2 / i2 i1)

b1 e1i1

b2 e2 i2

Copyright © C. J. Date 2005 page 20

ALLEN’S OPERATORS (cont.) :

i1 BEFORE i2 / i2 AFTER i1

b1 e1 b2 e2i1 i2

i1 MEETS i2 / i2 MEETS i1

b1 e1b2 e2i1 i2

i1 OVERLAPS i2 / i2 OVERLAPS i1

b1 e1i1

b2 e2 i2

Copyright © C. J. Date 2005 page 21

ALLEN’S OPERATORS (cont.) :

i1 MERGES i2 i1 OVERLAPS i2 OR i1 MEETS i2

b1 e1i1

b2 e2i2

Or:

b1 e1b2 e2i1 i2

Copyright © C. J. Date 2005 page 22

ALLEN’S OPERATORS (cont.) :

i1 BEGINS i2

b1 e1i1

b2 e2 i2

i1 ENDS i2

b1 e1i1

b2 e2 i2

Copyright © C. J. Date 2005 page 23

OTHER OPERATORS :

COUNT(i) /* aka DURATION(i) */ returns no. of points in i (i.e., cardinality)

i1 UNION i2 returns [MIN(b1,b2):MAX(e1,e2)] if i1 MERGES i2 and is otherwise undefined /* result is an interval */

b1 e1i1

b2 e2i2

b1 e2 i1 UNION i2

Copyright © C. J. Date 2005 page 24

OTHER OPERATORS (cont.) :

i1 INTERSECT i2 returns [MAX(b1,b2):MIN(e1,e2)] if i1 OVERLAPS i2 and is otherwise undefined

/* result is an interval */

b1 e1i1

b2 e2i2

b2 e1

i1 INTERSECT i2

Copyright © C. J. Date 2005 page 25

OTHER OPERATORS (cont.) : i1 MINUS i2 returns [b1:MIN(b2-1,e1)] if b1 < b2 and e1 < e2,

[MAX(e2+1,b1):e1] if b1 > b2 and e1 > e2, and is otherwise undefined (i.e., undefined if i1 BEGINS i2or i1 ENDS i2 or if either of i1 and i2 properly includes the other) /* result is an interval */

b1 e1i1

b2 e2i2

b1 b2-1

i1 MINUS i2

Copyright © C. J. Date 2005 page 26

SAMPLE QUERIES :

Get supplier numbers for suppliers who were able to supply part P2 on day 8

( SP_DURING WHEREP# = P# (‘P2’)AND d08 DURING ) { S# }

Get pairs of suppliers who were able to supply the same part at the same time

WITH SP_DURING RENAME ( S# AS X#, DURING AS XD ) AS T1 ,SP_DURING RENAME ( S# AS Y#, DURING AS YD ) AS T2 ,T1 JOIN T2 AS T3 ,

( T3 WHERE XD OVERLAPS YD ) AS T4 ,( T4 WHERE X# < Y# ) AS T5 :

T5 { X#, Y# }

Note the use of WITH to introduce names for expressions

Copyright © C. J. Date 2005 page 27

Get pairs of suppliers who were able to supply the same part at the same time, together with the parts and times in question

WITH SP_DURING RENAME ( S# AS X#, DURING AS XD ) AS T1 ,SP_DURING RENAME ( S# AS Y#, DURING AS YD ) AS T2 ,T1 JOIN T2 AS T3 ,

( T3 WHERE XD OVERLAPS YD ) AS T4 ,( T4 WHERE X# < Y# ) AS T5 ,( EXTEND T5 ADD ( XD INTERSECT YD AS DURING ) ) AS T6 :

T6 { X#, Y#, P#, DURING }

Copyright © C. J. Date 2005 page 28

2. LAYING THE FOUNDATIONS :

Time and the DB

What’s the problem?

Intervals

Interval operators

The EXPAND and COLLAPSE operators

The PACK and UNPACK operators

Relational operators

Copyright © C. J. Date 2005 page 29

THE EXPAND AND COLLAPSE OPERATORS :

Work on sets of intervals, not individual intervals (or pairs of intervals) per se

Each takes a set of intervals all of the same type as its single operand and returns another such set as its result

{ […], […], …, […] }

{ […], […], …, […] }

Result in each case is a particular canonical form for the original set

Copyright © C. J. Date 2005 page 30

ASIDE : "CANONICAL FORM"

Given (a) a set S of objects and

(b) a notion of equivalence among such objects

subset C of S is said to be a set of canonical forms for S (under the stated definition of equivalence) if and only if every object s in S is equivalent to just one object c in C

Object c is the canonical form for object s

All "interesting" properties that apply to s also apply to c; thus, we can study just the small set C, not the large set S, in order to obtain or prove a variety of "interesting" results

Copyright © C. J. Date 2005 page 31

EXPANDED FORM :

The objects we wish to study are sets of intervals, where the intervals are all of the same type

Let X1 and X2 be two such sets. Define equivalence:

X1 and X2 are equivalent if and only if set of all points in intervals in X1 =set of all points in intervals in X2

E.g.:

X1 = { [d01:d01], [d03:d05], [d04:d06] }

X2 = { [d01:d01], [d03:d04], [d05:d05], [d05:d06] }

Copyright © C. J. Date 2005 page 32

EXPANDED FORM (cont.) :

Corresp set of points = { d01, d03, d04, d05, d06 }

But we’re more interested in corresp set of unit intervals:

X3 = { [d01:d01], [d03:d03], [d04:d04],[d05:d05], [d06:d06] }

X3 is equivalent to both X1 and X2 (it’s the expanded form of both)

If X is a set of intervals all of the same type, then the expanded form of X is the set of all intervals of the form [p:p] where p is a point in some interval in X

Copyright © C. J. Date 2005 page 33

EXPANDED FORM (cont.) :

Given any such set X, a corresponding expanded form always exists; expanded form is equivalent to X and is unique

Expanded form of X is one possible canonical form for X

Unique set equivalent to X such that every contained interval is of minimum possible duration (viz., one)

X1 X2 if and only if they have the same expanded form

Intuitively, the expanded form of X allows us to focus on the information content of X at an atomic level, without worrying about the many different ways that information might be bundled together into clumps

Copyright © C. J. Date 2005 page 34

COLLAPSED FORM :

X1 = { [d01:d01], [d03:d05], [d04:d06] }

X2 = { [d01:d01], [d03:d04], [d05:d05], [d05:d06] }

X3 = { [d01:d01], [d03:d03], [d04:d04],[d05:d05], [d06:d06] }

Expanded form here (X3) has greatest cardinality: fluke!

X4 = { [d01:d01], [d03:d03], [d03:d04], [d03:d05],[d03:d06], [d04:d04], [d04:d05], [d04:d06] }

X4 has same expanded form but greater cardinality than X3

Copyright © C. J. Date 2005 page 35

COLLAPSED FORM (cont.) :

X5 = { [d01:d01], [d03:d06] }

X5 has same expanded form but minimum possible cardinality (it’s the collapsed form of X1, X2, X3, X4)

If X is a set of intervals all of the same type, then the collapsed form of X is the set Y of intervals of the sametype such that:

X and Y have the same expanded form, and

No two distinct intervals i1 and i2 in Y are such that i1 MERGES i2 is true

Copyright © C. J. Date 2005 page 36

COLLAPSED FORM (cont.) :

Given any such set X, a corresponding collapsed form always exists; collapsed form is equivalent to X and is unique

Collapsed form of X is another possible canonical form for X

Unique set equivalent to X that has the minimum possible cardinality

X1 X2 if and only if they have the same collapsed form

Intuitively, the collapsed form of X allows us to focus on the information content of X in a compressed (clumped) form, without worrying about the possibility that clumps might overlap or abut

Copyright © C. J. Date 2005 page 37

LET X BE A SET OF INTERVALS ALL OF THE SAME TYPE :

EXPAND ( X ) : returns expanded form of X

COLLAPSE ( X ) : returns collapsed form of X

(What happens if X has cardinality zero? Or one?)

Ops are not inverses of each other! In fact:

EXPAND ( COLLAPSE ( X ) ) EXPAND ( X )

COLLAPSE ( EXPAND ( X ) ) COLLAPSE ( X )

Copyright © C. J. Date 2005 page 38

NOW I NEED TO CLEAN UP MY ACT !

Relational model doesn’t support general sets, it supports relations!

However, a set of values v1, v2, …, vn all of the same type can easily be converted into a unary relation:

RELATION { TUPLE { A v1 } , TUPLE { A v2 },………………… , TUPLE { A vn } }

Returns: Arelation

v1 selectorv2 invocation. .vn

Copyright © C. J. Date 2005 page 39

So let us replace EXPAND and COLLAPSE as previously described by versions in which argument is specified as a unary relation

r EXPAND(r) COLLAPSE(r)

DURING DURING DURING

[d06:d09] [d01:d01] [d01:d01] [d04:d08] [d04:d04] [d04:d10] [d05:d10] [d05:d05] [d01:d01] [d06:d06]

[d07:d07] [d08:d08] [d09:d09] [d10:d10]

Extend definition of equivalence accordingly

Copyright © C. J. Date 2005 page 40

EXPANDING / COLLAPSING NULLARY RELATIONS :

Nullary relation has no attributes at all …

There are exactly two such relations!

TABLE_DEE has just one tuple (the "0-tuple")

TABLE_DUM has no tuples at all

Highly desirable to define versions of EXPAND and COLLAPSE for nullary relations (see later)

Definition: Result = input in every case

Copyright © C. J. Date 2005 page 41

2. LAYING THE FOUNDATIONS :

Time and the DB

What’s the problem?

Intervals

Interval operators

The EXPAND and COLLAPSE operators

The PACK and UNPACK operators

Relational operators

Copyright © C. J. Date 2005 page 42

THE PACK AND UNPACK OPERATORS :

Preliminaries The PACK operator The UNPACK operator Sample queries Packing and unpacking on no attributes Packing and unpacking on several attributes

Copyright © C. J. Date 2005 page 43

PRELIMINARY EXAMPLE :

UNPACK r PACK r r ON DURING ON DURING

S# DURING S# DURING S# DURING

S2 [d02:d04] S2 [d02:d02] S2 [d02:d05] S2 [d03:d05] S2 [d03:d03] S4 [d02:d06] S4 [d02:d05] S2 [d04:d04] S4 [d09:d10] S4 [d04:d06] S2 [d05:d05] S4 [d09:d10] S4 [d02:d02]

S4 [d03:d03] S4 [d04:d04] S4 [d05:d05] S4 [d06:d06] S4 [d09:d09] S4 [d10:d10]

Copyright © C. J. Date 2005 page 44

RECALL QUERY A :

Get S#-FROM-TO triples for suppliers who have been able to supply at least one part during at least one interval of time, where FROM and TO together designate a maximal interval during which supplier S# was in fact able to supply at least one part

Revised version:

Get S#-DURING pairs for suppliers who have been able to supply at least one part during at least one interval of time, where DURING designates a maximal interval during which supplier S# was in fact able to supply at least one part

We will build up our formulation one step at a time ...

Copyright © C. J. Date 2005 page 45

S# P# DURING

S1 P1 [d04:d10]S1 P2 [d05:d10]S1 P3 [d09:d10]S1 P4 [d05:d10]S1 P5 [d04:d10]S1 P6 [d06:d10]S2 P1 [d02:d04]S2 P1 [d08:d10]S2 P2 [d03:d03]S2 P2 [d09:d10]S3 P2 [d08:d10]S4 P2 [d06:d09]S4 P4 [d04:d08]S4 P5 [d05:d10]

SP_DURING (SAMPLE VALUE) :

Copyright © C. J. Date 2005 page 46

PROJECT AWAY PART NUMBERS :

S# DURING

S1 [d04:d10] WITH SP_DURING { S#, DURING }S1 [d05:d10] AS T1 :S1 [d09:d10] /* part numbers irrelevant */S1 [d06:d10]S2 [d02:d04]S2 [d08:d10] Note the redundancy—S2 [d03:d03] E.g., "Supplier S1 was able to S2 [d09:d10] supply something on day 6"S3 [d08:d10] appears three times!S4 [d06:d09]S4 [d04:d08]S4 [d05:d10]

T1

Copyright © C. J. Date 2005 page 47

DESIRED RESULT (ELIMINATING REDUNDANCY) :

S# DURING

S1 [d04:d10] Packed formS2 [d02:d04] of T1 on DURING:S2 [d08:d10]S3 [d08:d10] "PACK T1 ON DURING"S4 [d04:d10]

Note: Given DURING value for given supplier in RESULT does not necessarily exist as an explicit DURING value for that supplier in T1 (see, e.g., S4)

Copyright © C. J. Date 2005 page 48

WITH ( T1 GROUP { DURING } AS X ) AS T2 :

S# X Attribute X is relation-valued!

S1 DURING

[d04:d10][d05:d10][d09:d10] S3 DURING[d06:d10]

[d08:d10]S2 DURING

S4 DURING[d02:d04][d08:d10] [d06:d09][d03:d03] [d04:d08][d09:d10] [d05:d10]

T2

Next..