SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in...

52
SQL, RA, Sets, Bags • Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA uses sets – SQL uses bags (multisets) • There are good performance reasons for using bags: – Queries involve 2+ join, union, etc., which would require an extra pass through the relation being built – There are times we WANT every instance, particularly for aggregate functions (e.g. taking an average) • Downside: – Extra memory

Transcript of SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in...

Page 1: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

SQL, RA, Sets, Bags• Fundamental difference between theoretical RA

and practical application of it in DBMSs and SQL– RA uses sets– SQL uses bags (multisets)

• There are good performance reasons for using bags:– Queries involve 2+ join, union, etc., which would require

an extra pass through the relation being built– There are times we WANT every instance, particularly

for aggregate functions (e.g. taking an average)• Downside:– Extra memory

Page 2: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

• Section 5.1 Topics include:– Union, Difference, Intersection and how they are

affected by operation over bags– Projection operator over bags– Selection operator over bags– Product and join over bags

• All the above follow what you would expect• Other topics in 5.1:– Algebraic laws of set operators applied to bags

Page 3: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Examples: set operators over bags• {1,2,1} ∪ {1,1,2,3,1} =– {1,1,1,1,1,2,2,3}

• {1,2,1,1} ∩ {1,2,1,3} = – {1, 1, 2}

• {1,2,1,1,1} – {1,1,2,3} =– {1,1}

Page 4: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercise 5.1.3a

Page 5: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercise 5.1.3b

• πbore(Ships |><| Classes)

Page 6: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

More relational algebra

Page 7: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

δ – Duplicate elimination• δ(R) – Eliminate duplicates from relation R– (i.e. converts a relation from a bag to set

representation)• R2 := δ(R1)– R2 consists of one copy of each tuple that appears

in R2 one or more times• DISTINCT modifier in SELECT stmt

Page 8: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

δ - Example

R = ( A B )1 23 41 2

δ(R) = A B1 23 4

Page 9: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

τ – Sorting • R2 := τL(R1)– L – list of some attributes of R1– L specifies the order of sorting

• Increasing order

– Tuples with identical components in L specify no order• Benefit:– Obvious – ordered output– Not so obvious – stored sorted relations can have

substantial query benefit • Recall running time for binary search• O(log n) is far superior than O(n)

Page 10: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Aggregation Operators• Use to summarize something about the values

in attribute of a relation– Produces a single value as a result

• SUM(attr)• AVG(attr)• MIN(attr)• MAX(attr)• COUNT(attr)

Page 11: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example: Aggregation

R = ( A B )1 33 43 2

SUM(A) = 7COUNT(A) = 3MAX(B) = 4AVG(B) = 3

SUM(A), COUNT(A), MAX(B), AVG(B) = ?

Page 12: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Grouping Operator• R2 := γL(R1)• L is a list of elements that are: – Individual attributes of R1• Called grouping attributes

– Aggregated attribute of R1• Use an arrow and a new name to rename the

component

– R2 projects only what is in L

Page 13: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

How does γL(R) work?1. Form one group for each distinct list of values

for those attributes in R2. Within each group, compute AGG(A) for each

aggregation on L3. Result has one tuple for each group– The grouping attributes' values for the group– The aggregations over all tuples of the group (for

the aggregated attributes)

Page 14: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example: Grouping / AggregationR = ( A B C )

1 2 34 5 61 2 51 3 5

γA,B,AVG(C)->X (R) = ??First, partition R by A and B :

A B C1 2 31 2 54 5 61 3 5

Then, average C within groups:

A B X1 2 44 5 61 3 5

Page 15: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Note about aggregation• If R is a relation, and R has attributes A1…An,

then– δ(R) == γA1,A2,…,An(R) – Grouping on ALL attributes in R eliminates

duplicates– i.e. δ is not really necessary

• Also, if relation R is also a set, then– πA1,A2,…,An(R) = γA1,A2,…,An(R)

Page 16: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Extended Projection• Recall R2 := πL(R1)– R2 contains only L attributes from R1

• L can be extended to allow arbitrary expressions:– Renaming (e.g., A -> B)– Arithmetic expressions (e.g., A + B -> SUM)– Duplicate attributes (i.e., include in L multiple

times)

Page 17: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example: Extended Projection

R = ( A B )1 23 4

πA+B->C,A,A (R) = C A1 A23 1 17 3 3

Page 18: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Outer joins• Recall that the standard natural join occurs only if

there is a match from both relations • A tuple of R that has NO tuple of S with which it

can join is said to be dangling– Vice versa applies

• Outer join: preserves dangling tuples in join– Missing components set to NULL

• R |>◦<|C S.– This is a bad approximation of the symbol – see text– NO C? Natural outer join

Page 19: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example: Outer Join

R = ( A B ) S = ( B C )1 2 2 34 5 6 7

(1,2) joins with (2,3), but the other two tuplesare dangling.

R |>◦<| S = A B C1 2 34 5 NULLNULL 6 7

Page 20: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Types of outer joins• R |>◦<| S

– No condition, requires matching attributes– Pads dangling tuples from both side

• R |>◦<| L S– Pad dangling tupes of R only

• R |>◦<| R S– Pad dangling tuples of S only

• SQL:– R NATURAL {LEFT | RIGHT} JOIN S– R {LEFT | RIGHT} JOIN S

– NOTE MySQL does not allow a FULL OUTER JOIN! Only LEFT or RIGHT– Just UNION a left outer join and a right outer join… mostly

Page 21: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

A+B A2 B2

1 0 15 4 91 0 16 4 167 9 16

B+1 C-11 03 33 44 31 14 3

Page 22: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

A B0 12 32 43 4

A SUM(B)0 22 73 4

SELECT A,SUM(B) FROM R GROUP BY A

Page 23: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

A023

SELECT A FROM R GROUP BY A;

SELECT DISTINCT A FROM R;

Page 24: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

SELECT A,MAX(C) FROM R NATURAL JOIN SGROUP BY A;

A MAX(C)2 4

What if MAX(C) was SUM(C)?

Page 25: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

SELECT * FROM R NATURAL LEFT JOIN S;

A B C2 3 42 3 40 1 ┴0 1 ┴2 4 ┴3 4 ┴

Page 26: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

SELECT * FROM R NATURAL RIGHT JOIN S;

A B C2 3 42 3 4┴ 0 1┴ 2 4┴ 2 5┴ 0 2

Page 27: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

SELECT * FROM R NATURAL LEFT JOIN SUNIONSELECT * FROM R NATURAL RIGHT JOIN S;

A B C2 3 42 3 40 1 ┴0 1 ┴2 4 ┴3 4 ┴┴ 0 1┴ 2 4┴ 2 5┴ 0 2

Right?

Page 28: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

• SELECT * FROM R NATURAL LEFT JOIN SUNION ALLSELECT * FROM R NATURAL RIGHT JOIN SWHERE A IS NULL;

Page 29: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

A R.B S.B C0 1 2 40 1 2 50 1 3 40 1 3 40 1 2 40 1 2 50 1 3 40 1 3 42 3 ┴ ┴2 4 ┴ ┴3 4 ┴ ┴┴ ┴ 0 1┴ ┴ 0 2

Page 30: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Back to SQL

Page 31: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Aggregations• SUM, AVG, COUNT, MIN, and MAX can be

applied to a column in a SELECT clause– Produces an aggregation on the attribute

• COUNT(*) count the number of tuples• Use DISTINCT inside of an aggregation to

eliminate duplicates in the function

Page 32: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example:• Sells(bar, beer, price)• Find the average price of Guinness– SELECT AVG(price)– FROM Sells– WHERE beer = 'Guinness';

• Find the number of different prices charged for Guinness– SELECT COUNT(DISTINCT price) AS "# Prices"

– FROM Sells– WHERE beer = 'Guinness';

Page 33: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Grouping• SELECT attr(s)FROM tblsWHERE cond_exprGROUP BY attr(s)

• The resulting SELECT-FROM-WHERE relation determined FIRST, then grouped according to GROUP BY clause– MySQL will also sort the relations according to attributes

listed in GROUP BY clause• Therefore, allows optional ASC or DESC (just like ORDER BY)

• Aggregations are applied only within each group

Page 34: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Grouping and NULLS

Page 35: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Note on NULL and Aggregation• NULL values in a tuple:– never contribute to a sum, average or count– can never be a min or max of an attribute

• If all values for an attribute are NULL, then the result of an aggregation is NULL– Exception: COUNT of an empty set is 0

• NULL values are treated as ordinary values when forming groups

Page 36: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example: Grouping• Sells(bar, beer, price)

Frequents(drinker, bar)• Find the average price for each beer

– SELECT beer, AVG(price)– FROM Sells– GROUP BY beer;

• Find for each drinker the average price of Guinness at the bars they frequent– SELECT drinker, AVG(price)– FROM Frequents – NATURAL JOIN Sells– WHERE beer = 'Guinness'– GROUP BY drinker;

Page 37: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Restrictions• Example:– Find the bar that sells Guinness the cheapest– SELECT bar, MIN(price)FROM SellsWHERE beer = 'Guinness';

– Is this correct?• Book states that this is illegal SQL– if an aggregation used, then each SELECT element

should be aggregated or be an attribute in GROUP BY– MySQL allows the above, but such queries will give

meaningless results

Page 38: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example of confusing aggregation

• Find the country of the ship with bore of 15 with the smallest displacement

• SELECT country, MIN(displacement)FROM ClassesWHERE bore = 15;

Page 39: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Not quite the correct answer!

Be sure to follow the rules for aggregation.

Page 40: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)
Page 41: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

HAVING Clause• HAVING cond– Follows a GROUP BY clause– Condition applies to each possible group– Groups not satisfying condition are eliminated

• Rules for conditions in HAVING clause:– Aggregated attributes:

• Any attribute in relation in FROM clause can be aggregated• Only applies to the group being tested

– Unaggregated attributes• Only attributes in GROUP BY list• mySQL is more lenient with this, though they result in

meaningless information

Page 42: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example: HAVING• Sells(bar, beer, price)• Find the average price of those beers that are

served in at least three bars• SELECT beer, AVG(price)FROM SellsGROUP BY beerHAVING COUNT(*) >= 3;

Page 43: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Example: HAVING• Sells(bar, beer, price)

Beers(name, manf)• Find the average price of beers that are either served in at least

three bars or are manufactured by Sam Adams• SELECT beer, AVG(price)• FROM Sells• GROUP BY beer• HAVING COUNT(*) >= 3 OR• beer IN • (SELECT name FROM Beers WHERE manf = 'Sam Adams');

Page 44: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

• Find the average displacement of ships from each country having at least two classes

• SELECT country, AVG(displacement)• FROM Classes• GROUP BY country• HAVING count(*) >= 2;

Page 45: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Summary so far• SELECT S• FROM R1,…,Rn• WHERE C1• GROUP BY a1,…,ak• HAVING C2• ORDER BY b1,…,bk;

– S attributes from R1,…,Rn or aggregates– C1 are conditions on R1,…,Rn– a1,…,ak are attributes from R1,…,Rn– C2 are conditions based on any attribute, or on any

aggregation in GROUP BY clause– b1,…,bk are attributes on R1,…,Rn

Page 46: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercises

Page 47: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercise 6.2.3f

SELECT battleFROM Outcomes INNER JOIN Ships ON Outcomes.ship = Ships.name

NATURAL JOIN ClassesGROUP BY country, battleHAVING COUNT(ship) >= 3;

Page 48: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercise 6.4.7a

• SELECT COUNT(type)FROM ClassesWHERE type = 'bb';

Page 49: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercise 6.4.7b

• SELECT AVG(numGuns) AS 'Avg Guns'FROM ClassesWHERE type = 'bb';

Page 50: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercise 6.4.7c

• SELECT AVG(numGuns) AS 'Avg Guns'FROM Classes NATURAL JOIN ShipsWHERE type = 'bb';

Page 51: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercise 6.4.7d

• SELECT class, MIN(launched) AS First_LaunchedFROM Classes NATURAL JOIN ShipsGROUP BY class;

Page 52: SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Exercise 6.4.7e

• SELECT C.class, COUNT(O.ship) AS '# sunk'• FROM Classes AS C• NATURAL JOIN Ships AS S• INNER JOIN Outcomes AS O• ON S.name = O.ship• WHERE O.result = 'sunk'• GROUP BY C.class;