Principles*of*Database*Systems*...

99
Principles of Database Systems CSE 544p Lecture #1 September 28, 2011 1 Dan Suciu -- p544 Fall 2011

Transcript of Principles*of*Database*Systems*...

Page 1: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Principles  of  Database  Systems  CSE  544p  

Lecture  #1  September  28,  2011  

1 Dan Suciu -- p544 Fall 2011

Page 2: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Staff  

•  Instructor:    Dan  Suciu  – CSE  662,  [email protected]  – Office  hours:    Wednesdays,  5:30-­‐6:20  

•  TAs:    – Sandra  Fan,  [email protected]  

Dan Suciu -- p544 Fall 2011 2

Page 3: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

CommunicaRons  

•  Web  page:  hSp://www.cs.washington.edu/p544    –  Lectures  will  be  available  here  –  Homework  will  be  posted  here  –  Announcements  may  be  posted  here  

•  Mailing  list:  – Announcements,  group  discussions  –  If  you  registered,  you  are  automaRcally  subscribed  

3 Dan Suciu -- p544 Fall 2011

Page 4: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Textbook(s)  

Main  textbook:  •  Database  Management  Systems,  Ramakrishnan  and  Gehrke  

 Second  textbook:  •  Database  Systems:  The  Complete  Book,  Garcia-­‐Molina,  Ullman,  Widom  

4 Dan Suciu -- p544 Fall 2011

Page 5: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Course  Format  

•  Lectures  Wednesdays,  6:30-­‐9:20  

•  7  Homework    Assignments  

•  Take-­‐home  Final  

5 Dan Suciu -- p544 Fall 2011

Page 6: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Grading  

•  Homework:        70  %  

•  Take-­‐home  Final:    30%  

6 Dan Suciu -- p544 Fall 2011

Page 7: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Homework  Assignments  

1.  SQL  2.  Conceptual  design  3.  JAVA/SQL  4.  TransacRons  5.  Database  tuning  6.  XML/XPath/XQuery  7.  Pig  LaRn,  on  AWS  

7 Dan Suciu -- p544 Fall 2011 Due:  Mondays’,  by  11:59pm.  Three  late  days  per  person  

Page 8: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Take-­‐home  Final  

•  Posted  on  December  8,  at  11:59pm  

•  Due  on  December  10,  by  10:00pm  

•  No  late  days/hours/minutes/seconds  

Dan Suciu -- p544 Fall 2011 8

Page 9: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Sohware  Tools  •  Postgres:  

–  Preferred  usage:  download  from  download  hSp://www.postgresql.org/download/  –  Other  opRon:  use  postgres  on  lab  machines  

•  SQL  Server  2008  –  Download  client  from  hSp://msdnaa.cs.washington.edu    –  Username  is  your  full  @cs.washington.edu  email  address  –  Doesn’t  work  ?  Email  ms-­‐sw-­‐[email protected]    –  Connect  to  IPROJSRV  (may  need  tunneling)  –  OK  to  use  you  own  server,  just  import  IMDB  

•  Xquery:  download  one  interpreter  from  –  Preferred:  Saxon:  hSp://saxon.sourceforge.net/  (from  apache;  very  popular)  –  Others:  

•  Zorba:  hSp://www.zorba-­‐xquery.com/  (I  used  this  one:  ½  day  installaRon)  •  Galax:  hSp://galax.sourceforge.net/  (great  in  the  past,  seems  less  well  maintained)  

•  Pig  LaRn:      –  We  will  run  it  on  Amazon  Web  Services  –  You  may  download  from  hSp://hadoop.apache.org/pig/,  but  you  won’t  need  it  

Dan Suciu -- p544 Fall 2011 9

Page 10: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Accessing  SQL  Server  •  SQL  Server  Management  Studio  •  Server  Type  =  Database  Engine  •  Server  Name  =  IPROJSRV  •  AuthenRcaRon  =  SQL  Server  AuthenRcaRon  

–  Login  =  your  UW  email  address  (not  the  CSE  email)  –  Password  =  [in  class]  

•  Must  connect  from  within  CSE,  or  must  use  tunneling  •  AlternaRvely:  install  your  own,  get  it  from  MSDNAA  (see  earlier  slide)  

•  Then  play  with  IMDB,  start  working  on  HW  1  

Dan Suciu -- p544 Fall 2011 10

Page 11: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Rest  of  Today’s  Lecture  

• Overview  of  DBMS  

• Overview  of  the  course  content  

•  SQL  Dan Suciu -- p544 Fall 2011 11

Page 12: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Database  

What  is  a  database  ?  

Give  examples  of  databases  

12 Dan Suciu -- p544 Fall 2011

Page 13: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Database  

What  is  a  database  ?  •  A  collecRon  of  files  storing  related  data  

Give  examples  of  databases  •  Accounts  database;  payroll  database;  UW’s  students  database;  Amazon’s  products  database;  airline  reservaRon  database  

13 Dan Suciu -- p544 Fall 2011

Page 14: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Database  Management  System  

What  is  a  DBMS  ?  

Give  examples  of  DBMS  

14 Dan Suciu -- p544 Fall 2011

Page 15: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Database  Management  System  

What  is  a  DBMS  ?  •  A  big  C  program  wri;en  by  someone  else  that  allows  us  to  manage  efficiently  a  large  database  and  allows  it  to  persist  over  long  periods  of  Bme  

Give  examples  of  DBMS  •  DB2  (IBM),  SQL  Server  (MS),  Oracle,  Sybase  •  MySQL,  Postgres,  …  

15

SQL  for  Nerds,  Greenspun,  hSp://philip.greenspun.com/sql/  (Chap  1)  Dan Suciu -- p544 Fall 2011

Page 16: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Market  Shares  

From  2006  Gartner  report:    •  IBM:  21%  market  with  $3.2BN  in  sales  

•  Oracle:  47%  market  with  $7.1BN  in  sales  

•  Microsoh:  17%  market  with  $2.6BN  in  sales  

16 Dan Suciu -- p544 Fall 2011

Page 17: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

An  Example  

The  Internet  Movie  Database  hSp://www.imdb.com  

•  EnRRes:    Actors  (800k),  Movies  (400k),  Directors,  …  

•  RelaRonships:  who  played  where,  who  directed  what,  …  

17 Dan Suciu -- p544 Fall 2011

Page 18: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Key  concept  1:  RelaRonal  Data  Model  

18 Dan Suciu -- p544 Fall 2011

Actor: Cast:

Movie:

id fName lName gender

195428 Tom Hanks M 645947 Amy Hanks F

. . .

id Name year

337166 Toy Story 1995

. . . . . . . ..

pid mid

195428 337166 . . .

Page 19: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Key  concept  2:  DeclaraRve  Query  Language  

19

SELECT * FROM Actor

Dan Suciu -- p544 Fall 2011

SELECT count(*) FROM Actor

SELECT * FROM Actor WHERE lName = ‘Hanks’ SQL  

We  write  what  we  want,  not  how  we  want  it.  

Page 20: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Key  concept  3:  Data  Independence  

20

SELECT * FROM Actor, Casts, Movie WHERE lname='Hanks' and Actor.id = Casts.pid and Casts.mid=Movie.id and Movie.year=1995

817k actors, 3.5M casts, 380k movies; How can it be so fast ?

Physical  data  independence:    query  is  independent  of  physical  storage  

Page 21: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

21

How  Can  We  Evaluate  the  Query  ?  

Actor: Cast: Movie: id fName lName gender

. . . Hanks

. . .

id Name year

. . . 1995

. . .

pid mid

. . .

. . .

Plan 1: . . . . [ in class ] Plan 2: . . . . [ in class ]

Dan Suciu -- p544 Fall 2011

Page 22: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Dan Suciu -- p544 Fall 2011 22

Actor Cast Movie

σlName=‘Hanks’ σyear=1995

Actor Cast Movie

σlName=‘Hanks’ σyear=1995

Indexes:  on  Actor.lName,  on  Movie.year  

AlternaRve  query  plans:  

Query  opRmizaRon  Database  StaRsRcs  histograms,  synopses,  etc  

Page 23: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Key  concept  4:  TransacRons  

Dan Suciu -- p544 Fall 2011 23

X = Read(Account_1); X.amount = X.amount - 100; Write(Account_1, X); Y = Read(Account_2); Y.amount = Y.amount + 100; Write(Account_2, Y);

CRASH !

What  is  the  problem  ?  

Recovery  from  systems  failures:  Transfer  $100  from  account  1  to  account  2:  

Page 24: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Dan Suciu -- p544 Fall 2011 24

X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”);

X = Read(Account); if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; } else error(“Insufficient funds”);

What  can  go  wrong  ?  

Concurrency  Control  

Overdrahing  an  account:  

User  1:   User  2:  

Page 25: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

TransacRons  

ACID  =  •  Atomicity    (  =  recovery)  •  Consistency  •  IsolaRon      (  =  concurrency  control)  •  Durability  

25 Dan Suciu -- p544 Fall 2011

Page 26: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Client/Server  Database  Architecture  

•  Single  server  that  stores  the  database  •  Many  clients  running  apps  and  connecRng  to  DBMS  •  Performance  boSlenecks:    

–  Client/server  communicaRon  –  TransacRonal  semanRcs  

•  Other  architectures:    –  main  memory  database  –  replicated  databases  

26 Dan Suciu -- p544 Fall 2011

Page 27: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Two  Types  of  Database  Usage  

•  OLTP  (online-­‐transacRon-­‐processing)  – Many  updates  – Many  simple  “point  queries”  – Few  (or  no)  complex  aggregate  queries  

•  Decision-­‐Support  – Many  aggregate/group-­‐by  queries.  – Few  (or  no)  updates  

Dan Suciu -- p544 Fall 2011 27

Page 28: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Trends  in  Data  Management  

•  Large  scale  data  analyRcs:  Map/Reduce,  Pig,  …  •  Cloud  based  database  service:  AWS,  Azure,  …  •  NoSQL:  sacrifice  ACID  for  performance  •  Data  privacy  •  Data  provenance  •  Complex  data  analyRcs:  probabilisRc  databases  

Dan Suciu -- p544 Fall 2011 28

Page 29: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Outline  of  Course  Content  1.  SQL  2.  RelaRonal  Calculus,  Database  Design  3.  Constraints,  Views  4.  TransacRons:  recovery  5.  TransacRons:  concurrency  control  6.  XML,  XPath,  XQuery  7.  Data  storage,  indexes,  physical  tuning  8.  Query  execuRon  9.  Query  opRmizaRon  10.  Big  Data:  Parallel  databases,  Map/Reduce,  Pig  LaRn  11.  Advanced  topics:  privacy,  provenance,  probabilisRc  dbs  

Dan Suciu -- p544 Fall 2011 29

Page 30: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Announcement:  Homework  1  

•  Homework  1  is  posted;    •  Due  on  Monday,  Oct.  10  •  Tools:  

– Postgres:  install  on  your  computer  (PREFERRED)  or  use  the  installaRon  in  the  lab  

– SQL  Server,  for  tesRng  only;  connect  to  IPROJSRV:  login:  your  UW  email  address;  password:  ……..  

•  Tasks:  create  db,  import  data,  create  indices,  write  11  SQL  queries  

Dan Suciu -- p544 Fall 2011 30

Page 31: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

31

Outline  for  rest  of  today  

•  Basics  SQL  (Chapters  5.2,  5.3)  •  Aggregates  (Chapter  5.5.)  •  Nulls,  Outer  joins  (Chapter  5.6)  •  Subqueries  (Chapters  5.4)  

– This  is  tough  !    Next  lecture  we  will  discuss  RelaRonal  Calculus  (a.k.a.  Tuple  Calculus,  Chapter  4.3).    See  supplementary  text  Three  Query  Language  Formalisms  

Dan Suciu -- p544 Fall 2011

Page 32: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

32

SQL  

•  Data  DefiniRon  Language  (DDL)  – Create/alter/delete  tables  and  their  aSributes  – Read  from  the  book  

•  Data  ManipulaRon  Language  (DML)  – Query  tables,  Insert/delete/modify    – Discussed  in  class  

Dan Suciu -- p544 Fall 2011

Page 33: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

33

Tables  in  SQL  

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

Attribute names Table name

Tuples or rows

Key

Dan Suciu -- p544 Fall 2011

Page 34: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

The  RelaRonal  Data  Model  Data  is  stored  in  tables  ,  a.k.a.  relaBons  

Each  relaRon  has:  1.  A  schema  =  name+aSributes  

–  Product(PName,  Price,  Category,  Manufacturer)  –  Each  relaRon  has  a  key,  which  we  underline  

2.  An  instance  =  set  of  rows  

SQL  departs  from  the  pure  relaRonal  model  in  that  it  allows  duplicate  tuples  •  Set  semanBcs  à  bag  semanBcs  

 {1,  2,  3}    à  {1,  1,  2,  3,  3,  3}  

Dan Suciu -- p544 Fall 2011 34

Page 35: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

35

Data  Types  in  SQL  

•  Atomic  types:  – Characters:  CHAR(20),  VARCHAR(50)  – Numbers:  INT,  BIGINT,  SMALLINT,  FLOAT  – Others:  MONEY,  DATETIME,  …  

•  Record  (aka  tuple)  – Has  atomic  aSributes  

•  Table  (relaRon)  – A  set  of  tuples  

Dan Suciu -- p544 Fall 2011

Page 36: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

36

Simple  SQL  Query  

PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi

SELECT * FROM Product WHERE category=‘Gadgets’

Product

PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks “selection” Dan Suciu -- p544 Fall 2011

Page 37: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

37

Simple  SQL  Query  PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi

SELECT PName, Price, Manufacturer FROM Product WHERE Price > ‘$100’

Product

PName Price Manufacturer SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi

“selection” and “projection”

Dan Suciu -- p544 Fall 2011

Page 38: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

38

Details  •  Case  insensiRve:  

 SELECT  =  Select  =  select    Product  =    product    BUT:  ‘SeaSle’  ≠  ‘seaSle’  

•  Constants:    ‘abc’    -­‐  yes    “abc”  -­‐  no  

Dan Suciu -- p544 Fall 2011

Page 39: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

39

EliminaRng  Duplicates  

SELECT DISTINCT category FROM Product

Compare to:

SELECT category FROM Product

Category Gadgets Gadgets

Photography Household

Category Gadgets

Photography Household

Dan Suciu -- p544 Fall 2011

Page 40: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

40

Ordering  the  Results  

SELECT pname, price, manufacturer FROM Product WHERE category=‘Gadgets’ AND price > ‘$10’ ORDER BY price, pname

Ties are broken by the second attribute on the ORDER BY list. Ordering is ascending, unless you specify the DESC keyword.

Dan Suciu -- p544 Fall 2011

Page 41: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

41

SELECT Category FROM Product ORDER BY PName

PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi

? SELECT DISTINCT category FROM Product ORDER BY category

SELECT DISTINCT category FROM Product ORDER BY PName

? ?

Dan Suciu -- p544 Fall 2011

Page 42: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

42

Keys  and  Foreign  Keys  

PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi

Product

Company

CName Country

GizmoWorks USA

Canon Japan

Hitachi Japan

Key

Foreign key

Dan Suciu -- p544 Fall 2011

Page 43: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

43

Joins  

Product (PName, Price, Category, Manufacturer) Company (CName,, Country) Find all products under $200 manufactured in Japan; return their names and prices.

SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’

Join between Product

and Company

Dan Suciu -- p544 Fall 2011

Page 44: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

44

Joins  

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname Country

GizmoWorks USA

Canon Japan

Hitachi Japan

PName Price

SingleTouch $149.99

SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’

Dan Suciu -- p544 Fall 2011

Page 45: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

45

Tuple  Variables  

SELECT DISTINCT name, country FROM Person, Company WHERE worksfor = cname

Which country ?

Product (PName, Price, Category, Manufacturer) Company (CName,, Country) Person(name, Country, Worksfor)

SELECT DISTINCT Person.name, Company.country FROM Person, Company WHERE Person.worksfor = Company.cname

Dan Suciu -- p544 Fall 2011

SELECT DISTINCT x.name, y.country FROM Person AS x, Company AS y WHERE x.worksfor = y.cname

Page 46: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

46

In  Class  

Product (pname, price, category, manufacturer) Company (cname, country) Find all Chinese companies that manufacture products in the ‘toy’ category

SELECT cname FROM WHERE

Dan Suciu -- p544 Fall 2011

Page 47: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

47

In  Class  

Product (pname, price, category, manufacturer) Company (cname, country) Find all Chinese companies that manufacture products both in the ‘electronic’ and ‘toy’ categories

SELECT cname FROM WHERE

Dan Suciu -- p544 Fall 2011

Page 48: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

48

The  Nested  Loop  SemanRcs  of  SQL  Queries  

SELECT  a1,  a2,  …,  ak  FROM        R1  AS  x1,  R2  AS  x2,  …,  Rn  AS  xn  WHERE    CondiRons  

Dan Suciu -- p544 Fall 2011

Answer = {} for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer ∪ {(a1,…,ak)} return Answer

Page 49: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

49

SELECT DISTINCT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A

Using  the  Formal  SemanRcs  

If S ≠ ∅ and T ≠ ∅ then returns R ∩ (S ∪ T) else returns ∅

What do these queries compute ?

SELECT DISTINCT R.A FROM R, S WHERE R.A=S.A

Returns R ∩ S

Dan Suciu -- p544 Fall 2011

Page 50: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

50

AggregaRon  

SELECT count(*) FROM Product

Except count, all aggregations apply to a single attribute

SELECT sum(price) FROM Product WHERE manufacturer=‘GizmoWorks’

SQL supports several aggregation operations: sum, count, min, max, avg

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

Page 51: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

51

COUNT applies to duplicates, unless otherwise stated:

SELECT count(category) FROM Product WHERE price > ‘$20’

If category has no nulls, then count(category)=count(*)

We probably want:

SELECT count(DISTINCT category) FROM Product WHERE price > ‘$20’

AggregaRon:  Count  

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

Page 52: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

52

Grouping  and  AggregaRon  

SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer

Let’s see what this means…

For each manufacturer, find total number of its products under $200.

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

Page 53: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

53

Grouping  and  AggregaRon  

1. Compute the FROM and WHERE clauses. 2. Group by the attributes in the GROUPBY 3. Compute the SELECT clause, including aggregates.

Dan Suciu -- p544 Fall 2011

Page 54: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

54

1&2.  FROM-­‐WHERE-­‐GROUPBY  

Dan Suciu -- p544 Fall 2011

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer

Page 55: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

55

3.  SELECT  

Dan Suciu -- p544 Fall 2011

SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

count(*) Manufacturer

2 GizmoWorks

1 Canon

Page 56: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

56

HAVING  Clause  

SELECT manufacturer, count(*) AS total FROM Product WHERE price < ‘$200’ GROUP BY manufacturer HAVING min(price) >’$20’

Same query, except that we return only those manufacturers that make only products with price > $20

HAVING clause contains conditions on aggregates. Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

Page 57: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

57

General  form  of  Grouping  and  AggregaRon  

SELECT        S  FROM              R1,…,Rn  WHERE        C1  GROUP  BY  a1,…,ak  HAVING          C2    S  =  may  contain  aSributes  a1,…,ak  and/or  any  aggregates  but  NO  

OTHER  ATTRIBUTES  C1  =  is  any  condiRon  on  the  aSributes  in  R1,…,Rn  C2  =  is  any  condiRon  on  aggregate  expressions  

Why ?

Dan Suciu -- p544 Fall 2011

Page 58: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

58

General  form  of  Grouping  and  AggregaRon  

EvaluaRon  steps:  1.  Evaluate  FROM-­‐WHERE,  apply  condiRon  C1  2.  Group  by  the  aSributes  a1,…,ak    3.  Apply  condiRon  C2  to  each  group  (may  have  aggregates)  4.  Compute  aggregates  in  S  and  return  the  result  

SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2

Dan Suciu -- p544 Fall 2011

Page 59: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

59

NULLS  in  SQL  

•  Whenever  we  don’t  have  a  value,  we  can  put  a  NULL  •  Can  mean  many  things:  

–  Value  does  not  exists  –  Value  exists  but  is  unknown  –  Value  not  applicable  –  Etc.  

•  The  schema  specifies  for  each  aSribute  if  can  be  null  (nullable  aSribute)  or  not  

•  How  does  SQL  cope  with  tables  that  have  NULLs  ?  

Dan Suciu -- p544 Fall 2011

Page 60: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

60

Null  Values  

•  If  x=  NULL  then  4*(3-­‐x)/7  is  sRll  NULL  

•  If  x=  NULL  then  x=‘Joe’        is  UNKNOWN  •  In  SQL  there  are  three  boolean  values:  

FALSE                          =    0  UNKNOWN        =    0.5  TRUE                              =    1  

Dan Suciu -- p544 Fall 2011

Page 61: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

61

Null  Values  

•  C1  AND  C2      =    min(C1,  C2)  •  C1    OR        C2    =    max(C1,  C2)  •  NOT  C1                  =    1  –  C1          Rule  in  SQL:  include  only  tuples  that  yield  TRUE  

SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190)

E.g. age=20 heigth=NULL weight=200

Dan Suciu -- p544 Fall 2011

Page 62: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

62

Null  Values  

Unexpected  behavior:            Some  Persons  are  not  included  !  

SELECT * FROM Person WHERE age < 25 OR age >= 25

Dan Suciu -- p544 Fall 2011

Page 63: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

63

Null  Values  

Can  test  for  NULL  explicitly:  –  x  IS  NULL  –  x  IS  NOT  NULL  

         Now  it  includes  all  Persons  

SELECT * FROM Person WHERE age < 25 OR age >= 25 OR age IS NULL

Dan Suciu -- p544 Fall 2011

Page 64: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Outerjoins  

64

SELECT x.country, y.pname FROM Company x JOIN Product y ON x.cname = y.manufacturer

SELECT x.country, y.pname FROM Company x, Product y WHERE x.cname = y.manufacturer

Same as:

But countries that don’t manufacture will not be listed !

Product (pname, price, category, manufacturer) Company (cname, country) Normally, joins are “inner joins”:

Dan Suciu -- p544 Fall 2011

Page 65: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Outerjoins  

65

SELECT x.country, y.pname FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer

If we want to see the companies that don’t produce anything, then we use an outer join:

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

Page 66: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

66

Product Company

Dan Suciu -- p544 Fall 2011

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Cname Country

GizmoWorks USA

Canon Japan

Hitachi Japan

MuseumPass Vatican

Cname   PName  

USA   GizmoWorks

USA GizmoWorks

Japan Canon

Japan Hitachi

Vatican NULL  

Page 67: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

ApplicaRon  

Dan Suciu -- p544 Fall 2011 67

SELECT x.country, count(*) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country

What’s wrong ?

Product (pname, price, category, manufacturer) Company (cname, country)

Compute  the  total  number  of  products  made  by  each  country  

Page 68: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

ApplicaRon  

Dan Suciu -- p544 Fall 2011 68

SELECT x.country, count(y.pname) FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer GROUP BY x.country

Now we also get the products who sold in 0 quantity

Product (pname, price, category, manufacturer) Company (cname, country)

Compute  the  total  number  of  products  made  by  each  country  

Note:  we  don’t  use  count(*)  

WHY  ?  

Page 69: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

69

Outer  Joins  

•  Leh  outer  join:  –  Include  the  leh  tuple  even  if  there’s  no  match  

•  Right  outer  join:  –  Include  the  right  tuple  even  if  there’s  no  match  

•  Full  outer  join:  –  Include  the  both  leh  and  right  tuples  even  if  there’s  no  match  

Dan Suciu -- p544 Fall 2011

Page 70: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Subqueries  

•  A  subquery  is  another  SQL  query  nested  inside  a  larger  query  

•  Such  inner-­‐outer  queries  are  called  nested  queries  •  A  subquery  may  occur  in:  

1.  A  SELECT  clause  2.  A  FROM  clause  3.  A  WHERE  clause  

Dan Suciu -- p544 Fall 2011 70

Rule  of  thumb:  avoid  wriRng  nested  queries  when  possible;    someRmes  it’s  impossible  

Page 71: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

71

1.  Subqueries  in  SELECT  

Product (pname, price, category, manufacturer) Company (cname, country)

For each product return the country that manufactures it

SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X

What happens if a subquery returns more than one country ?

Dan Suciu -- p544 Fall 2011

Page 72: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

72

1.  Subqueries  in  SELECT  

Whenever possible, don’t use a nested queries:

= We have “unnested” the query

Dan Suciu -- p544 Fall 2011

SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer) FROM Product X

SELECT pname, country FROM Product, Company WHERE cname=manufacturer

Product (pname, price, category, manufacturer) Company (cname, country)

Page 73: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

73

1.  Subqueries  in  SELECT  

Compute the number of products made by each country

SELECT DISTINCT x.country, (SELECT count(*) FROM Company y, Product WHERE y.cname=manufacturer and y.country = x.country) FROM Company x

Better: we can unnest by using a GROUP BY

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

SELECT x.country, count(*) FROM Company x, Product z WHERE x.cname = z.manufacturer GROUP BY x.country

Page 74: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

74

GROUP  BY  v.s.  Nested  Quereis  

SELECT manufacturer, count(*) AS total FROM Product WHERE price < '$200’ GROUP BY manufacturer

SELECT DISTINCT x.manufacturer, (SELECT count(*) FROM Product y WHERE x.manufacturer = y.manufacturer AND price < '$200’) AS total FROM Product x WHERE price < '$200’

Why twice ? Dan Suciu -- p544 Fall 2011

Page 75: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

75

2.  Subqueries  in  FROM  

Find all products whose prices is > 20 and < 30

SELECT * FROM (SELECT * FROM Product AS Y WHERE Y.price > ‘$20’) AS x WHERE x.price < ‘$30’

Unnest this query !

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

Page 76: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

76

3.  Subqueries  in  WHERE  

Find all countries that make some products with price < 100

SELECT DISTINCT x.country FROM Company x WHERE EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price < ‘$100’)

Existential quantifiers

Using EXISTS:

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

Correlated  subqery:  uses  x  from  outer  query  

Page 77: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

77

3.  Subqueries  in  WHERE  

Find all countries that make some products with price < 100

Predicate Calculus (a.k.a. First Order Logic)

Dan Suciu -- p544 Fall 2011

{ y |∃x.Company(x,y)∧(∃z.∃p.∃c.Product(z,p,c,x)∧p<100)}

Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

Page 78: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

78

3.  Subqueries  in  WHERE  

Find all countries that make some products with price < 100

SELECT DISTINCT country FROM Company WHERE cname IN (SELECT Product.manufacturer FROM Product WHERE Product.price < ‘$100’)

Using IN

Dan Suciu -- p544 Fall 2011

Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

De-­‐correlated  subqery  

Page 79: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

79

3.  Subqueries  in  WHERE  

Find all countries that make some products with price < 100

SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ANY (SELECT price FROM Product WHERE manufacturer = cname)

Using ANY:

Dan Suciu -- p544 Fall 2011

Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

Page 80: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

80

3.  Subqueries  in  WHERE  

Find all countries that make some products with price < 100

SELECT DISTINCT x.country FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price < ‘$100’

Existential quantifiers are easy ! J

Now let’s unnest it:

Dan Suciu -- p544 Fall 2011

Existential quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

Page 81: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

81

3.  Subqueries  in  WHERE  

Universal quantifiers are hard ! L

Find the countries of all companies that make only products with price < 100

Dan Suciu -- p544 Fall 2011

Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

Page 82: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

82

3.  Subqueries  in  WHERE  

Predicate Calculus (a.k.a. First Order Logic)

Dan Suciu -- p544 Fall 2011

{ y | ∃x.Company(x,y)∧(∀z.∀p.∀c.Product(z,p,c,x)èp<100) }

Find the countries of all companies that make only products with price < 100

Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

Page 83: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

83

3.  Subqueries  in  WHERE  

Dan Suciu -- p544 Fall 2011

{ y |∃x. Company(x,y)∧(∀z.∀p.∀c.Product(z,p,c,x)èp<100) }

De Morgan’s Laws: ¬(A ∧ B) = ¬A ∨ ¬B ¬(A ∨ B) = ¬A ∧ ¬B ¬∀x. P(x) = ∃x. ¬ P(x) ¬∃x. P(x) = ∀x. ¬ P(x)

{ y|∃x.Company(x,y)∧¬(∃z∃p.∃p.Product(z,p,c,x)∧p≥100) }

{ y | ∃x. Company(x,y)) } − { y | ∃x. Company(x,y) ∧(∃z∃p.∃c.Product(z,p,c,x)∧p≥100) }

¬(A è B) = A ∧ ¬B

=

=

Page 84: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

84

3.  Subqueries  in  WHERE  

2. Find all companies s.t. all their products have price < 100

1. Find the other companies: i.e. s.t. some product ≥ 100

Dan Suciu -- p544 Fall 2011

SELECT DISTINCT country FROM Company WHERE cname IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’)

SELECT DISTINCT country FROM Company WHERE cname NOT IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’)

Page 85: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

85

3.  Subqueries  in  WHERE  

Find the countries of all companies that make only products with price < 100

Universal quantifiers

Using EXISTS:

Dan Suciu -- p544 Fall 2011

SELECT DISTINCT x.country FROM Company x WHERE NOT EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price >= ‘$100’)

Product (pname, price, category, manufacturer) Company (cname, country)

Page 86: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

86

3.  Subqueries  in  WHERE  

SELECT DISTINCT Company.country FROM Company WHERE ‘$100’ > ALL (SELECT price FROM Product WHERE manufacturer = cname)

Using ALL:

Dan Suciu -- p544 Fall 2011

Find the countries of all companies that make only products with price < 100

Universal quantifiers Product (pname, price, category, manufacturer) Company (cname, country)

Page 87: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

87

QuesRon  for  Database  Fans  and  their  Friends  

•  Can  we  unnest  this  query  ?  

Dan Suciu -- p544 Fall 2011

Find the countries of all companies that make only products with price < 100

Page 88: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

88

Monotone  Queries  •  A  query  Q  is  monotone  if:  

– Whenever  we  add  tuples  to  one  or  more  of  the  tables…  –  …  the  answer  to  the  query  cannot  contain  fewer  tuples  

•  Fact:    all  unnested  queries  are  monotone    –  Proof:  using  the  “nested  for  loops”  semanRcs  

•  Fact:  A  query  a  universal  quanRfier  is  not  monotone  

•  Consequence:  we  cannot  unnest  a  query  with  a  universal  quanRfier  Dan Suciu -- p544 Fall 2011

Page 89: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Queries  that  must  be  nested  

Dan Suciu -- p544 Fall 2011 89

Rule  of  Thumb:  Non-­‐monotone  queries  cannot  be  unnested.    In  parRcular,  queries  with  a  universal  quanRfier  cannot  be  unnested  

Page 90: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

More  SQL  

Read  the  following  commands  in  the  book  

•  CREATE  TABLE  •  INSERT  •  DELETE  •  UPDATE  

They  are  easy;  but  we  need/use  them  all  the  Rme  in  class,  and  in  the  homework  assignments  

Dan Suciu -- p544 Fall 2011 90

Page 91: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

91

Advanced  SQLizing  

1.  UnnesRng  Aggregates  

2.  Finding  witnesses  

Dan Suciu -- p544 Fall 2011

Page 92: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

UnnesRng  Aggregates  

For each category, find the maximum price

SELECT DISTINCT X.category, (SELECT max(Y.price) FROM Product Y WHERE X.category = Y.category) FROM Product X

SELECT category, max(price) FROM Product GROUP BY category

Equivalent queries

Note: no need for DISTINCT (DISTINCT is the same as GROUP BY)

Product (pname, price, category, manufacturer) Company (cname, country)

Page 93: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

UnnesRng  Aggregates  

Find the number of products made in each country SELECT DISTINCT X.country, (SELECT count(*) FROM Company Y, Product Z WHERE Y.cname=Z.manufacturer AND Y.country = X.country) FROM Company X

SELECT X.country, count(*) FROM Company X, Product Y WHERE X.cname=Y.manufacturer GROUP BY X.country

They are NOT equivalent !

(WHY?)

Product (pname, price, category, manufacturer) Company (cname, country)

Page 94: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

94

More  UnnesRng  

•  Find  authors  who  wrote  ≥  10  documents:  •  ASempt  1:  with  nested  queries  

SELECT DISTINCT Author.name FROM Author WHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10

This is SQL by a novice

Author(login,name) Wrote(login,url)

Dan Suciu -- p544 Fall 2011

Page 95: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

95

More  UnnesRng  

•  Find  all  authors  who  wrote  at  least  10  documents:  

•  ASempt  2:  SQL  style  (with  GROUP  BY)  

SELECT DISTINCT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) > 10

This is SQL by

an expert

Dan Suciu -- p544 Fall 2011

Page 96: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

96

Finding  Witnesses  

For each country, find its most expensive products

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)

Page 97: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

Finding  Witnesses  

SELECT x.country, max(y.price) FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country

Finding the maximum price is easy…

But we need the witnesses, i.e. the products with max price

For each country, find its most expensive products

Product (pname, price, category, manufacturer) Company (cname, country)

Page 98: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

98

Finding  Witnesses  

SELECT u.country, v.pname, v.price FROM Company u, Product v, (SELECT x.country, max(y.price) as mprice FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country) AS p WHERE u.country = p.country and v.price = p.mprice

To find the witnesses, compute the maximum price in a subquery

Dan Suciu -- p544 Fall 2011

Page 99: Principles*of*Database*Systems* CSE544pcourses.cs.washington.edu/courses/csep544/11au/lectures/lecture01-sql.pdf · 44 Joins PName Price Category Manufacturer Gizmo $19.99 Gadgets

99

Finding  Witnesses  

There is a more concise solution here:

SELECT x.country, y.pname, y.price FROM Company x, Product y WHERE x.cname = y.manufacturer and y.price >= ALL (SELECT z.price FROM Product z WHERE x.cname = z.manufacturer)

Dan Suciu -- p544 Fall 2011

Product (pname, price, category, manufacturer) Company (cname, country)