class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null,...

Post on 04-Jul-2020

3 views 0 download

Transcript of class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null,...

joinsprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 14

/34CS165, Fall 2017 Stratos Idreos 2

Milestone1: quite involved but easy on the algorithmic side Milestone2: easy once you understand we just batching data Milestone3: not easy, not easy (all concepts/tools needed) Milestone4: better than M3 but still heavy on concepts Milestone5: should be quick

Testing server: will run twice a day as of this week

Remember: limited chances of success if you try to do this alone

Lab marathon: once more in a couple of weeks

/34CS165, Fall 2017 Stratos Idreos 3

FINAL REPORT CONTAINS EXPERIMENTAL ANALYSIS HOW TO DO EXPERIMENTS?

find out what matters, test by changing one thing at a time

say we want to test the select operator, to compare scan vs secondary index

/34CS165, Fall 2017 Stratos Idreos 4

do not listen to youtube while you run experiments!close all apps, recreate the same environment every time

create scripts for everything

ISOLATE PERFORMANCE AS BEST AS POSSIBLE

/34CS165, Fall 2017 Stratos Idreos 5

resp

onse

tim

e (s

ecs)

data size (GB)re

spon

se ti

me

(sec

s)data size (GB)

e.g., to test the select operator

OK not OK

examples for final evaluation

/34CS165, Fall 2017 Stratos Idreos 6

thro

ughp

ut (q

/s)

# of queriesth

roug

hput

(q/s

)# of queries

e.g., to test shared scans

OK not OK

examples for final evaluation

/34CS165, Fall 2017 Stratos Idreos 7

Midterm2: Nov 15 -> Nov 20?

Midterm1: overall great performance! If you did not score 90+

please consider joining more for OH! If you did score 90+

please consider joining more for OH!

all quizzes, all discussions, all “Read” readings extra weekend OH will be announced

/34CS165, Fall 2017 Stratos Idreos 8

so far

database kernel

data data data

algo

rithm

s/op

erat

ors

disk

memory

cpu

/34CS165, Fall 2017 Stratos Idreos 8

so far

database kernel

data data data

algo

rithm

s/op

erat

ors

disk

memory

cpu

columns, rows, hybrids, trees

/34CS165, Fall 2017 Stratos Idreos 8

so far

database kernel

data data data

algo

rithm

s/op

erat

ors

disk

memory

cpu

scan, binary search, tuple reconstruction, min,

max, search b-tree, etc.

columns, rows, hybrids, trees

/34CS165, Fall 2017 Stratos Idreos 8

so far

database kernel

data data data

algo

rithm

s/op

erat

ors

disk

memory

cpu

scan, binary search, tuple reconstruction, min,

max, search b-tree, etc.

columns, rows, hybrids, trees

early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing

selects down, etc

/34CS165, Fall 2017 Stratos Idreos 9

joins(project=m4)

/34CS165, Fall 2017 Stratos Idreos 10

fact table(id1,id2,…)

dimension table 1(id1,…)

dimension table 2(id2,…)

star schema

/34CS165, Fall 2017 Stratos Idreos 10

fact table(id1,id2,…)

dimension table 1(id1,…)

dimension table 2(id2,…)

star schema

avoid duplicates - minimize update cost - but we have to do joins

/34CS165, Fall 2017 Stratos Idreos 11

snowflake schema

/34CS165, Fall 2017 Stratos Idreos 12

professor(id,name,…)

course(id,name, profId,…)

student(id,name,…)

database

give me all students enrolled in cs165select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

enrolled(studentId,

courseId,…) foreign key

/34CS165, Fall 2017 Stratos Idreos 13

table 1 table 2

joinforeign key

referencing table 2 primary key

find all tuples where FK=PKjoin: glue the data back together

/34CS165, Fall 2017 Stratos Idreos 13

table 1 table 2

joinforeign key

referencing table 2 primary key

find all tuples where FK=PKjoin: glue the data back together

equi join

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

/34CS165, Fall 2017 Stratos Idreos 15

inner join

left outer

right outer

v1, a1 v2, a2

v3, b1 v1, b2

v1, a1, b2

v1, a1, b2 v2, a2, null

v1, a1, b2 v3, null, b1

join

/34CS165, Fall 2017 Stratos Idreos 16

select courses.name=“cs165”

join enrolled.courseid=course.id

students enrolled courses

join student.id=enrolled.studentid

project student.name

good plan

select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

/34CS165, Fall 2017 Stratos Idreos 17

select courses.name=“cs165”

join enrolled.courseid=course.id

students enrolled courses

join student.id=enrolled.studentid

project student.name

select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

pushing selects down

/34CS165, Fall 2017 Stratos Idreos 18

selectR.C

selectS.F

join

fetchR.A

fetchS.A

maxR.D

minS.G

R(A,B,C,D) - S(A,E,F,G)

select max(R.D),min(S.G) from R,S where R.A=S.A and R.C<10 and S.F>30

block operator

access patterns

/34CS165, Fall 2017 Stratos Idreos 19

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

/34CS165, Fall 2017 Stratos Idreos 19

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(inter2,40,50)

select(Sa,50,65)

/34CS165, Fall 2017 Stratos Idreos 20

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

/34CS165, Fall 2017 Stratos Idreos 20

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

select(Sa,50,65)

/34CS165, Fall 2017 Stratos Idreos 21

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

/34CS165, Fall 2017 Stratos Idreos 21

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

project m4

/34CS165, Fall 2017 Stratos Idreos 22

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

/34CS165, Fall 2017 Stratos Idreos 22

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

/34CS165, Fall 2017 Stratos Idreos 23

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

nested loops

L R

for all tuples of one side check all tuples of the other side

/34CS165, Fall 2017 Stratos Idreos 24

outer(L) inner(R)join probe the red side

for better data locality

stream outer pages hold inner pages

Level1

Level2

res

Total footprint=L+R+res (bytes), R.pages<=Level1.pages-2

say red fits in Level1

/34CS165, Fall 2017 Stratos Idreos 25

what if not all data fits in main-memory?

what if not all data fits in L3 cache?

what if not all data fits in L2 cache?

what if not all data fits in L1 cache?

can we utilize >1 cores?

/34CS165, Fall 2017 Stratos Idreos 26

for every L.key = R.key pair

return [L.pos,R.pos]

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

joinL: key,payload R: key,payload

1) design a nested loops join algorithm and give its I/O cost2) which column should be the inner and why?3) describe optimizations to minimize Level1 misses

Quickly if there is time think of the following: 4) can we use sorting? 5) how would you use a b-tree to do a join?

level 1

level 2

R.size> Level1.size, L.size>Level1.size, R.size+L.size << L2, Level1 block size = Level 2 block size

CPU

data/results stored one column-at-a-time

/34CS165, Fall 2017 Stratos Idreos 27

comp O(LxR) I/O O(L/lp+Lx(R/rp))

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column

/34CS165, Fall 2017 Stratos Idreos 28

zig zagouter inner

phase 1

123

outer innerphase 2

123

A number of pages will still be

in LLC!

/34CS165, Fall 2017 Stratos Idreos 29

comp O(LxR) I/O O(L/lp+Lx(R/rp))

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column

/34CS165, Fall 2017 Stratos Idreos 29

comp O(LxR) I/O O(L/lp+Lx(R/rp))

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

I/O with zig zag: L/lp + R/rp, if R/rp<=LLC-2

L/lp +Lx(R/rp -(LLC-2)), if R/rp<=2x(LLC-2)

lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column

/34CS165, Fall 2017 Stratos Idreos 30

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i+lp) for (j=0;j<R;j=j+rp) for (r=i;r<i+lp;r++) for (m=j;m<j+rp;m++) if L[r]==R[m]

resL[k]=r resR[k++]=m

comp O(LxR) I/O O(L/lp+Lx(R/rp))

comp O(LxR) I/O O(L/lp+(L/lp)x(R/rp))

blocked nested loopsnew resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

/34CS165, Fall 2017 Stratos Idreos 30

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i+lp) for (j=0;j<R;j=j+rp) for (r=i;r<i+lp;r++) for (m=j;m<j+rp;m++) if L[r]==R[m]

resL[k]=r resR[k++]=m

comp O(LxR) I/O O(L/lp+Lx(R/rp))

comp O(LxR) I/O O(L/lp+(L/lp)x(R/rp))

blocked nested loopsnew resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

I/O with zig zag: L/lp + R/rp, if R/rp<=LLC-2

L/lp +(L/lp)x(R/rp -(LLC-2)), if R/rp<=2x(LLC-2)

But if R/rp>2x(LLC-2) zig zag does not work anymore… and so we can flip inner/outer decision

/34CS165, Fall 2017 Stratos Idreos 31

sort

on k

ey

sort

on k

ey

merge

sort merge join

while left and right still have tuples if left.val < right.val left++ else if left.val > right.val right++ else add to result, left++, right++

perfect if data is sorted or needs to be sorted anyway

for the rest of the plan

+ handle duplicates on both sides!

/34CS165, Fall 2017 Stratos Idreos 32

utilize index

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) jk=R.btree.probe(L[i]) if (jk!=null) resL[k]=i

resR[k++]=jk.pos

L R

/34CS165, Fall 2017 Stratos Idreos 33

Read: textbook: chapters 4, 14

DATA SYSTEMSprof. Stratos Idreos

class 14

joins