complex plans and hybrid layouts - Harvard...

38
complex plans and hybrid layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 7

Transcript of complex plans and hybrid layouts - Harvard...

Page 1: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

complex plans and hybrid layoutsprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 7

Page 2: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /282

essential column-stores featuresvirtual ids late tuple reconstruction (if ever) vectorized execution compression fixed-width columns

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

Column"Store" Row"Store"

Run$

me'(sec)'

Performance'of'Column3Oriented'Op$miza$ons'

–Late"Materializa:on"

–Compression"

–Join"Op:miza:on"

–Tuple@at@a@:me"

Baseline"

Column-stores vs. row-stores: how different are they really? D. Abadi, S. Madden, and N. Hachem

ACM SIGMOD Conference on Management of Data, 2008

Page 3: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /283

but why now…weren’t all those design options obvious in the past as well?

moving data from disk

moving data from memory

computation 1) big memories 2) cpu vs memory speed

Page 4: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /28

main-memory systems

4

optimized for the memory wallwith or without persistent data

Page 5: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /28

other system categories noSQL (cs265), newSQL, matlab, etc..

column-stores = bad name modern systems

Page 6: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /286

column-storage row-storage

A B C DA B C D

two extremes

Page 7: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /287

PAX: store all data about a row in a single page but

organize data in a column major way inside each page

Anastasia Ailamaki, EPFL

Page 8: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /288

…A B C D A B C D A B C D

column-groups: store data about a row in a single page but use any kind of column-group combination inside each page

Page 9: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /289

A C D

B C D

column-groups on separate files: group columns in separate pages each row is spread in >1 pages

B

A

Page 10: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2810

A B C D A C DB

how do we decide?

Page 11: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2810

A B C D A C DB

first think what do we want to do with the data: access patterns

query A,B, query A,B,C,D

insert delete

update A

metric: data movement

# pages

how do we decide?

Page 12: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /28

offline?

11

Page 13: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /28

offline?

11

online?

Page 14: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /28

offline?

11

online?

code generation?

Page 15: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /28

offline?

11

online?

Browse: H2O: A Hands-free Adaptive StoreIoannis Alagiannis, Stratos Idreos, and Anastassia Ailamaki In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2014

code generation?

Page 16: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /28

can we do any better?

12

fixed-width dense

unordered columns

Page 17: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2813

zone1: min value =m1 max value = k1

zone2: min value =m2 max value = k2

Page 18: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2814

zone1: min value =m1 max value = k1

zone2: min value =m2 max value = k2

what if data is uniformly distributed

over the column

Page 19: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2814

zone1: min value =m1 max value = k1

zone2: min value =m2 max value = k2

what if data is uniformly distributed

over the column

adaptively reorganize columnto minimize # of zones a query has to scan

cs165/265 project: finalist ACM SIGMOD undergrad research competition 2016

Page 20: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2815

Jignesh Patel, U of Wisconsin

it is all about the bits

1 0 1 0 1 0 0 1

0 0 1 1 0 0 0 00 0 0 0 0 1 1 0

columnvalue1=

value2=value3=

Page 21: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2816

1 0 1 0 1 0 0 1

0 0 1 1 0 0 0 00 0 0 0 0 1 1 0

1 0 1 0 1 0 0 1

0 0 1 1 0 0 0 00 0 0 0 0 1 1 0

Extra: ByteSlice: Processing with a New Storage LayoutZiqiang Feng, Eric Lo, Ben Kao, Wenjian Xu In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2015

Browse: BitWeaving: fast scans for main memory data processingYinan Li, Jignesh M. Patel In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2013

Page 22: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2817

professor(id,name,…)

course(id,name, profId,…)

student(id,name,…)

database

give me all students enrolled in cs165select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

enrolled(studentId,

courseId,…) foreign key

Page 23: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2818

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)select(Sa,49,65)

49<S.a<65

Page 24: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2818

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(inter2,40,50)

select(Sa,50,65)select(Sa,49,65)

49<S.a<65

Page 25: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2819

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,49,65)

49<S.a<65

Page 26: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2819

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,49,65)

select(Sa,49,65)

49<S.a<65

Page 27: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2820

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,49,65)

49<S.a<65

Page 28: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2820

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,49,65)

49<S.a<65

Page 29: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2821

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,49,65)

49<S.a<65

Page 30: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2821

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,49,65)

49<S.a<65

Page 31: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2822

what happens after the join?

selectR.C

selectS.F

join

fetchR.A

fetchS.A

fetchR.D

fetchS.G

select R.D+S.G from R,S where R.A=S.A and R.C<10 and S.F>30

block operator

access patterns

R.D+ S.G

Use a simple join algorithm: Build hash table on S.A and then we probe

for each R.A value

Goal: Describe each operator: input/output and access patterns

watch out for random access!

Page 32: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2823

select R.A, R.B, R.C, S.A, S.B, S.C from R, S where R.J=S.J and …

R.* scan

pos R.Jfetch

posR.J

ordered sparse

ordered sparse

preparing the R join input

same for the S join input

we need the original positions so we can fetch other R columns after the join

Page 33: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2824

join input R.J R.J + posR

join input S.J S.J + posS partition (=reorder)

both join inputs to join

join result posR + posS

one or both sides are unordered

join

ordered sparse

ordered sparse

select R.A, R.B, R.C, S.A, S.B, S.C from R, S where R.J=S.J and …

1) We can sequentially scan R.J and probe S.J. Then result is in the order of R.J but S.J part is out of order

Page 34: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2824

join input R.J R.J + posR

join input S.J S.J + posS partition (=reorder)

both join inputs to join

join result posR + posS

one or both sides are unordered

join

ordered sparse

ordered sparse

select R.A, R.B, R.C, S.A, S.B, S.C from R, S where R.J=S.J and …

1) We can sequentially scan R.J and probe S.J. Then result is in the order of R.J but S.J part is out of order

2) More advanced algorithms will partitions both inputs (we will see why and how in future classes)

This means output is unordered with respect to both inputs…

Page 35: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2825

join result posR + posS

cluster on R

join result clustered on R posR + posS

fetch R payload with sequential pattern

1 2 3 4

ID + posS

cluster on S4 2 3 1

ID + posS

clus

tere

d

clus

tere

d fetch S payload with sequential pattern

4 2 3 1

e.g., ID + S.A

sort on ID decluster

all result columns aligned

radix declustering

both sides are unordered

cluster= partition just enough so that any random access is within a small range (e.g., a page)

Page 36: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2826

first part done: basic concepts in modern systems

coming up: indexing and fast scans

Page 37: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

CS165, Fall 2017 Stratos Idreos /2827

Browse: Cache-Conscious Radix Decluster ProjectionsBy S. Manegold, P. Boncz, N. Nes, and M. Kersten Very Large Databases Conference, 2004

Browse: H2O: A Hands-free Adaptive StoreIoannis Alagiannis, Stratos Idreos, and Anastassia Ailamaki In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2014

Browse: BitWeaving: fast scans for main memory data processingYinan Li, Jignesh M. Patel In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2013

Page 38: complex plans and hybrid layouts - Harvard SEASdaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · CS165, Fall 2017 Stratos Idreos 18/28 12 34 75 45 49 58 97 75 42 55 11 35

DATA SYSTEMSprof. Stratos Idreos

class 7

complex plans & hybrid layouts