Adaptive Adaptive Adaptive Indexing

20
Adaptive Adaptive Indexing Felix Martin Schuhknecht Jens Dittrich Laurent Linden Big Data Analytics Group bigdata.uni-saarland.de Saarland University ICDE 2018

Transcript of Adaptive Adaptive Adaptive Indexing

Page 1: Adaptive Adaptive Adaptive Indexing

Adaptive Adaptive Indexing

Felix Martin SchuhknechtJens Dittrich

Laurent Linden

Big Data Analytics Groupbigdata.uni-saarland.de

Saarland University

ICDE 2018

Page 2: Adaptive Adaptive Adaptive Indexing

2007

2/20

Page 3: Adaptive Adaptive Adaptive Indexing

?

Table Column

< 13

>= 13

< 42

>= 42

Index Column

Q0=[13,42) Q1=[6,27)

< 6

>= 6< 13

>= 13< 27

>=27< 42

>= 42

Index Column

Index Column

sortedQ2 Qn...

Database Cracking / Standard Cracking

3/20

Page 4: Adaptive Adaptive Adaptive Indexing

Problems?

1

10

100

1000

10000

100000

1 10 100 1000Res

pons

e Ti

me

Hig

her t

han

Full

Inde

x [%

]

Query Sequence

Individual PointsBezier Smoothed

Low Convergence Speed

1

10

100

1000

1 10 100 1000

Que

ry R

espo

nse

Tim

e [m

s]

Query Sequence

Convex HullIndividual Points

High Variance

[Felix Schuhknecht, Alekh Jindal, Jens Dittrich: The Uncracked Pieces in Database Cracking, PVLDB Vol. 7, No. 2, Best Paper Award]

Low Robustness

0

5

10

15

20

Random Sequential Skewed

Acc

umul

ated

Que

ry R

espo

nse

Tim

e [s

]

4/20

Page 5: Adaptive Adaptive Adaptive Indexing

5/20

Page 6: Adaptive Adaptive Adaptive Indexing

An Adaptive Adaptive Index?

All-in-one?

6/20

Page 7: Adaptive Adaptive Adaptive Indexing

Design rules:

1. Generalize way of refinement

2. Adapt refinement effort

3. Awareness of key distributions

7/20

Page 8: Adaptive Adaptive Adaptive Indexing

1. Generalize way of refinement:

partition-in-kQ0

Base Table36

13

67

42

99

78

18

85

28

55

5

47

Index Column

out-of-place

partition-in-k

13

18

5

36

42

28

47

67

55

99

78

85

Qi, i>0

Index Column

in-placepartition-in-k

13

18

5

36

42

28

47

67

55

99

78

85

13

18

5

28

36

42

47

67

55

99

78

85

8/20

Page 9: Adaptive Adaptive Adaptive Indexing

1. Generalize way of refinement:

Base Table36

13

67

42

99

78

18

85

28

55

5

47

out-of-place

partition-in-k

Index Column13

18

5

36

42

28

47

67

55

99

78

85

9/20

Page 10: Adaptive Adaptive Adaptive Indexing

1. Generalize way of refinement:

Base Table36

13

67

42

99

78

18

85

28

55

5

47

Index Column13

18

5

36

42

28

47

67

55

99

78

85

36 42

Hardwarewrite-combine

buffer

_mm256_stream_si256

flush

13

36 42

67

Software-managedbuffers

36

42

9/20

Page 11: Adaptive Adaptive Adaptive Indexing

2. Adapt refinement effort

0

5

10

15

20

25

30

35

4 32 512 4 32 512 4 32 512 4 32 512

32KB (L1) 256KB (L2) 2MB (Page) 10MB (L3)

Runtim

e in

[m

s]

Partitioning Fanout

Input data size

2 x In-place crack-in-two2 x In-place radix partitioning

Qi, i > 0

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

4 8

16

32

64

128

256

512

1024

2048

4096

8192

16384

32768

Runtim

e in

[s]

Partitioning Fanout

Out-of-place radix partitioningOut-of-place crack-in-two + In-place crack-in-two

Q0

partition size query sequence number11/20

Page 12: Adaptive Adaptive Adaptive Indexing

2. Adapt refinement effort

1

10

100

0 10 20 30 40 50 60 70 80

Fanout bits

: f(

s,q

)

Partition Size (MB): s

tadapt = 64MBtsort = 2MBf(s,q)bminbmax

12/20

Page 13: Adaptive Adaptive Adaptive Indexing

3. Awareness of key distributions: skew?IndexColumn

0000

01

10

11

0001

0010

0011

in-place on

bmin=4 bits

0000

0001

00100011

Input

1. H

isto

gram

00

01

10

11

bfirst

00 01 10 11

IndexColumn

out-of-place on

bfirst=2 bits+

Histogram on 00

0000

01

10

11

0001

0010

0011

0001

1011

bmin

0000

0001

0010

0011

13/20

Page 14: Adaptive Adaptive Adaptive Indexing

Putting it all together254

192

65

25

1

90

127

120

36

200

28

35

113

41

66

50

164

145

74

180

56

6

128

99

80

105

187

34

38

49

205

65

25

1

90

127

120

36

28

35

113

41

66

50

74

56

6

99

80

105

34

38

49

254

192

200

164

145

180

128

187

205

Q0 = [35,185] Q1 = [38,149] Q2 = [48,109]

25

1

36

28

35

41

50

56

6

34

38

49

65

90

127

120

113

66

74

99

80

105

254

192

200

164

145

180

128

187

205

25

1

28

6

36

35

41

50

56

34

38

49

65

90

127

120

113

66

74

99

80

105

164

145

180

128

187

254

192

200

205

25

1

28

6

36

35

34

38

41

50

49

56

65

90

66

74

80

127

120

113

99

105

164

145

180

128

187

254

192

200

205

out-of-placeradix

partitioning(bfirst=1)

1

0

127128

255

0

127128

255

6463skew

diffusing(bmin=1)

2

FILTER

3

3

0

127128

255

6463

3132

191192

in-placeradix

partitioningf(s,q)=1

4

in-placeradix

partitioningf(s,q)=1

5

0

127128

255

6456-63

3132

191192

in-placeradix

partitioningf(s,q)=2

6

9596

3940-4748

55

in-placeradix

partitioningf(s,q)=1

7SCAN

FILTER

FILTER

SCAN

FILTER

SCAN

F

FILTER

25

1

28

6

34

35

36

38

41

49

50

56

65

90

66

74

80

127

120

113

99

105

164

145

180

128

187

254

192

200

205

Q3 = [36,49]

0

127128

255

6456-63

3132

191192

9596

3940-4748

559

8

sortf(s,q)=64

sortf(s,q)=64

SCAN

10

254

192

65

25

1

90

127

120

36

200

28

35

113

41

66

50

164

145

74

180

56

6

128

99

80

105

187

34

38

49

205

65

25

1

90

127

120

36

28

35

113

41

66

50

74

56

6

99

80

105

34

38

49

254

192

200

164

145

180

128

187

205

Q0 = [35,185] Q1 = [38,149] Q2 = [48,109]

25

1

36

28

35

41

50

56

6

34

38

49

65

90

127

120

113

66

74

99

80

105

254

192

200

164

145

180

128

187

205

25

1

28

6

36

35

41

50

56

34

38

49

65

90

127

120

113

66

74

99

80

105

164

145

180

128

187

254

192

200

205

25

1

28

6

36

35

34

38

41

50

49

56

65

90

66

74

80

127

120

113

99

105

164

145

180

128

187

254

192

200

205

out-of-placeradix

partitioning(bfirst=1)

1

0

127128

255

0

127128

255

6463skew

diffusing(bmin=1)

2

FILTER

3

3

0

127128

255

6463

3132

191192

in-placeradix

partitioningf(s,q)=1

4

in-placeradix

partitioningf(s,q)=1

5

0

127128

255

6456-63

3132

191192

in-placeradix

partitioningf(s,q)=2

6

9596

3940-4748

55

in-placeradix

partitioningf(s,q)=1

7SCAN

FILTER

FILTER

SCAN

FILTER

SCAN

F

FILTER

25

1

28

6

34

35

36

38

41

49

50

56

65

90

66

74

80

127

120

113

99

105

164

145

180

128

187

254

192

200

205

Q3 = [36,49]

0

127128

255

6456-63

3132

191192

9596

3940-4748

559

8

sortf(s,q)=64

sortf(s,q)=64

SCAN

10

254

192

65

25

1

90

127

120

36

200

28

35

113

41

66

50

164

145

74

180

56

6

128

99

80

105

187

34

38

49

205

65

25

1

90

127

120

36

28

35

113

41

66

50

74

56

6

99

80

105

34

38

49

254

192

200

164

145

180

128

187

205

Q0 = [35,185] Q1 = [38,149] Q2 = [48,109]

25

1

36

28

35

41

50

56

6

34

38

49

65

90

127

120

113

66

74

99

80

105

254

192

200

164

145

180

128

187

205

25

1

28

6

36

35

41

50

56

34

38

49

65

90

127

120

113

66

74

99

80

105

164

145

180

128

187

254

192

200

205

25

1

28

6

36

35

34

38

41

50

49

56

65

90

66

74

80

127

120

113

99

105

164

145

180

128

187

254

192

200

205

out-of-placeradix

partitioning(bfirst=1)

1

0

127128

255

0

127128

255

6463skew

diffusing(bmin=1)

2

FILTER

3

3

0

127128

255

6463

3132

191192

in-placeradix

partitioningf(s,q)=1

4

in-placeradix

partitioningf(s,q)=1

5

0

127128

255

6456-63

3132

191192

in-placeradix

partitioningf(s,q)=2

6

9596

3940-4748

55

in-placeradix

partitioningf(s,q)=1

7SCAN

FILTER

FILTER

SCAN

FILTER

SCAN

F

FILTER

25

1

28

6

34

35

36

38

41

49

50

56

65

90

66

74

80

127

120

113

99

105

164

145

180

128

187

254

192

200

205

Q3 = [36,49]

0

127128

255

6456-63

3132

191192

9596

3940-4748

559

8

sortf(s,q)=64

sortf(s,q)=64

SCAN

10

254

192

65

25

1

90

127

120

36

200

28

35

113

41

66

50

164

145

74

180

56

6

128

99

80

105

187

34

38

49

205

65

25

1

90

127

120

36

28

35

113

41

66

50

74

56

6

99

80

105

34

38

49

254

192

200

164

145

180

128

187

205

Q0 = [35,185] Q1 = [38,149] Q2 = [48,109]

25

1

36

28

35

41

50

56

6

34

38

49

65

90

127

120

113

66

74

99

80

105

254

192

200

164

145

180

128

187

205

25

1

28

6

36

35

41

50

56

34

38

49

65

90

127

120

113

66

74

99

80

105

164

145

180

128

187

254

192

200

205

25

1

28

6

36

35

34

38

41

50

49

56

65

90

66

74

80

127

120

113

99

105

164

145

180

128

187

254

192

200

205

out-of-placeradix

partitioning(bfirst=1)

1

0

127128

255

0

127128

255

6463skew

diffusing(bmin=1)

2

FILTER

3

3

0

127128

255

6463

3132

191192

in-placeradix

partitioningf(s,q)=1

4

in-placeradix

partitioningf(s,q)=1

5

0

127128

255

6456-63

3132

191192

in-placeradix

partitioningf(s,q)=2

6

9596

3940-4748

55

in-placeradix

partitioningf(s,q)=1

7SCAN

FILTER

FILTER

SCAN

FILTER

SCAN

F

FILTER

25

1

28

6

34

35

36

38

41

49

50

56

65

90

66

74

80

127

120

113

99

105

164

145

180

128

187

254

192

200

205

Q3 = [36,49]

0

127128

255

6456-63

3132

191192

9596

3940-4748

559

8

sortf(s,q)=64

sortf(s,q)=64

SCAN

10

14/20

Page 15: Adaptive Adaptive Adaptive Indexing

Emulation

[Felix Martin Schuhknecht, Alekh Jindal, Jens Dittrich: The Uncracked Pieces in Database Cracking, PVLDB Vol. 7, No. 2]

15/20

Page 16: Adaptive Adaptive Adaptive Indexing

Test Setup

Freq

uenc

yUNIFORM [0,264)

Key range

NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6

16/20

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Ke

y R

an

ge

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Ke

y R

an

ge

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALTZOOMOUTALT ZOOMINALTSEQUENTIALPERIODICSKEWRANDOM

Query Sequence

Key

Ran

ge

[Felix Halim, Stratos Idreos, Panagiotis Karras, Roland H. C. Yap: Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores, PVLDB Vol. 5, No. 6]

Page 17: Adaptive Adaptive Adaptive Indexing

Individual Query Response Times

Meta-adaptive Index (Manually configured)DC DD1R HCS

1

10

100

1000

10000

1 10 100 1000

Sing

le Q

uery

Res

pons

e Ti

me

[ms]

Query Sequence

Sort + Binary Search

Freq

uenc

y

UNIFORM [0,264)

Key range

NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Ke

y R

an

ge

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Ke

y R

an

ge

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALTZOOMOUTALT ZOOMINALTSEQUENTIALPERIODICSKEWRANDOM

Query Sequence

Key

Ran

ge

Freq

uenc

y

UNIFORM [0,264)

Key range

NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6

Meta-adaptive Index (Manually configured)DC DD1R HCS

1

10

100

1000

10000

1 10 100 1000

Sing

le Q

uery

Res

pons

e Ti

me

[ms]

Query Sequence

Sort + Binary Search

17/20

bfirst=10bmin=3bmax=6tadapt=64MBtsort=256KB

Adaptive Adaptive Index

Page 18: Adaptive Adaptive Adaptive Indexing

Freq

uenc

y

UNIFORM [0,264)

Key range

NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6

Freq

uenc

y

UNIFORM [0,264)

Key range

NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6

1

10

100

1000

10000

1 10 100 1000

Sing

le Q

uery

Res

pons

e Ti

me

[ms]

Query Sequence

Meta-adaptive Index (Manually configured)DC DD1R HCS

1

10

100

1000

10000

1 10 100 1000

Sing

le Q

uery

Res

pons

e Ti

me

[ms]

Query Sequence

Sort + Binary Search

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Ke

y R

an

ge

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Ke

y R

an

ge

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALTZOOMOUTALT ZOOMINALTSEQUENTIALPERIODICSKEWRANDOM

Query Sequence

Key

Ran

ge

Individual Query Response Times

18/20

bfirst=10bmin=3bmax=6tadapt=64MBtsort=256KB

Adaptive Adaptive Index

Page 19: Adaptive Adaptive Adaptive Indexing

Accumulated Query Response Times

DC DD1R HCS

5

10

15

20

25

RA

ND

OM

SK

EW

ED

PE

RIO

DIC

SE

QU

EN

TIA

L

ZO

OM

OU

TA

LT

ZO

OM

INA

LT

Acc

um

. Q

uery

Resp

onse

Tim

e [s]

Query Workloads

DCDD1R

HCS

Madaptive Index (Manually configured)Madaptive Index (Simulated annealing configured)Meta-adaptive Index (Simulated annealing configured)

Meta-adaptive Index (Manually configured)

Query Workloads

Freq

uenc

y

UNIFORM [0,264)

Key range

NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6

Freq

uenc

y

UNIFORM [0,264)

Key range

NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODIC

Key

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODICK

ey

Ra

ng

e

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALT

RANDOM SKEW PERIODICK

ey

Range

SEQUENTIAL

Query Sequence

ZOOMOUTALT ZOOMINALTZOOMOUTALT ZOOMINALTSEQUENTIALPERIODICSKEWRANDOM

Query Sequence

Key

Ran

ge

19/20

bfirst=10bmin=3bmax=6tadapt=64MBtsort=256KB

Adaptive Adaptive IndexAdaptive Adaptive Index

Page 20: Adaptive Adaptive Adaptive Indexing

20/20