Adaptive Adaptive Adaptive Indexing
Transcript of Adaptive Adaptive Adaptive Indexing
Adaptive Adaptive Indexing
Felix Martin SchuhknechtJens Dittrich
Laurent Linden
Big Data Analytics Groupbigdata.uni-saarland.de
Saarland University
ICDE 2018
2007
2/20
?
Table Column
< 13
>= 13
< 42
>= 42
Index Column
Q0=[13,42) Q1=[6,27)
< 6
>= 6< 13
>= 13< 27
>=27< 42
>= 42
Index Column
Index Column
sortedQ2 Qn...
Database Cracking / Standard Cracking
3/20
Problems?
1
10
100
1000
10000
100000
1 10 100 1000Res
pons
e Ti
me
Hig
her t
han
Full
Inde
x [%
]
Query Sequence
Individual PointsBezier Smoothed
Low Convergence Speed
1
10
100
1000
1 10 100 1000
Que
ry R
espo
nse
Tim
e [m
s]
Query Sequence
Convex HullIndividual Points
High Variance
[Felix Schuhknecht, Alekh Jindal, Jens Dittrich: The Uncracked Pieces in Database Cracking, PVLDB Vol. 7, No. 2, Best Paper Award]
Low Robustness
0
5
10
15
20
Random Sequential Skewed
Acc
umul
ated
Que
ry R
espo
nse
Tim
e [s
]
4/20
5/20
An Adaptive Adaptive Index?
All-in-one?
6/20
Design rules:
1. Generalize way of refinement
2. Adapt refinement effort
3. Awareness of key distributions
7/20
1. Generalize way of refinement:
partition-in-kQ0
Base Table36
13
67
42
99
78
18
85
28
55
5
47
Index Column
out-of-place
partition-in-k
13
18
5
36
42
28
47
67
55
99
78
85
Qi, i>0
Index Column
in-placepartition-in-k
13
18
5
36
42
28
47
67
55
99
78
85
13
18
5
28
36
42
47
67
55
99
78
85
8/20
1. Generalize way of refinement:
Base Table36
13
67
42
99
78
18
85
28
55
5
47
out-of-place
partition-in-k
Index Column13
18
5
36
42
28
47
67
55
99
78
85
9/20
1. Generalize way of refinement:
Base Table36
13
67
42
99
78
18
85
28
55
5
47
Index Column13
18
5
36
42
28
47
67
55
99
78
85
36 42
Hardwarewrite-combine
buffer
_mm256_stream_si256
flush
13
36 42
67
Software-managedbuffers
36
42
9/20
2. Adapt refinement effort
0
5
10
15
20
25
30
35
4 32 512 4 32 512 4 32 512 4 32 512
32KB (L1) 256KB (L2) 2MB (Page) 10MB (L3)
Runtim
e in
[m
s]
Partitioning Fanout
Input data size
2 x In-place crack-in-two2 x In-place radix partitioning
Qi, i > 0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
4 8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
Runtim
e in
[s]
Partitioning Fanout
Out-of-place radix partitioningOut-of-place crack-in-two + In-place crack-in-two
Q0
partition size query sequence number11/20
2. Adapt refinement effort
1
10
100
0 10 20 30 40 50 60 70 80
Fanout bits
: f(
s,q
)
Partition Size (MB): s
tadapt = 64MBtsort = 2MBf(s,q)bminbmax
12/20
3. Awareness of key distributions: skew?IndexColumn
✓
✓
✓
✓
✓
✓
✓
0000
01
10
11
0001
0010
0011
in-place on
bmin=4 bits
0000
0001
00100011
✘
✓
✓
✓
Input
1. H
isto
gram
00
01
10
11
bfirst
00 01 10 11
IndexColumn
✓
✓
✓
out-of-place on
bfirst=2 bits+
Histogram on 00
0000
01
10
11
0001
0010
0011
0001
1011
bmin
0000
0001
0010
0011
13/20
Putting it all together254
192
65
25
1
90
127
120
36
200
28
35
113
41
66
50
164
145
74
180
56
6
128
99
80
105
187
34
38
49
205
65
25
1
90
127
120
36
28
35
113
41
66
50
74
56
6
99
80
105
34
38
49
254
192
200
164
145
180
128
187
205
Q0 = [35,185] Q1 = [38,149] Q2 = [48,109]
25
1
36
28
35
41
50
56
6
34
38
49
65
90
127
120
113
66
74
99
80
105
254
192
200
164
145
180
128
187
205
25
1
28
6
36
35
41
50
56
34
38
49
65
90
127
120
113
66
74
99
80
105
164
145
180
128
187
254
192
200
205
25
1
28
6
36
35
34
38
41
50
49
56
65
90
66
74
80
127
120
113
99
105
164
145
180
128
187
254
192
200
205
out-of-placeradix
partitioning(bfirst=1)
1
0
127128
255
0
127128
255
6463skew
diffusing(bmin=1)
2
FILTER
3
3
0
127128
255
6463
3132
191192
in-placeradix
partitioningf(s,q)=1
4
in-placeradix
partitioningf(s,q)=1
5
0
127128
255
6456-63
3132
191192
in-placeradix
partitioningf(s,q)=2
6
9596
3940-4748
55
in-placeradix
partitioningf(s,q)=1
7SCAN
FILTER
FILTER
SCAN
FILTER
SCAN
F
FILTER
25
1
28
6
34
35
36
38
41
49
50
56
65
90
66
74
80
127
120
113
99
105
164
145
180
128
187
254
192
200
205
Q3 = [36,49]
0
127128
255
6456-63
3132
191192
9596
3940-4748
559
8
sortf(s,q)=64
sortf(s,q)=64
SCAN
10
254
192
65
25
1
90
127
120
36
200
28
35
113
41
66
50
164
145
74
180
56
6
128
99
80
105
187
34
38
49
205
65
25
1
90
127
120
36
28
35
113
41
66
50
74
56
6
99
80
105
34
38
49
254
192
200
164
145
180
128
187
205
Q0 = [35,185] Q1 = [38,149] Q2 = [48,109]
25
1
36
28
35
41
50
56
6
34
38
49
65
90
127
120
113
66
74
99
80
105
254
192
200
164
145
180
128
187
205
25
1
28
6
36
35
41
50
56
34
38
49
65
90
127
120
113
66
74
99
80
105
164
145
180
128
187
254
192
200
205
25
1
28
6
36
35
34
38
41
50
49
56
65
90
66
74
80
127
120
113
99
105
164
145
180
128
187
254
192
200
205
out-of-placeradix
partitioning(bfirst=1)
1
0
127128
255
0
127128
255
6463skew
diffusing(bmin=1)
2
FILTER
3
3
0
127128
255
6463
3132
191192
in-placeradix
partitioningf(s,q)=1
4
in-placeradix
partitioningf(s,q)=1
5
0
127128
255
6456-63
3132
191192
in-placeradix
partitioningf(s,q)=2
6
9596
3940-4748
55
in-placeradix
partitioningf(s,q)=1
7SCAN
FILTER
FILTER
SCAN
FILTER
SCAN
F
FILTER
25
1
28
6
34
35
36
38
41
49
50
56
65
90
66
74
80
127
120
113
99
105
164
145
180
128
187
254
192
200
205
Q3 = [36,49]
0
127128
255
6456-63
3132
191192
9596
3940-4748
559
8
sortf(s,q)=64
sortf(s,q)=64
SCAN
10
254
192
65
25
1
90
127
120
36
200
28
35
113
41
66
50
164
145
74
180
56
6
128
99
80
105
187
34
38
49
205
65
25
1
90
127
120
36
28
35
113
41
66
50
74
56
6
99
80
105
34
38
49
254
192
200
164
145
180
128
187
205
Q0 = [35,185] Q1 = [38,149] Q2 = [48,109]
25
1
36
28
35
41
50
56
6
34
38
49
65
90
127
120
113
66
74
99
80
105
254
192
200
164
145
180
128
187
205
25
1
28
6
36
35
41
50
56
34
38
49
65
90
127
120
113
66
74
99
80
105
164
145
180
128
187
254
192
200
205
25
1
28
6
36
35
34
38
41
50
49
56
65
90
66
74
80
127
120
113
99
105
164
145
180
128
187
254
192
200
205
out-of-placeradix
partitioning(bfirst=1)
1
0
127128
255
0
127128
255
6463skew
diffusing(bmin=1)
2
FILTER
3
3
0
127128
255
6463
3132
191192
in-placeradix
partitioningf(s,q)=1
4
in-placeradix
partitioningf(s,q)=1
5
0
127128
255
6456-63
3132
191192
in-placeradix
partitioningf(s,q)=2
6
9596
3940-4748
55
in-placeradix
partitioningf(s,q)=1
7SCAN
FILTER
FILTER
SCAN
FILTER
SCAN
F
FILTER
25
1
28
6
34
35
36
38
41
49
50
56
65
90
66
74
80
127
120
113
99
105
164
145
180
128
187
254
192
200
205
Q3 = [36,49]
0
127128
255
6456-63
3132
191192
9596
3940-4748
559
8
sortf(s,q)=64
sortf(s,q)=64
SCAN
10
254
192
65
25
1
90
127
120
36
200
28
35
113
41
66
50
164
145
74
180
56
6
128
99
80
105
187
34
38
49
205
65
25
1
90
127
120
36
28
35
113
41
66
50
74
56
6
99
80
105
34
38
49
254
192
200
164
145
180
128
187
205
Q0 = [35,185] Q1 = [38,149] Q2 = [48,109]
25
1
36
28
35
41
50
56
6
34
38
49
65
90
127
120
113
66
74
99
80
105
254
192
200
164
145
180
128
187
205
25
1
28
6
36
35
41
50
56
34
38
49
65
90
127
120
113
66
74
99
80
105
164
145
180
128
187
254
192
200
205
25
1
28
6
36
35
34
38
41
50
49
56
65
90
66
74
80
127
120
113
99
105
164
145
180
128
187
254
192
200
205
out-of-placeradix
partitioning(bfirst=1)
1
0
127128
255
0
127128
255
6463skew
diffusing(bmin=1)
2
FILTER
3
3
0
127128
255
6463
3132
191192
in-placeradix
partitioningf(s,q)=1
4
in-placeradix
partitioningf(s,q)=1
5
0
127128
255
6456-63
3132
191192
in-placeradix
partitioningf(s,q)=2
6
9596
3940-4748
55
in-placeradix
partitioningf(s,q)=1
7SCAN
FILTER
FILTER
SCAN
FILTER
SCAN
F
FILTER
25
1
28
6
34
35
36
38
41
49
50
56
65
90
66
74
80
127
120
113
99
105
164
145
180
128
187
254
192
200
205
Q3 = [36,49]
0
127128
255
6456-63
3132
191192
9596
3940-4748
559
8
sortf(s,q)=64
sortf(s,q)=64
SCAN
10
14/20
Emulation
[Felix Martin Schuhknecht, Alekh Jindal, Jens Dittrich: The Uncracked Pieces in Database Cracking, PVLDB Vol. 7, No. 2]
15/20
Test Setup
Freq
uenc
yUNIFORM [0,264)
Key range
NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6
16/20
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Ke
y R
an
ge
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Ke
y R
an
ge
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALTZOOMOUTALT ZOOMINALTSEQUENTIALPERIODICSKEWRANDOM
Query Sequence
Key
Ran
ge
[Felix Halim, Stratos Idreos, Panagiotis Karras, Roland H. C. Yap: Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores, PVLDB Vol. 5, No. 6]
Individual Query Response Times
Meta-adaptive Index (Manually configured)DC DD1R HCS
1
10
100
1000
10000
1 10 100 1000
Sing
le Q
uery
Res
pons
e Ti
me
[ms]
Query Sequence
Sort + Binary Search
Freq
uenc
y
UNIFORM [0,264)
Key range
NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Ke
y R
an
ge
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Ke
y R
an
ge
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALTZOOMOUTALT ZOOMINALTSEQUENTIALPERIODICSKEWRANDOM
Query Sequence
Key
Ran
ge
Freq
uenc
y
UNIFORM [0,264)
Key range
NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6
Meta-adaptive Index (Manually configured)DC DD1R HCS
1
10
100
1000
10000
1 10 100 1000
Sing
le Q
uery
Res
pons
e Ti
me
[ms]
Query Sequence
Sort + Binary Search
17/20
bfirst=10bmin=3bmax=6tadapt=64MBtsort=256KB
Adaptive Adaptive Index
Freq
uenc
y
UNIFORM [0,264)
Key range
NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6
Freq
uenc
y
UNIFORM [0,264)
Key range
NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6
1
10
100
1000
10000
1 10 100 1000
Sing
le Q
uery
Res
pons
e Ti
me
[ms]
Query Sequence
Meta-adaptive Index (Manually configured)DC DD1R HCS
1
10
100
1000
10000
1 10 100 1000
Sing
le Q
uery
Res
pons
e Ti
me
[ms]
Query Sequence
Sort + Binary Search
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Ke
y R
an
ge
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Ke
y R
an
ge
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALTZOOMOUTALT ZOOMINALTSEQUENTIALPERIODICSKEWRANDOM
Query Sequence
Key
Ran
ge
Individual Query Response Times
18/20
bfirst=10bmin=3bmax=6tadapt=64MBtsort=256KB
Adaptive Adaptive Index
Accumulated Query Response Times
DC DD1R HCS
5
10
15
20
25
RA
ND
OM
SK
EW
ED
PE
RIO
DIC
SE
QU
EN
TIA
L
ZO
OM
OU
TA
LT
ZO
OM
INA
LT
Acc
um
. Q
uery
Resp
onse
Tim
e [s]
Query Workloads
DCDD1R
HCS
Madaptive Index (Manually configured)Madaptive Index (Simulated annealing configured)Meta-adaptive Index (Simulated annealing configured)
Meta-adaptive Index (Manually configured)
Query Workloads
Freq
uenc
y
UNIFORM [0,264)
Key range
NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6
Freq
uenc
y
UNIFORM [0,264)
Key range
NORMAL (µ=263,σ=261) ZIPF [0,264), α=0.6
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODIC
Key
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODICK
ey
Ra
ng
e
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALT
RANDOM SKEW PERIODICK
ey
Range
SEQUENTIAL
Query Sequence
ZOOMOUTALT ZOOMINALTZOOMOUTALT ZOOMINALTSEQUENTIALPERIODICSKEWRANDOM
Query Sequence
Key
Ran
ge
19/20
bfirst=10bmin=3bmax=6tadapt=64MBtsort=256KB
Adaptive Adaptive IndexAdaptive Adaptive Index
20/20