NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column Stores
fast multi-column sorting in main-memory column-stores
-
Upload
wenjian-xu -
Category
Software
-
view
188 -
download
0
Transcript of fast multi-column sorting in main-memory column-stores
![Page 1: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/1.jpg)
Fast Multi-Column Sorting in Main-Memory Column-Stores
Wenjian Xu†, Ziqiang Feng†, Eric Lo‡
†The Hong Kong Polytechnic University‡The Chinese University of Hong Kong
![Page 2: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/2.jpg)
2
Background
Analytic database
Read-most queries
Main memory
Column store
Column compression
De-normalization
![Page 3: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/3.jpg)
3
Sort
• Implementing SQL operators like• GROUP BY• ORDER BY• PARTITION BY
![Page 4: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/4.jpg)
4
SIMD-Sort
256-bit SIMD register
0xBBB0000000F0x222000000010x333000F00090x8000000000E0x110003000010x1020FF000000x108000000900x1000200000E
…
44-bit column
0xBBB0000000F 0x22200000001 0x333000F0009 0x8000000000E
64-bit bank
4x data parallelism
Bank size could be 8-bit, 16-bit, 32-bit, or
64-bit
![Page 5: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/5.jpg)
5
SIMD-Sort
0xA0000x10000x20000x72000x00000x020F0x08000x0002
…
0xA000 0x1000 0x2000 0x7200 0x0000 0x020F 0x0800 0x0002 0xBB00 0x1C00 0x0022 0x7200 0x00F0 0xFFFF 0xBBCF 0x1000
44-bit column 16-bit bank
16x data parallelism
256-bit SIMD register
Parallelism degree depends on the code width of the column
16
Bank size could be 8-bit, 16-bit, 32-bit, or
64-bit
![Page 6: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/6.jpg)
Multi-Column Sorting
6
SELECT FROM orders
ORDER BY order_date, retail_price
Multiple attributes
Multi-Column Sorting Scan+Lookup+Aggregation
Q1 Q2 Q3 Q7
Q9 Q10 Q16 Q18
TPC-H QueriesMulti-Column Sorting becomes the bottleneck
Widespread in workloads: 45% TPC-H queries, 72% TPC-DS queries
Our work: Optimizing Multi-Column Sorting
Example Query:
![Page 7: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/7.jpg)
7
State-of-the-Art Implementation: Column-at-a-Time
X (20-bit)0xEEEEE0x000000xEEEEE0x000000xEEEEE0x000000xEEEEE
1234567
32-bit bank SIMD-sort
8x parallelism
0x100010x100010x100010x100010x100030x100030x10003
0xEEEEE 10x00000 20xEEEEE 30x00000 40xEEEEE 50x00000 60xEEEEE 7
oid Y (12-bit)0xAAA0xCCC0xBBB0xAAA0xAAA0xFFF0xCCC
Order by X, Y
![Page 8: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/8.jpg)
8
State-of-the-Art Implementation: Column-at-a-Time
X (20-bit)0xEEEEE0x000000xEEEEE0x000000xEEEEE0x000000xEEEEE
1234567
32-bit bankSIMD-sort
8x parallelism
0x000000x000000x000000xEEEEE0xEEEEE0xEEEEE0xEEEEE
2461357
Y (12-bit)0xAAA0xCCC0xBBB0xAAA0xAAA0xFFF0xCCC
oid
0xCCC
0xAAA
0xFFF
0xAAA
Order by X, Y
![Page 9: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/9.jpg)
9
State-of-the-Art Implementation: Column-at-a-Time
X (20-bit)0xEEEEE0x000000xEEEEE0x000000xEEEEE0x000000xEEEEE
1234567
32-bit bankSIMD-sort
8x parallelism
0x000000x000000x000000xEEEEE0xEEEEE0xEEEEE0xEEEEE
2461357
Y (12-bit)2461357
16-bit bankSIMD-sort
16x parallelism
16-bit bankSIMD-sort
16x parallelism
0xAAA0xCCC0xBBB0xAAA0xAAA0xFFF0xCCC
oid
LOOKUP
0xCCC0xAAA0xFFF0xAAA0xBBB0xAAA0xCCC
0xAAA0xCCC0xFFF0xAAA0xAAA0xBBB0xCCC
4261537
Order by X, Y
Can we do better?
![Page 10: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/10.jpg)
10
Option 1: Stitch TogetherX (20-bit)
0xEEEEE0x000000xEEEEE0x000000xEEEEE0x000000xEEEEE
1234567
32-bit bank SIMD-sort
8x parallelism
0x000000x000000x000000xEEEEE0xEEEEE0xEEEEE0xEEEEE
2461357
Y (12-bit)2461357
16-bit bankSIMD-sort
16x parallelism
16-bit bankSIMD-sort
16x parallelism
0xAAA0xCCC0xBBB0xAAA0xAAA0xFFF0xCCC
oid
0xCCC0xAAA0xFFF0xAAA0xBBB0xAAA0xCCC
0xAAA0xCCC0xFFF0xAAA0xAAA0xBBB0xCCC
4261537
0xEEEEE AAA0x00000 CCC0xEEEEE BBB
LOOKUP
Stitch
LOOKUP
Column-at-a-Time
Stitch X and Y
State-of-the-Art Implementation: Column-at-a-Time
![Page 11: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/11.jpg)
11
Option 1: Stitch TogetherX (20-bit)
0xEEEEE0x000000xEEEEE0x000000xEEEEE0x000000xEEEEE
1234567
32-bit bankSIMD-sort
8x parallelism
0x000000x000000x000000xEEEEE0xEEEEE0xEEEEE0xEEEEE
2461357
Y (12-bit)2461357
16-bit bankSIMD-sort
16x parallelism
16-bit bankSIMD-sort
16x parallelism
0xAAA0xCCC0xBBB0xAAA0xAAA0xFFF0xCCC
oid
0xCCC0xAAA0xFFF0xAAA0xBBB0xAAA0xCCC
0xAAA0xCCC0xFFF0xAAA0xAAA0xBBB0xCCC
4261537
Supercolumn(32-bit)
LOOKUP
0xEEEEE0x000000xEEEEE0x000000xEEEEE0x000000xEEEEE
AAACCCBBBAAAAAAFFFCCC
32-bit bankSIMD-sort
8x parallelism
1234567
0x00000AAA0x00000CCC0x00000FFF0xEEEEEAAA0xEEEEEAAA0xEEEEEBBB0xEEEEECCC
4261537
Save one LOOKUP operation
LOOKUP
Stitch
Column-at-a-Time
Stitch X and Y
Correctness proved!
Save one round of sorting
Stitch overhead
WIN
![Page 12: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/12.jpg)
12
Is stitch together always good?Let’s consider another example.
![Page 13: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/13.jpg)
13
Option 1: Stitch TogetherX (20-bit)
0xEEEEEE0x0000000xEEEEEE0x0000000xEEEEEE0x0000000xEEEEEE
32-bit bank SIMD-sort
8x parallelism
0x100010x100010x100010x100030x100030x100030x10003
Y (12-bit)32-bit bankSIMD-sort
8x parallelism
32-bit bankSIMD-sort
8x parallelism
0xAAAAA0xCCCCC0xAAAAA0xCCCCC0xCCCCC0xAAAAA0xCCCCC
0x00C0x00A0x00F0x00A0x00B0x00A0x00C
0x00A0x00C0x00F0x00A0x00A0x00B0x00C
LOOKUPLOOKUP
24 20
Supercolumn(32-bit)
0xEEEEEE0x0000000xEEEEEE0x0000000xEEEEEE0x0000000xEEEEEE
AAAAACCCCCAAAAACCCCCCCCCCAAAAACCCCC
32-bit bankSIMD-sort
4x parallelism
0x00000AAA0x00000CCC0x00000FFF0xEEEEEAAA0xEEEEEAAA0xEEEEEBBB0xEEEEECCC
Stitch Stitch X and Y
44
64
Column-at-a-Time
Lower Data Parallelism LOSE
Any alternatives other than Stitching X and Y in this example?
![Page 14: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/14.jpg)
14
0xAAAAA0xCCCCC0xAAAAA0xCCCCC0xCCCCC0xAAAAA0xCCCCC
0xEEEEEE0x0000000xEEEEEE0x0000000xEEEEEE0x0000000xEEEEEE
Option 2: Bit BorrowingX (24-bit)
0xEEEEEE0x0000000xEEEEEE0x0000000xEEEEEE0x0000000xEEEEEE
32-bit bank SIMD-sort
8x parallelism
0x100010x100010x100010x100030x100030x100030x10003
Y (20-bit)32-bit bankSIMD-sort
8x parallelism
32-bit bankSIMD-sort
8x parallelism
0x00C0x00A0x00F0x00A0x00B0x00A0x00C
0x00A0x00C0x00F0x00A0x00A0x00B0x00C
LOOKUPLOOKUP
<< 4 bitsX (24-bit) Y (20-bit)
0xAAAAA0xCCCCC0xAAAAA0xCCCCC0xCCCCC0xAAAAA0xCCCCC
ACACCAC
32-bit bank SIMD-sort
8x parallelism
16-bit bankSIMD-sort
16x parallelism
16-bit bankSIMD-sort
16x parallelism
0x000000A0x000000C0x000000C0xEEEEEEA0xEEEEEEA0xEEEEEEC0xEEEEEEC
28 16
Option 1: Stitch TogetherColumn-at-a-Time
Borrowing bits from Y to X
Improved parallelism
LOOKUP
![Page 15: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/15.jpg)
15
Optimal Plan
• Given 3 columns with 11-bit, 14-bit, and 21-bit to be sorted:
• Cost model• Plan enumeration and
search
Stitch together?
Bit borrowing?
Split into more
rounds? In the paper:
Num. of possible Plans: 2(11+14+21)
![Page 16: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/16.jpg)
16
Experiments
• Setup Intel Xeon E5 10-core & Intel i7 quad-coreAVX2 instruction set (256 bits)
• Data sets TPC-H TPC-H Skew TPC-DS Real data (Airline Origin and Destination Survey)
![Page 17: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/17.jpg)
17
Speedup over Column-at-a-Time
1.8X ~ 5.5X speedup
TPC-H TPC-H Skew TPC-DS Real Data
![Page 18: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/18.jpg)
18
Data Size Scalability
Linear data size scalability
Our solution for Multi-Column Sorting
![Page 19: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/19.jpg)
19
Core/thread Scalability
Linear core/thread scalability
Our solution for Multi-Column Sorting
![Page 20: fast multi-column sorting in main-memory column-stores](https://reader036.fdocuments.us/reader036/viewer/2022062401/58f28fa71a28ab50798b45a1/html5/thumbnails/20.jpg)
20
Summary• First work to pinpoint and tackle the issue of multi-column
sorting• Our technique: manipulate the bits across input columns• Up to 5.5X speedup in query execution.
Thank you