Query Task Model (QTM): Modeling Query Execution with Tasks 1 Steffen Zeuch and Johann-Christoph...
-
Upload
kade-batterton -
Category
Documents
-
view
234 -
download
3
Transcript of Query Task Model (QTM): Modeling Query Execution with Tasks 1 Steffen Zeuch and Johann-Christoph...
Query Task Model (QTM):Modeling Query Execution with
Tasks
1Steffen Zeuch and Johann-Christoph Freytag
Motivation✤ Different DBMS execute the same QEP using different
schedules
✤ Run-time execution not query optimization
✤ No uniform scheduling format
✤ Query execution in different DBMS are not comparable
✤ Major differences between DBMS:
✤ Chunk Size: Size of operator’s input
✤ Scheduling Strategy: Execution model vs. run-time scheduler
2
How to make different schedules com-parable to explain
why one schedule performs better than another?
Chunk Size
4
Selection
t1
Tuple-at-a-time
t1
Buffer-at-a-time
t1,t2, t3t1,t2,t3
t4, t5, t6
Column-at-a-time
t2
t3
t4
t5
t6
Chunk Size DBMS
1 Tuple System R, MySQL, (PostgreSQL)
“Fit into Cache” Monet X100, DB2 with BLU
Fix number of tuples Hyper
Fix Block Size C-Store
Column MonetDB MIL
Volcano Execution Model(Open-Next-Close Iterator)
6
R S T
Hash Build
Hash Build
Selection
HashProbe (S)
HashProbe (R)
Next
Next
NextTuple
Tuple
Tuple
(Run-time) Scheduler
7
T
Selection
HashProbe (S)
HashProbe (R)
Spatial Locality
Sel(t1)
Sel(t2)
Prob_S(t1)
Prob_S(t2)
Prob_R(t2)
Prob_R(t1)
Temporal Locality
Sel(t1)
Sel(t2)
Prob_S(t1)
Prob_S(t2)
Prob_R(t2)
Prob_R(t1)
TimeFurther Optimiziation Criteria:
I/O, NUMA or Memory Usage
DBMS Landscape
9
Tuple-at-atime
Buffer-at-a
time
Column-at-atime
VolcanoExecution
Model
(Run-time)
Scheduler
DynamicLoad
Balancing
System RMySQL
PostgreSQL
DB2PostgreSQL
MonetDB X100
DB2 BLUStagedDB
Hyper
MonetDB MIL
SAP HANA
Ch
un
k S
ize
Scheduling Strategy
QTM: Query Task Model Idea: A model that describes parallel query
execution with tasks
QEP: Queue of tasks
Task: Encapsulate a piece of work on some data
Goals:
Open a design space for DBMS schedules
Make main aspects of query scheduling comparable:
Execution order, degree of parallelism and thread coordination, and partitioning 11
Query Task Model
12
Work
Data
ProcessingStrategies
T1 T3T2
Task Queue
Data Queuet1 t3t2
t1Tablet2
t3
Evaluation: Scenario
19
Schedule Workload
Tuples per Relation
30M
Selection < 25M
S1 Values 0,1,2 …
S2 Values 0,2,4,…
S3 Values 0,4,8,…
Evaluation: Configuration
20
Schedule Buffer Size
Tasks per Op
Total Tasks
1) Tup – Pipe
1 30M 90M
2) Tup – Mat
1 30M 150M
3) Tup – Seq
1 30M 150M
4) Buf - CL 4 7.5M 22.5M
5) Buf – L1 2,048 14,649 43,947
6) Buf – L2 16,384 1,832 5,496
7) Buf – L3 491,520 62 186
8) Op - Mat
7.5M 4 20
9) Op - Seq
7.5M 4 20
Evaluation: Insights✤ Tradeoff between data and instruction cache performance
✤ Sweet spot: Largest private cache size vs. slightly larger buffer
✤ Medium sized tasks are data-efficient:
✤ Pros: Buffer fits entirely into cache, high data locality
✤ Cons: High number of tasks and instructions
✤ Large tasks are instruction-efficient:
✤ Pros: Decrease number of instructions and tasks, high instruction locality
✤ Cons: More data cache misses if cache size is exceeded
✤ QTM: Cache-performance can be adjusted by buffer size 25