Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in Apache Tajo
Query optimization in Apache Tajo
-
Upload
jihoon-son -
Category
Engineering
-
view
1.318 -
download
0
Transcript of Query optimization in Apache Tajo
![Page 1: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/1.jpg)
Query Optimization in Apache TajoJihoon Son / Gruter inc.
![Page 2: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/2.jpg)
About Me
● Jihoon Son (@jihoonson)○ Tajo project co-founder ○ Committer and PMC member of Apache Tajo○ Research engineer at Gruter
2
![Page 3: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/3.jpg)
● Introduction to Tajo● Query processing in Tajo
○ Query plans in Tajo○ Query processing example
● Query optimization in Tajo○ Introduction to query optimization○ Query optimization techniques in Tajo
Outline
3
![Page 4: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/4.jpg)
● Apache Top-level Project○ Data warehouse system
■ Efficient processing of analytic queries■ ANSI-SQL compliant
○ Scalable and rapid query execution with own engine■ Distributed query processing■ Fault-tolerance
○ Beyond SQL-on-Hadoop■ Support various types of storage
● HDFS, S3, hbase, rdbms, ...
What is Tajo?
4
![Page 5: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/5.jpg)
Highlighted Features
● Support long-running batch queries as well as interactive ad-hoc queries○ Fast query processing
■ Optimized scan performance● 120 MB/sec per physical disk (SATA)
○ Reliability■ Fault tolerance■ No single point of failure with HA support
5
![Page 6: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/6.jpg)
Highlighted Features
● Support of various kinds of data sources○ HDFS, Amazon S3, Google Cloud Storage, HBase,
RDBMS, ...● Mature SQL support
○ Various kinds of join support○ Window function support○ Cost-based query optimization
● Integration with other systems○ Notebooks like Zeppelin○ BI tools
6
![Page 7: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/7.jpg)
Recent Release: 0.11
● Feature highlights○ Query federation○ JDBC-based storage support○ Self-describing data formats support○ Multi-query support○ More stable and efficient join execution○ Index support○ Python UDF/UDAF support
7
![Page 8: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/8.jpg)
Tajo Master
Catalog Server
Tajo Master
Catalog Server
Architecture Overview
DBMS
HCatalog
Tajo Master
Catalog Server
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
JDBC client
TSQLWebUI
REST API
Storage
Submit a query
Manage metadataAllocate
a query
Send tasks & monitor
Send tasks & monitor
8
![Page 9: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/9.jpg)
Tajo Worker
Query Master
Tajo Worker
Query Master
Tajo Worker
Query Master
Query Execution Steps
9
Tajo Master
Catalog ServerTajo Client
① Submit a query
DBMS
② Assign a query
● Initializing a query execution
③ Build a query execution plan
![Page 10: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/10.jpg)
Tajo Worker
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Executor
Storage Service
Query Execution Steps
10
Storage
⑥ Send status and progress
⑤ Read and process data
④ Send tasks & monitor
● Executing a query
Tajo Master
![Page 11: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/11.jpg)
Tajo Worker
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Executor
Storage Service
Query Execution Steps
11
Tajo Client
Storage
⑧ Notify that query execution is completed
⑦ Store the result on storage
⑨ Send the result location
⑩ Read the result
● Finalizing the query execution
Tajo Master
![Page 12: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/12.jpg)
Query Processing in Tajo
12
![Page 13: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/13.jpg)
● Given a user query, a query execution plan is an ordered set of steps to execute the query○ Example
■ Read data from storage, and then do join on some join keys, and finally aggregate with some aggregation keys
● In Tajo, there are three kinds of query plans○ Query master generates a logical query plan and a
distributed query plan○ Query executor of tajo workers generates a local query
plan
Query Execution Plan
13
![Page 14: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/14.jpg)
Query Planning Steps in Tajo
14
SQLSQL
AnalyzerAlgebraic
ExpressionLogicalPlanner
Logical Query Plan
Global Planner
Distributed Query Plan
Physical Planner
Local Query Plan
Query Executor
Query Master
Distributed to tajo workers
![Page 15: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/15.jpg)
Join
Logical Query Plan
● A tree of relational algebras● Example
15
SELECT item.brand, sum(price)FROM sales, itemWHERE sales.item_key = item.item_keyGROUP BY item.brand,
Scan on item
Scan on sales
Group by
< SQL > < Logical query plan >
key: item_key
key: brandfunc: sum(price)
![Page 16: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/16.jpg)
Distributed Query Plan
● A plan with additional annotations for distributed execution○ Data exchange (shuffle) keys, methods, ...
16< Distributed query plan >
Join
Scan on item
Scan on sales
Group by
< Logical query plan >
key: item_key
key: brandfunc: sum(price)
Join
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
Hash shuffle with item_key
Hash shuffle with item_key
Range shuffle with brand
![Page 17: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/17.jpg)
Local Query Plan
● A plan with additional annotations for local execution○ In-memory algorithm, disk-based algorithm, …
17
< Distributed query plan >
Join
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
Hash shuffle with item_key
Hash shuffle with item_key
Range shuffle with brand
< Local query plan >
Join
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
Hash shuffle with item_key
Hash shuffle with item_key
Range shuffle with brandSort-merge
join
Hash aggregation
![Page 18: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/18.jpg)
Query Processing in Tajo
● A query is executed by executing multiple stages subsequently○ A stage is a minimum unit to execute at least a single
operator● Each stage is processed by multiple query executors of
tajo worker in parallel
18
Join
Scan on item
Scan on sales
key: item_keyStage 2
Stage 1
![Page 19: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/19.jpg)
● SQL ● Logical query plan
Query Processing Example
19
Join
SELECT item.brand, sum(price)FROM sales, itemWHERE sales.item_key = item.item_keyGROUP BY item.brand,
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
![Page 20: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/20.jpg)
● Logical query plan ● Distributed query plan
Query Processing Example
20
Join
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
Join
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle with item_key
Range shuffle with brand
Hash shuffle with item_key
![Page 21: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/21.jpg)
Query Processing Example
● Distributed query plan
21
Join
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle with item_key
Range shuffle with brand
Hash shuffle with item_key
item item sales sales sales
WorkerScan
WorkerScan
WorkerScan
WorkerScan
WorkerScan
● Distributed processing
![Page 22: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/22.jpg)
Query Processing Example
22
Join
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle with item_key
Range shuffle with brand
Hash shuffle with item_key
item item sales sales sales
WorkerScan
WorkerScan
WorkerScan
WorkerScan
WorkerScan
WorkerJoin
WorkerJoin
WorkerJoin
WorkerJoin
WorkerJoin
shuffle
● Distributed query plan ● Distributed processing
![Page 23: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/23.jpg)
Query Processing Example
● Distributed query plan
23
Join
Scan on item
Scan on sales
Group by
key: item_key
key: brandfunc: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle with item_key
Range shuffle with brand
Hash shuffle with item_key
item item sales sales sales
WorkerScan
WorkerScan
WorkerScan
WorkerScan
WorkerScan
WorkerJoin
WorkerJoin
WorkerJoin
WorkerJoin
WorkerJoin
WorkerGroup by
WorkerGroup by
WorkerGroup by
WorkerGroup by
WorkerGroup by
shuffle
shuffle
● Distributed processing
![Page 24: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/24.jpg)
Query Optimization in Tajo
24
![Page 25: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/25.jpg)
Query Optimization
● Mostly, user queries are not optimized for performance
● The query optimizer attempts to determine the most efficient way to execute a user query ○ Considering the possible query plans, and choosing the
best one
25
![Page 26: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/26.jpg)
Extreme Example
● Query○ select * from t where name like 'tajo%' order by id;
● Possible plans
26
Scan
Sort
Filter
Scan with Filter
Sort● Naive plan○ Filtering out tuples
after sort○ Large cost for sort
● Better plan○ Filtering out tuples
after scan immediately○ Small cost for sort○ Reduced number of
operations
![Page 27: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/27.jpg)
Two Kinds of Query Optimization
● Rule-based optimization○ A set of predefined rules is used to choose a good plan○ Usually, heuristic approaches are used
■ Ex) filters should be pushed down to the lower part of the query plan as much as possible
● Cost-based optimization○ Enumerating possible query plans and choosing the one
having the lowest cost○ Cost function has an important role
● Tajo utilizes both types of optimization
27
![Page 28: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/28.jpg)
Query Optimization in Tajo
● Difference from traditional query optimization○ Unlike traditional database systems, pre-collected
statistics is not so important ■ Data may be added or updated by several systems
including Flume, Kafka, Tajo, … ■ Pre-collected statistics can be useful, but is not fully
trustworthy○ It is important to optimize query plans with minimal
statistics ■ Volume of input relations
28
![Page 29: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/29.jpg)
Query Optimization in Tajo
● Tajo has two different approaches for query optimization○ Static optimization
■ Traditional approach■ Optimizing the plan during the query planning phase
○ Progressive optimization■ Optimizing the plan based on the intermediate statistics
while executing the query● A query plan can be optimized without pre-collected
statistics
● Especially effective for queries which require multiple stage execution 29
![Page 30: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/30.jpg)
Logical Query Plan Optimization
● Rule-based optimization○ Access path rewrite rule
■ Choosing access path to data■ Index scan has the highest priority if available
○ Distributivity rule■ Reducing filters based on distributivity
○ Filter pushdown rule■ Pushing down filters to the lowest part as much as
possible○ In-subquery rewrite rule
■ Transforming subqueries in 'IN' filters to semi(anti) joins30
![Page 31: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/31.jpg)
Logical Query Plan Optimization
● Rule-based optimization (cont')○ Projection pushdown rule
■ Pushing down projections to the lowest part as much as possible
● Cost-based optimization○ Join order optimization
■ Finding a join order of lowest cost■ Greedy heuristic: ordering relations from small ones to
large ones● Very effective in single computing environment● Need to improve for parallel computing environment
31
![Page 32: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/32.jpg)
Distributed Query Plan Optimization
● Rule-based optimization○ Two-phase execution of operators
■ Operators which require data shuffling like aggregation, join, or sort are executed in two-phase
■ First phase is for local computing to reduce the amount of shuffled data
■ Second phase is to get the result of the operation
32
![Page 33: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/33.jpg)
Two-phase Execution Example
● Logical query plan
33
● Distributed query plan
Group by
Scan
Sort
Group by
Scan
SortStage 3
Stage 2
Stage 1
Group by
Sort
Local group by
Local sort
![Page 34: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/34.jpg)
Distributed Query Plan Optimization
● Distributed join algorithm selection○ Two representative distributed join algorithms
■ Join cannot be performed within a single stage in distributed systems● Tuples of the same join key may be distributed over cluster
nodes■ Repartition join
● Both input relations are shuffled with the join key columns■ Broadcast join
● Small relations are broadcasted to every node before join
34
![Page 35: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/35.jpg)
Example of Repartition Join
● select … from employee e, department d where e.DeptName = d.DeptName
35
![Page 36: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/36.jpg)
Example of Broadcast Join
● select … from employee e, department d where e.DeptName = d.DeptName
36
![Page 37: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/37.jpg)
Distributed Join Algorithm Selection
● Repartition join VS broadcast join○ Given a set of joins, some parts can be executed with
broadcast join while remaining parts are executed with repartition join
● Which parts will be executed with broadcast join?○ Greedy heuristic: broadcast join is used as many as
possible ■ The size of input relation should be smaller than pre-
defined threshold■ The total volume of broadcasted relations should not
exceed pre-defined threshold37
![Page 38: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/38.jpg)
Distributed Join Algorithm Selection Example
● select … from lineitem, nation, region …
38
![Page 39: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/39.jpg)
Local Query Plan Optimization
● Selecting the best algorithm based on the current resource status○ Aggregation
■ Hash aggregation, sort aggregation○ Join
■ Hash join, sort-merge join● For sort, hash sort is basically used with spilling data to
disk when it doesn't fit into memory
39
![Page 40: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/40.jpg)
Progressive Optimization
● Data repartition○ Some operators like join or aggregation require to
shuffle data with keys○ The number of result partitions of shuffle should be
carefully decided■ The number of partitions is related to the number of tasks
of the next stage● At the beginning of each stage, the number of
partitions is decided based on the input size
40
![Page 41: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/41.jpg)
Progressive Optimization Example
41
Group by
Scan on item (100GB)
SortStage 3
Stage 2
Stage 1
Group by
Sort
# of partitions: 100
● If the default task size is 1GB,
Group by
Scan on item
SortStage 3
Stage 2
Stage 1
Group by(50GB)
Sort# of partitions: 50
# of tasks: 100
# of tasks: 50
![Page 42: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/42.jpg)
Future Work
● Adding more optimization methods● Improve cost functions for more effective cost-based
optimization● Adding new approaches for progressive optimization
○ Runtime query rewriting○ Integrating with genetic algorithm○ …
42
![Page 43: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/43.jpg)
43
Get Involved!
● General○ http://tajo.apache.org
● Getting Started○ http://tajo.apache.org/docs/current/getting_started.html
● Downloads○ http://tajo.apache.org/downloads.html
● Jira – Issue Tracker○ https://issues.apache.org/jira/browse/TAJO
● Join the mailing list○ [email protected]○ [email protected]
![Page 44: Query optimization in Apache Tajo](https://reader034.fdocuments.us/reader034/viewer/2022042520/58865c451a28ab26598b60f3/html5/thumbnails/44.jpg)
44
Thanks!