Rewriting Procedures for Batched Bindings
description
Transcript of Rewriting Procedures for Batched Bindings
Rewriting Procedures for Batched Bindings
Ravindra Guravannavar and S. SudarshanIndian Institute of Technology, Bombay
Appeared in VLDB 2008
2
Motivation
Nested subqueries (with correlated evaluation) Queries embedded in user-defined functions
(UDFs) that are invoked from other queries Queries/updates in stored procedures that are
invoked repeatedly (e.g. batch jobs) Queries/updates invoked inside loops in
application programs
Queries/updates are often invoked repeatedly. E.g.:
3
Example: A Nested Query
List the orders of high worth (> 10K)
SELECT O.orderid, O.custid FROM orders OWHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid);
Iterative Execution:For each record in the result of the outer query block
- bind the parameters for the nested sub-query - evaluate the nested sub-query - process the results (check predicate, output)
4
Optimizing Repeated Invocations Iterative execution of queries often performs
poorly Redundant I/O Random I/O and poor buffer effects Network round-trip delays
Decorrelation is widely used for optimizing nested queries
5
Query Decorrelation
Original QuerySELECT O.orderid, O.custid FROM orders OWHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid);
After Decorrelation (Most systems do this automatically)SELECT O.orderid, O.custid FROM orders O, lineitem L WHERE O.orderid=L.orderid GROUP BY O.orderid, O.custid HAVING sum(L.lineprice) > 10000;
6
Limitations of Decorrelation
Tricky at times… COUNT aggregate Non-equality correlation predicates
The solutions may not produce the best plan
Decorrelation techniques are not applicable for: Nested invocation of procedures with complex
logic and embedded query invocations Iterative invocation of queries/updates from
application code
7
Example: UDF Invoked from a QuerySELECT * FROM category WHERE count_items(category-id) > 50;
// Count the items in a given category and its sub-categories
int count_items(int categoryId) {…while(…) {
……SELECT count(item-id) INTO icount FROM item WHERE category-id = :curcat; …
}}
Procedural logic with embedded queries
8
Key Idea: Parameter Batching Repeated invocation of an operation is replaced
by a single invocation of its batched form Batched form: Works on a set of parameters
Benefits Choice of efficient set-oriented plans
Repeated selection → Join Efficient integrity checks Efficient disk access (sort RIDs before fetch)
Reduced network round-trip delay
9
Batched Forms of Basic Operations
Insert insert into <table1> select … from <table2> … Bulk load (SQLServer: bcp, Oracle: sqlldr, DB2: load)
Update update <table1> from <table2> where … (equivalent to SQL:2003 merge statement)
Queries Make use of join or outer-join (seen in decorrelation)
10
SQL Merge
empid name grants
S101 Ramesh 8000
S204 Gopal 4000
S305 Veena 3500
S602 Mayur 2000
GRANTMASTER
empid grantsS204 5000
S602 2600
GRANTLOAD
merge into GRANTMASTER GM using GRANTLOAD GLon GM.empid=GL.empidwhen matched then update set GM.grants=GL.grants;
Notation: Mc1=c1’,c2=c2’,… cn=cn’(r, s)
11
Effect of Batch Size on Inserts
Bulk Load: 1.3 min
12
Iterative and Set-Oriented Updates on the Server Side
TPC-H PARTSUPP (800,000 records), Clustering index on (partkey, suppkey)
Iterative update of all the records using T-SQL script (each update has an index lookup plan)
Single commit at the end of all updates
Takes 1 minute
Same update processed as a merge (update … from …)
Takes 15 seconds
13
The Challenge
Given a procedure, how to obtain its
batched form?
Possible to manually rewrite, but time
consuming and error-prone.
14
Our Work Automatic generation of batched forms
of UDFs/stored procedures using rewrite rules
Automatic rewrite of programs to replace looping with batched invocation
15
Batched Forms Batched form qb of a pure function q
Returns results for a set of parameters Result in the form {(parameter, result)} For queries: standard techniques for creating
batched forms (from work on decorrelation)
Example:Original query:SELECT item-id FROM item WHERE category-id=?
Batched form:SELECT pb.category-id, item-id FROM param-batch pb LEFT OUTER JOIN item ON pb.category-id = item.category-id;
16
Batch Safe Operations Batched forms – no guaranteed order of
parameter processing Can be a problem for operations having
side-effects
Batch-Safe operations All operations without side effects Also a few operations with side effects
E.g.: INSERT on a table with no constraints Operations inside unordered loops (e.g., cursor loops with
no order-by)
17
Generating Batched Forms of Procedures
Step 1: Create trivial batched form. Transform: procedure p(r) { body of p} To procedure p_batched(pb) { for each record r in pb {
< body of p> < collect the return value >
} return the collected results paired with corrsp.
params; }Step 2: Optimize query invocations in the trivial
batched form
18
Rule 1A: Rewriting a Simple Set Iteration Loop
where q is any batch-safe operation with qb as its batched form
Rule 1B Handles return values
19
Rule 1C: Batching Conditional Statements
Let s = // Batched invocation
where s
// Merge the results
Operation to Batch
Return values
Condition for Invocation
20
Rule 2: Splitting a Loop
while (p) {
ss1;
sq;
ss2;
}
Table(T) t;
while(p) {
ss1 modified to save local variables as a tuple in t
}
Collect theparameters
for each r in t {
sq modified to use attributes of r;
}
Can apply Rule 1A-1C and batch.
for each r in t {
ss2 modified to use attributes of r;
}
Process the results
* Conditions Apply
21
Rule 2: Pre-conditions The conditions make use of the data dependence graph Data Dependence Graph
Nodes: program statements Edges: dependencies between statements that read/write same
location Types of Dependencies
Flow (Write Read), Anti (Read Write) and Output (Write Write) Loop-carried flow/anti/output
Dependencies across iterations
Pre-conditions for Rule-2 No loop-carried flow/output dependencies cross the points at
which the loop is split No loop-carried dependencies through external data (e.g., DB)
22
Need for Reordering Statements
(s1) while (category != null) {(s2) item-count = q1(category);(s3) sum = sum + item-count;(s4) category = getParent(category);}
Flow DependenceAnti DependenceOutput Dependence
Loop-Carried
Control Dependence
Data Dependencies
23
while (category != null) { int item-count = q1(category); // Query to batch sum = sum + item-count; category = getParent(category);}
Splitting made possible after reorderingwhile (category != null) { int temp = category; category = getParent(category); int item-count = q1(temp); sum = sum + item-count;}
Reordering Statements to Enable Rule 2
24
Cycles of Flow Dependencies
25
Rule 4: Control Dependencieswhile (…) {
item = …; qty = …; brcode = …;
if (brcode == 58) { brcode = 1; q(item, qty, brcode); }
}Remember the branching decision in a boolean variable
while (…) { item = …; qty = …; brcode = …;
boolean cv = (brcode == 58);cv? brcode = 1;cv? q(item, qty, brcode);
}
26
Cascading of Rules
Table(…) t;while (…) {
r.item = …; r.qty = …; r.brcode = …; r.cv = (r.brcode == 58); r.cv? r.brcode = 1;
t.addRecord(r);}for each r in t {
r.cv? q(r.item, r.qty, r.brcode);}
qb(item,qty,brcode(cv=true(t))
Rule 1C
After applying Rule 2
27
Batching Across Multiple Levels
while(…) { ….
while(…) { ...
q(v1, v2, … vn); … }
…}
while(…) { …. Table t (…);
while(…) { ... }
qb(t); …}
Batch q w.r.t inner loop
28
Parameter Batches as Nested Tables
cust-id cust-class orders
C101 Gold
C180 Regular
ord-id date
1011 10-12-08
1012 12-01-09
ord-id date
1801 10-12-08
1802 20-12-08
1803 08-01-09
29
Nest and Unnest Operations
μc(T) : Unnest T w.r.t. table-valued column c
vS s(T) : Group T on columns other than S and nest the columns in S under the name s
30
Rule 6: Unnesting Nested Batches
31
Implementation and Evaluation Conceptually the techniques can be used with
any language (PL/SQL, Java, C#-LINQ) We implemented for Java using the SOOT framework for
program analysis
Evaluation No benchmarks for procedural SQL Scenarios from three real-world applications, which faced
performance problems Data Sets: TPC-H and synthetic
32
Application 1: ESOP Management AppProcess records from a file in custom format. Repeatedly called a stored procedure with mix of queries, updates and business logic.
- Validate inputs - Lookup existing record - Update or Insert
Rewritten program used outer-join and merge.
33
Application 2: Category TraversalFind the maximum size of any part in a given category and its sub-categories.
Clustered IndexCATEGORY (category-id)
Secondary IndexPART (category-id)
Original ProgramRepeatedly executed a query that performed selection followed by grouping.Rewritten ProgramGroup-By followed by Join
34
Application 3: Value Range Expansion
Log scale
Expand records of
the form:
(start-num, end-num,
issued-to, …)
Performed repeated
inserts.
Rewritten program
Pulled the insert stmt out of
the loop and replaced it
with batched insert.
~75% improvement
~10% overhead
35
Related Work Query unnesting
E.g. Kim [TODS82], Dayal [VLDB87], Seshadri et al. [ICDE96], Galindo Legaria et al. [SIGMOD01]
We extend the benefits of unnesting to procedural nested blocks
Graefe [BTW03] highlights the importance of batching in nested iteration plans (a motivation for our work)
Optimizing set iteration loops in database programming languages - Lieuwen and DeWitt [SIGMOD 92] Also perform program rewriting, but Do not address batching of queries/procedure calls within the loop Limited language constructs - No WHILE loops, IF-THEN-ELSE
Parallelizing compilers Kennedy[90], Padua[95] We borrow and extend the techniques
36
Conclusion Automatic rewrite of programs for set-orientation
is possible Combining query rewrite with program analysis is the key
Our experiments on real-world scenarios show significant benefits due to batching
Future Work Cost-based selection of operations to batch Handling exceptions Automatically deciding whether an operation is batch-
safe Implementing rewriting for PL/SQL
37
Questions?