Rewriting Procedures for Batched Bindings

37
Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008

description

Rewriting Procedures for Batched Bindings. Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay. Appeared in VLDB 2008. Motivation. Queries/updates are often invoked repeatedly. E.g.:. Nested subqueries (with correlated evaluation) - PowerPoint PPT Presentation

Transcript of Rewriting Procedures for Batched Bindings

Page 1: Rewriting Procedures for Batched Bindings

Rewriting Procedures for Batched Bindings

Ravindra Guravannavar and S. SudarshanIndian Institute of Technology, Bombay

Appeared in VLDB 2008

Page 2: Rewriting Procedures for Batched Bindings

2

Motivation

Nested subqueries (with correlated evaluation) Queries embedded in user-defined functions

(UDFs) that are invoked from other queries Queries/updates in stored procedures that are

invoked repeatedly (e.g. batch jobs) Queries/updates invoked inside loops in

application programs

Queries/updates are often invoked repeatedly. E.g.:

Page 3: Rewriting Procedures for Batched Bindings

3

Example: A Nested Query

List the orders of high worth (> 10K)

SELECT O.orderid, O.custid FROM orders OWHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid);

Iterative Execution:For each record in the result of the outer query block

- bind the parameters for the nested sub-query - evaluate the nested sub-query - process the results (check predicate, output)

Page 4: Rewriting Procedures for Batched Bindings

4

Optimizing Repeated Invocations Iterative execution of queries often performs

poorly Redundant I/O Random I/O and poor buffer effects Network round-trip delays

Decorrelation is widely used for optimizing nested queries

Page 5: Rewriting Procedures for Batched Bindings

5

Query Decorrelation

Original QuerySELECT O.orderid, O.custid FROM orders OWHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid);

After Decorrelation (Most systems do this automatically)SELECT O.orderid, O.custid FROM orders O, lineitem L WHERE O.orderid=L.orderid GROUP BY O.orderid, O.custid HAVING sum(L.lineprice) > 10000;

Page 6: Rewriting Procedures for Batched Bindings

6

Limitations of Decorrelation

Tricky at times… COUNT aggregate Non-equality correlation predicates

The solutions may not produce the best plan

Decorrelation techniques are not applicable for: Nested invocation of procedures with complex

logic and embedded query invocations Iterative invocation of queries/updates from

application code

Page 7: Rewriting Procedures for Batched Bindings

7

Example: UDF Invoked from a QuerySELECT * FROM category WHERE count_items(category-id) > 50;

// Count the items in a given category and its sub-categories

int count_items(int categoryId) {…while(…) {

……SELECT count(item-id) INTO icount FROM item WHERE category-id = :curcat; …

}}

Procedural logic with embedded queries

Page 8: Rewriting Procedures for Batched Bindings

8

Key Idea: Parameter Batching Repeated invocation of an operation is replaced

by a single invocation of its batched form Batched form: Works on a set of parameters

Benefits Choice of efficient set-oriented plans

Repeated selection → Join Efficient integrity checks Efficient disk access (sort RIDs before fetch)

Reduced network round-trip delay

Page 9: Rewriting Procedures for Batched Bindings

9

Batched Forms of Basic Operations

Insert insert into <table1> select … from <table2> … Bulk load (SQLServer: bcp, Oracle: sqlldr, DB2: load)

Update update <table1> from <table2> where … (equivalent to SQL:2003 merge statement)

Queries Make use of join or outer-join (seen in decorrelation)

Page 10: Rewriting Procedures for Batched Bindings

10

SQL Merge

empid name grants

S101 Ramesh 8000

S204 Gopal 4000

S305 Veena 3500

S602 Mayur 2000

GRANTMASTER

empid grantsS204 5000

S602 2600

GRANTLOAD

merge into GRANTMASTER GM using GRANTLOAD GLon GM.empid=GL.empidwhen matched then update set GM.grants=GL.grants;

Notation: Mc1=c1’,c2=c2’,… cn=cn’(r, s)

Page 11: Rewriting Procedures for Batched Bindings

11

Effect of Batch Size on Inserts

Bulk Load: 1.3 min

Page 12: Rewriting Procedures for Batched Bindings

12

Iterative and Set-Oriented Updates on the Server Side

TPC-H PARTSUPP (800,000 records), Clustering index on (partkey, suppkey)

Iterative update of all the records using T-SQL script (each update has an index lookup plan)

Single commit at the end of all updates

Takes 1 minute

Same update processed as a merge (update … from …)

Takes 15 seconds

Page 13: Rewriting Procedures for Batched Bindings

13

The Challenge

Given a procedure, how to obtain its

batched form?

Possible to manually rewrite, but time

consuming and error-prone.

Page 14: Rewriting Procedures for Batched Bindings

14

Our Work Automatic generation of batched forms

of UDFs/stored procedures using rewrite rules

Automatic rewrite of programs to replace looping with batched invocation

Page 15: Rewriting Procedures for Batched Bindings

15

Batched Forms Batched form qb of a pure function q

Returns results for a set of parameters Result in the form {(parameter, result)} For queries: standard techniques for creating

batched forms (from work on decorrelation)

Example:Original query:SELECT item-id FROM item WHERE category-id=?

Batched form:SELECT pb.category-id, item-id FROM param-batch pb LEFT OUTER JOIN item ON pb.category-id = item.category-id;

Page 16: Rewriting Procedures for Batched Bindings

16

Batch Safe Operations Batched forms – no guaranteed order of

parameter processing Can be a problem for operations having

side-effects

Batch-Safe operations All operations without side effects Also a few operations with side effects

E.g.: INSERT on a table with no constraints Operations inside unordered loops (e.g., cursor loops with

no order-by)

Page 17: Rewriting Procedures for Batched Bindings

17

Generating Batched Forms of Procedures

Step 1: Create trivial batched form. Transform: procedure p(r) { body of p} To procedure p_batched(pb) { for each record r in pb {

< body of p> < collect the return value >

} return the collected results paired with corrsp.

params; }Step 2: Optimize query invocations in the trivial

batched form

Page 18: Rewriting Procedures for Batched Bindings

18

Rule 1A: Rewriting a Simple Set Iteration Loop

where q is any batch-safe operation with qb as its batched form

Rule 1B Handles return values

Page 19: Rewriting Procedures for Batched Bindings

19

Rule 1C: Batching Conditional Statements

Let s = // Batched invocation

where s

// Merge the results

Operation to Batch

Return values

Condition for Invocation

Page 20: Rewriting Procedures for Batched Bindings

20

Rule 2: Splitting a Loop

while (p) {

ss1;

sq;

ss2;

}

Table(T) t;

while(p) {

ss1 modified to save local variables as a tuple in t

}

Collect theparameters

for each r in t {

sq modified to use attributes of r;

}

Can apply Rule 1A-1C and batch.

for each r in t {

ss2 modified to use attributes of r;

}

Process the results

* Conditions Apply

Page 21: Rewriting Procedures for Batched Bindings

21

Rule 2: Pre-conditions The conditions make use of the data dependence graph Data Dependence Graph

Nodes: program statements Edges: dependencies between statements that read/write same

location Types of Dependencies

Flow (Write Read), Anti (Read Write) and Output (Write Write) Loop-carried flow/anti/output

Dependencies across iterations

Pre-conditions for Rule-2 No loop-carried flow/output dependencies cross the points at

which the loop is split No loop-carried dependencies through external data (e.g., DB)

Page 22: Rewriting Procedures for Batched Bindings

22

Need for Reordering Statements

(s1) while (category != null) {(s2) item-count = q1(category);(s3) sum = sum + item-count;(s4) category = getParent(category);}

Flow DependenceAnti DependenceOutput Dependence

Loop-Carried

Control Dependence

Data Dependencies

Page 23: Rewriting Procedures for Batched Bindings

23

while (category != null) { int item-count = q1(category); // Query to batch sum = sum + item-count; category = getParent(category);}

Splitting made possible after reorderingwhile (category != null) { int temp = category; category = getParent(category); int item-count = q1(temp); sum = sum + item-count;}

Reordering Statements to Enable Rule 2

Page 24: Rewriting Procedures for Batched Bindings

24

Cycles of Flow Dependencies

Page 25: Rewriting Procedures for Batched Bindings

25

Rule 4: Control Dependencieswhile (…) {

item = …; qty = …; brcode = …;

if (brcode == 58) { brcode = 1; q(item, qty, brcode); }

}Remember the branching decision in a boolean variable

while (…) { item = …; qty = …; brcode = …;

boolean cv = (brcode == 58);cv? brcode = 1;cv? q(item, qty, brcode);

}

Page 26: Rewriting Procedures for Batched Bindings

26

Cascading of Rules

Table(…) t;while (…) {

r.item = …; r.qty = …; r.brcode = …; r.cv = (r.brcode == 58); r.cv? r.brcode = 1;

t.addRecord(r);}for each r in t {

r.cv? q(r.item, r.qty, r.brcode);}

qb(item,qty,brcode(cv=true(t))

Rule 1C

After applying Rule 2

Page 27: Rewriting Procedures for Batched Bindings

27

Batching Across Multiple Levels

while(…) { ….

while(…) { ...

q(v1, v2, … vn); … }

…}

while(…) { …. Table t (…);

while(…) { ... }

qb(t); …}

Batch q w.r.t inner loop

Page 28: Rewriting Procedures for Batched Bindings

28

Parameter Batches as Nested Tables

cust-id cust-class orders

C101 Gold

C180 Regular

ord-id date

1011 10-12-08

1012 12-01-09

ord-id date

1801 10-12-08

1802 20-12-08

1803 08-01-09

Page 29: Rewriting Procedures for Batched Bindings

29

Nest and Unnest Operations

μc(T) : Unnest T w.r.t. table-valued column c

vS s(T) : Group T on columns other than S and nest the columns in S under the name s

Page 30: Rewriting Procedures for Batched Bindings

30

Rule 6: Unnesting Nested Batches

Page 31: Rewriting Procedures for Batched Bindings

31

Implementation and Evaluation Conceptually the techniques can be used with

any language (PL/SQL, Java, C#-LINQ) We implemented for Java using the SOOT framework for

program analysis

Evaluation No benchmarks for procedural SQL Scenarios from three real-world applications, which faced

performance problems Data Sets: TPC-H and synthetic

Page 32: Rewriting Procedures for Batched Bindings

32

Application 1: ESOP Management AppProcess records from a file in custom format. Repeatedly called a stored procedure with mix of queries, updates and business logic.

- Validate inputs - Lookup existing record - Update or Insert

Rewritten program used outer-join and merge.

Page 33: Rewriting Procedures for Batched Bindings

33

Application 2: Category TraversalFind the maximum size of any part in a given category and its sub-categories.

Clustered IndexCATEGORY (category-id)

Secondary IndexPART (category-id)

Original ProgramRepeatedly executed a query that performed selection followed by grouping.Rewritten ProgramGroup-By followed by Join

Page 34: Rewriting Procedures for Batched Bindings

34

Application 3: Value Range Expansion

Log scale

Expand records of

the form:

(start-num, end-num,

issued-to, …)

Performed repeated

inserts.

Rewritten program

Pulled the insert stmt out of

the loop and replaced it

with batched insert.

~75% improvement

~10% overhead

Page 35: Rewriting Procedures for Batched Bindings

35

Related Work Query unnesting

E.g. Kim [TODS82], Dayal [VLDB87], Seshadri et al. [ICDE96], Galindo Legaria et al. [SIGMOD01]

We extend the benefits of unnesting to procedural nested blocks

Graefe [BTW03] highlights the importance of batching in nested iteration plans (a motivation for our work)

Optimizing set iteration loops in database programming languages - Lieuwen and DeWitt [SIGMOD 92] Also perform program rewriting, but Do not address batching of queries/procedure calls within the loop Limited language constructs - No WHILE loops, IF-THEN-ELSE

Parallelizing compilers Kennedy[90], Padua[95] We borrow and extend the techniques

Page 36: Rewriting Procedures for Batched Bindings

36

Conclusion Automatic rewrite of programs for set-orientation

is possible Combining query rewrite with program analysis is the key

Our experiments on real-world scenarios show significant benefits due to batching

Future Work Cost-based selection of operations to batch Handling exceptions Automatically deciding whether an operation is batch-

safe Implementing rewriting for PL/SQL

Page 37: Rewriting Procedures for Batched Bindings

37

Questions?