1 Adaptive Execution of Variable-Accuracy Functions VLDB Conference Seoul September 2006 Matt Denny...

28
1 Adaptive Execution of Adaptive Execution of Variable-Accuracy Variable-Accuracy Functions Functions VLDB Conference Seoul September 2006 Matt Denny - UC Berkeley/Fred Alger, Inc. Michael Franklin - UC Berkeley
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of 1 Adaptive Execution of Variable-Accuracy Functions VLDB Conference Seoul September 2006 Matt Denny...

1

Adaptive Execution of Adaptive Execution of Variable-Accuracy Variable-Accuracy

FunctionsFunctions

VLDB ConferenceSeoul

September 2006

Matt Denny - UC Berkeley/Fred Alger, Inc.Michael Franklin - UC Berkeley

Matt Denny, Mike FranklinUC Berkeley EECS

IntroductionIntroduction

• Many applications apply expensive functions to streams of data• Finance: real-time market monitoring with

securities models• Power Management: overload prediction

using current weather conditions• Supply Chain Management: inventory models

using RFID data to find shortages in real-time

Matt Denny, Mike FranklinUC Berkeley EECS

Continuous Queries w/ Continuous Queries w/ UDFsUDFs

Example: Bond Pricing BondData: table of bond data (maturity, coupon, etc.) IntRate: stream of interest rate data

model(): C++/Java routine takes bond data and interest rate, and returns a price

SELECT BD.BondIDFROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100

Filtering

SELECT MAX(model(BD,IR.rate))FROM BondData BD, IntRate IR [Rows 1]WHERE BD.numHeld > 0

Aggregation

Matt Denny, Mike FranklinUC Berkeley EECS

The ProblemThe Problem

• Analytical functions can be expensive!• minutes or hours per data point.

• Query processor has no control over execution of individual function calls.• UDF API is a Black Box

• Earlier work aims to avoid UDF calls:• predicate reordering ([HS93][KMPS94][CS96]))• memoization and caching ([HN96], [DF05])

• Remaining calls can still be a showstopper.

Matt Denny, Mike FranklinUC Berkeley EECS

The IntuitionThe Intuition

1. Many functions have accuracy/cost tradeoffs. e.g., iterative solvers.

2. UDFs often appear in predicates and aggregates where exact answers are not required.SELECT BD.*

FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100

Matt Denny, Mike FranklinUC Berkeley EECS

Our SolutionOur Solution

VAOs (Variable Accuracy Operators)

New query operators that:• Expose function cost/accuracy

tradeoffs using a new UDF API.

• Exploit this tradeoff to avoid excess work while correctly answering the query.

Matt Denny, Mike FranklinUC Berkeley EECS

VAOs - Basic IdeaVAOs - Basic Idea

• Initially run function to obtain a coarse answer.• This needs to be cheaper than

running to a more accurate answer.

• If more accuracy needed - iterate!

Matt Denny, Mike FranklinUC Berkeley EECS

Traditional Execution - Traditional Execution - SelectSelect

Select> 100 ?

execute model (IR.Rate,BD)

SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)

> $100;

10.1% . . .InterestRate

BondData

BD 1 $105.01 Result

BD 1

Matt Denny, Mike FranklinUC Berkeley EECS

VAO VAO Execution: Execution: SelectSelect

SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)

> $100;

10.1% . . .InterestRate

BondData

Select> 100 ?

execute model (IR.Rate,BD)

-VAO

BD 1 $98 $110

ResultObject

L H

Matt Denny, Mike FranklinUC Berkeley EECS

VAO VAO Execution: Execution: SelectSelect

SELECT BD.bondIDFROM BondData BD, IntRate IR [Rows 1]WHERE model(BD,IR.rate)

> $100;

10.1% . . .InterestRate

BondData

BD 1

Select> 100 ?

execute model (IR.Rate,BD)

-VAO

BD 1 $101 $108

ResultObject

L H

Iterate()

Matt Denny, Mike FranklinUC Berkeley EECS

VAO APIVAO API

• Use iterative interface• Traditional: <number> = f(<args>) • VAO: <result object> = f(<args>)

1. fields for (conservative) error bounds2. iterate() method: refines bounds with more

work3. for some vaos: also need estimates for CPU

cost and error reduction of next iteration

• Useful for:• Any sort of iterative function (e.g. root

finders, numerical integration)• Any technique with iterative step refinement

(e.g. PDEs)

Matt Denny, Mike FranklinUC Berkeley EECS

Iteration StrategyIteration Strategy

• Selection iterates over an object until predicate value is known.

• Aggregate operators more difficult • Answer dependent on sets of result

objects• Need to decide how to iterate over

multiple result objects

Matt Denny, Mike FranklinUC Berkeley EECS

Example: MAX(f(x1), Example: MAX(f(x1), f(x2))f(x2))

xx1 x2

f(x) bounds

initial

bounds

IterateOverf(x1)

xx1 x2

f(x) bounds

xx1 x2

f(x) bounds

IterateOverf(x2)

IterateOverboth

xx1 x2

f(x) bounds

Need an iteration strategy that attempts to minimize cost

Matt Denny, Mike FranklinUC Berkeley EECS

Solution: Greedy Solution: Greedy StrategyStrategy

• Iterate over the object that has the best ratio of benefit to CPU cost among the current choices.

• Good strategy if functions converge• Later iterations likely to have

less benefit/unit cost

• Operator-dependent

Matt Denny, Mike FranklinUC Berkeley EECS

Example RevisitedExample Revisited

MAX(f(x1),f(x2))

Greedy Strategy: choose best overlap reduction per CPU costUse error reduction estimates to estimate overlap reduction.Cost estimation depends on function.

Goal State: no overlap between f(x1) and f(x2)

Matt Denny, Mike FranklinUC Berkeley EECS

Example RevisitedExample Revisited• Determine if f(x1) > f(x2)

Function Overlap Red. Est.

CPU Cost Est.

f(x1)

f(x2)

$.04 4 sec.

$.04 4 sec.

xx1 x2

f(x)

Matt Denny, Mike FranklinUC Berkeley EECS

Example RevisitedExample Revisited• Determine if f(x1) > f(x2)

Function Overlap Red. Est.

CPU Cost Est.

f(x1)

f(x2)

xx1 x2

f(x)

$.01 8 sec.

$.02 4 sec.

$.04 4 sec.

$.04 4 sec.

Matt Denny, Mike FranklinUC Berkeley EECS

Example RevisitedExample Revisited• Determine if f(x1) > f(x2)

Function Overlap Red. Est.

CPU Cost Est.

f(x1)

f(x2)

xx1 x2

f(x)

$0 8 sec.

$0 8 sec.

$.01 8 sec.

$.02 4 sec.

Matt Denny, Mike FranklinUC Berkeley EECS

AggregatesAggregates

Operator

Goal State Greedy Heuristic

min/max(general)

No overlap between minimum (maximum) value and other function error bounds

Make educated guess for max. Choose iteration that reduces most overlap between guess and other error bounds per cycle

avg/sum avg/sum of error bounds have widthless than user-defined tolerance

Choose iteration which reduces avg/sum of bounds the most per cycle

Matt Denny, Mike FranklinUC Berkeley EECS

Performance SetupPerformance Setup

• Standalone implemenation of VAO framework in C++

• Used numeric bond model and bond data from [DF05]

• Real Bond Data - 500 Mortgage-backed Securities.

• Synthetic Bond Data - to stress test VAOs

• Single Interest Rate.

Matt Denny, Mike FranklinUC Berkeley EECS

VAO ImplementationVAO Implementation

• Numeric bond model [S95] implemented with traditional and VAOs interface• Based on PDE solver• VAO iterate(): double size of PDE

grid• Bounds and error reduction estimates

derived by using current and previous iteration results and Richardson’s Extrapolation [BF01]

Matt Denny, Mike FranklinUC Berkeley EECS

Selection Selection PerformancePerformance500 bonds, 1 interest rate

Selection Performance

1

10

100

1000

10000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Selectivity

Runtime (sec.)

Trad

VAO

Runtime depends on number of bonds close to predicate.

Matt Denny, Mike FranklinUC Berkeley EECS

Stress TestStress Test• Generate bonds with accurate

values near the predicateGaussian, mean = predicate value, vary

std. dev.

Std. dev. of realbonds: $7.78

Matt Denny, Mike FranklinUC Berkeley EECS

In the PaperIn the Paper• Other Results

• Max• Real bonds: 111 sec. vs. 6953 sec.• Synthetic bonds: VAOs better than traditional above

$.05 std. dev.• Average

• Up to 5x improvement if a small number of bonds are weighted heavily in average.

• Details on Error and Cost estimates for PDE-based bond model.• Other types of models covered in Matt’s thesis.

Matt Denny, Mike FranklinUC Berkeley EECS

ConclusionConclusion• Many emerging CQ applications require the

repeated execution of expensive functions.• VAOs are new operators that change how

these functions execute• Use new iterative API that exposes work-accuracy

tradeoff in functions• Do only enough work to answer the query using

greedy strategy to choose iterations

• With real bond data and models, VAOs show 1-2 orders of magnitude improvement.

• For more detailed information:[email protected]

Matt Denny, Mike FranklinUC Berkeley EECS

The Advisor’s DodgeThe Advisor’s Dodge

Relative Contribution to Research

0

20

40

60

80

100

0 1 2 3 4 5

Time in Program (years)

Percent Contribution

Student

AdvisorThisWork

Courtesy of Jennifer Widom

Matt Denny, Mike FranklinUC Berkeley EECS

BibliographyBibliography

• [HS93] J. M. Hellerstein and M. Stonebraker, “Predicate Migration: Optimizing Queries with Expensive Predicates”, SIGMOD 1993.

• [HN96] J. M. Hellerstein and J. Naughton, “Query Execution Techniques for Caching Expensive Predicates”, SIGMOD 1996.

• [DF05] M. Denny and M.J. Franklin. “Predicate Result Range Caching for Continuous Queries”, SIGMOD 2005

Matt Denny, Mike FranklinUC Berkeley EECS

BibliographyBibliography

• [S95] R. Stanton, “Rational Prepayment and the Valuation of Mortgage-Backed Securities,” The Review of Financial Studies, Vol. 8, No. 3, 677-708.

• [BF01] R.L. Burden, J.D. Faires, Numerical Analysis. Brooks/Cole, 2001.