Loading a Cache with Query Results

23
Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden

description

Loading a Cache with Query Results. Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden. Background & Motivation. Applications invoke queries and methods Queries select relevant objects Methods work with relevant objects - PowerPoint PPT Presentation

Transcript of Loading a Cache with Query Results

Page 1: Loading a Cache with  Query Results

Loading a Cache with Query Results

Laura Haas, IBM Almaden

Donald Kossmann, Univ. Passau

Ioana Ursu, IBM Almaden

Page 2: Loading a Cache with  Query Results

2

Background & Motivation

• Applications invoke queries and methods• Queries select relevant objects• Methods work with relevant objects• Example: find hotels and reserve rooms

• Other examples: CAX, SAP R/3, Web

foreach h in (select oid from hotels h where city = Edinburgh) h.requestRoom(3, Sep-6, Sep-12);

Page 3: Loading a Cache with  Query Results

3

Background and Motivation

• Traditional client-server systems:– methods are executed by clients with caching– queries are executed by clients and servers– query processing is independent of caching

• Problems: – data must be fetched twice– objects are faulted in individually

• Terrible performance in many environments

Page 4: Loading a Cache with  Query Results

4

Traditional System

server

cache query processor

foreach h in (select oid from ...) h.reserveRoom();

<apex, ***, ...><carlton, **, ...>

<apex, ***, ...>

Page 5: Loading a Cache with  Query Results

5

Goal & Solution

• Load Cache as a by-product of queries.– copy relevant objects while executing the query

• Cache operators do the copying• Extend the query optimizer

– which collections should be cached?– when to copy?

• Assumption: caching in the granularity of objects

Page 6: Loading a Cache with  Query Results

6

Hotels Cities

Cache

Join

foreach h in (select oid from ...) h.reserveRooms();

server

<apex, ***, ...><carlton, **, ...>

<apex, ***, ...>

Page 7: Loading a Cache with  Query Results

7

Tradeoffs

• What to cache?– Cost of Cache operator must be smaller than

savings obtained by this kind of pre-caching

• When to cache?– late so that only relevant objects are cached– early so that other operators are not affected

• N.B. Cache operators affect the cost of other (lower) operators in the plan

Page 8: Loading a Cache with  Query Results

8

Hotels Cities

Cache

Join

server

<apex, ***, Edinburgh><ritz, *****, Paris>

<apex, ...><ritz, ...><carlton, ...><plaza, ...>

Early vs. Late Cache Operators: Copying Irrelevant Objects

Page 9: Loading a Cache with  Query Results

9

Hotels

CitiesCache

Join

<apex, ***, Edinburgh><ritz, *****, Paris>

Early vs. Late Cache Operators: Late Projections

<apex, Edin.><ritz, Paris>

Early Cache - Cheap Join Late Cache - Expensive Join

Hotels

<apex, ***, Edinburgh><ritz, *****, Paris>

Cities

Join

Cache

<apex><ritz>

Page 10: Loading a Cache with  Query Results

10

Alternative Approaches

• Determine candidate collections for caching; i.e. what to cache:– carry out data flow analysis– analyze select clause of the query; cache if oid

is returned

• Determine when to cache candidate objects:– heuristics– cost-based approach

Page 11: Loading a Cache with  Query Results

11

Caching at the Top Heuristics

• Policy– cache all candidate collections– cache no irrelevant objects (i.e., late caching)

• Algorithm– generate query plan for select * query– place Cache operator at the top of plan– push down Cache operator through non-reductive

operations

• N.B.: Simulates „external“ approach

Page 12: Loading a Cache with  Query Results

Cache Operator Push DownCache Operator may be pushed down non-reductive operations

Cache(h,c)

Sort

Join

Hotels Cities

Initial Plan

Sort Sort

Cache(h,c)

Join

Hotels Cities

1. Push Down

Cache(h)

Join

Hotels Cache(c)

Cities

2. Push Down

Push-down reduces the cost of non-reductive operations without causing irrelevant objects being copied

Page 13: Loading a Cache with  Query Results

Caching at the Bottom Heuristics• Policy

– cache all candidate collections– increase cost of other operations as little as

possible (i.e., early caching)

• Algorithm– extend optimizer to produce plan with Cache

operators as low as possible (details in paper)– pull-up Cache operators through pipeline

Pull-up reduces the number of irrelevant objects that are cachedwithout increasing the cost of pipelined operators

Page 14: Loading a Cache with  Query Results

14

Cost-based Cache Operator Placement

• Try to find the best possible plan– Cache operators only if they are benefitial– Find best place for Cache operators in plan– Join order and site selection depends on caching

• Extend the query optimizer– enumerate all possible Caching plans– estimate cost and benefit of Cache operators– extended pruning condition for dyn. programming

Page 15: Loading a Cache with  Query Results

15

Enumerating all Caching Plans

Hotels Cities

Join

Cache(h,c)

Hotels Cities

Join

Cache(h) Cache(c)

Hotels Cities

Join

Cache(h)

Hotels Cities

Join

Plans with Join at the Server

Plans with Join at the Client

Hotels Cities

Join

Cache(h)

Hotels Cities

Join

Cache(c)

Page 16: Loading a Cache with  Query Results

16

Costing of Cache Operators• Overhead of Cache Operators

– cost to probe hash table for every object– cost to copy objects which are not yet cached

• Benefit of Cache Operators– savings: relevant objects are not refetched– savings depend on costs to fault-in object and current

state of the cache

• Cost = Overhead - Benefit– only Cache operators with Cost < 0 are useful

Page 17: Loading a Cache with  Query Results

17

Summary of Approaches

• Heuristics– simple to implement– not much additional optimization overhead– poor plans in certain situations

• Cost-based – very good plans– huge search space, slows down query optimizer

Page 18: Loading a Cache with  Query Results

18

Performance Experiments

• Test Environment– Garlic heterogeneous database system– UDB, Lotus Notes, WWW servers

• Benchmark– relational BUCKY benchmark database– simple queries to multi-way cross-source joins– simple accessor methods

Page 19: Loading a Cache with  Query Results

19

Application Run Time (secs)single-table query + accessor method

UDB Notes WWW

no caching 47.8 22.9 3538.5

traditional caching 22.9 18.2 1762.3

caching at top 2.2 12.7 11.9

caching at bottom 2.2 12.7 11.9

cost-based 2.2 2.7 11.9

Page 20: Loading a Cache with  Query Results

20

Application Run Time (secs)three-way joins + accessor method

Q1large cache

Q1small cache

Q2 Q3

no caching 405.5 405.5 842.5 129.2

traditional caching 405.5 405.5 842.7 129.9

caching at top 71.3 71.3 49.8 177.5

caching at bottom 76.0 415.8 34.9 141.9

cost-based 71.4 71.4 35.1 130.7

Page 21: Loading a Cache with  Query Results

21

Query Optimization Times(secs)vary number of candidate collections

n 2 3 4 5 6

no caching < 1 < 1 ~ 1 ~ 2 ~ 4

traditional caching < 1 < 1 ~ 1 ~ 2 ~ 4

caching at top < 1 < 1 ~ 1 ~ 2 ~ 4

caching at bottom < 1 < 1 ~ 1 ~ 2 ~ 4

cost-based < 1 < 1 ~ 3 ~ 12 ~ 80

Page 22: Loading a Cache with  Query Results

22

Conclusions

• Loading the cache with query results can result in huge wins– for search & work applications – if client-server interaction is expensive

• Use cost-based approach for simple queries – four or less candidate collections

• Use heuristics for complex queries

• Caching at Bottom heuristics is always at least as good as traditional, do-nothing approach

Page 23: Loading a Cache with  Query Results

23

Future Work

• Explore full range of possible approaches– e.g. cost-based Cache operator pull-up and push-

down

• Consider tradeoff of optimization time and application run time (meta optimization)– invest in optimization time only if high gains in

application run-time can be expected– consider state of the cache, dynamic optimization