Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris...

22
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index Selection for Shifting Workloads
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    3

Transcript of Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris...

Page 1: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz)

Serge Abiteboul (INRIA and University of Paris 11)

Tova Milo (University of Tel Aviv)

Automatic Index Selectionfor Shifting Workloads

Page 2: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

A heavily used database needs indexes But an evolving workload makes it tough

Popularity shifts over time Applications come and go

The problem: on-line index selection Maintain a dynamic set of indexes that

improve performance of current queries

On-line Index Selection

Page 3: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Can off-line techniques be adapted? Off-line algorithms need a fixed query load This is not given in the on-line problem

We could try:

Attempted On-line Solution

Gather query workload W

Optimize indexes for Wcan be very slow

Loop

no obviousimplementation

creation cost not considered

Page 4: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

COLT: Continuous On-Line Tuning A system for on-line index selection

Features: Continuously tracks query load Evaluates indexes with what-if optimizer Adaptively controls overhead Selects indexes based on recent trends

Prototype implementation in PostgreSQL

Our Solution

Page 5: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

We make the following assumptions Data is in a relational database Workload of SELECT statements Restricted to single-column indexes Given a budget for index storage

This problem still has major challenges

Lifting these assumptions in future work

Problem Assumptions

Page 6: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

System Architecture

Parser

Executor Database

query & plan

indexes to profile

index benefits

CREATE INDEX ...DROP INDEX ...

OptimizerWhat-if

Interface

query

COLT

epoch

(between epochs)

queries

Page 7: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

COLT Internal Organization

What-ifInterface indexes to profile

index benefits

query & plan

recentqueries

COLT

candidateindexes

Page 8: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Similar queries are grouped in clusters

Each query is placed in a cluster on arrival Aggregate benefit statistics for each cluster

Confidence intervals of past measurements

Organizing the Queries

Cluster 1 Cluster 2

Query 1

Query 3

Query 2

Query 4

Query 5

Page 9: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Organizing the Indexes

Cold Set

Hot Set

Materialized Set

Relevant, but not promising candidates

Benefit measured with crude metric

Promising candidates for materialization

Profiled accurately with what-if calls

Indexes materialized by COLT

Disk storage must fit in budget

Profiled with reverse what-if calls

new candidate discarded

Page 10: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Challenge: Selective ProfilingWhich hot and materialized indexes are profiled w.r.t. the current query?

Challenge: Index SelectionHow are index candidates selected forthe cold, hot, and materialized sets?

Key Challenges

Page 11: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Selective Profiling (1/2)

map to cluster

relevant for profiling

C1 C2 C3 C4

query

Cold Hot Mat

indexes to profilerandom sample focus on indexes

with uncertain benefit in cluster

Page 12: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Profiling budget:max what-if calls per epoch

Budget is adjusted after each epoch Set proportional to potential of hot indexes Potential based on optimistic assumptions

Result:

Selective Profiling (2/2)

stable workload suspend profiling

shifting workload intensify profiling

Page 13: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

How are new candidates chosen? Use a crude benefit metric

Cost of index access vs. sequential scan Approximate and cheap to compute

When each query arrives: Compute crude benefit of relevant indexes Indexes with benefit > 0 become candidates

Index Selection (1/3) new candidate

Cold

Page 14: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

How are hot indexes chosen? At the end of each epoch:

Get crude benefit of hot and cold indexes Find cut-off point to be in hot set:

Index Selection (2/3)

HOT

Crude Benefit

COLD

Cut-off point derived from two-cluster model

Cold

Hot

Page 15: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

How to choose materialized set? At the end of each epoch:

Predict benefit of hot and materialized indexes

Index Selection (3/3)

ObservedBenefit

time

PredictedBenefit

Cost of materialization is discounted Select indexes to maximize total benefit

Hot

Materialized

Page 16: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Performance of COLT

Experimented with a

4-phase workload

COLT adapts to each phase

Off-line chooses best static index set

Phase 1 Phase 2 Phase 3 Phase 4Minimum TimeCOLT Extra TimeOff-line Extra Time

Page 17: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Overhead of Profiling

Overhead peaks atstart of each phase

Decreases when system is well tuned

Average < 1 what-ifcall per query

Phase 1 Phase 2 Phase 3 Phase 4

Page 18: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

On-line index selection harder than off-line Efficiency is a higher priority Essentially need to guess the future Index creation is an issue

COLT is our solution Solves a constrained problem Potentially extensible to other domains

Only some components would change

Closing Remarks

Page 19: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Thank You

Page 20: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Epoch length = 10 What-if limit = 20

Or less if very worried about overhead Averaging window

Used when predicting future benefit Indicates the number of epochs that give

a good picture of the query distribution Not easy to set

Setting System Parameters

Page 21: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Bruno and Chaudhuri, ICDE 2007 Avoids extra optimizer calls completely Heuristics to capture index interaction Very different mechanisms for index selection

Sattler, Schallehn, and Geist, IDEAS 2004 More similarities to COLT Different prediction of future benefit No control on profiling overhead

Other On-line Systems

Page 22: Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

80%

60%

40%

20%

100%

Performance with Noise

The worst-casescenario for COLT

Concentrated bursts

of noise queries

Performance lossin some cases