Query Optimization

Query Optimization

Allison Griffin

Importance of Optimization

Time is money Queries are faster Helps everyone who uses the server Solution to speed lies in the algorithm Different performance improvements with

different database engines and schemas

Brief History

Before 1970’s: Dark days, manual optimization Late 70’s to mid 80’s:

– Birth of relational data model and declarative SQL– Optimization is job of system– System R-beginning work on join order optimization– Dynamic Programming: Heuristic Optimizers

Mid 80’s to early 90’s:– Extensible query optimization (Exodus)

Mid 90’s to late 90’s:– Materialized Views

Volcano Extensible Query Optimizer Generator

General purpose cost based query optimizer, based on equivalence rules in algebra– Equivalences: join associativity, select push

down, aggregate push down– Extensible: new operations and equivalences can

be easily added– Developed by Graefe and McKenna 1993

Materialized Views

Can materialize (pre-compute and store) views to speed up queries– Incremental maintenance

when database is updated, propagate updates to materialized view without complete re-computation

– Deciding when to use materialized views even if query does not refer to materialized view, optimizer

can figure out it can be used

Deciding What to Materialize

Maintenance cost and query cost– Workload depends on what is materialized:

queries and update transactions weights for each component of workload

Goal: find set of views that gives minimum cost if materialized, subject to space constraints

What we already know…

Query optimizer analyzes set of query execution plans and gives optimal (least cost)– Heavily dependent on optimizer’s estimate for

number of rows that will result at each step of QEP

– Estimates rely on statistics typically stored in histograms

Recent Approaches to Improve Statistics

Paper “Distinct-Value Synopses for Multiset Operations” by Kevin Beyer, Rainer Gemulla, Peter J. Haas, Berthold Reinwalk, and Yannis Sismanis, 2007

IBM’s LEO (Learning Empirical Results in Query Optimization), 2001

Summary of Paper Results

Addresses the problem of efficient estimate of number of distinct values of an attribute

Builds on leveraging of randomized algorithms

Claim to have unbiased estimator for distinct values with lower mean squared error– Past attempts tend to by higher than the actual

number so they have come up with way to cut that number down to be more reasonable

Distinct-Value Estimation

Propose summary structure (synopsis) for a relation

– Synopsis can be used to estimate number of DVs in the partition

– Synopses can be combined to create synopses for compound partitions created from base partitions using multiset union, intersection or difference operations

– Updates can be performed on compound partitions by using synopses from base relations

LEO - Learning Emperical Results in Query Optimization

Autonomic feedback loops that create a self-tuning database query optimizer

Self-validates and adjusts to improve query optimization and execution without requiring user interaction to repair incorrect statistics or cardinality estimates

Reduces the total cost of owning database management systems by simplifying database administration

How it works

Monitors queries as they execute Compares the optimizer’s estimates with

actuals at each step in a QEP Then computes adjustments to its estimates

that may be used during future optimizations of similar queries

Moreover, estimation errors can also trigger re-optimization of a query in mid-execution.

Challenges in Research of LEO

(1) ensuring stability and convergence of the autonomic system

(2) guaranteeing consistency of the overall optimizer's model upon refinements

Results

Reduction of query execution time by orders of magnitude at negligible additional run-time cost

Reduced administration time Fewer problem queries Overall improved query performance with

increased robustness and predictability of query response times

Bibliography

“LEO-Learning Empirical Results in Query Optimization.” IBM. <http://domino.watson.ibm.com/comm/research.nsf/pages/r.datamgmt.innovation.html>.

“Optimizing for Query Speed”. SQL. <http://www.devshed.com/c/a/MySQL/Optimizing-for-Query-Speed/1/

“Optimizing Database Queries”. IBM. <http://www.stevengould.org/portfolio/developerWorks/efficientPHP/wa-effphp/wa-effphp-4-1.html>.

“Optimize Queries Theory in Practice”. <http://www.serverwatch.com/tutorials/article.php/2175621/How-to-Optimize-Queries-Theory-an-Practice.htm>.

Beyer, Kevin, Gemulla, Rainer, Haas, Peter J., Reinwald, Berthold, Sismani, Yannis. “Distinct-Value Synopses for Multiset Operations”. Communications of the ACM. Vol. 52. October 2009.

Chaudhuri, Surajit. “Technical Perspective: Relational Query Optimization-Data Management Meets Statistical Estimation”. Communications of the ACM. Vol. 52. October 2009.

Query Optimization

Documents

Transcript of Query Optimization