Query Optimization
description
Transcript of Query Optimization
Query Optimization
Allison Griffin
Importance of Optimization
Time is money Queries are faster Helps everyone who uses the server Solution to speed lies in the algorithm Different performance improvements with
different database engines and schemas
Brief History
Before 1970’s: Dark days, manual optimization Late 70’s to mid 80’s:
– Birth of relational data model and declarative SQL– Optimization is job of system– System R-beginning work on join order optimization– Dynamic Programming: Heuristic Optimizers
Mid 80’s to early 90’s:– Extensible query optimization (Exodus)
Mid 90’s to late 90’s:– Materialized Views
Volcano Extensible Query Optimizer Generator
General purpose cost based query optimizer, based on equivalence rules in algebra– Equivalences: join associativity, select push
down, aggregate push down– Extensible: new operations and equivalences can
be easily added– Developed by Graefe and McKenna 1993
Materialized Views
Can materialize (pre-compute and store) views to speed up queries– Incremental maintenance
when database is updated, propagate updates to materialized view without complete re-computation
– Deciding when to use materialized views even if query does not refer to materialized view, optimizer
can figure out it can be used
Deciding What to Materialize
Maintenance cost and query cost– Workload depends on what is materialized:
queries and update transactions weights for each component of workload
Goal: find set of views that gives minimum cost if materialized, subject to space constraints
What we already know…
Query optimizer analyzes set of query execution plans and gives optimal (least cost)– Heavily dependent on optimizer’s estimate for
number of rows that will result at each step of QEP
– Estimates rely on statistics typically stored in histograms
Recent Approaches to Improve Statistics
Paper “Distinct-Value Synopses for Multiset Operations” by Kevin Beyer, Rainer Gemulla, Peter J. Haas, Berthold Reinwalk, and Yannis Sismanis, 2007
IBM’s LEO (Learning Empirical Results in Query Optimization), 2001
Summary of Paper Results
Addresses the problem of efficient estimate of number of distinct values of an attribute
Builds on leveraging of randomized algorithms
Claim to have unbiased estimator for distinct values with lower mean squared error– Past attempts tend to by higher than the actual
number so they have come up with way to cut that number down to be more reasonable
Distinct-Value Estimation
Propose summary structure (synopsis) for a relation
– Synopsis can be used to estimate number of DVs in the partition
– Synopses can be combined to create synopses for compound partitions created from base partitions using multiset union, intersection or difference operations
– Updates can be performed on compound partitions by using synopses from base relations
LEO - Learning Emperical Results in Query Optimization
Autonomic feedback loops that create a self-tuning database query optimizer
Self-validates and adjusts to improve query optimization and execution without requiring user interaction to repair incorrect statistics or cardinality estimates
Reduces the total cost of owning database management systems by simplifying database administration
How it works
Monitors queries as they execute Compares the optimizer’s estimates with
actuals at each step in a QEP Then computes adjustments to its estimates
that may be used during future optimizations of similar queries
Moreover, estimation errors can also trigger re-optimization of a query in mid-execution.
Challenges in Research of LEO
(1) ensuring stability and convergence of the autonomic system
(2) guaranteeing consistency of the overall optimizer's model upon refinements
Results
Reduction of query execution time by orders of magnitude at negligible additional run-time cost
Reduced administration time Fewer problem queries Overall improved query performance with
increased robustness and predictability of query response times
Bibliography
“LEO-Learning Empirical Results in Query Optimization.” IBM. <http://domino.watson.ibm.com/comm/research.nsf/pages/r.datamgmt.innovation.html>.
“Optimizing for Query Speed”. SQL. <http://www.devshed.com/c/a/MySQL/Optimizing-for-Query-Speed/1/
“Optimizing Database Queries”. IBM. <http://www.stevengould.org/portfolio/developerWorks/efficientPHP/wa-effphp/wa-effphp-4-1.html>.
“Optimize Queries Theory in Practice”. <http://www.serverwatch.com/tutorials/article.php/2175621/How-to-Optimize-Queries-Theory-an-Practice.htm>.
Beyer, Kevin, Gemulla, Rainer, Haas, Peter J., Reinwald, Berthold, Sismani, Yannis. “Distinct-Value Synopses for Multiset Operations”. Communications of the ACM. Vol. 52. October 2009.
Chaudhuri, Surajit. “Technical Perspective: Relational Query Optimization-Data Management Meets Statistical Estimation”. Communications of the ACM. Vol. 52. October 2009.