A Variability Model for Query Optimizers

17
1 09.07.2012 Michael Soffner A Variability Model for Query Optimizers Michael Soffner 1 , Norbert Siegmund 1 , Marko Rosenmüller 1 , Janet Siegmund 1 , Thomas Leich 2 , Gunter Saake 1 1 University of Magdeburg, Germany 2 METOP GmbH, Germany

description

A Variability Model for Query Optimizers. Michael Soffner 1 , Norbert Siegmund 1 , Marko Rosenmüller 1 , Janet Siegmund 1 , Thomas Leich 2 , Gunter Saake 1. 1 University of Magdeburg, Germany 2 METOP GmbH, Germany. Outline. Motivation Variability Approach System Analysis - PowerPoint PPT Presentation

Transcript of A Variability Model for Query Optimizers

Page 1: A Variability Model for Query Optimizers

109.07.2012Michael Soffner

A Variability Model for Query Optimizers

Michael Soffner1, Norbert Siegmund1, Marko Rosenmüller1, Janet Siegmund1, Thomas Leich2, Gunter Saake1

1 University of Magdeburg, Germany2 METOP GmbH, Germany

Page 2: A Variability Model for Query Optimizers

209.07.2012Michael Soffner

• Motivation

• Variability Approach

• System Analysis

• Unified Variability Model

Outline

Page 3: A Variability Model for Query Optimizers

309.07.2012Michael Soffner

• Database vendors continuously extend functionality to fit to new application domains

• Leads to over bloated systems that have decreased performance and manageability

• Specialized systems outperform RDBMS, e.g., Sensor Networks and Data Warehouses (Stonebraker2005)

Driving factors for Query Optimizer extensions

• SQL conformity to standard

• New indexes, operations, statistics

Result: Increased search space and reduced performance

Motivation

Page 4: A Variability Model for Query Optimizers

409.07.2012Michael Soffner

• Goal: Specialized query processors by introducing variability

• Selection of only needed functionality and omitting the rest

• Variability through Software Product Lines (SPLs)

Our Approach

Fig.1 Benefits of tailored Query Optimizers

Page 5: A Variability Model for Query Optimizers

509.07.2012Michael Soffner

Software Product Lines (SPLs)

Use Features to describe a concept in a domain model

Page 6: A Variability Model for Query Optimizers

609.07.2012Michael Soffner

Product Derivation

Configuration

Feature Model

Reusable Implementation Artifacts

Program Generator Final Product

Dom

ain

Eng

ineeri

ng

Applic

ati

on E

ngin

eeri

ng

Page 7: A Variability Model for Query Optimizers

709.07.2012Michael Soffner

• 3 Steps to a unified model

Overall Process

Course Model

Course Model

SystemAnalysisSystemAnalysis UnificationUnification

SQLite/Optimizer

Evaluation Algorithm

Simplifi-cation

Strategy

logical

physical

Operations

Standardi-zation

Amelio-ration

SQLite/Optimizer

WhereClause Optimization

OrderBy Optimization

Evaluation Algorithm

Selectivity

Or Optimization

Truncate Optimization

Between Optimization

Greedy

Statistics

Join Nested Loop

Left Deep Tree

Histogram

Estimated Selectivity

MIN/MAX Optimization

Subquery Flattening From-Clause

Simplifi-cation

Or-In-Rewriting

Histogram Based

No. of Entries (Index)

Estimated Selectivity

Frequency

s

s

s

s

s

s

Strategy

logical

physical

Operations

Cost-Based Selection

Standardi-zation

Amelio-ration

AccessPaths

Like Optimization

s

Analyzes

Cardinality

Default Value

Table

RowID

Index

Full Table Scan

Analyzed Value

Default Value

Costs

Multi-OR

Index Range

Index Equal

s

SQLite/Optimizer

WhereClause Optimization

OrderBy Optimization

Evaluation Algorithm

Selectivity

Or Optimization

Truncate Optimization

Between Optimization

Greedy

Statistics

Join Nested Loop

Left Deep Tree

Histogram

Estimated Selectivity

MIN/MAX Optimization

Subquery Flattening From-Clause

Simplifi-cation

Or-In-Rewriting

Histogram Based

No. of Entries (Index)

Estimated Selectivity

Frequency

s

s

s

s

s

s

Strategy

logical

physical

Operations

Cost-Based Selection

Standardi-zation

Amelio-ration

AccessPaths

Like Optimization

s

Analyzes

Cardinality

Default Value

Table

RowID

Index

Full Table Scan

Analyzed Value

Default Value

Costs

Multi-OR

Index Range

Index Equal

s

SQLite/Optimizer

WhereClause Optimization

OrderBy Optimization

Evaluation Algorithm

Selectivity

Or Optimization

Truncate Optimization

Between Optimization

Greedy

Statistics

Join Nested Loop

Left Deep Tree

Histogram

Estimated Selectivity

MIN/MAX Optimization

Subquery Flattening From-Clause

Simplifi-cation

Or-In-Rewriting

Histogram Based

No. of Entries (Index)

Estimated Selectivity

Frequency

s

s

s

s

s

s

Strategy

logical

physical

Operations

Cost-Based Selection

Standardi-zation

Amelio-ration

AccessPaths

Like Optimization

s

Analyzes

Cardinality

Default Value

Table

RowID

Index

Full Table Scan

Analyzed Value

Default Value

Costs

Multi-OR

Index Range

Index Equal

s

Query Optimizer

Join

Evaluation Algorithms

NestedLoop

RecursiveAlgorithm

Hash

Merge

Left Deep

Bushy

Right Deep

Amelioration

View Merging

Inline set returning functions

Expression Preprocessing

Pull Up Subqueries

Reduce Outer Joins

Logical Optimizer

Physical Optimizer

Access Path

Table Scan

Index Scan

Sequential Scan

Full Index Scan

Index Unique Scan

TID/RowID-Scan

Index Range Scan

Index Skip Scan

Fast Full Index Scan

Index Joins

Bitmap Index

Sample Table Scan

Hash Scan

Cluster Scan

BitmapHeap Scan

GeneticAlgorithm

Left Deep

Selectivity

Cost-based Selection

Cost

Cardinality

Statistics

No. of Entries

No. of Blocks

Histogram

Frequency

Most Common Values

Most Common Frequencies

Height Balanced

No. of Distinct Values

Dynamic Sampling

StrategyOperations

Statistic Based

soptional (static)

requires

alternative

optional

mandatory

Standardization Simplification

Estimated Selectivity

Analyze

Predicate Pushing

Rewrite with materialized

views

Between Optimization

Truncate Optimization

Like Optimization

Full Table Scan

Multi-OR

Default Values

Default Values

StatsticBased

HistogramBased

CPU Usage

Memory Usage

Disk I/O

Classification by Jarke (1984)

Oracle, PostgreSQL, SQLite

Unified Model

Page 8: A Variability Model for Query Optimizers

809.07.2012Michael Soffner

• Generally distinguishes logical and physical optimization

Optimizer Functionality Classification (Jarke)

Logical Physical

Standardization• Transformation into a standardized representation(e.g. predicate normalization)

Evaluation Algorithm• General algorithm that generates the program to a given query(e.g. recursive search)

Simplification• Elimination of redundancies, (e.g. idempotency rules)

Operations• Physical implementations of logical operations(e.g. nested loop join)

Amelioration• Generating semantically equal queries with better performance(e.g. heuristics)

Strategy• Concepts to find best query plan(e.g. cost-based approach)

Page 9: A Variability Model for Query Optimizers

909.07.2012Michael Soffner

Simplifi-cation

logical Standardi-zation

Amelio-ration

SQLite/Optimizer

Truncate Optimization

Between Optimization

Subquery Flattening From-Clause

Simplifi-cation

Or-In-Rewriting

s

s

s

s

Standardi-zation

Amelio-ration

Like Optimization

s

• Customizable through #ifdef compiler flags static configuration

• All logical optimization features are optional• Only B-Tree indexes• Allows statistics to be omitted statically

SQLite

Selectivity

Statistics Histogram

Estimated SelectivityHistogram

Based

No. of Entries (Index)

Estimated Selectivity

Frequencys

Strategy Cost-Based Selection

Analyze

Default Value

Operationss

Page 10: A Variability Model for Query Optimizers

1009.07.2012Michael Soffner

Statistics

Seq Page Cost

Random Page Cost

CPU Tuple Cost

CPU Operator Cost

No. of TableEntries

No. of IndexEntries

No. Of Block (per Table)

No. Of Block (per Index)

Cost

Histogram Frequency

Most Common Values

Cost-based

• Most logical optimization feature aim to standardize the input query• No features for special heuristics• Includes inline set returning functions• Two evaluation algorithms: exhaustive search, genetic algorithm• Four index types: b-tree, hash-based and multi-dimension-based

indexes (GIS support)

PostgreSQL

Simplification

Rewrite Rule System

Pull up Sublinks

Inline set-returning functions

Expression Preprocessing

Pull Up Subqueries

Reduce Outer Joins

Logical Standardi-zation

Recursive Near Exhaustive

Search

Genetic Query Optimizer

Evaluation Algorithm

Page 11: A Variability Model for Query Optimizers

1109.07.2012Michael Soffner

• Special feature: predicates pushing, rewrite materialized views• Most Access Paths• Configuration through Hints

Oracle

SubqueryUnnesting

From-Clause

Where-Clause

View Merging

Predicate Pushing

Rewrite with Materiallized

Views

Logical Optimization Standardization

Amelioration

Access Paths Tablescan

IndexScan

Sequential Scan

Full Index Scan

Full Table Scan

Index Unique Scan

RowID-Scan

Index Range Scan

Index Skip Scan

Fast Full Index Scan

Index Joins

Bitmap Index

Sample Table Scan

Hash Scan

Cluster Scan

Estimation

Statistics

Selectivity

Cardinality

Cost

No. of Distinct Values

Histogram

No. Rows

No. of Disk I/O

Amount of CPU Usage

Amout of Memory Usage

Dynamic Sampling

Height Balanced

Frequency

Internal Default Values

Page 12: A Variability Model for Query Optimizers

1209.07.2012Michael Soffner

• Goal: System-independent Variability Model

• Identification of feature that implement same functionality

1.Integration

• A1: Same functionality but different names

• A2: Same names but different functionality

• Only semantic descriptions allow a decision

• Basis: Documentation and Source Code

• Example: Nested Loop

Variability Model: Unification Process

SQLite PostgreSQL

Name Nested Loop Nestpath

Source Source-code Comment

typedef definition and cost calculation algorithm

Page 13: A Variability Model for Query Optimizers

1309.07.2012Michael Soffner

2.Unification

• 1:1 Mapping (Mapping of one Features into one unified Feature)

• 1:n Mapping (Compose multiple system-dependent Features into one

unified Feature)

Variability Model: Unification Process 2

Feature SQLite PostgreSQL Oracle

No. of Table Entries

N/A No. of Table Entries

No. of Rows

Pull Up Subqueries

Subquery Flattening

Pull Up Subqueries Subquery Unnesting

From Clause N/A From Clause

N/A N/A Where Clause

N/A Pull Up Sublinks N/A

Page 14: A Variability Model for Query Optimizers

1409.07.2012Michael Soffner

Selectivity

Cost-based Selection

Cost

Cardinality

Dynamic Sampling

Strategy

Statistic Based

Estimated Selectivity

Default Values

Default Values

StatsticBased

HistogramBased

CPU Usage

Memory Usage

Disk I/O

Join

NestedLoop

Hash

Merge

Access Path

Table Scan

Index Scan

Sequential Scan

Full Index Scan

Index Unique Scan

TID/RowID-Scan

Sample Table Scan

Operations

AnalyzeFull Table

Scan

Variability Model

Amelioration

View Merging

Inline set returning functions

Expression Preprocessing

Pull Up Subqueries

Reduce Outer Joins

Logical Optimizer

Standardization Simplification

Predicate Pushing

Rewrite with materialized

views

Between Optimization

Truncate Optimization

Like Optimization

Evaluation Algorithms

RecursiveAlgorithm

Left Deep

Bushy

Right Deep

GeneticAlgorithm

Left Deep

Statistics

No. of Entries

No. of Blocks

Histogram

Frequency

Most Common Values

Most Common Frequencies

Height Balanced

No. of Distinct Values

Page 15: A Variability Model for Query Optimizers

1509.07.2012Michael Soffner

• Provide a basis for implementing configurable query optimizer

• Unified semantic description of query optimizer functionality (Taxonomy/Ontology)

• Provides a foundation for a (semi-)automatic configuration of query optimizers based on application requirements

• Provide a basis for modeling dependencies between query optimizers and deeper layers of DBMSs, e.g., Storage Engine

Conclusion

Page 16: A Variability Model for Query Optimizers

1609.07.2012Michael Soffner

[Stonebraker2005] M. Stonebraker and U. Cetintemel. One Size Fits All: An Idea Whose Time Has Come and Gone. In Proceedings of the International Conference on Data Engineering (ICDE),pages 2-11, 2005.

[Jarke84] M. Jarke and J. Koch. Query optimization in database systems. ACM Computing Surveys (CSUR), 16:111-152, June 1984. ACM ID: 356928.

References

Page 17: A Variability Model for Query Optimizers

1709.07.2012Michael Soffner

Thanks for your attention!