Ch08 Query Processing 1 Overview QP

8/3/2019 Ch08 Query Processing 1 Overview QP

1/26

Query Processing 1

Fakultas IF ITTelkom

2010


2/26

Steps in Query Processing

1. Parsing and translation

.

3. Evaluation


3/26



4/26


Parsing and translation

Translate the query into its internal form.

Translation is similar to the work performed by the

Parser checks syntax, verifies relations

Parse tree representation

This is then translated into RA expression


5/26


Query Execution Plan In SQL, a query can be expressed is several ways

Each SQL query can itself be translated into RA expression in many ways

An RA expression only partially tells you how to evaluate a query

Several ways to evaluate RA expression Annotate RA expression with instructions specifying how to evaluate each

operation

Annotation may state the algorithm to be used for a specific operation or the

particular index to use Annotated RAE is called an evaluation primitive

Sequence of primitive operations is a QEP


6/26


Evaluation

The query-execution engine takes a query-evaluation plan,

executes that plan, and returns the answers to the query.


7/26


Example

select balance

from account

where balance < 2500

balance


8/26


Query Execution Plan


9/26

Query Optimization

Different QEPs for a given query can have

different costs

Users not expected to write their queries in a way

a sugges s e mos e c en It is the systems responsibility to construct a QEP

that minimizes the cost

This is Query Optimization


10/26

Query Optimization Query Optimization: Amongst all equivalent evaluation plans choose

the one with lowest cost. Cost is estimated using statistical information from the

database catalog

e.g. number of tuples in each relation, size of tuples, etc.

First we study How to measure query costs

Algorithms for evaluating relational algebra operations

How to combine algorithms for individual operations in order to evaluate a

complete expression Next

We study how to optimize queries, that is, how to find an evaluation plan

with lowest estimated cost


11/26


12/26

Query Optimization

Ideally: Want to find best plan.Practically: Avoid worst plans!

We will study the System R approach


13/26

What and Why Query Processor ??

Query Optimization ??

Query Optimization strongly relies on File

Organization techniques


14/26

Highlights of System R Optimizer

Impact:

Most widely used currently; works well for < 10 joins.

Cost estimation: Approximate art at best.

Statistics, maintaine in system cata ogs, use toestimate cost of operations and result sizes.

Considers combination of CPU and I/O costs.

Plan Space: Too large, must be pruned.


15/26

Query Execution

Query compiler

Execution engine

Index/record mgr.

User/Application

Query

update

Query execution

plan

Record, index

requests

Buffer manager

Storage manager

storage

Pagecommands

Read/write

pages


16/26

Query Execution Plans

SELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND

Q.cit =seattle AND

City=seattle phone>5430000

buyer

Q.phone > 5430000

Query Plan: logical tree

implementation choice at every node scheduling of operations.

Purchase Person

Buyer=name (Simple Nested Loops)

(Table scan) (Index scan)


17/26

The Leaves of the Plan: Scans Table scan: iterate through the records of the

relation. Index scan: go to the index, from there get the

better?)

Sorted scan: produce the relation in order.

Implementation depends on relation size.


18/26

Cost-based Query Sub-System

Query Parser

Usually there is aheuristics-basedrewriting step beforethe cost-based steps.

Select *

From Blah B

Where B.blah = blah

Queries

Query Optimizer

Plan

Generator

Plan Cost

Estimator

Query Executor

Catalog Manager

Schema Statistics


19/26

Evaluating Expressions

Materialization

Pipelining


20/26

Materialization Materialized evaluation: evaluate one operation at a time, starting at the

lowest-level. Use intermediate results materialized into temporary

relations to evaluate next-level operations.

E.g., in figure below, compute and store

then compute the store its join with customer, and finally compute the

projections on customer-name.

)(2500 accountbalance


21/26

Materialization (Cont.)

Materialized evaluation is always applicable

Cost of writing results to disk and reading them back can bequite high

Our cost formulas for operations ignore cost of writing results to disk,

so

Overall cost=Sum of costs of individual operations +cost of writing intermediate results to disk

Double buffering: use two output buffers for each operation,

when one is full write it to disk while the other is getting filled

Allows overlap of disk writes with computation and reduces execution

time


22/26

Pipelining

Several operations are grouped together into a pipeline,

in which each of the operations start working on its

tuples even as they are being generated by anotheroperation

Results of one operation are passed along to the next

operation in the pipeline Pipelined evaluation

Reduces the number of temporary relations that are

produced

Eliminates the cost of reading and writing temporary

relations


23/26

Pipelining

Pipelined evaluation : evaluate several operations simultaneously, passing the

results of one operation on to the next.

E.g., in previous expression tree, dont store result of

instead, pass tuples directly to the join.. Similarly, dont store result of join, pass tuples

directly to projection.

)(2500 accountbalance


24/26

Pipelining (Cont.)

In demand driven or lazy evaluation

system repeatedly requests next tuple from top level operation

Each operation requests next tuple from children operations as required,

in order to output its next tuple In between calls, operation has to maintain state so it knows what to

return next

In producer-driven or eager pipelining

Operators produce tuples eagerly and pass them up to their parents

Buffer maintained between operators, child puts tuples in buffer,

parent removes tuples from buffer

if buffer is full, child waits till there is space in the buffer, and then

generates more tuples

System schedules operations that have space in output buffer and canprocess more input tuples

Alternative name: pull and push models of pipelining


25/26

Pipelining (Cont.) Implementation of demand-driven pipelining

Each operation is implemented as an iteratorimplementing the following operations

open()

E.g. file scan: initialize file scan

state: pointer to beginning of file

E.g.merge join: sort relations;

state: pointers to beginning of sorted relations

next()

E.g. for file scan: Output next tuple, and advance and store filepointer

E.g. for merge join: continue with merge from earlier state tillnext output tuple is found. Save pointers as iterator state.

close()


26/26

Evaluation Algorithms for Pipelining Some algorithms are not able to output results even as they get input tuples

E.g. merge join, or hash join

intermediate results written to disk and then read back Algorithm variants to generate (at least some) results on the fly, as input tuples

are read in

E.g. hybrid hash join generates output tuples even as probe relation tuples

in the in-memory partition (partition 0) are read in

Pipelined join technique: Hybrid hash join, modified to buffer partition 0

tuples of both relations in-memory, reading them as they become available,

and output results of any matches between partition 0 tuples

When a new r0 tuple is found, match it with existing s0 tuples, output

matches, and save it in r0

Symmetrically for s0 tuples

Ch08 Query Processing 1 Overview QP

Documents

Transcript of Ch08 Query Processing 1 Overview QP