Ch08 Query Processing 1 Overview QP
-
Upload
prima-winangun -
Category
Documents
-
view
220 -
download
0
Transcript of Ch08 Query Processing 1 Overview QP
-
8/3/2019 Ch08 Query Processing 1 Overview QP
1/26
Query Processing 1
Fakultas IF ITTelkom
2010
-
8/3/2019 Ch08 Query Processing 1 Overview QP
2/26
Steps in Query Processing
1. Parsing and translation
.
3. Evaluation
-
8/3/2019 Ch08 Query Processing 1 Overview QP
3/26
Steps in Query Processing
-
8/3/2019 Ch08 Query Processing 1 Overview QP
4/26
Steps in Query Processing
Parsing and translation
Translate the query into its internal form.
Translation is similar to the work performed by the
Parser checks syntax, verifies relations
Parse tree representation
This is then translated into RA expression
-
8/3/2019 Ch08 Query Processing 1 Overview QP
5/26
Steps in Query Processing
Query Execution Plan In SQL, a query can be expressed is several ways
Each SQL query can itself be translated into RA expression in many ways
An RA expression only partially tells you how to evaluate a query
Several ways to evaluate RA expression Annotate RA expression with instructions specifying how to evaluate each
operation
Annotation may state the algorithm to be used for a specific operation or the
particular index to use Annotated RAE is called an evaluation primitive
Sequence of primitive operations is a QEP
-
8/3/2019 Ch08 Query Processing 1 Overview QP
6/26
Steps in Query Processing
Evaluation
The query-execution engine takes a query-evaluation plan,
executes that plan, and returns the answers to the query.
-
8/3/2019 Ch08 Query Processing 1 Overview QP
7/26
Steps in Query Processing
Example
select balance
from account
where balance < 2500
balance
-
8/3/2019 Ch08 Query Processing 1 Overview QP
8/26
Steps in Query Processing
Query Execution Plan
-
8/3/2019 Ch08 Query Processing 1 Overview QP
9/26
Query Optimization
Different QEPs for a given query can have
different costs
Users not expected to write their queries in a way
a sugges s e mos e c en It is the systems responsibility to construct a QEP
that minimizes the cost
This is Query Optimization
-
8/3/2019 Ch08 Query Processing 1 Overview QP
10/26
Query Optimization Query Optimization: Amongst all equivalent evaluation plans choose
the one with lowest cost. Cost is estimated using statistical information from the
database catalog
e.g. number of tuples in each relation, size of tuples, etc.
First we study How to measure query costs
Algorithms for evaluating relational algebra operations
How to combine algorithms for individual operations in order to evaluate a
complete expression Next
We study how to optimize queries, that is, how to find an evaluation plan
with lowest estimated cost
-
8/3/2019 Ch08 Query Processing 1 Overview QP
11/26
-
8/3/2019 Ch08 Query Processing 1 Overview QP
12/26
Query Optimization
Ideally: Want to find best plan.Practically: Avoid worst plans!
We will study the System R approach
-
8/3/2019 Ch08 Query Processing 1 Overview QP
13/26
What and Why Query Processor ??
Query Optimization ??
Query Optimization strongly relies on File
Organization techniques
-
8/3/2019 Ch08 Query Processing 1 Overview QP
14/26
Highlights of System R Optimizer
Impact:
Most widely used currently; works well for < 10 joins.
Cost estimation: Approximate art at best.
Statistics, maintaine in system cata ogs, use toestimate cost of operations and result sizes.
Considers combination of CPU and I/O costs.
Plan Space: Too large, must be pruned.
-
8/3/2019 Ch08 Query Processing 1 Overview QP
15/26
Query Execution
Query compiler
Execution engine
Index/record mgr.
User/Application
Query
update
Query execution
plan
Record, index
requests
Buffer manager
Storage manager
storage
Pagecommands
Read/write
pages
-
8/3/2019 Ch08 Query Processing 1 Overview QP
16/26
Query Execution Plans
SELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND
Q.cit =seattle AND
City=seattle phone>5430000
buyer
Q.phone > 5430000
Query Plan: logical tree
implementation choice at every node scheduling of operations.
Purchase Person
Buyer=name (Simple Nested Loops)
(Table scan) (Index scan)
-
8/3/2019 Ch08 Query Processing 1 Overview QP
17/26
The Leaves of the Plan: Scans Table scan: iterate through the records of the
relation. Index scan: go to the index, from there get the
better?)
Sorted scan: produce the relation in order.
Implementation depends on relation size.
-
8/3/2019 Ch08 Query Processing 1 Overview QP
18/26
Cost-based Query Sub-System
Query Parser
Usually there is aheuristics-basedrewriting step beforethe cost-based steps.
Select *
From Blah B
Where B.blah = blah
Queries
Query Optimizer
Plan
Generator
Plan Cost
Estimator
Query Executor
Catalog Manager
Schema Statistics
-
8/3/2019 Ch08 Query Processing 1 Overview QP
19/26
Evaluating Expressions
Materialization
Pipelining
-
8/3/2019 Ch08 Query Processing 1 Overview QP
20/26
Materialization Materialized evaluation: evaluate one operation at a time, starting at the
lowest-level. Use intermediate results materialized into temporary
relations to evaluate next-level operations.
E.g., in figure below, compute and store
then compute the store its join with customer, and finally compute the
projections on customer-name.
)(2500 accountbalance
-
8/3/2019 Ch08 Query Processing 1 Overview QP
21/26
Materialization (Cont.)
Materialized evaluation is always applicable
Cost of writing results to disk and reading them back can bequite high
Our cost formulas for operations ignore cost of writing results to disk,
so
Overall cost=Sum of costs of individual operations +cost of writing intermediate results to disk
Double buffering: use two output buffers for each operation,
when one is full write it to disk while the other is getting filled
Allows overlap of disk writes with computation and reduces execution
time
-
8/3/2019 Ch08 Query Processing 1 Overview QP
22/26
Pipelining
Several operations are grouped together into a pipeline,
in which each of the operations start working on its
tuples even as they are being generated by anotheroperation
Results of one operation are passed along to the next
operation in the pipeline Pipelined evaluation
Reduces the number of temporary relations that are
produced
Eliminates the cost of reading and writing temporary
relations
-
8/3/2019 Ch08 Query Processing 1 Overview QP
23/26
Pipelining
Pipelined evaluation : evaluate several operations simultaneously, passing the
results of one operation on to the next.
E.g., in previous expression tree, dont store result of
instead, pass tuples directly to the join.. Similarly, dont store result of join, pass tuples
directly to projection.
)(2500 accountbalance
-
8/3/2019 Ch08 Query Processing 1 Overview QP
24/26
Pipelining (Cont.)
In demand driven or lazy evaluation
system repeatedly requests next tuple from top level operation
Each operation requests next tuple from children operations as required,
in order to output its next tuple In between calls, operation has to maintain state so it knows what to
return next
In producer-driven or eager pipelining
Operators produce tuples eagerly and pass them up to their parents
Buffer maintained between operators, child puts tuples in buffer,
parent removes tuples from buffer
if buffer is full, child waits till there is space in the buffer, and then
generates more tuples
System schedules operations that have space in output buffer and canprocess more input tuples
Alternative name: pull and push models of pipelining
-
8/3/2019 Ch08 Query Processing 1 Overview QP
25/26
Pipelining (Cont.) Implementation of demand-driven pipelining
Each operation is implemented as an iteratorimplementing the following operations
open()
E.g. file scan: initialize file scan
state: pointer to beginning of file
E.g.merge join: sort relations;
state: pointers to beginning of sorted relations
next()
E.g. for file scan: Output next tuple, and advance and store filepointer
E.g. for merge join: continue with merge from earlier state tillnext output tuple is found. Save pointers as iterator state.
close()
-
8/3/2019 Ch08 Query Processing 1 Overview QP
26/26
Evaluation Algorithms for Pipelining Some algorithms are not able to output results even as they get input tuples
E.g. merge join, or hash join
intermediate results written to disk and then read back Algorithm variants to generate (at least some) results on the fly, as input tuples
are read in
E.g. hybrid hash join generates output tuples even as probe relation tuples
in the in-memory partition (partition 0) are read in
Pipelined join technique: Hybrid hash join, modified to buffer partition 0
tuples of both relations in-memory, reading them as they become available,
and output results of any matches between partition 0 tuples
When a new r0 tuple is found, match it with existing s0 tuples, output
matches, and save it in r0
Symmetrically for s0 tuples