Ch08 Query Processing 1 Overview QP

download Ch08 Query Processing 1 Overview QP

of 26

Transcript of Ch08 Query Processing 1 Overview QP

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    1/26

    Query Processing 1

    Fakultas IF ITTelkom

    2010

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    2/26

    Steps in Query Processing

    1. Parsing and translation

    .

    3. Evaluation

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    3/26

    Steps in Query Processing

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    4/26

    Steps in Query Processing

    Parsing and translation

    Translate the query into its internal form.

    Translation is similar to the work performed by the

    Parser checks syntax, verifies relations

    Parse tree representation

    This is then translated into RA expression

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    5/26

    Steps in Query Processing

    Query Execution Plan In SQL, a query can be expressed is several ways

    Each SQL query can itself be translated into RA expression in many ways

    An RA expression only partially tells you how to evaluate a query

    Several ways to evaluate RA expression Annotate RA expression with instructions specifying how to evaluate each

    operation

    Annotation may state the algorithm to be used for a specific operation or the

    particular index to use Annotated RAE is called an evaluation primitive

    Sequence of primitive operations is a QEP

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    6/26

    Steps in Query Processing

    Evaluation

    The query-execution engine takes a query-evaluation plan,

    executes that plan, and returns the answers to the query.

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    7/26

    Steps in Query Processing

    Example

    select balance

    from account

    where balance < 2500

    balance

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    8/26

    Steps in Query Processing

    Query Execution Plan

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    9/26

    Query Optimization

    Different QEPs for a given query can have

    different costs

    Users not expected to write their queries in a way

    a sugges s e mos e c en It is the systems responsibility to construct a QEP

    that minimizes the cost

    This is Query Optimization

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    10/26

    Query Optimization Query Optimization: Amongst all equivalent evaluation plans choose

    the one with lowest cost. Cost is estimated using statistical information from the

    database catalog

    e.g. number of tuples in each relation, size of tuples, etc.

    First we study How to measure query costs

    Algorithms for evaluating relational algebra operations

    How to combine algorithms for individual operations in order to evaluate a

    complete expression Next

    We study how to optimize queries, that is, how to find an evaluation plan

    with lowest estimated cost

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    11/26

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    12/26

    Query Optimization

    Ideally: Want to find best plan.Practically: Avoid worst plans!

    We will study the System R approach

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    13/26

    What and Why Query Processor ??

    Query Optimization ??

    Query Optimization strongly relies on File

    Organization techniques

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    14/26

    Highlights of System R Optimizer

    Impact:

    Most widely used currently; works well for < 10 joins.

    Cost estimation: Approximate art at best.

    Statistics, maintaine in system cata ogs, use toestimate cost of operations and result sizes.

    Considers combination of CPU and I/O costs.

    Plan Space: Too large, must be pruned.

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    15/26

    Query Execution

    Query compiler

    Execution engine

    Index/record mgr.

    User/Application

    Query

    update

    Query execution

    plan

    Record, index

    requests

    Buffer manager

    Storage manager

    storage

    Pagecommands

    Read/write

    pages

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    16/26

    Query Execution Plans

    SELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND

    Q.cit =seattle AND

    City=seattle phone>5430000

    buyer

    Q.phone > 5430000

    Query Plan: logical tree

    implementation choice at every node scheduling of operations.

    Purchase Person

    Buyer=name (Simple Nested Loops)

    (Table scan) (Index scan)

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    17/26

    The Leaves of the Plan: Scans Table scan: iterate through the records of the

    relation. Index scan: go to the index, from there get the

    better?)

    Sorted scan: produce the relation in order.

    Implementation depends on relation size.

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    18/26

    Cost-based Query Sub-System

    Query Parser

    Usually there is aheuristics-basedrewriting step beforethe cost-based steps.

    Select *

    From Blah B

    Where B.blah = blah

    Queries

    Query Optimizer

    Plan

    Generator

    Plan Cost

    Estimator

    Query Executor

    Catalog Manager

    Schema Statistics

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    19/26

    Evaluating Expressions

    Materialization

    Pipelining

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    20/26

    Materialization Materialized evaluation: evaluate one operation at a time, starting at the

    lowest-level. Use intermediate results materialized into temporary

    relations to evaluate next-level operations.

    E.g., in figure below, compute and store

    then compute the store its join with customer, and finally compute the

    projections on customer-name.

    )(2500 accountbalance

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    21/26

    Materialization (Cont.)

    Materialized evaluation is always applicable

    Cost of writing results to disk and reading them back can bequite high

    Our cost formulas for operations ignore cost of writing results to disk,

    so

    Overall cost=Sum of costs of individual operations +cost of writing intermediate results to disk

    Double buffering: use two output buffers for each operation,

    when one is full write it to disk while the other is getting filled

    Allows overlap of disk writes with computation and reduces execution

    time

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    22/26

    Pipelining

    Several operations are grouped together into a pipeline,

    in which each of the operations start working on its

    tuples even as they are being generated by anotheroperation

    Results of one operation are passed along to the next

    operation in the pipeline Pipelined evaluation

    Reduces the number of temporary relations that are

    produced

    Eliminates the cost of reading and writing temporary

    relations

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    23/26

    Pipelining

    Pipelined evaluation : evaluate several operations simultaneously, passing the

    results of one operation on to the next.

    E.g., in previous expression tree, dont store result of

    instead, pass tuples directly to the join.. Similarly, dont store result of join, pass tuples

    directly to projection.

    )(2500 accountbalance

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    24/26

    Pipelining (Cont.)

    In demand driven or lazy evaluation

    system repeatedly requests next tuple from top level operation

    Each operation requests next tuple from children operations as required,

    in order to output its next tuple In between calls, operation has to maintain state so it knows what to

    return next

    In producer-driven or eager pipelining

    Operators produce tuples eagerly and pass them up to their parents

    Buffer maintained between operators, child puts tuples in buffer,

    parent removes tuples from buffer

    if buffer is full, child waits till there is space in the buffer, and then

    generates more tuples

    System schedules operations that have space in output buffer and canprocess more input tuples

    Alternative name: pull and push models of pipelining

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    25/26

    Pipelining (Cont.) Implementation of demand-driven pipelining

    Each operation is implemented as an iteratorimplementing the following operations

    open()

    E.g. file scan: initialize file scan

    state: pointer to beginning of file

    E.g.merge join: sort relations;

    state: pointers to beginning of sorted relations

    next()

    E.g. for file scan: Output next tuple, and advance and store filepointer

    E.g. for merge join: continue with merge from earlier state tillnext output tuple is found. Save pointers as iterator state.

    close()

  • 8/3/2019 Ch08 Query Processing 1 Overview QP

    26/26

    Evaluation Algorithms for Pipelining Some algorithms are not able to output results even as they get input tuples

    E.g. merge join, or hash join

    intermediate results written to disk and then read back Algorithm variants to generate (at least some) results on the fly, as input tuples

    are read in

    E.g. hybrid hash join generates output tuples even as probe relation tuples

    in the in-memory partition (partition 0) are read in

    Pipelined join technique: Hybrid hash join, modified to buffer partition 0

    tuples of both relations in-memory, reading them as they become available,

    and output results of any matches between partition 0 tuples

    When a new r0 tuple is found, match it with existing s0 tuples, output

    matches, and save it in r0

    Symmetrically for s0 tuples