17 Stream Intro

download 17 Stream Intro

of 71

Transcript of 17 Stream Intro

  • 8/2/2019 17 Stream Intro

    1/71

    An Introduction To Data Stream Query Processing

    Neil Conway

    Amalgamated Insight, Inc.

    May 24, 2007

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 1 / 45

    http://find/
  • 8/2/2019 17 Stream Intro

    2/71

    Outline

    1 The Need For Data Stream Processing

    2 Stream Query Languages

    3 Query Processing Techniques For StreamsSystem ArchitectureShared EvaluationAdaptive Tuple RoutingOverload Handling

    4 Current Choices For A DSMSOpen Source

    Proprietary

    5 Demo

    6 Q & A

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 2 / 45

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    3/71

    Outline

    1 The Need For Data Stream Processing

    2 Stream Query Languages

    3 Query Processing Techniques For StreamsSystem ArchitectureShared EvaluationAdaptive Tuple RoutingOverload Handling

    4 Current Choices For A DSMSOpen Source

    Proprietary

    5 Demo

    6 Q & A

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 3 / 45

    http://find/
  • 8/2/2019 17 Stream Intro

    4/71

    The Need For Data Stream Processing

    Whats wrong with database systems?

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 4 / 45

    http://find/
  • 8/2/2019 17 Stream Intro

    5/71

    The Need For Data Stream Processing

    Whats wrong with database systems?

    Nothing, but they arent the right solutionto every problem

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 4 / 45

    http://find/
  • 8/2/2019 17 Stream Intro

    6/71

    The Need For Data Stream Processing

    Whats wrong with database systems?

    Nothing, but they arent the right solutionto every problem

    What are some problems for which

    a traditional DBMS is an awkward fit?

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 4 / 45

    http://find/
  • 8/2/2019 17 Stream Intro

    7/71

    Financial Analysis

    Electronic trading is now commonplaceTrading volume continues to increase rapidly

    Algorithmic trading: detect advantageous market conditions,automatically execute trades

    Latency is key

    VisualizationA hard problem in itself

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 5 / 45

    http://find/
  • 8/2/2019 17 Stream Intro

    8/71

    Financial Analysis

    Electronic trading is now commonplaceTrading volume continues to increase rapidly

    Algorithmic trading: detect advantageous market conditions,automatically execute trades

    Latency is key

    VisualizationA hard problem in itself

    Typical Queries

    5-minute rolling average, volume-waited average price (VWAP)Comparison between sector averages and portfolio averages over time

    Implement models provided by quantitive analysis

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 5 / 45

    http://find/
  • 8/2/2019 17 Stream Intro

    9/71

    Network Monitoring

    Network volume continues to increase rapidly

    Custom solutions are possible, but roll-your-own is expensive

    Ad-hoc queries would be nice

    Can we build generic infrastructure for these kinds of monitoringapplications?

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 6 / 45

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    10/71

    Sensor Networks

    Pervasive Sensors

    As the cost of micro sensors continues to decline over the next decade,we could see a world in which everything of material significance getssensor-tagged. Mike Stonebraker

    Military applications: real-time command and control

    Healthcare

    Habitat monitoring

    Manufacturing

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 7 / 45

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    11/71

    Other Examples

    Real-Time Decision SupportTurnaround-time for traditional data warehouses is often too slow

    Business Activity Monitoring (BAM)

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 8 / 45

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    12/71

    Other Examples

    Real-Time Decision SupportTurnaround-time for traditional data warehouses is often too slow

    Business Activity Monitoring (BAM)

    Fraud DetectionSophisticated, cross-channel fraud

    Real-time

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 8 / 45

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    13/71

    Other Examples

    Real-Time Decision SupportTurnaround-time for traditional data warehouses is often too slow

    Business Activity Monitoring (BAM)

    Fraud DetectionSophisticated, cross-channel fraud

    Real-time

    Online Gaming

    Detect malicious behavior

    Monitor quality of service

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 8 / 45

    http://find/
  • 8/2/2019 17 Stream Intro

    14/71

    Data Stream Management Systems

    Database SystemsMostly static data, ad-hoc one-time queries

    Fire the queries at the data, return result sets

    Store and query

    Focus: concurrent reads & writes, efficient use of I/O, maximizetransaction throughput, transactional consistency, historical analysis

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 9 / 45

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    15/71

    Data Stream Management Systems

    Database SystemsMostly static data, ad-hoc one-time queries

    Fire the queries at the data, return result sets

    Store and query

    Focus: concurrent reads & writes, efficient use of I/O, maximizetransaction throughput, transactional consistency, historical analysis

    Data Stream Systems

    Mostly transient data, continuous queriesFire the data at the queries, incrementally update result streams

    Data rates often exceed disk throughput

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 9 / 45

    ( )

    http://find/
  • 8/2/2019 17 Stream Intro

    16/71

    Complex Event Processing (CEP)

    Data stream processing emerged from the database community

    Early 90s: active databases with triggers

    Complex Event Processing is another approach to the same problemsDifferent nomenclature and backgroundOften similar in practice

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 10 / 45

    O li

    http://find/
  • 8/2/2019 17 Stream Intro

    17/71

    Outline

    1 The Need For Data Stream Processing

    2 Stream Query Languages

    3 Query Processing Techniques For StreamsSystem ArchitectureShared Evaluation

    Adaptive Tuple RoutingOverload Handling

    4 Current Choices For A DSMSOpen Source

    Proprietary

    5 Demo

    6 Q & A

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 11 / 45

    D S

    http://goforward/http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    18/71

    Data Streams

    A stream is an infinite sequence of tuple, timestamp pairs

    Append-onlyNew type of database object

    The timestamp defines a total order over the tuples in a streamIn practice: require that stream tuples have a special CQTIME column

    Different approaches to building stream processing systems

    This talk: relation-oriented DSMS. Specifically, TelegraphCQ,AmInsight, StreamBase, . . .

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 12 / 45

    CREATE STREAM

    http://find/
  • 8/2/2019 17 Stream Intro

    19/71

    CREATE STREAM

    Exactly 1 column must have a CQTIME constraint

    CQTIME can be system-generated or user-provided

    With user-provided timestamps, system must cope with out-of-ordertuples

    Slack specifies maximum out-of-orderness

    Example Query

    CREATE STREAM trades (

    symbol varchar(5),

    price real,

    volume integer,

    tstamp timestamp CQTIME USER GENERATED SLACK 1 minute

    ) TYPE UNARCHIVED;

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 13 / 45

    T f St

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    20/71

    Types of Streams

    Raw StreamsStream tuples are injected into the system by an external data source

    E.g. stock tickers, sensor data, network interface, . . .

    Both push and pull models have been explored

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 14 / 45

    T f St

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    21/71

    Types of Streams

    Raw StreamsStream tuples are injected into the system by an external data source

    E.g. stock tickers, sensor data, network interface, . . .

    Both push and pull models have been explored

    Derived Streams

    Defined by a query expression that yields a stream

    Archived Streams

    Allows historical and real-time stream content to be combinedin a single database object

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 14 / 45

    Language Design Philosophy

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    22/71

    Language Design Philosophy

    Pragmatism: relational query languages are well-establishedRelational query evaluation techniques are well-understoodEveryone knows SQL

    Therefore, add stream-oriented extensions to SQL

    Pioneering work: CQL from Stanford STREAM project

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 15 / 45

    Language Design Philosophy

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    23/71

    Language Design Philosophy

    Pragmatism: relational query languages are well-establishedRelational query evaluation techniques are well-understoodEveryone knows SQL

    Therefore, add stream-oriented extensions to SQL

    Pioneering work: CQL from Stanford STREAM project

    Kinds Of Operators

    Relation Relation: Plain Old SQL

    Stream Relation: Periodically produce a relation from a stream

    Relation Stream: Produce stream from changes to a relation

    Note that S S operators are not provided.

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 15 / 45

    Continuous Queries

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    24/71

    Continuous Queries

    Fundamental Difference

    The result of a continuous query is an unbounded stream, not a finiterelation

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 16 / 45

    Continuous Queries

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    25/71

    Continuous Queries

    Fundamental Difference

    The result of a continuous query is an unbounded stream, not a finiterelation

    Typical Query

    1 Split infinite stream into pieces via windowsS R

    2 Compute analysis for the current window, comparison with priorwindows or historical data

    R R

    3 Convert result of analysis into result stream

    R SOften implicit (use defaults)

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 16 / 45

    Stream Relation Operators: Windows

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    26/71

    Stream Relation Operators: Windows

    Streams are infinite: at any given time, examine a finite sub-setApply window operator to stream to periodically produce

    visible sets of tuples

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 17 / 45

    Stream Relation Operators: Windows

    http://find/
  • 8/2/2019 17 Stream Intro

    27/71

    Stream Relation Operators: Windows

    Streams are infinite: at any given time, examine a finite sub-setApply window operator to stream to periodically produce

    visible sets of tuples

    Properties of Sliding Windows

    Range: Width of the window. Units: rows or time.

    Slide: How often to emit new visible sets. Units: rows or time.

    Start: When to start emitting results.

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 17 / 45

    Example Query

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    28/71

    Example Query

    Description

    Every second, return the total volume of trades in the previous second.

    Query

    SELECT sum(volume) AS volume,

    advance_agg(qtime) AS windowtime

    FROM trades < VISIBLE 1 second ADVANCE 1 second >

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 18 / 45

    Another Example

    http://find/
  • 8/2/2019 17 Stream Intro

    29/71

    Another Example

    Description

    Every 5 seconds, return the volume-adjusted price of MSFT for the last 1minute of trades.

    QuerySELECT sum(price * volume) / sum(volume) AS vwap,

    sum(volume) AS volume,

    advance_agg(qtime) AS windowtime

    FROM trades < VISIBLE 1 minute ADVANCE 5 seconds >

    WHERE symbol = MSFT

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 19 / 45

    More About Windows

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    30/71

    More About Windows

    AggregationUseful aggregate: advance agg(CQTIME)

    Timestamp that marks the end of the current window

    Similar aggregates for beginning of window, middle of window

    might also be useful

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 20 / 45

    More About Windows

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    31/71

    More About Windows

    AggregationUseful aggregate: advance agg(CQTIME)

    Timestamp that marks the end of the current window

    Similar aggregates for beginning of window, middle of window

    might also be useful

    Other Window Types

    Landmark: Fixed left edge, elastic right edge. Periodically reset.(All stock trades after 9AM today.)

    Partitioned: Divide stream into sub-streams based on partitioning key(s),then apply another S R operator to the sub-streams.

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 20 / 45

    Relation Stream Operators

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    32/71

    p

    Types of Operators

    ISTREAM: the tuples added to a relation

    RSTREAM: all the tuples in a relation

    DSTREAM: the tuples removed from relation

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 21 / 45

    Relation Stream Operators

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    33/71

    p

    Types of Operators

    ISTREAM: the tuples added to a relation

    RSTREAM: all the tuples in a relation

    DSTREAM: the tuples removed from relation

    Defaults

    ISTREAM for queries without aggregation/grouping

    RSTREAM for queries with aggregation/grouping

    DSTREAM is rarely useful

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 21 / 45

    Mixed Joins

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    34/71

    Common Requirement

    Compare stream tuples with historical data

    System must provide both tables and streams!

    Elegantly modeled as a join between a table and a stream

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 22 / 45

    Mixed Joins

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    35/71

    Common Requirement

    Compare stream tuples with historical data

    System must provide both tables and streams!

    Elegantly modeled as a join between a table and a stream

    Implementation

    Stream is the right (outer) join operand; left (inner) operand isarbitrary Postgres subplan

    For each stream tuple, join against non-continuous subplan

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 22 / 45

    Mixed Join Example

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    36/71

    DescriptionEvery 3 seconds, compute the total value of high-volume trades made onstocks in the S & P 500 in the past 5 seconds.

    Example QuerySELECT T.symbol, sum(T.price * T.volume)

    FROM s_and_p_500 S,

    trades T < VISIBLE 5 sec ADVANCE 3 sec >

    WHERE T.symbol = S.symbol

    AND T.volume > 5000GROUP BY T.symbol

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 23 / 45

    Composing Streams

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    37/71

    The tuples in a stream can be viewed as a series of eventsE.g. The temperature in the room is 20, 25, 30, . . .

    The output of a continuous query is another series of events, typicallyhigher-level or more complex

    E.g. The room is on fire.

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 24 / 45

    Composing Streams

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    38/71

    The tuples in a stream can be viewed as a series of eventsE.g. The temperature in the room is 20, 25, 30, . . .

    The output of a continuous query is another series of events, typicallyhigher-level or more complex

    E.g. The room is on fire.Therefore, streams can be composed in various ways:

    Stream views

    Macro semantics

    Derived streams

    SubqueriesActive tables

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 24 / 45

    Derived Streams

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    39/71

    A derived stream is a database object definedby a persistent continuous query

    Unlike a stream view, always active

    Similar to a materialized view

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 25 / 45

    Example Query

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    40/71

    Description

    Every 3 seconds, compute the volume-weighted average price (VWAP)for all stocks traded in the past 5 seconds.

    Query

    CREATE STREAM vwap (symbol varchar(5),

    vwap float,

    vtime timestamp cqtime) AS

    (SELECT symbol,

    sum(price * volume) / sum(volume),

    advance_agg(qtime)FROM trades < VISIBLE 5 seconds ADVANCE 3 seconds >

    GROUP BY symbol);

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 26 / 45

    Subqueries

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    41/71

    One-time subqueries can be used in continuous queries, of course

    Continuous subqueries are planned and executed asindependent queries

    Essentially inline derived streams

    Require that subqueries yielding streams specify CQTIME

    Planned: WITH-clause subqueries

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 27 / 45

    Active Tables

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    42/71

    An active table is a table with an associated continuous query

    Two modes of operation:

    Append: New stream tuples appended to table at each windowReplace: At each new window, truncate previous table contents

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 28 / 45

    Event Language

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    43/71

    Example Query

    SELECT Shoplifting!, D.loc, D.id

    FROM Store S C D PARTITION BY id

    WHERE S.loc = shelf and C.loc = checkoutAND D.loc = door

    EVENT AND (FOLLOWS(S, D, 1 hour),

    NOT PRECEDES(C, D, 1 hour));

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 29 / 45

    Outline

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    44/71

    1 The Need For Data Stream Processing

    2 Stream Query Languages

    3 Query Processing Techniques For StreamsSystem ArchitectureShared Evaluation

    Adaptive Tuple RoutingOverload Handling

    4 Current Choices For A DSMSOpen Source

    Proprietary5 Demo

    6 Q & A

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 30 / 45

    Basic Requirements

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    45/71

    Adaptivity

    Static query planning is undesirable for long-running queries

    Either replan or use adaptive planning

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 31 / 45

    Basic Requirements

    http://find/
  • 8/2/2019 17 Stream Intro

    46/71

    Adaptivity

    Static query planning is undesirable for long-running queries

    Either replan or use adaptive planning

    Shared Processing

    Essential for good performance: 100s of queries not uncommonLong-lived queries make this more feasible

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 31 / 45

    Basic Requirements

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    47/71

    Adaptivity

    Static query planning is undesirable for long-running queries

    Either replan or use adaptive planning

    Shared Processing

    Essential for good performance: 100s of queries not uncommonLong-lived queries make this more feasible

    Graceful Overload Handling

    Stream data rates are often highly variableOften too expensive to provision for maximal data rate

    Therefore, must handle overload gracefully

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 31 / 45

    System Architecture

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    48/71

    Modified version of PostgreSQL

    One-time queries executed normally

    Continuous queries planned and executed by the CqRuntime process

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 32 / 45

    System Architecture

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    49/71

    Modified version of PostgreSQL

    One-time queries executed normally

    Continuous queries planned and executed by the CqRuntime process

    Stream input: COPY, or submitted via TCP to CqIngress process

    libevent-based, simple COPY-like protocol

    Stream output: cursors, active tables, CqEgress process

    Communication between processes done via shared memory queueinfrastructure

    Message passing done via SysV shmem and locks

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 32 / 45

    Shared Runtime

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    50/71

    New continuous query is defined shared runtime via shared memory

    Runtime plans the query, folds query into single shared query plan

    Not a traditional tree; graph of operators

    Shared Runtime Main Loop1 Check for control messages: add new CQ, remove CQ, . . .2 Check for new stream tuples

    Route each stream tuple through the operator graph (CPS)

    Push output tuples to result consumers

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 33 / 45

    Shared Evaluation

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    51/71

    Continuous query evaluation done by a network of operators in theshared runtime

    If multiple queries reference the same operator, we can evaluate itonly once

    Better than linear scalability!

    Each operator keeps track of the queries it helps to implement

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 34 / 45

    Implementing Shared Evaluation

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    52/71

    Sharing PredicatesSimple cases: , , =Construct a tree that divides domain of type into disjoint regionsFor each tuple: walk the tree to find the region the tuple belongs in

    Region implies which queries the tuple is still visible to

    Immutable functions can also be shared relatively easily

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 35 / 45

    Implementing Shared Evaluation

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    53/71

    Sharing PredicatesSimple cases: , , =Construct a tree that divides domain of type into disjoint regionsFor each tuple: walk the tree to find the region the tuple belongs in

    Region implies which queries the tuple is still visible to

    Immutable functions can also be shared relatively easily

    Sharing Joins, Aggregates

    Can also be done

    Even between queries with varying windows and predicatesRequires some thought (say, a PhD thesis or two)

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 35 / 45

    Adaptive Tuple Routing

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    54/71

    Given a new tuple, how do we route it through the graph ofoperators?

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 36 / 45

    Adaptive Tuple Routing

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    55/71

    Given a new tuple, how do we route it through the graph ofoperators?

    Traditional approach: statically choose an optimal route for eachstream

    Hard optimization problemNeed to re-optimize when new queries defined or system conditionschange (e.g. operator selectivity)

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 36 / 45

    Adaptive Tuple Routing

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    56/71

    Given a new tuple, how do we route it through the graph ofoperators?

    Traditional approach: statically choose an optimal route for eachstream

    Hard optimization problemNeed to re-optimize when new queries defined or system conditionschange (e.g. operator selectivity)

    TelegraphCQ approach: adaptive per-tuple routing

    Push tuples one at a time through the operator graph; choose order of

    operators at runtime

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 36 / 45

    Implementing Adaptive Routing

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    57/71

    For each tuple, maintain lineage

    What operators has this tuple visited?Which queries can still see this tuple?

    Implication: cant push down projections

    Make routing decisions on the basis of simple run-time statistics

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 37 / 45

    Handling Overload

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    58/71

    Common scenario: peak stream rate >> average stream rate(bursty)

    The system should cope gracefully

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45

    Handling Overload

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    59/71

    Common scenario: peak stream rate >> average stream rate(bursty)

    The system should cope gracefully

    Three alternatives:1 Spool tuples to disk, process later

    But stream rates often exceed disk throughput

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45

    Handling Overload

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    60/71

    Common scenario: peak stream rate >> average stream rate(bursty)

    The system should cope gracefully

    Three alternatives:1 Spool tuples to disk, process later

    But stream rates often exceed disk throughput

    2 Drop excess tuples

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45

    Handling Overload

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    61/71

    Common scenario: peak stream rate >> average stream rate(bursty)

    The system should cope gracefully

    Three alternatives:1 Spool tuples to disk, process later

    But stream rates often exceed disk throughput

    2 Drop excess tuples3 Substitute statistical summaries for dropped stream tuples

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45

    Handling Overload

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    62/71

    Common scenario: peak stream rate >> average stream rate(bursty)

    The system should cope gracefully

    Three alternatives:1 Spool tuples to disk, process later

    But stream rates often exceed disk throughput

    2 Drop excess tuples3 Substitute statistical summaries for dropped stream tuples

    Quality of Service (QoS)

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45

    Outline

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    63/71

    1 The Need For Data Stream Processing

    2 Stream Query Languages3 Query Processing Techniques For Streams

    System ArchitectureShared Evaluation

    Adaptive Tuple RoutingOverload Handling

    4 Current Choices For A DSMSOpen SourceProprietary

    5 Demo

    6 Q & A

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 39 / 45

    Open Source DSMS

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    64/71

    Esper

    DSMS engine written in Java (GPL). SQL-like stream query language.http://esper.codehaus.org

    TelegraphCQ

    Academic prototype from UC Berkeley, based on PostgreSQL 7.3PostgreSQLs SQL dialect, plus stream-oriented extensions

    BSD licensed; http://telegraph.cs.berkeley.edu

    StreamCruncherDSMS engine written in Java. Free for commercial use (not open source).

    http://www.streamcruncher.com

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 40 / 45

    Proprietary DSMS

    http://esper.codehaus.org/http://telegraph.cs.berkeley.edu/http://www.streamcruncher.com/http://www.streamcruncher.com/http://telegraph.cs.berkeley.edu/http://esper.codehaus.org/http://find/
  • 8/2/2019 17 Stream Intro

    65/71

    StreamBaseA Stonebraker company. Founded in 2003.

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 41 / 45

    Proprietary DSMS

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    66/71

    StreamBaseA Stonebraker company. Founded in 2003.

    Other Startups

    Coral8Apama (purchased by Progress Software in 2005)

    and more . . .

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 41 / 45

    Proprietary DSMS

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    67/71

    StreamBaseA Stonebraker company. Founded in 2003.

    Other Startups

    Coral8Apama (purchased by Progress Software in 2005)

    and more . . .

    Established Companies

    TIBCO BusinessEvents, Oracle BAM

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 41 / 45

    Amalgamated Insight

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    68/71

    Based on the experience gained from TelegraphCQ

    New codebase

    Application components:1 Continuous Query Engine

    Modified version of PostgreSQL (currently 8.1.9+)

    2 Integration Framework

    Connectors, input/output converters, query management

    3 Visualization

    Closed Series A funding in June 2006

    1.0 release will be available Real Soon Now (currently RC3)

    Lesson: PostgreSQL is a huge competitive advantage

    Were hiring :-)

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 42 / 45

    Outline

    Th N d F D S P i

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    69/71

    1 The Need For Data Stream Processing

    2 Stream Query Languages3 Query Processing Techniques For Streams

    System ArchitectureShared Evaluation

    Adaptive Tuple RoutingOverload Handling

    4 Current Choices For A DSMSOpen SourceProprietary

    5 Demo

    6 Q & A

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 43 / 45

    Outline

    Th N d F D S P i

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    70/71

    1 The Need For Data Stream Processing

    2 Stream Query Languages3 Query Processing Techniques For Streams

    System ArchitectureShared Evaluation

    Adaptive Tuple RoutingOverload Handling

    4 Current Choices For A DSMSOpen SourceProprietary

    5 Demo

    6 Q & A

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 44 / 45

    Q & A

    http://find/http://goback/
  • 8/2/2019 17 Stream Intro

    71/71

    Thank You.

    Any Questions?

    Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 45 / 45

    http://find/http://goback/