Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware...

48
Stream Query Semantics: Stream Query Semantics: ”Update-Pattern- ”Update-Pattern- Awareness” Awareness” Based on paper “Update-Pattern-Aware Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Modeling and Processing of Continuous Queries” Queries” by by Lukasz Golab and M. Tamer Özsu, in Lukasz Golab and M. Tamer Özsu, in SIGMOD’2005. SIGMOD’2005. Slides in part based on SIGMOD’05 talk. Slides in part based on SIGMOD’05 talk.

Transcript of Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware...

Page 1: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

Stream Query Stream Query Semantics:Semantics:

”Update-Pattern-”Update-Pattern-Awareness”Awareness”

Based on paper “Update-Pattern-Aware Modeling Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” and Processing of Continuous Queries”

by by Lukasz Golab and M. Tamer Özsu, in Lukasz Golab and M. Tamer Özsu, in SIGMOD’2005.SIGMOD’2005.

Slides in part based on SIGMOD’05 talk.Slides in part based on SIGMOD’05 talk.

Page 2: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

2 of 57CS525 (Golab+Oszu'05)

IntroductionIntroduction SQL queries SQL queries Relational algebra Relational algebra

Each operator consumes one or more relation Each operator consumes one or more relation instances and outputs a relation instanceinstances and outputs a relation instance

Computation models:Computation models: Blocking computations Blocking computations Pipelined variantsPipelined variants

Page 3: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

3 of 57CS525 (Golab+Oszu'05)

What is a continuous What is a continuous query?query?

Quuery expression composed of Quuery expression composed of non-non-blocking blocking ``relational’’ operators that ``relational’’ operators that operate on streamsoperate on streams

Page 4: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

4 of 57CS525 (Golab+Oszu'05)

What is a continuous What is a continuous query?query?

Quuery expression composed of Quuery expression composed of non-non-blocking blocking ``relational’’ operators that ``relational’’ operators that operate on streamsoperate on streams

Computation modelComputation model:: Is join blocking or not ?Is join blocking or not ?

Issues :Issues : Streams may be bounded by sliding windowsStreams may be bounded by sliding windows

Page 5: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

5 of 57CS525 (Golab+Oszu'05)

““Semantics of Semantics of continuous query?continuous query?

QQ((tt) = answer of a continuous query ) = answer of a continuous query QQ at at time time tt

= output of corresponding one-time= output of corresponding one-time relational query relational query Q’Q’ whose inputs whose inputs are theare the current states of the streams, current states of the streams, windows, windows, and tables referenced in and tables referenced in QQ

Page 6: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

6 of 57CS525 (Golab+Oszu'05)

Example of a continuous Example of a continuous queryquery

SUM

Questions :Questions :•Difference between a relation and a stream ?Difference between a relation and a stream ?

•Relation could be static, insert-only, or Relation could be static, insert-only, or arbitrary modifications !arbitrary modifications !

Page 7: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

8 of 57CS525 (Golab+Oszu'05)

Idea of update pattern?Idea of update pattern?

Update pattern :Update pattern : refers to changes in the answer of a continuous refers to changes in the answer of a continuous

query (insertions/deletions) query (insertions/deletions)

Changes in answer :Changes in answer : Results are inserted into output (append)Results are inserted into output (append) Results are to be removed from output (really?)Results are to be removed from output (really?) Why would ‘delete’ happen ? Why would ‘delete’ happen ?

Based on operator type? Else?Based on operator type? Else?

Is there some predictable timing (pattern) for Is there some predictable timing (pattern) for changes?changes?

Page 8: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

9 of 57CS525 (Golab+Oszu'05)

Append-only Semantics Append-only Semantics Informal:Informal:

Stream == append-only database ! ?Stream == append-only database ! ?

Input streams : Input streams : Typically assumed to be “Typically assumed to be “append-only”append-only”

Output streams :Output streams : Also just append all answers to latest output Also just append all answers to latest output

stream (?)stream (?)

Question : Question : Do queries over an append-only database (stream) Do queries over an append-only database (stream)

necessarily produce append-only output ?necessarily produce append-only output ?

Page 9: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

10 of 57CS525 (Golab+Oszu'05)

JOIN continuous queryJOIN continuous query

Given : (append-only) input stream Given : (append-only) input stream

Is JOIN output append only ?Is JOIN output append only ?

Page 10: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

11 of 57CS525 (Golab+Oszu'05)

Monotonic queriesMonotonic queries

Query Query QQ is is monotonicmonotonic (over an (over an append-only database) if append-only database) if QQ((tt) ) QQ((tt`̀for all for all t ≤ t`t ≤ t`

Observe: No result is ever deleted from Observe: No result is ever deleted from output streamoutput stream

Informally: Monotonic queries (on Informally: Monotonic queries (on append-only input) lead to append-only append-only input) lead to append-only outputoutput

Page 11: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

12 of 57CS525 (Golab+Oszu'05)

Monotonic queries Monotonic queries (append-only output) would (append-only output) would

be nice, but … be nice, but … Problem :Problem : Queries over an append- Queries over an append-

only database don’t necessarily only database don’t necessarily produce append-only output ! produce append-only output !

Example :Example : Select stocks whose price Select stocks whose price this hour is greater than their price this hour is greater than their price in the previous hour.in the previous hour.Company X 8am $1.00

Company X 9am $1.50Company X 10am $1.25

Output ?Update Pattern?

ConclusionConclusion : Some queries are : Some queries are non-non-monotonic monotonic over append-only database over append-only database (input stream) !(input stream) !

Page 12: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

13 of 57CS525 (Golab+Oszu'05)

Windows + Non-monotonic?Windows + Non-monotonic? Some monotonic queries become non-Some monotonic queries become non-

monotonic when adding windowingmonotonic when adding windowing Example: Example:

Select all stock quotes Select all stock quotes – – monotonicmonotonic

Select all stock prices Select all stock prices reported in the last reported in the last 5 minutes5 minutes – – non-monotonicnon-monotonic

1 2 3 4 5 6 7 8 9 10 11 121 2 3 4 5 6 7 8 9 10 11 121 2 3 4 5 6 7 8 9 10 11 12FIFO Update Pattern

Page 13: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

14 of 57CS525 (Golab+Oszu'05)

Windows + Non-Windows + Non-monotonic ?monotonic ?

Observe : Queries become non-monotonic Observe : Queries become non-monotonic due to windowingdue to windowing

Reason : because all of their results Reason : because all of their results eventually expire as the windows slide eventually expire as the windows slide forwardforward

Goal : Goal : Let’s understand different types of “non-Let’s understand different types of “non-

monotincity”monotincity”

Page 14: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

17 of 57CS525 (Golab+Oszu'05)

Focus of this SIGMOD’05 Focus of this SIGMOD’05 PaperPaper

MotivationMotivation Two possible reasons for non-monotonic Two possible reasons for non-monotonic

behaviour of continuous queries behaviour of continuous queries Understand Understand “non-monotincity”“non-monotincity”

Steps :Steps : Divide non-monotonic queries into classesDivide non-monotonic queries into classes Analyze update patterns of each classAnalyze update patterns of each class Use update pattern knowledge in query Use update pattern knowledge in query

processingprocessing

Page 15: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

18 of 57CS525 (Golab+Oszu'05)

OutlineOutline

Classification of Update patterns of Classification of Update patterns of sliding window queriessliding window queries

Query semantics (using update-Query semantics (using update-pattern awareness)pattern awareness)

Query processing (using update-Query processing (using update-pattern awareness)pattern awareness)

Page 16: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

19 of 57CS525 (Golab+Oszu'05)

AssumptionsAssumptions

Stream = append-only sequenceStream = append-only sequence Window = time-basedWindow = time-based Time-stamps = arriving in Time-stamps = arriving in

increasing orderincreasing order Linear processing = each new Linear processing = each new

tuple is fully processed by all tuple is fully processed by all operators before next tuple is operators before next tuple is touched. touched.

Page 17: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

20 of 57CS525 (Golab+Oszu'05)

Review: Sliding window Review: Sliding window operatorsoperators

When a tuple falls out of its window, it also When a tuple falls out of its window, it also expires from the operator state and from outputexpires from the operator state and from output

DISTINCT

x z x z x z z x y

x z y z y

oldest

f a d a cS1

S2 c f g d af d a a c

How to undo ?

z x z x z z x y

Page 18: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

21 of 57CS525 (Golab+Oszu'05)

Approach1: Negative Approach1: Negative TuplesTuples

Expiration from operator state and then Expiration from operator state and then possibly from output streampossibly from output stream

f a d a cS1

S2 c f g d af d a a c

undo

Idea : Negative tuple to signal tuple has expiredIdea : Negative tuple to signal tuple has expired Issues: Issues:

All operators must be able to handle neg. All operators must be able to handle neg. tuplestuples Process double number of tuples Process double number of tuples Input window of each operator stored to know Input window of each operator stored to know when to generate negative tuplewhen to generate negative tuple

Page 19: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

22 of 57CS525 (Golab+Oszu'05)

Approach2 : Direct Approach2 : Direct ApproachApproach

For ‘negation-free’ queries, the operator can access For ‘negation-free’ queries, the operator can access its state to calculate its state to calculate expiration times via time expiration times via time stampsstamps Assign a timestamp, Assign a timestamp, tsts, upon arrival, upon arrival Expiration time = Expiration time = ts ts + window_size+ window_size For joins: min(expiration times of the joined tuples)For joins: min(expiration times of the joined tuples)

Note: some operators produce new results due to Note: some operators produce new results due to expiration, such as duplicate, groupby, negationexpiration, such as duplicate, groupby, negation

Issues :Issues : Combine expiration (purge) with query processingCombine expiration (purge) with query processing Expiration may be required even if no tuple arrivesExpiration may be required even if no tuple arrives Operator clock and expiration of State must be Operator clock and expiration of State must be

synchronizedsynchronized

Page 20: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

23 of 57CS525 (Golab+Oszu'05)

Discussion of Direct/Neg-Discussion of Direct/Neg-TupleTuple

Advantages of Direct over Negative Tuple Approach Advantages of Direct over Negative Tuple Approach :: No overhead for processing negative tuplesNo overhead for processing negative tuples No need to store base windowsNo need to store base windows

But processing overhead by Direct Approach:But processing overhead by Direct Approach: If state sorted on tuple arrival, then delete requires scanIf state sorted on tuple arrival, then delete requires scan If state sorted on tuple expiration time, then insert If state sorted on tuple expiration time, then insert

requires scanrequires scan

UNLESS :UNLESS : Expiration order is same as insert order (aaaaah !!)Expiration order is same as insert order (aaaaah !!)

Page 21: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

24 of 57CS525 (Golab+Oszu'05)

Calculating expiration Calculating expiration timestimes

Time-based windows – Time-based windows – predictable expiration predictable expiration timestimes Assign a timestamp, Assign a timestamp, tsts, upon arrival, upon arrival Expiration time = Expiration time = ts ts + window_size + window_size FIFOFIFO For joins: min(expiration times of the joined tuples)For joins: min(expiration times of the joined tuples)

Predictable, but is it still FIFO? Predictable, but is it still FIFO?

Count-based windows, non-monotonic queries Count-based windows, non-monotonic queries over infinite streams - over infinite streams - unpredictableunpredictable Expiration time depends on stream arrival rates or Expiration time depends on stream arrival rates or

the data arriving on the stream the data arriving on the stream need need negative negative tuplestuples

Page 22: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

25 of 57CS525 (Golab+Oszu'05)

Classification of update Classification of update patternspatterns

MonotonicMonotonic:: answers never expireanswers never expire Queries over infinite streams : selection, union, join, duplicate Queries over infinite streams : selection, union, join, duplicate

Weakest non-monotonicWeakest non-monotonic:: answers expire in FIFO order, answers expire in FIFO order, negative tuples are not negative tuples are not

necessarynecessary operators over time-based windows that don’t reorder incoming operators over time-based windows that don’t reorder incoming

tuples during processing : select over window and merge-union tuples during processing : select over window and merge-union over windowsover windows

Weak non-monotonicWeak non-monotonic:: expiration order is not FIFO, expiration order is not FIFO, but negative tuples are not but negative tuples are not

neededneeded expiration time of result can be determined without neg. tuples : expiration time of result can be determined without neg. tuples :

time-based window join, duplicate elimination, and groupbytime-based window join, duplicate elimination, and groupby Strict non-monotonicStrict non-monotonic::

unpredictable expiration order, unpredictable expiration order, so requires explicit so requires explicit negative tuplesnegative tuples

negation, queries over count-based windowsnegation, queries over count-based windows

Page 23: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

26 of 57CS525 (Golab+Oszu'05)

Example : Update pattern Example : Update pattern classesclasses Weak non-monotonicWeak non-monotonic::

order is not FIFO, order is not FIFO, but ebut expiration time of xpiration time of result can be determined without result can be determined without negative tuples negative tuples

Example : Example : time-based window jointime-based window join f a d a S1

S2 g a d ga d a

f a d a S1

S2 g a d g fa d a f

Update pattern not FIFO!Update pattern not FIFO!

Next, f falls out of window. What happens ?Next, f falls out of window. What happens ?

Page 24: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

27 of 57CS525 (Golab+Oszu'05)

Discussion : Classification Discussion : Classification of update patternsof update patterns

Produce results that can be materialized and Produce results that can be materialized and maintained without requiring negative tuples:maintained without requiring negative tuples: All queries but from strict non-monotonic class.All queries but from strict non-monotonic class. Hence, difference queries require negative tuple Hence, difference queries require negative tuple

approach !approach ! So only queries without difference can utilize direct So only queries without difference can utilize direct

approach to state managementapproach to state management

Queries on Sliding windows with strict-non-Queries on Sliding windows with strict-non-monotonic patterns are also non-monotonic over monotonic patterns are also non-monotonic over infinite stream infinite stream E.g., holds for differenceE.g., holds for difference

Queries that are monotonic over infinite stream Queries that are monotonic over infinite stream then must be weak non-monotonic over time-then must be weak non-monotonic over time-windowswindows E.g., holds for joinE.g., holds for join

Page 25: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

28 of 57CS525 (Golab+Oszu'05)

NextNext

Update patterns of sliding window queries Classification

Advantages of update pattern Advantages of update pattern awarenessawareness Modeling (query semantics)Modeling (query semantics) Processing (query execution)Processing (query execution)

Page 26: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

29 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware Update-pattern-aware semantics of continuous semantics of continuous

queriesqueries How are updates of relational tables How are updates of relational tables

different from insertions and different from insertions and deletions caused by the movement of deletions caused by the movement of the windows?the windows? Infinite streams Infinite streams WindowsWindows Relation : staticRelation : static Relation : special meaning of updates Relation : special meaning of updates Relation : allow arbitrary updatesRelation : allow arbitrary updates

Page 27: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

31 of 57CS525 (Golab+Oszu'05)

Special semantics for table Special semantics for table updates:updates:

Non-retroactive relation (NRR)Non-retroactive relation (NRR) Definition of NRR: Definition of NRR:

Allow arbitrary updates on tableAllow arbitrary updates on table But simpler semantics of table updates : But simpler semantics of table updates :

previously arrived stream tuples are not affected previously arrived stream tuples are not affected

Join on NRR+Stream:Join on NRR+Stream: Update on NRR: do nothing (don’t probe stream Update on NRR: do nothing (don’t probe stream

state)state) Tuple arrival on stream : probe NRR contentTuple arrival on stream : probe NRR content

Note : “arrival” uses assumption of total Note : “arrival” uses assumption of total orderorder

Page 28: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

32 of 57CS525 (Golab+Oszu'05)

Example of NNR RelationExample of NNR Relation Stream:Stream: stock quotes stock quotes Table:Table: mapping btw stock symbols and company mapping btw stock symbols and company

namesnames Query:Query: select T.company-name and S.price over a select T.company-name and S.price over a

(time-based) window(time-based) window Cases:Cases:

Company no longer trading : delete its previously returned Company no longer trading : delete its previously returned stock quotes (relation) stock quotes (relation) or leave previous quotes alone (NRR)or leave previous quotes alone (NRR)

Company changes name: update the name in previous quotes Company changes name: update the name in previous quotes (relation) (relation) or only utilize new name for new quotes (NRR)or only utilize new name for new quotes (NRR)

New Company added into table: attempt to generate quotes New Company added into table: attempt to generate quotes retroactively (relation), retroactively (relation), or assume no prior stock quotes had or assume no prior stock quotes had been generated for unknown company - so don’t scan join been generated for unknown company - so don’t scan join state (NNR)state (NNR)

Page 29: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

33 of 57CS525 (Golab+Oszu'05)

Updates of relational tables Updates of relational tables vs window movementsvs window movements

Join of two infinite streams is Join of two infinite streams is monotonicmonotonic

Join of two windows is Join of two windows is weak non-monotonicweak non-monotonic

Join of a window and a table: Join of a window and a table: easier : easier : weakest non-monotonic ?weakest non-monotonic ? same :same : weak non-monotonic ? weak non-monotonic ? harder :harder : strict non-monotonic ? strict non-monotonic ?

Page 30: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

34 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware Update-pattern-aware modeling of continuous modeling of continuous

queries, cont.queries, cont.

Relation with arbitrary table updatesRelation with arbitrary table updates HarderHarder Strict non-monotonicStrict non-monotonic because we can’t predict when and how the because we can’t predict when and how the

table will be changedtable will be changed NNR relation (NRR) – don’t allow NNR relation (NRR) – don’t allow

retroactive updatesretroactive updates Easier (or, same)Easier (or, same) Weakest non-monotonic Weakest non-monotonic because we can predict when/how table changesbecause we can predict when/how table changes

Page 31: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

35 of 57CS525 (Golab+Oszu'05)

NextNext

Update patterns of sliding window queries Classification

Advantages of update pattern Advantages of update pattern awarenessawareness Modeling (query semantics)Modeling (query semantics) Processing (query execution)Processing (query execution)

Page 32: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

36 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware query Update-pattern-aware query processingprocessing

Current techniques :Current techniques : Negative tuple approach:Negative tuple approach:

BAD: CPU processing time BAD: CPU processing time GOOD : state maintenance overheadGOOD : state maintenance overhead

Direct approach:Direct approach: GOOD: CPU processing timeGOOD: CPU processing time BAD : state maintenance overheadBAD : state maintenance overhead

Query processor aware of update-Query processor aware of update-patterns to patterns to Decrease CPU processing timeDecrease CPU processing time Reduce state maintenance overheadReduce state maintenance overhead

Page 33: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

37 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware query Update-pattern-aware query processingprocessing

Idea : exploit update patterns of each Idea : exploit update patterns of each sub-querysub-query

Main techniques: Main techniques: Identify update pattern of subqueriesIdentify update pattern of subqueries Use appropriate data structures for Use appropriate data structures for

storing respective statestoring respective state Develop physical operator Develop physical operator

implementationsimplementations

Page 34: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

38 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware query Update-pattern-aware query processingprocessing

Algorithm to identify update pattern Algorithm to identify update pattern of query based on update pattern of of query based on update pattern of its operatorsits operators

Annotate query plan with update Annotate query plan with update patterns of each sub-query patterns of each sub-query

Page 35: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

39 of 57CS525 (Golab+Oszu'05)

Update-pattern annotationsUpdate-pattern annotations Goal: Labels all edges with WKST, WK, STR to Goal: Labels all edges with WKST, WK, STR to

indicate update pattern generated by sub-queryindicate update pattern generated by sub-query First, label all edges originating at leaf nodes First, label all edges originating at leaf nodes

sliding-windows with WKSsliding-windows with WKS Then repeatedly construct labels using 5 rules:Then repeatedly construct labels using 5 rules:

Output of unary weakest non-monotonic op and JOIN Output of unary weakest non-monotonic op and JOIN (NRR) is same as its input(NRR) is same as its input

Output of binary weakest non-monotonic operator is Output of binary weakest non-monotonic operator is STR if at least one of its inputs is STR,STR if at least one of its inputs is STR,

WK is the inputs are either WKS or WK, and WKS if WK is the inputs are either WKS or WK, and WKS if both inputs are WKS.both inputs are WKS.

Output of groupby is always WKOutput of groupby is always WK Output of strict non-monotoic operators & JOIN® is Output of strict non-monotoic operators & JOIN® is

always STRalways STR

Page 36: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

40 of 57CS525 (Golab+Oszu'05)

Update-pattern label Update-pattern label propagationpropagation

WKS WKS

WKSWK

STR

Stream 1 Stream 2 Stream 3

Page 37: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

41 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware query Update-pattern-aware query optimization, cont.optimization, cont.

WKS WKS

WKSWK

STR

WKS WKS

WKSSTR

STR

STR

Stream 1 Stream 2 Stream 3 Stream 1 Stream 2 Stream 3

Page 38: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

42 of 57CS525 (Golab+Oszu'05)

Physical operator designPhysical operator design

Use appropriate physical operatorsUse appropriate physical operators

DISTINCTStrict non-monotonic

DISTINCT

Weakest or weak non-monotonic

Idea: no negative tuplesCausing premature expiration.Instead, store youngest tuple with each distinct value.

Page 39: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

43 of 57CS525 (Golab+Oszu'05)

State and Result Data State and Result Data StructuresStructures

Use appropriate data structures for Use appropriate data structures for maintaining state buffers and storing maintaining state buffers and storing final resultsfinal results

Delete Insert

partition by expiration time

Weakest non-monotonic

Weak non-monotonic

Strict non-monotonic If premature exp. rare, use partition-states aboveElse use neg-tuple approach for expiration and sort state by negative tuple attribute.

Page 40: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

45 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware query Update-pattern-aware query optimizationoptimization

Generate Query PlanGenerate Query Plan For each operator, choose best For each operator, choose best

structure based on its input patternstructure based on its input pattern Cost model Cost model

Per-unit-time cost of executing Per-unit-time cost of executing operators, maintaining state, and operators, maintaining state, and processing negative tuplesprocessing negative tuples

Page 41: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

46 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware query Update-pattern-aware query rewritingrewriting

Reminder on Query rewritingReminder on Query rewriting Selection push-downSelection push-down Join orderingJoin ordering

Update-pattern-aware heuristics Update-pattern-aware heuristics Weakest-NM push-down - analog to select pushdown Weakest-NM push-down - analog to select pushdown

below joinsbelow joins Strict-NM pull-up - like NEGATION are complex to Strict-NM pull-up - like NEGATION are complex to

reduce # of operators faced with negative tuplesreduce # of operators faced with negative tuples Push dup-elim below joins to make it simpler dup-Push dup-elim below joins to make it simpler dup-

elim*elim*

Intuition : Reduce update pattern complexity in Intuition : Reduce update pattern complexity in subtrees subtrees

Page 42: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

47 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware query Update-pattern-aware query processingprocessing

If negation-free, then use direct If negation-free, then use direct approach (with cheap expiration due approach (with cheap expiration due to update-pattern-driven data to update-pattern-driven data structures)structures)

If negation operator in query, trade-If negation operator in query, trade-off between 2 choices based on off between 2 choices based on frequency of premature expirationsfrequency of premature expirations

Page 43: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

48 of 57CS525 (Golab+Oszu'05)

Update-pattern-aware query Update-pattern-aware query plansplans

WKS WKS

WKSWK

STR

WKS WKS

WKSSTR

STR

STR

Stream 1 Stream 2 Stream 3 Stream 1 Stream 2 Stream 3

Join must process

Join must process

Many neg. tuples?

Many neg. tuples?

Page 44: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

49 of 57CS525 (Golab+Oszu'05)

Experimental EvaluationExperimental Evaluation

NT : negative tuples planNT : negative tuples plan NT(join-up) : negative tuples with NT(join-up) : negative tuples with

join pulled upjoin pulled up UTA: join pushed down and not UTA: join pushed down and not

generate neg tuplesgenerate neg tuples

Page 45: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

50 of 57CS525 (Golab+Oszu'05)

EvaluationEvaluation

Page 46: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

51 of 57CS525 (Golab+Oszu'05)

Experimental EvaluationExperimental Evaluation NT : negative tuples planNT : negative tuples plan NT(join-up) : negative tuples with join pulled upNT(join-up) : negative tuples with join pulled up UTA: join pushed down and not generate neg UTA: join pushed down and not generate neg

tuplestuples

NT(join-up) outperforms NF because negation NT(join-up) outperforms NF because negation is more selective than joinis more selective than join

UTA better for large windows (150Kb) as avoids UTA better for large windows (150Kb) as avoids generation of most neg. tuplesgeneration of most neg. tuples

NT(join-up) good for small windows because NT(join-up) good for small windows because #neg-tuples small and thus affordable compared #neg-tuples small and thus affordable compared to non-optimal ordering of Negation and Jointo non-optimal ordering of Negation and Join

Page 47: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

52 of 57CS525 (Golab+Oszu'05)

SummarySummary

Monotonic vs. non-monotonic classification Monotonic vs. non-monotonic classification is not precise enoughis not precise enough Fails to distinguish between predictable (due Fails to distinguish between predictable (due

to windowing) and unpredictable update to windowing) and unpredictable update patternspatterns

Update-pattern classificationUpdate-pattern classification Clarifies the semantics of continuous queries Clarifies the semantics of continuous queries

that reference tables alongside streams & that reference tables alongside streams & windowswindows

Forms the basis of proposed update-pattern-Forms the basis of proposed update-pattern-aware query processoraware query processor

Page 48: Stream Query Semantics: ”Update-Pattern-Awareness” Based on paper “Update-Pattern-Aware Modeling and Processing of Continuous Queries” by Lukasz Golab.

53 of 57CS525 (Golab+Oszu'05)

Future workFuture work

Extend update-pattern-aware query Extend update-pattern-aware query optimizationoptimization

Investigate the update patterns of Investigate the update patterns of periodically re-executed queriesperiodically re-executed queries

Sub-divide queries over count-based Sub-divide queries over count-based windowswindows For now, strict non-monotonicFor now, strict non-monotonic