Stream Window Aggregation Semantics and...

11
Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios Katsifodimos and Seif Haridi Definition Sliding windows are bounded sets which evolve together with an infinite data stream of records. Each new sliding window evicts records from the previous one while introducing newly arrived records as well. Aggregations on windows typically derive some metric such as an average or a sum of a value in each window. The main challenge of applying aggregations to sliding windows is that a naive execution can lead to a high degree of redundant computation due to a large number of common records across different win- dows. Special optimization techniques have been developed throughout the years to tackle redundancy and make sliding window aggregation feasible and more efficient in large data streams. Overview Data stream processing has evolved significantly throughout the years, both in terms of system support and in pro- gramming model primitives. Alongside adopting common data-centric operators from relational algebra and functional programming such as select, join, flatmap, reduce etc., stream processors introduced a new set of primitives that are exclusive to the evolving nature of unbounded data. Stream windows are perhaps the most common and widely studied primitive in stream processing which is used to express computation on continuously evolving subsets out of a possibly never-ending stream. In essence, stream windows grant control on the granularity and the scope of stream aggregations. Several early stream processing systems (e.g., TelegraphCQ (Chan- drasekaran et al 2003), STREAM 1

Transcript of Stream Window Aggregation Semantics and...

Page 1: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

Stream Window Aggregation Semantics andOptimisation

Paris Carbone, Asterios Katsifodimos and Seif Haridi

Definition

Sliding windows are bounded setswhich evolve together with an infinitedata stream of records. Each newsliding window evicts records from theprevious one while introducing newlyarrived records as well. Aggregations onwindows typically derive some metricsuch as an average or a sum of a valuein each window. The main challengeof applying aggregations to slidingwindows is that a naive execution canlead to a high degree of redundantcomputation due to a large number ofcommon records across different win-dows. Special optimization techniqueshave been developed throughout theyears to tackle redundancy and makesliding window aggregation feasible andmore efficient in large data streams.

Overview

Data stream processing has evolvedsignificantly throughout the years, bothin terms of system support and in pro-gramming model primitives. Alongsideadopting common data-centric operatorsfrom relational algebra and functionalprogramming such as select, join,flatmap, reduce etc., stream processorsintroduced a new set of primitives thatare exclusive to the evolving nature ofunbounded data. Stream windows areperhaps the most common and widelystudied primitive in stream processingwhich is used to express computationon continuously evolving subsets outof a possibly never-ending stream. Inessence, stream windows grant controlon the granularity and the scope ofstream aggregations.

Several early stream processingsystems (e.g., TelegraphCQ (Chan-drasekaran et al 2003), STREAM

1

Page 2: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

2 Carbone, Katsifodimos, Haridi

(Arasu et al 2004)) provided supportfor windowing through a predefinedset of primitives to construct time-and count-based sliding windows. Forexample, periodic tumbling and slidingwindows were already standardized asearly as the SQL-99 standard and stud-ied thoroughly in the Continuous QueryLanguage (CQL) (Arasu et al 2006) aswell as the Stream Processing Language(SPL) (Hirzel et al 2009) among others.A tumbling window is a simple case of astream window type, which is defined asa sequence of periodic consecutive setsof records in a stream with a fixed lengththat is termed range. For example, if weassume a stream of car speed events thefollowing simple query in CQL woulddiscretize that stream into windows ofevery 30sec-interval and compute themaximum speed per window:

SELECT max ( speed )from CarEven t s[RANGE 30 Seconds ]

In principle, in tumbling windowseach record can only belong to a singlewindow. As a result, the evaluation ofeach window can be performed triviallyby grouping records by the window theybelong to, and executing each windowaggregation independently. However,sliding windows add a challenging twistto the formula, namely the ’slide’. As anexample consider the following slidingwindow query in SQL-99:

SELECT AVERAGE( speed )FROM CarEven t s[WATTR timestampRANGE 7 minuteSLIDE 3 minute ]

The slide represents ‘when’ or ‘howoften’ a window has to be evaluatedwhile including all records defined in its

range in the computation of the averagespeed. In this sliding window example,there is an overlap of 5 minutes betweeneach consecutive window and thus,a naive execution would result intoredundant operations in its great major-ity. Beyond periodic windows, today’sApache open source systems such asFlink (Carbone et al 2015, 2017), Beamand Apex, provide support for moreadvanced, often user-defined, slidingwindow definitions such as session(Akidau et al 2015) or content-drivenwindows (Bifet and Gavalda 2007),among others.

Sliding windows have their own setof optimization techniques that aimto reduce the redundant computationscaused by the intersection of eventsbetween neighboring windows. In thispaper we categorize optimization tech-niques into slicing, pre-aggregation andhybrid and analyze them throughout therest of this chapter.

Basic Concepts

There are many interpretations of win-dow semantics, from simple range andcount event-time windows (Arasu et al2006) to policy-based (Hirzel et al 2009)and composite event-time windows withretraction for out-of-order processing(Li et al 2008b). The SECRET model(Botan et al 2010) aimed to subsumemost of the windowing semanticsproposed in academia and commercialsystems. However, for the sake ofbrevity, a more simplified description isused here, with a heavy focus on win-dow aggregation, based on the recentwork on the Cutty aggregator (Carbone

Page 3: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

Stream Window Aggregation 3

1 2 3 4 5 6 7 8 9 10 11

1 2 3 4 5

3 4 5 6 7

5 6 7 8 9

fd

Fig. 1: Discretization of a count windowof range 5 and slide 2.

et al 2016) and FlatFat (Tangwongsanet al 2015).

Stream Discretization

Data streams are unbounded sequencesof records which are described by agiven schema T . More formally, astream s ∈ Seq(T ) is a sequence outof all possible sequences Seq(T ) overT . Windows are finite subsequencesreflecting intervals of a stream s. Aninterval s[a,b] is simply a set of recordsfrom index a to b over a stream sand the set of all possible intervalsStr(T )⊂ Seq(T ).

In their most general form, windowscan be derived by discretizing an un-bounded stream. A Discretizertransforms a stream s ∈ Seq(T ) into asequence w ∈ Seq(Str(T )) of (possiblyoverlapping) windows. In the mostsystem-agnostic manner, every possi-ble discretization of a stream can beaided through a discretization functionprovided to a special Discretize orWindowing stream operator.

Definition 1. Discretize : fdisc ×Seq(T )→ Seq(Str(T ))

Figure 1 depicts a simple example ofa discretization function fd applied on astream of elements which forms countwindows with a ‘range’ of 5 records anda ‘slide’ of 2 records.

Aggregating Sliding Windows

Conceptually, in data stream program-ming models an Aggregate operatorcomputes an aggregation on each win-dow derived after a stream discretiza-tion, given an aggregation function fa.

Definition 2. Aggregate : ( fa :Str(T )→ T ′)×Seq(Str(T ))→ Seq(T ′)

Examples of fa is a SUM or AVG butalso more complex aggregations can beexecuted in a window, such as building amachine learning model. Figure 2 showshow aggregates are formed on eachconsecutive window of a discretizedstream. The main reason for optimizingthe window aggregation process stemsfrom two issues that can be observed inthis example. First, a consecutive exe-cution of the aggregation operation afterdiscretization can be inefficient both interms of space needed to log all elementsof each window as windows can be verylarge in size. At the same time, theresponse time has to be minimized wheniterating through all window contents inorder to calculate an aggregate. Finally,and most importantly, as highlighted inFigure 2 sliding windows might involvea large amount of overlapping acrossconsecutive windows. In this examplethere is an overlapping of three recordsbetween every two windows.

The first problem is solved by sim-ply pipelining discretization with aggre-gation and thus, effectively providing a

Page 4: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

4 Carbone, Katsifodimos, Haridi

1 2 3 4 5 6 7 8 9 10 11 …

1 2 3 4 5

3 4 5 6 7

5 6 7 8 9

A1

A2

A3

fa

Fig. 2: Aggregation of a count windowof range 5 and slide 2.

partial evaluation of window aggregates.Partial aggregation is described in sec-tion . The overlapping problem is morecomplex and its optimization techniquesare further classified by window typesinto slicing, general pre-aggregation andhybrid techniques, covered thoroughlyin section , and respectively.

Partial Window Aggregation

A complete partial window aggregationscheme has been proposed in both Flat-Fat (Tangwongsan et al 2015) and Cutty(Carbone et al 2016). According to thatscheme, an aggregation fa can be de-composed into partial aggregation oper-ations in order to pipeline the processduring discretization. A window aggre-gation function is therefore decomposedinto, lift and lower, and a combinefunctions as such:lift : T → A maps an element of awindow to a partial aggregate of type A.combine ⊕ : A×A→ A combines twopartial aggregates into a new partial ag-gregate (equivalent to a reduce func-tion).lower : A→ T ′ maps a partial aggre-gate into an element in the type T ′ ofoutput values.

The main and only requirement forpartial aggregation is to have an associa-tive combine function so that aggrega-tion can be used to evaluate a full win-dow aggregate in discrete steps in dis-crete steps (Arasu and Widom 2004;Krishnamurthy et al 2006; Tangwongsanet al 2015).

An example of partial aggregationof a window is depicted in Figure 3.The goal in this example is to partiallycompute the average value out of a set ofrecords with values 1 to 5. The invariantis that only one partial aggregate is keptin memory (initially an empty aggre-gate). That aggregate is incrementallyupdated by each record that arrives ina window. To compute an average, twovalues have to be maintained in theaggregate type: a sum and a count. Byusing the lift function each record isfirst mapped into an aggregate type of itsvalue and a count of 1. The combinefunction updates the partial aggregatewith the new sum and count until all el-ements of a window have arrived. Thenfinally, the lower function transformsthe aggregate into the window average,in this case this is 3.

Window Slicing

The amount of overlapping across slid-ing windows introduces additional spaceand computational complexity that par-tial aggregation itself cannot solve.

In the case of windows with prede-fined periodic characteristics such as atime or count slide, a family of optimiza-tion techniques are used to further de-compose windows into non-overlappingpartial aggregates which can be sharedand combined to calculate full window

Page 5: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

Stream Window Aggregation 5

1

liftrecord —> (val,count) combine(val1+val2,count1+count2)

lower(val,count) —> val/count

1 2 3 4 5

1. lift

3. lower

A

M (2,1)(1,1) 2. combine M

M

M

M

M

3

record typepartial typeaggregate type

?

Partial WindowAverage Example (3,2)

(1,1)

(3,1)

(6,3) (4,1)

(10,4) (5,1)

(15,5)

stream window window aggregate

15 / 5

empty aggregate

Fig. 3: Partial Aggregation Example for Window Average.

aggregates. This technique is typicallyname ’slicing’ or ’bucketing’. The twomost popular slicing techniques that arehave also been deployed in productionstream processing systems in the past arepanes (Li et al 2005a) and pairs (Krish-namurthy et al 2006).

Panes

The main idea behind Panes is that ifwe have a periodic window query witha fixed slide and a range it is trivial tobreak down the aggregation process intopartials with a constant size, equal tothe greatest common denominator of therespective range and slide. For exam-ple if we have a sliding window with arange of 9 minutes and a slide of 6 min-utes the stream would be sliced and pre-aggregated into buckets, each of whichcorresponds to 3 minutes of the ingestedstream (greatest common denominatorof 6 and 9).

Panes have been criticized (Krish-namurthy et al 2006) for their lack ofgeneral applicability, yielding unbal-anced performance that depends highlyon the combination of range and slide.For example, a window of range 10seconds and a slide of 3 seconds wouldbreak down to slices of a single second,no longer exploiting the amount ofnon-overlapping segments in a stream(as depicted in Figure 4).

Pairs

The pairs technique (Krishnamurthyet al 2006) splits a stream into two al-ternating slices: p2 = range mod slideand p1 = slide − s2. This techniqueutilizes better non-overlapping segmentsin a stream and seems to work wellwith most combinations of range andslide. Intuitively, pairs break slicingonly when a stream window starts orends. This is visualized in Figure 4

Page 6: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

6 Carbone, Katsifodimos, Haridi

Panesgcd(range,slide)

Pairsp2:range%slidep1:slide-p2

Cutty

1 2 3 4 5 6 7 8 9 10 11

12

13 14 15 16 17 18 19

1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

12

periodic windows

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

slices

Fig. 4: Example of different slicing techniques

where it results into a lower number ofslices for the same window comparedto using panes. Contrary to panes, pairshas also been proposed for multi-queryaggregation sharing where a largesequence of shared slices is decided atcompilation time across shared slidingwindow aggregation queries in the sameoperator.

Slicing Limitations

While slicing offers the best knownspace and computational performancein sliding window aggregation its appli-cations are limited to periodic windowqueries since this is the only type ofwindows for which their beginning andend are predefined. The Cutty technique(Carbone et al 2016) which is furtheranalyzed in section avoids this strongassumption by letting user-definedfunctions signify during runtime whenwindows start or end. In the same

1 2 3 4 5 6 7 8

3 7 11 15

10 26

36

9 10

19

30

21

arbitrary lookups

(#logn ops)}} n

pre-computed partials

n leaves ~ records}

}2n

Fig. 5: Example of a general pre-aggregation tree

work slicing is also combined withgeneral pre-aggregation techniques (an-alyzed in section ) in order to combinethe strongest characteristics of bothdomains of window optimization.

General Pre-aggregation

The main incentive of general pre-aggregation techniques is to be able toallow for arbitrary segment lookups in a

Page 7: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

Stream Window Aggregation 7

tumbling

single-typeperiodic Punctuation

SnapshotFCF/CF

Lower-Bound

Session

multi-type

ADWIN

Delta-based

FCA

efficientslicing

generic, high-costpre-aggregation

Non-Periodic

Periodic

Fig. 6: Applicability of slicing and gen-eral pre-aggregation

stream (e.g., from external user queries)as depicted in Figure 5.

The earliest work on generalpre-aggregation was presented in B-int (Arasu and Widom 2004) whichpre-computes eagerly higher orderpartials on different segments of astream. The application of B-int wasmeant to be fast aggregate retrievalfor ad-hoc stream queries (i.e., usingCQL), however, the applicability ofgeneral pre-aggregation makes suchtechniques convenient for aggregatingcontinuous sliding windows without aknown range or slide or other periodicityassumptions. The Reactive Aggregatorby IBM Research (Tangwongsan et al2015) exploits the properties of B-Intand introduces FlatFat: a fixed sizecircular heap binary tree of higher orderpartials that “slides” together with therecords of the stream.

General Pre-AggregationLimitations

General pre-aggregation offers fast re-trievals of arbitrary windows on a streamat the cost of additional space and in-cremental update computation require-

ments. That is due to the fact that everytime a new aggregate is added to the bi-nary tree a sum of log(N) additional par-tial aggregations need to be employed inorder to update all higher order aggre-gates of the tree (given N: the numberof active records/leaves in the tree). Insummary, the runtime costs of employ-ing eager-aggregation, which are also vi-sualized in Figure 5 are the following:space: 2N partials need to always bekept on heap to hold the full aggregationbinary tree.update/lookup: both update and fullwindow lookup have O(logN) computa-tional complexity. In the case of updatesthe complexity is fixed (logN) againstslicing which typically involves a singleaggregation per record.

All these costs pose an interestingtrade-off when eager aggregation isemployed per-record in a data stream,which often results in more operationsthan a naive execution of each redun-dant window aggregation separately.As a result, it is likely that if generalpre-aggregation is de-facto applied,its runtime cost would never be amor-tized across a full run of a continuousapplication.

Nevertheless, the power of generalpre-aggregation lies at the observationthat it can be generally applied to anytype of windows, thus, covering a largespace of non-periodic window typesused within research and industrialapplications, as depicted in Figure 6.

Cutty: A Hybrid Approach

Slicing and general pre-aggregationare orthogonal techniques that can bepotentially combined to support a wider

Page 8: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

8 Carbone, Katsifodimos, Haridi

tumbling

single-typeperiodic Punctuation

SnapshotFCF/CF

Lower-Bound

Session

multi-type

ADWIN

Delta-based

FCA

efficientslicing

generic, high-costpre-aggregation

Non-DeterministicDeterministic

Fig. 7: Visualizing the expressive powerof Deterministic Windows for EfficientAggregation

variety of stream windows for aggre-gation. The Cutty aggregator (Carboneet al 2016) employs such a hybridapproach that can lead to efficientaggregation of a broader number ofwindow types than simply periodic. Themain observation behind its design isthat there is an implicit class of windows(a superclass of periodic ones), termeddeterministic which can be obtainedby using the right core primitives inthe programming model. Deterministicwindows can be used to enable efficientshared aggregation without limitingwindow expressivity. Figure 7 showsthe expressive power of deterministicwindows, being able to provide optimalpre-aggregation to more than the limitedperiodic windows.

Deterministic Windows

The concept of deterministic windowsstems from the observation that all ittakes to achieve optimal slicing is notthe apriori knowledge of the periodicityof windows (if any) but the runtimeknowledge of where a new windowstarts. While partial aggregation isemployed, as described in section a

Fig. 8: An Overview Example of Cutty

single partial result can be kept in activememory until we receive a record thatmarks the beginning of a new window.Conceptually, a new window startmeans that we will later need a partialaggregate (or slice) starting at that pointto evaluate the window that startedthere.

Cutty proposes user-defined window-ing through the use of a discretizationfunction fdisk, defined as follows:

fdisc : T → 〈Wbegin : N,Wend : N〉

where for each record r ∈ T : i) Wbegin isthe number of windows beginning with rand ii) Wend the number of windows end-ing with r.

Overview of Cutty

Optimal slicing with deterministic win-dows minimizes but does not eliminateredundancy. As an example, considerthe slices produced by Cutty during theexample execution of Figure 4. A de-

Page 9: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

Stream Window Aggregation 9

tailed observation of the slices used perwindow reveals a level of redundancythat cannot be handled by slicing. Forexample, slices s[4,6] and s[7,9] wouldhave to be combined together twice:once for computing fa(s[1,10]) itself,and once for computing fa(s[4,13]).Instead, if somehow the evaluation offa(s[4,9]) was stored, it would not benecessary to repeat that aggregation.

Cutty utilizes general pre-aggregation(FlatFat) in order to further reduce thecost of full window aggregate evaluationand compute higher order combina-tions of aggregates only once. Thisidea gives a new purpose to generalpre-aggregation techniques and grants alow memory footprint since the spacecomplexity of the aggregation tree isbounded by the number of active slices(which is equivalent to the minimumnumber of non-overlapping segments ina stream). A full example run of Cuttyis visualized in Figure 8, showing bothslicing of deterministic windows andgeneral pre-aggregation and evaluationon stored partials. In the same examplenotice that the partial P(s[6,7]) iscomputed once and reused for bothfa(s[1,5]) and fa(s[1,6]).

For non-deterministic windows,Cutty falls back to general pre-aggregation (simply using FlatFat)since slicing cannot be applied. Nev-ertheless, the generality and flexible(runtime specific) nature of this aggrega-tion technique also enables the prospectof using it for applying operator sharing(Hirzel et al 2014) on data streams.A full complexity and performanceanalysis and comparison are provided inthe original Cutty paper (Carbone et al2016).

Further Works

Window aggregation is an interesting re-search topic and there are many relevantproposed ideas to the ones presentedhere. For example, Li et al. (Li et al2005b, 2008a) classified window typesby their evaluation context require-ments, leaving the characterization ofeach class as an open research question.Performing certain types of aggregatesin constant-time was recently proposed(Tangwongsan et al 2017). Determin-istic functions in Cutty subsume allforward-context-free windows (no fu-ture records are required to know when awindow starts), while non-deterministicdiscretization functions are forward-context-aware. Heuristic-based planoptimizers have also been proposed(e.g., TriWeave(Guirguis et al 2012)) tofine-tune the execution of periodic timequeries dynamically using runtime met-rics (i.e. input rate and shared aggregaterate).

Future Directions

Windowing semantics are becomingincreasingly more complex and so-phisticated as data stream processingis widely adopted. Aggregation tech-niques will have to follow the trendsin windowing semantics and adapt tomore dynamic, data-centric windowtypes. One of the most prominent futuredirections in stream windowing is itsstandardization and encapsulation instream SQL standards that are undergo-ing in open-source communities (e.g.,the Calcite project and Google Dataflow(Akidau et al 2015)). However, nosignificant efforts have been made to

Page 10: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

10 Carbone, Katsifodimos, Haridi

apply relational optimizations in streamwindowing. Another future directionis to extend slicing capabilities beyonddeterministic windows (if possible)and cover cases of fully data-drivenwindows without FIFO guarantees suchas ADWIN (Bifet and Gavalda 2007).Finally, general pre-aggregation datastructures have to employ the notionof out-of-orderness (Traub et al 2018).Currently, with existing out-of-the-boxsolution such as FlatFat it is not possibleto retract already evaluated windowaggregates, thus, making it impossibleto use for systems like Beam and Flinkwith out-of-order logic.

Conclusions

Windows over streaming data continueto be the most central abstraction indata stream processing. Aggregationtechniques aim to reduce the compu-tational redundancy to the maximumextent possible for sliding windows.Most often, approaches to efficientaggregation are entangled with actualwindowing semantics, such as assumingperiodic queries to provide efficientpre-aggregation. Slicing techniquesprovide low memory footprint andgenerally good performance at the costof limited applicability while generalpre-aggregation techniques can beemployed for any window lookup at thecost of high computational and memoryfootpring. Recent approaches aim for ahybrid solution by generalizing slicingfurther while combining data structuresfrom general pre-aggregation. Slidingwindow aggregation remains a challeng-ing topic today, and new challenges willarise with the adoption of richer and

more complex windowing semanticsand out-of-order streams.

References

Akidau T, Bradshaw R, Chambers C, ChernyakS, Fernandez-Moctezuma RJ, Lax R,McVeety S, Mills D, Perry F, Schmidt E,et al (2015) The dataflow model: a practicalapproach to balancing correctness, latency,and cost in massive-scale, unbounded,out-of-order data processing. In: VLDB

Arasu A, Widom J (2004) Resource sharing incontinuous sliding-window aggregates. In:VLDB

Arasu A, Babcock B, Babu S, Cieslewicz J,Datar M, Ito K, Motwani R, Srivastava U,Widom J (2004) Stream: The stanford datastream management system. Book chapter

Arasu A, Babu S, Widom J (2006) The CQLcontinuous query language: semantic foun-dations and query execution. VLDBJ

Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In:SDM, SIAM

Botan I, Derakhshan R, Dindar N, Haas L,Miller RJ, Tatbul N (2010) Secret: A modelfor analysis of the execution semantics ofstream processing systems. In: VLDB

Carbone P, Katsifodimos A, Ewen S, Markl V,Haridi S, Tzoumas K (2015) Apache flink:Stream and batch processing in a single en-gine. Bulletin of the IEEE Computer SocietyTechnical Committee on Data Engineering36(4)

Carbone P, Traub J, Katsifodimos A, Haridi S,Markl V (2016) Cutty: Aggregate sharingfor user-defined windows. In: Proceedingsof the 25th ACM International on Confer-ence on Information and Knowledge Man-agement, ACM

Carbone P, Ewen S, Fora G, Haridi S, RichterS, Tzoumas K (2017) State managementin apache flink R©: consistent stateful dis-tributed stream processing. Proceedings ofthe VLDB Endowment 10(12):1718–1729

Chandrasekaran S, Cooper O, Deshpande A,Franklin MJ, Hellerstein JM, Hong W,Krishnamurthy S, Madden SR, Reiss F,Shah MA (2003) TelegraphCQ: continuousdataflow processing. In: Proceedings of the

Page 11: Stream Window Aggregation Semantics and Optimisationasterios.katsifodimos.com/assets/publications/... · Stream Window Aggregation Semantics and Optimisation Paris Carbone, Asterios

Stream Window Aggregation 11

2003 ACM SIGMOD international confer-ence on Management of data, ACM, pp668–668

Guirguis S, Sharaf MA, Chrysanthis PK,Labrinidis A (2012) Three-level processingof multiple aggregate continuous queries.In: IEEE ICDE

Hirzel M, Andrade H, Gedik B, Kumar V, LosaG, Nasgaard M, Soule R, Wu K (2009)SPL stream processing language specifi-cation. NewYork: IBMResearchDivisionTJWatsonResearchCenter, IBM ResearchRe-port: RC24897 (W0911 044)

Hirzel M, Soule R, Schneider S, Gedik B,Grimm R (2014) A catalog of streamprocessing optimizations. ACM ComputingSurveys (CSUR) 46(4):46

Krishnamurthy S, Wu C, Franklin M (2006)On-the-fly sharing for streamed aggrega-tion. In: AMC SIGMOD

Li J, Maier D, Tufte K, Papadimos V, Tucker PA(2005a) No pane, no gain: efficient evalua-tion of sliding-window aggregates over datastreams. ACM SIGMOD Record

Li J, Maier D, Tufte K, Papadimos V, TuckerPA (2005b) Semantics and evaluation tech-niques for window aggregates in datastreams. In: ACM SIGMOD

Li J, Tufte K, Maier D, Papadimos V (2008a)Adaptwid: An adaptive, memory-efficientwindow aggregation implementation. IEEEInternet Computing

Li J, Tufte K, Shkapenyuk V, Papadimos V,Johnson T, Maier D (2008b) Out-of-orderprocessing: a new architecture for high-performance stream systems. Proceedingsof the VLDB Endowment 1(1):274–288

Tangwongsan K, Hirzel M, Schneider S, Wu KL(2015) General incremental sliding-windowaggregation. In: VLDB

Tangwongsan K, Hirzel M, Schneider S (2017)Low-latency sliding-window aggregation inworst-case constant time. In: Proceedingsof the 11th ACM International Conferenceon Distributed and Event-based Systems,ACM, pp 66–77

Traub J, Grulich P, Rodriguez Cuellar A, BressS, Katsifodimos A, Rable T, Markl V (2018)Scotty: Efficient window aggregation forout-of-order stream processing. In: Data En-gineering (ICDE), 2012 IEEE 34th Interna-tional Conference on, IEEE