CLOUD-BASED
DATA STREAM PROCESSING
Thomas Heinze, Leonardo Aniello, Leonardo Querzoni, Zbigniew Jerzak
DEBS 2014
Mumbai 26/5/2014
TUTORIAL GOALS
Data stream processing (DSP) was in the past considered a solution for very specific problems.
Financial trading
Logistics tracking
Factory monitoring
Today the potentialities of DSPs start to be used in more general settings.
DSP : online processing = MR : batch processing
DSPs will possibly be offered as-a-service from cloud-based providers ?
2 25/05/14 CLoud-Based Data Stream Processing
TUTORIAL GOALS
Here we present an overview about current
research trends within data streaming systems
How they consider the requirements imposed by
recent use-cases
How are they moving toward cloud platforms
Two focus areas:
Scalability
Fault Tolerance
3 25/05/14 CLoud-Based Data Stream Processing
THE AUTHORS
Thomas Heinze Phd Student @ SAP
Phd Topic: Elastic Data Stream Processing
4
Leonardo Aniello Researcher fellow @ Sapienza Univ. of Rome
Leonardo Querzoni Assistant professor @ Sapienza Univ. of Rome
Zbigniew Jerzak Senior Researcher @ SAP
Co-Organizer of DEBS Grand Challenge
25/05/14 CLoud-Based Data Stream Processing
OUTLINE
Historical overview of data stream processing
Use cases for cloud-based data stream
processing
How DSPs work: Storm as example
Scalability methods
Fault tolerance integrated in modern DSPs
Future research directions and conclusions
5 25/05/14 CLoud-Based Data Stream Processing
CLOUD-BASED
DATA STREAM PROCESSING
Historical Overview 6
DATA STREAM PROCESSING ENGINE
It is a piece of software that
continuously calculates results for long-standing
queries
over potentially infinite data streams
using operators
algebraic (filters, join, aggregation)
user defined
That can be stateless or stateful
25/05/14 CLoud-Based Data Stream Processing 7
Source Aggr Sink
REQUIREMENTS[1]
Rule 1 - Keep the data moving
Rule 2 - Query using SQL on stream (StreamSQL)
Rule 3 - Handle stream imperfections (delayed, missing and out-of-
order data)
Rule 4 - Generate predictable outcomes
Rule 5 - Integrate stored and streaming data
Rule 6 - Guarantee data safety and availability
Rule 7 - Partition and scale applications automatically
Rule 8 - Process and respond instantaneously
8
[1] M. Stonebraker, U. Cetintemel and S. Zdonik. "The 8 Requirements of real-time Stream Processing", ACM
SIGMOD Record, 2005.
25/05/14 CLoud-Based Data Stream Processing
THREE GENERATIONS
First Generation
Extensions to existing database engines or simplistic engines
Dedicated to specific use cases
Second Generation
Enhanced methods regarding language expressiveness, load
balancing, fault tolerance
Third Generation
Dedicated towards trend of cloud computing; designed
towards massive parallelization
9 25/05/14 CLoud-Based Data Stream Processing
GENEALOGY
10
FIRST GENERATION THIRD GENERATION SECOND GENERATION
Aurora (2002) Medusa (2004)
Borealis (2005)
TelegraphCQ (2003)
NiagaraCQ (2001)
STREAM (2003)
StreamMapReduce(2011)
StreamMine3G (2013)
Yahoo! S4(2010)
Apache S4(2012)
Twitter Storm (2011) Trident (2013)
StreamCloud(2010)
Elastic StreamCloud (2012)
SPADE(2008)
Elastic Operator(2009)
InfoSphere Streams(2010)
FLUX (2002)
GigaScope (2002)
Spark Streaming (2010) D-Streams(2013)
Cayuga (2007)
SASE (2008)
ZStream (2009)
CAPE (2004)
D-CAPE (2005)
SGuard (2008)
PIPES (2004) HybMig (2006)
StreamInsight (2010)
TimeStream (2013) CEDR (2005)
SEEP (2013)
Esper (2006)
Truviso (2006)
StreamBase (2003)
Coral8 (2005)
Aleri (2006) Sybase CEP (2010)
Oracle CQL (2006) Oracle CEP (2009)
SAP ESP (2012)
DejaVu (2009)
System S(2005) LAAR(2014)
StreamIt (2002)
Flextream(2007)
Elastic System S(2013)
Cisco Prime (2012)
RTM Analyzer (2008)
webMethods Business Events(2011)
TIBCO StreamBase (2013)
TIBCO BusinessEvents(2008)
NILE (2004)
NILE-PDT (2005)
RLD (2013)
MillWheel(2013)
Johka (2007)
25/05/14 CLoud-Based Data Stream Processing
FIRST GENERATION - TELEGRAPH CQ[2]
Data stream processing engine built on top of
Postgres DB
11
[2] S. Chandrasekaran et al. "TelegraphCQ: Continuous Dataflow Processing", SIGMOD, 2003.
25/05/14 CLoud-Based Data Stream Processing
Source: S. Chandrasekaran et al. "TelegraphCQ: Continuous Dataflow Processing", SIGMOD, 2003.
SECOND GENERATION - BOREALIS[3]
Joint research project of Brandeis University,
Brown University and MIT
Duration: ~3 years
Allowed the experimentation of several
techniques
12
[3] D. J. Abadi et al. "The Design of the Borealis Stream Processing Engine", CIDR 2005.
25/05/14 CLoud-Based Data Stream Processing
BOREALIS MAIN NOVELTIES
Load-aware Distribution
Fine-grained High-availability
Load Shedding mechanisms
13 25/05/14 CLoud-Based Data Stream Processing
Source: D. J. Abadi et al. "The Design of the Borealis Stream Processing Engine", CIDR 2005.
THIRD GENERATION
Novel use cases are driving toward
Uprecedeted levels of parallelism
Efficient fault tolerance
Dynamic scalability
Etc.
14 25/05/14 CLoud-Based Data Stream Processing
CLOUD-BASED
DATA STREAM PROCESSING
Use Cases for Cloud-Based Data
Stream Processing
SCENARIOS FOR THIRD GENERATION
DSPS Tracking of query trend evolution in Google
Analysis of popular queries submitted to Twitter
User profiling at Yahoo! based on submitted queries
Bus routing monitoring and management in Dublin
Sentiment analysis on multiple tweet streams
Fraud monitoring in cellular telephony
16 25/05/14 Cloud-Based Data Stream Processing
QUERY MONITORING AT GOOGLE
Analyze queries submitted to Google search engine to
create a query historical model
Run on Zeitgeist on top of MillWheel[4]
Incoming searches are organized in 1-second buckets
Buckets are compared to historical data
Useful to promptly detect anomalies (spikes/dips)
17
[4] Akidau, Balikov, Bekiroğlu, Chernyak, Haberman, Lax, McVeety, Mills, Nordstrom, Whittle. MillWheel: Fault-tolerant
Stream Processing at Internet Scale.VLDB, 2013.
25/05/14 Cloud-Based Data Stream Processing
POPULAR QUERY ANALYSIS AT TWITTER
When a relevant event happens, an increase occurs in the
number of queries submitted to Twitter[5]
These queries have to be correlated in real-time with
tweets (2.1 billion tweets per day)
Runs on Storm
These spikes are likely to fade away in a limited amount of
time
Useful to improve the accuracy of popular queries
18
[5] Improving Twitter Search with Real-time Human Computation, 2013 http://blog.echen.me/2013/01/08/improving-
twitter-search-with-real-time-human-computation
25/05/14 Cloud-Based Data Stream Processing
POPULAR QUERY ANALYSIS AT TWITTER
Problem Sudden peak of queries about a new (never seen before) event
How to properly assign the right semantic (categorization) to the queries?
Example
Solution
Employ Human Evaluation (Amazon’s Mechanical Turk Service) to produce
categorizations to queries unseen so far
incorporate such categorizations into backend models
19
how can we understand that
#bindersfullofwomen refers
to politics?
25/05/14 Cloud-Based Data Stream Processing
POPULAR QUERY ANALYSIS AT TWITTER
20
Search Engine
Kafka Queue
query Log
Spout
Bolt
(query, ts)
popular queries
Human Evaluation
categorizations of queries
engine tuning
Storm
25/05/14 Cloud-Based Data Stream Processing
USER PROFILING AT YAHOO!
Queries submitted by users are evaluated (thousands
queries per second by millions of users)
Run on S4[6]
Useful to generate highly personalized advertising
21
[6] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed Stream Computing Platform.ICDMW, 2010.
25/05/14 Cloud-Based Data Stream Processing
BUS ROUTING MANAGEMENT IN DUBLIN
Tracking of bus locations (1000 buses) to improve
public transportation for 1.2 million citizens[7]
Run on System S
Position tracking by GPS signals to provide real-time
traffic information monitoring
Useful to predict arrival times and suggest better routes
22
[7] Dublin City Council - Traffic Flow Improved by Big Data Analytics Used to Predict Bus Arrival and Transit Times.
IBM, 2013. http://www-03.ibm.com/software/businesscasestudies/en/us/corp?docid=RNAE-9C9PN5
25/05/14 Cloud-Based Data Stream Processing
SENTIMENT ANALYSIS ON TWEET
STREAMS Tweet streams flowing at high rates (10K tweet/s)
Sentiment analysis in real-time
Limited computation time (latency up to 2 seconds)
Run on Timestream[8]
Useful to continuously estimate the mood about specific
topics
23
[8] Z. Qian, Y. He, C. Su, Z. Wu, H. Zhu, T. Zhang, L. Zhou, Y. Yu, Z. Zhang. Timestream: Reliable Stream
Computation in the Cloud. Eurosys, 2013.
25/05/14 Cloud-Based Data Stream Processing
FRAUD MONITORING FOR MOBILE CALLS
Fraud detection by real-time processing of Call Detail
Records (10k-50k CDR/s)
Requires self-joins over large time windows (queries on
millions of CDRs)
Run on StreamCloud[9]
Useful to reactively spot dishonest behaviors
24
[9] V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An Elastic and Scalable
Data Streaming System. IEEE TPDS, 2012.
25/05/14 Cloud-Based Data Stream Processing
REQUIREMENTS
Real-time and continuous complex analysis of heavy data streams More than 10k event/s
Limited computation latency Up to few seconds
Correlation of new and historical data Computations have a state to be kept
Input load varies considerably over time Computations have to adapt dynamically (Scalability)
Hardware and Software failures can occur Computations have to transparently tolerate faults with limited
performance degradation (Fault Tolerance)
25 25/05/14 Cloud-Based Data Stream Processing
CLOUD-BASED
DATA STREAM PROCESSING
How DSPs work: Storm as example
STORM
Storm is an open source distributed realtime
computation system
Provides abstractions for implementing event-based
computations over a cluster of physical nodes
Manages high throughput data streams
Performs parallel computations on them
It can be effectively used to design complex
event-driven applications on intense streams of
data
25/05/14 Cloud-Based Data Stream Processing 27
STORM
Originally developed by Nathan Marz and team
at BackType, then acquired by Twitter, now an
Apache Incubator project.
Currently used by Twitter, Groupon, The
Weather Channel, Taobao, etc.
Design goals:
Guaranteed data processing
Horizontal scalability
Fault Tolerance
25/05/14 Cloud-Based Data Stream Processing 28
STORM
An application is represented by a topology:
Spout
Spout
Spout
Spout
Bolt Bolt
Bolt
BoltBolt
Bolt
Bolt
Bolt
25/05/14 Cloud-Based Data Stream Processing 29
OPERATOR EXPRESSIVENESS
Storm is designed for custom operator definition
Bolts can be designed as POJOs adhering to a specific
interface
Implementations in other languages are feasible
Trident[10] offers a more high level programming
interface
[10] N. Marz. Trident tutorial. https://github.com/nathanmarz/storm/wiki/Trident-tutorial, 2013.
25/05/14 Cloud-Based Data Stream Processing 30
OPERATOR EXPRESSIVENESS
Operators are connected through grouping:
Shuffle grouping
Fields grouping
All grouping
Global grouping
None grouping
Direct grouping
Local or shuffle grouping
25/05/14 Cloud-Based Data Stream Processing 31
PARALLELIZATION
Storm asks the developer to provide “parallelism
hints” in the topology
Spout
Spout
Spout
Spout
Bolt Bolt
Bolt
BoltBolt
Bolt
Bolt
Bolt
x3
x8
x5
25/05/14 Cloud-Based Data Stream Processing 32
STORM INTERNALS
A storm cluster is constituted by a Nimbus node
and n Worker nodes
25/05/14 Cloud-Based Data Stream Processing 33
TOPOLOGY EXECUTION A topology is run by submitting it to Nimbus Nimbus allocates the execution of components (spouts
and bolts) to the worker nodes using a scheduler Each component has multiple instances (parallelism)
Each instance is mapped to an executor
A worker is instantiated whenever the hosting node must run executors for the submitted topology
Each worker node locally manages incoming/outgoing streams and local computation The local surpervisor takes care that everything runs as
prescribed
Nimbus monitors worker nodes during the execution to manage potential failures and the current resource usage
25/05/14 Cloud-Based Data Stream Processing 34
TOPOLOGY EXECUTION
Storm’s default scheduler (EvenScheduler)
applies a simple round robin strategy
25/05/14 Cloud-Based Data Stream Processing 35
A PRACTICAL EXAMPLE
Word count: the HelloWorld for DSPs
Input: stream of text (e.g. from documents)
Output: number of appearance for each word
Source: http://wpcertification.blogspot.it/2014/02/helloworld-apache-storm-word-counter.html
H elloStorm
LineReader
Spout
W ordSplit ter
Bolt
W ordCounter
Bolt
text lines words
TXT
docs
25/05/14 Cloud-Based Data Stream Processing 36
A PRACTICAL EXAMPLE
LineReaderSpout: reads docs and creates tuples
public class LineReaderSpout implements IRichSpout {
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.context = context;
this.fileReader = new FileReader(conf.get("inputFile").toString());
this.collector = collector;
}
public void nextTuple() {
if (completed) {
Thread.sleep(1000);
}
String str; BufferedReader reader = new BufferedReader(fileReader);
while ((str = reader.readLine()) != null) {
this.collector.emit(new Values(str), str);
completed = true;
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
public void close() {
fileReader.close();
}
}
25/05/14 Cloud-Based Data Stream Processing 37
A PRACTICAL EXAMPLE
WordSplitterBolt: cuts lines in words
public class WordSplitterBolt implements IRichBolt {
public void prepare(Map stormConf, TopologyContext context,OutputCollector c) {
this.collector = c;
}
public void execute(Tuple input) {
String sentence = input.getString(0);
String[] words = sentence.split(" ");
for(String word: words){
word = word.trim();
if(!word.isEmpty()){
word = word.toLowerCase();
collector.emit(new Values(word));
}
}
collector.ack(input);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
25/05/14 Cloud-Based Data Stream Processing 38
A PRACTICAL EXAMPLE
WordCounterBolt: counts word occurrences public class WordCounterBolt implements IRichBolt {
public void prepare(Map stormConf, TopologyContext context, OutputCollector c) {
this.counters = new HashMap<String, Integer>();
this.collector = c;
}
public void execute(Tuple input) {
String str = input.getString(0);
if(!counters.containsKey(str)){
counters.put(str, 1);
} else {
Integer c = counters.get(str) +1;
counters.put(str, c);
}
collector.ack(input);
}
public void cleanup() {
for (Map.Entry<String, Integer> entry : counters.entrySet()){
System.out.println (entry.getKey()+" : " + entry.getValue());
}
}
}
25/05/14 Cloud-Based Data Stream Processing 39
A PRACTICAL EXAMPLE
HelloStorm: contains the topology definition public class HelloStorm {
public static void main (String[] args) throws Exception{
Config config = new Config();
config.put("inputFile", args[0]);
config.setDebug(true);
config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("line-reader-spout", new LineReaderSpout());
builder.setBolt("word-spitter", new WordSplitterBolt().shuffleGrouping(
"line-reader-spout");
builder.setBolt("word-counter", new WordCounterBolt()).shuffleGrouping(
"word-spitter");
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("HelloStorm", config, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown();
}
}
25/05/14 Cloud-Based Data Stream Processing 40
CLOUD-BASED
DATA STREAM PROCESSING
Scalability
PARTITIONING SCHEMES
Data Parallelism:
How to parallelize the execution of an operator?
How to detect the optimal level of parallelization?
Operator Distribution:
How to distribute the load across available hosts?
How to achieve a load balance between these
machines?
28/05/2014 Cloud-based Data Stream Processing 42
RQUIREMENTS FOR DATA
PARALLELISM Transparent to the user
correct results in correct order (identical to
sequential execution)
28/05/2014 Cloud-based Data Stream Processing
Filter
Filter
Round-
Robin Union
Aggr
Aggr
Hash
(groupID)
Merge&
Sort
43
DATA PARALLELISM
First presented by FLUX[11] and Borealis[12]
Explicitely done using partitioning operators
User needs to decide:
partitioning scheme
merging scheme
level of parallelism
28/05/2014 Cloud-based Data Stream Processing 44
[11] M. Shah, et al. "Flux: An adaptive partitioning operator for continuous query systems." In ICDE, 2003.
[12] D. Abadi, et al. "The Design of the Borealis Stream Processing Engine." In CIDR, 2005..
PARALLELISM FOR CLOUD-BASED DSP
Massive parallelization (>100 partitions)
Support custom operators
Adapt parallelization level to the workload
without user interaction
28/05/2014 Cloud-based Data Stream Processing 45
DATA PARALLELISM IN STORM
User defines number of parallel task
Storm support different partitioning schemes
(aka grouping):
Shuffle grouping, Fields grouping, All grouping, Global
grouping, None grouping, Direct grouping, Local or
shuffle grouping, Custom
28/05/2014 Cloud-based Data Stream Processing 46
MAPREDUCE FOR STREAMING [13,14]
Extend MapReduce model for streaming data:
Break strict phases
Introduce Stateful Reducer
28/05/2014 Cloud-based Data Stream Processing
Map
Map
Map
Reduce
Reduce
Output Input
[13] T. Condie, et al. "MapReduce Online." In NSDI,2010.
[14] A. Brito, et al. "Scalable and low-latency data processing with StreamMapReduce." CloudCom, 2011.
47
STATEFUL REDUCER[14]
reduceInit(k1) {
// custom user class
S = new State();S.sum = 0; S.count = 0;
// object S is now associated with key k1
return S;
}
reduce(Key k, <List new, List expired, List window, UserState S>) {
For v in expired do:
// Remove contribution of expired events
S.sum -= v; S.count--;
For v in new do:
// Add contribution of new events
S.sum += v; S.count++;
send(k1, S.sum/S.count);
}
[14] A. Brito, et al. "Scalable and low-latency data processing with StreamMapReduce." CloudCom, 2011.
28/05/2014 Cloud-based Data Stream Processing 48
AUTO PARALLELIZATION[15,16]
Goal:
detect parallelizable regions in operator graph with custom operators for user-defined operators
Runtime support for enforcing safety conditions
Safety Conditions:
For an operator: no state or partitioned state, selectivity < 1, at most one pre/successor
For an parallel region: compatible keys, forwarded keys, region-local fusion dependencies
28/05/2014 Cloud-based Data Stream Processing
[15] S Schneider, et al. "Auto-parallelizing stateful distributed streaming applications." In PACT, 2012.
[16] Wu et al. “Parallelizing Stateful Operators in a Distributed Stream Processing System: How, Should You and How
Much?” In DEBS, 2012.
49
AUTO PARALLELIZATION
Compiler-based Approach:
Characterize each operator based on a set of
criterias (state type, selectivity, forwarding)
Merge different operators together into parallel
regions (based on left-first approach)
Best parallelization strategy is applied
automatically
Level of parallelism is decided on job admission
28/05/2014 Cloud-based Data Stream Processing 50
ELASTICITY[17]
Goal: React to unpredicted load peaks & reduce
the amount of ideling resources.
28/05/2014 Cloud-based Data Stream Processing
Workload Load
Static Provisioning
Elastic Provisioning
Underprovisioning
Overprovisioning
[17] M. Armbrust, et al. "A view of cloud computing." Communications of the ACM, 2010.
51
DIFFERENT TYPES OF ELASTICITY
Vertical Elasticity (Scale up)
Adapt the number of threads per operator based
on the workload.
Horizontal Elasticity (Scale out)
Adapt the number of hosts based on the workload.
28/05/2014 Cloud-based Data Stream Processing 52
ELASTIC OPERATOR EXECUTION[18]
Dynamically adapt number of threads for a
stateless operator based on the workload
Limitations: works only on thread-level and for
stateless operators
28/05/2014 Cloud-based Data Stream Processing
[18] S. Schneider, et al. "Elastic scaling of data parallel operators in stream processing." In IPDPS, 2009.
Operator
Thread Pool
53
ELASTIC AUTO-PARALLELIZATION[19]
Combines ideas of Elasticity and Auto-
Parallelization
Adapt parallelization level using controller-based
approach
Dynamic adaption of parallelization level based
on operator migration protocol
28/05/2014 Cloud-based Data Stream Processing 54
[19] B. Gedik, et al. “Elastic Scaling for Data Stream Processing." In TPDS, 2013.
Horizontal barrier Vertical
barrier
ELASTIC AUTO-PARALLELIZATION
Dynamic adaption of parallelization: lend &
borrow approach
28/05/2014 Cloud-based Data Stream Processing 55
Op1
Op2
Splitter Merger
Op2
Op2
1. Lend Phase 2. Borrow Phase
OPERATOR DISTRIBUTION
Well-studied problem since first DSP prototypes
Typically referred to as „Operator Placement“
Examples: Borealis, SODA, Pietzuch et al.
Design decisions[20]
Optimization goal: What is optimized ?
Execution mode: Centralized or Decentralized?
Algorithm runtime: Offline, online or both?
28/05/2014 Cloud-based Data Stream Processing
56
[20] G.T. Lakshmanan, et al. "Placement strategies for internet-scale data stream systems." In IEEE Internet Computing, 2008.
OPTIMIZATION GOAL
Different objectives:
Balance CPU load (handle load variations)
Minimize Network latency (minimize latency for
sensor networks)
Latency optimization (predictable quality of service)
28/05/2014 Cloud-based Data Stream Processing 57
EXECUTION MODE
Centralized Execution:
All decision done by centralized management component
Major disadvantage: manager becomes scalability bottleneck
Decentralized Execution:
Different hosts try to agree on an operator placement
Two examples: operator routing and commutative execution
28/05/2014 Cloud-based Data Stream Processing 58
DECENTRALIZER EXECUTION
Based on pairwise agreement[21]
28/05/2014 Cloud-based Data Stream Processing
Host 1 Host 2 Host 3 Host 4
Aggr4 Aggr2
Filter 2 Filter 3
Filter 5
Filter 6
Filter 4
Filter 1
Aggr 3 Aggr 4
Aggr 3
Aggr 5 Aggr 6
Source 1
Source 2
59
[21] M. Balazinska, et al. "Contract-Based Load Management in Federated Distributed Systems." NSDI. Vol. 4.
2004.
DECENTRALIZER EXECUTION
Based on decentralized routing[22]
Host 2
Host 3
Host 4 Host 1
Src1
Src2
J
D
F F
28/05/2014 Cloud-based Data Stream Processing 60
[22] Y.Ahmad, et al. "Network-aware query processing for stream-based applications. In VLDB, 2004.
ALGORITHM RUNTIME
Offline
Based on estimation of input rates, selectivities, etc.
Online
Mostly simplistic initial estimation (e.g. round-robin or
random placement)
Continously measurements and adaption during
runtime
28/05/2014 Cloud-based Data Stream Processing 61
MOVEMENT PROTOCOLS
How to ensure loss-free movement of
operators?
No input event shall be lost
State need to be restored on new host
Movement strategies:
Pause & Resume vs. Parallel track
large overlap with research on adaptive query
processing[23]
28/05/2014 Cloud-based Data Stream Processing 62
[23] Y. Zhu, et al. "Dynamic plan migration for continuous queries over data streams." In SIGMOD, 2004.
PAUSE & RESUME
Approach:
1. Stop execution
2. Move state to new host
3. Restart execution on new hosts
Properties:
Simple & generic
Latency Peak can be observed due to pausing
28/05/2014 Cloud-based Data Stream Processing 63
PARALLEL TRACK
Approach:
1. Start new instance
2. Move or create up to date state
3. Stop old instance as soon as instances are in sync
Properties:
No latency peak
Requires duplicate detection, detection of „sync“
status
Events processed twice
28/05/2014 Cloud-based Data Stream Processing 64
OPERATOR PLACEMENT ITHIN CLOUD-
BASED DSP
Different setup (highly location-wise distributed vs. single cluster)
Larger scale (100 …. 1000 hosts)
New objectives: Elasticity, Monetary Cost, Energy Efficiency
Mostly centralized, adaptive solutions optimizing CPU utilization or monetary cost with Pause &
Resume Operator Movement.
28/05/2014 Cloud-based Data Stream Processing 65
RECENT OPERATOR PLACEMENT
APPROACHES SQPR[24]
Query planner for data centers with heterogenes
resources
MACE[25]
Present san approach for precise latency estimation
28/05/2014 Cloud-based Data Stream Processing
[24] V. Kalyvianaki et al. "SQPR: Stream query planning with reuse“. In ICDE, 2011.
[25] B. Chandramouli et al. "Accurate latency estimation in a distributed event processing system“. In ICDE, 2011.
66
STORM LOAD MODEL
Worker: physical JVM and executes subset of all
the tasks of the topology
Task: Parallel instance of an operator
Executor: Thread of an worker, executes one or
more tasks of the same operator
28/05/2014 Cloud-based Data Stream Processing 67
EXAMPLE FOR STORM LOAD MODEL
Config conf = new Config();
conf.setNumWorkers(3); // use two worker
processes topologyBuilder.setSpout(„src", new Source(), 3);
topologyBuilder.setBolt(„aggr", new Aggregation(),
6).shuffleGrouping(„src");
topologyBuilder.setBolt(„sink", new Sink(),
6).shuffleGrouping(„aggr");
StormSubmitter.submitTopology( "mytopology", conf,
topologyBuilder.createTopology() );
http://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html.
28/05/2014 Cloud-based Data Stream Processing 68
STORM: DISTRIBUTION MODEL[26]
28/05/2014 Cloud-based Data Stream Processing 69
Source
(spout)
Aggr
(bolt)
Sink
(bolt)
Parallelism
hint =3
Parallelism
hint =6
Parallelism
hint =3
Worker 1
Source Task
Sink Task
Aggr Task
Aggr Task
Worker 2
Source Task
Sink Task
Aggr Task
Aggr Task
Worker 3
Source Task
Sink Task
Aggr Task
Aggr Task
[26] Storm. http://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-
topology.html.
ADAPTIVE PLACEMENT IN STORM
Manual reconfiguration (pause & resume
complete topology)
Enhanced Load scheduler for Twitter Storm[27]
Load balancing adapted to Storm architecture
28/05/2014 Cloud-based Data Stream Processing
$ storm rebalance mytopo -n 4 -e src=8
70
[27] L. Aniello, et al. "Adaptive online scheduling in storm.“ In DEBS, 2013.
DIFFERENT LEVELS OF ELASTICITY
28/05/2014 Cloud-based Data Stream Processing
Elastic Action Taken System
Virtual Machine Move virtual machines transparent to the user. [28]
Engine New engine instance is started and new queries are
employed on this engine
[29]
Query New query instance is started and the input data is
split
[30]
Operator New operator instance is created and the input data
is split
[31]
[28] T. Knauth, et al. "Scaling non-elastic applications using virtual machines." In Cloud, 2011.
[29] W. Kleiminger, et al."Balancing load in stream processing with the cloud." In ICDEW, 2011.
[30] M. Migliavacca, et al. "SEEP: scalable and elastic event processing.“ In Middleware, 2010.
[31] R. Fernandez, et al. „Integrating scale out and fault tolerance in stream processing using operator state
management”, SIGMOD 2013.
71
ELASTICS DSP SYSTEM ARCHITECTURE
28/05/2014 Cloud-based Data Stream Processing
Elasticity
Manager
Resource
Manager
Data Stream
Processing Engine
Data Stream
Processing Engine
Data Stream
Processing Engine
Virtual
Machine
Virtual
Machine
Virtual
Machine
. . . . . .
72
STREAMCLOUD[9]
Realized different level of elasticity on a highly
scalable DSP
Studied different major design decisions:
Optimal Level of Elasticity
Migration strategy
Load balancing based on user-defined thresholds
28/05/2014 Cloud-based Data Stream Processing 73
[9] V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An Elastic and
Scalable Data Streaming System. IEEE TPDS, 2012.
SEEP[31]
Present the state of the art mechanims for state
management
combine fault tolerance & scale out using same
mechanism
Highly scalable and fast recovery
28/05/2014 Cloud-based Data Stream Processing 74
[31] R. Fernandez, et al. "Integrating scale out and fault tolerance in stream processing using operator state
management", SIGMOD 2013.
MILLWHEEL[4]
Similar mechanisms like SEEP or StreamCloud
Support massive scale out
Centralized manager
Optimization of CPU Load
28/05/2014 Cloud-based Data Stream Processing 75
[4] T. Akidau, et al. „MillWheel: Fault-Tolerant Stream Processing at Internet Scale”, VLDB 2013.
CLOUD-BASED
DATA STREAM PROCESSING
Fault tolerance
FAULT TOLERANCE IN DSPS
Small scale stream processing
Faults are an exception
Optimize for the lucky case
Catch faults at execution time and start recovery
procedures (possibly expensive)
Large scale DSPs (e.g. cloud based)
Faults are likely to affect every execution
Consider them in the design process
28/05/2014 Cloud-Based Data Stream Processing 77
FAULT TOLERANCE IN DSPS
Two main fault causes
Message losses
Computational element failures
Event tracking
Makes sure each injected event is correctly processed
with well-defined semantics
State management
Makes sure that the failure of a computational
element will not affect the system‘s correct execution
28/05/2014 Cloud-Based Data Stream Processing 78
EVENT TRACKING
When a fault occurs, events may need to be
replayed
Typical approach:
Request acks from downstream operators
Detect losses by setting timeouts on acks
Replay lost events from upstream operators
Tracking and managing information on processed
events may be needed to guarantee specific
processing semantics
28/05/2014 Cloud-Based Data Stream Processing 79
STORM – ET
Event tracking in storm is guaranteed by two
different techniques:
Acker processes at-least-once message processing
Transactional topologies exactly-once message
processing
28/05/2014 Cloud-Based Data Stream Processing 80
STORM – ET
A tuple injected in the system can cause the
production of multiple tuples in the topology
This production partially maps the underlying
application topology
Storm keeps track of the processing DAG
stemming from each input tuple
28/05/2014 Cloud-Based Data Stream Processing 81
STORM – ET
Example: word-count topology
SpoutSplit ter
bolt
Counter
bolt
“stay hungry, stay foolish”
text lines words
[“stay”, “hungry”, “stay”, “foolish”] [“stay”, 2]
[“hungry”, 1]
[“foolish”, 1]
“stay hungry, stay foolish”
“stay” [“stay”, 1]
“hungry”
“stay”
“foolish”
[“hungry”, 1]
[“foolish”, 1]
[“stay”, 2]
28/05/2014 Cloud-Based Data Stream Processing 82
STORM – ET
Acker tasks are responsible for monitoring the
flow of tuples in the DAG
Each bolt “acks” the correct processing of a tuple
The processing of a tuple can be committed when it
has fully traversed the DAG
In this case the acker notifies the original spout
The spout must implement an ack(Object msgId) method
It can be used to garbage collect tuple state
28/05/2014 Cloud-Based Data Stream Processing 83
STORM – ET
If a tuple does not reach the end of the DAG
The acker timeouts and invoke fail(Object msgId) on
the spout
The spout method implementation is in charge of
replaying failed tuples
Notice: the original data source must be able to
reliably reply events (e.g. a reliable MQ)
28/05/2014 Cloud-Based Data Stream Processing 84
STORM – ET
Developer perspective:
Correctly connect bolts with tuple sources
(anchoring)
Anchoring is how you specify the tuple tree
public void execute(Tuple tuple) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
_collector.emit(tuple, new Values(word));
}
_collector.ack(tuple);
}
Anchor
Source: https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing
28/05/2014 Cloud-Based Data Stream Processing 85
STORM – ET
Developer perspective:
Explicitly ack processed tuples
Or extend BaseBasicBolt
public void execute(Tuple tuple) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
_collector.emit(tuple, new Values(word));
}
_collector.ack(tuple);
} Explicit ack
Source: https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing
28/05/2014 Cloud-Based Data Stream Processing 86
STORM – ET
Implement ack() and fail() on the spout(s)
public void nextTuple() {
if(!toSend.isEmpty()){
for(Map.Entry<Integer, String> transactionEntry: toSend.entrySet()){
Integer transactionId = transactionEntry.getKey();
String transactionMessage = transactionEntry.getValue();
collector.emit(new Values(transactionMessage),transactionId);
}
toSend.clear();
}
}
public void ack(Object msgId) {
messages.remove(msgId);
}
public void fail(Object msgId) {
Integer transactionId = (Integer) msgId;
Integer failures = transactionFailureCount.get(transactionId) + 1;
if(failures >= MAX_FAILS){
throw new RuntimeException(”Too many failures on Tx [”+transactionId+"]”);
}
transactionFailureCount.put(transactionId, failures);
toSend.put(transactionId,messages.get(transactionId));
}
Source: J. Leibiusky, G. Eisbruch and D. Simonassi. Getting Started with Storm. O’Reilly, 2012
re-schedule
Failed topology
Completed processing
28/05/2014 Cloud-Based Data Stream Processing 87
STORM – ET
How does the acker task work?
It is a standard bolt
The acker tracks for each tuple emitted by a spout
the corresponding DAG
It acks the spout whenever the DAG is complete
You can instantiate parallel ackers to improve
performance
Tuples are randomly assigned to ackers to improve
load balancing (uses mod hashing)
28/05/2014 Cloud-Based Data Stream Processing 88
STORM – ET
How can ackers keep track of DAGs?
Each tuple is identified by a unique 64bit ID
The acker stores in a map for each tuple IP
The ID of the emitter task
An ack val
An ack val is a bit-vector that encodes
IDs of tuples stemmed from the initial one
IDs of acked tuples
28/05/2014 Cloud-Based Data Stream Processing 89
STORM – ET
Ack val management
When a tuple is emitted by a spout
Initializes the vector and encodes in it the tuple ID
When a tuple is acked
XOR its ID in the ack val
When an anchored tuple is emitted
XOR its ID in the ack val
When the ack val is empty, all tuples in the DAG
have been acked (with high probability)
28/05/2014 Cloud-Based Data Stream Processing 90
STORM – ET
If exactly-once semantics is required use a transactional topology Transaction = processing + committing
Processing is heavily parallelized
Committing is strictly sequential
Storm takes care of
State management (through Zookeeper)
Transaction coordination
Fault detection
Provides a batch processing API
Note: requires a source able to reply data batches
28/05/2014 Cloud-Based Data Stream Processing 91
EVENT TRACKING
In other systems the event tracking functionality
is strongly coupled with state management
events cannot be garbage collected when they are
acknowledged
Need to wait for the checkpointing of a state
updated with such events
Timestream does not store all the events and re-
compute those to be replayed by tracking their
dependencies with input events, similarly to
Storm
28/05/2014 Cloud-Based Data Stream Processing 92
EVENT TRACKING
SEEP stores non-checkpointed events on the
upstream operators
Millwheel persists all intermediate results to an
underlying Distributed File System.
It also provides exactly-once semantics
D-Streams stores data to be processed in
immutable partitioned datasets
These are implemented as resilient distributed
datasets (RDD)
28/05/2014 Cloud-Based Data Stream Processing 93
STATE MANAGEMENT
Stateful operators require their state to be
persisted in case of failures.
Two classic approaches
Active replication
Passive replication
28/05/2014 Cloud-Based Data Stream Processing 94
STATE MANAGEMENT
28/05/2014 Cloud-Based Data Stream Processing 95
Operator
Operator
Operator
Pre
vious
stag
e
Next
stag
e
State
State
State
Data
State implicitly synchronized byordered evaluation of same data
Act ive replicat ion
Operator
Operator(dormant )
Operator(dormant )
Pre
vious
stag
e
Next
stag
e
State
State
State
Data
State periodically persisted on stable storage and recovered on demand
Passive replicat ion
Checkpoints
STORM - SM
What happen when tasks fail ?
If a worker dies its supervisor restarts it
If it fails on startup Nimbus will reassign it on a different
machine
If a machine fails its assigned tasks will timeout and
Nimbus will reassign them
If Nimbus/Supervisors die they are simply restarted
Behave like fail-fast processes
They’re stateless in practice
Their state is safely maintained in in a Zookeeper cluster
28/05/2014 Cloud-Based Data Stream Processing 96
STORM - SM
There is no explicit state management for
operators in Storm
Trident builds automatic SM on top of it
Batch of tuples have unique Tx id
If a batch is retried it will have the exact same Tx id
State updates are ordered among batches
A new Tx is not committed if an old one is still pending
Transactional state guarantees transparently
exactly-once tuple processing
28/05/2014 Cloud-Based Data Stream Processing 97
STORM - SM
Transactional state is possible only if supported
by the data source
As an alternative
Opaque transactional state
Each tuple is guaranteed to be executed exactly in one
transaction
But the set of transactions for a given Tx id can change in
case of failures
Non transactional state
28/05/2014 Cloud-Based Data Stream Processing 98
STORM - SM
Few combinations guarantee exactly-once sem.
28/05/2014 Cloud-Based Data Stream Processing 99
State
Non
transactional Transactional
Opaque
transactional
Sp
ou
t Non
transactional No No No
Transactional No Yes Yes
Opaque
transactional No No Yes
APACHE S4 - SM
Lightweight approach
Assumes that lossy failovers are acceptable.
PEs hosted on failed PNs are automatically moved to a standby server
Running PEs periodically perform uncoordinated and asynchronous checkpointing of their internal state
Distinct PE instances can checkpoint their state autonomously without synchronization
No global consistency
Executed asynchronously by first serializing the operator state and then saving it to stable storage through a pluggable adapter
Can be overridden by a user implementation
28/05/2014 Cloud-Based Data Stream Processing 100
MILLWHEEL - SM
Guarantees strongly consistent processing
Checkpoints on persistent storage every single state change incurred after a computation
Can be executed
before emitting results downstream
operator implementations are automatically rendered idempotent with respect to the execution
after emitting results downstream
it’s up to the developer to implement idempotent operators if needed
Produced results are checkpointed with state (strong productions)
28/05/2014 Cloud-Based Data Stream Processing 101
TIMESTREAM- SM Takes a similar approach State information checkpointed to stable storage
For each operator the state includes state dependency: the list of input events that made the
operator reach a specific state
output dependency: the list of input events that made an operator produce a specific output starting from a specific state.
Allow to correctly recover from a fault without having to store all the intermediate events produced by the operators and their states.
Is it possibile to periodically checkpoint a full operator state in order to avoid re-emitting the whole history of input events in order to recompute it.
28/05/2014 Cloud-Based Data Stream Processing 102
SEEP - SM
Takes a different route allowing state to be
stored on upstream operators
allows SEEP to treat operator recovery as a special
case of a standard operator scale-out procedure
state in SEEP is characterized by three elements
internal state
output buffers
routing state
treated differently to reduce the state management
impact on system performance.
28/05/2014 Cloud-Based Data Stream Processing 103
D-STREAMS- SM SM depends strictly on the computation model A computation is structured as a sequence of
deterministic batch computations on small time intervals
The input and output of each batch, as well as the state of each computation, are stored as reliable distributed datasets (RDDs)
For each RDD, the graph of operations used to compute (its lineage) it is tracked and reliably stored
The recovery of an RDD can be performed in parallel on separate nodes in order to speed up recovery operation
Operator state can be optionally checkpointed on stable storage to limit the number of operations required to restore it.
28/05/2014 Cloud-Based Data Stream Processing 104
CLOUD-BASED
DATA STREAM PROCESSING
Open research directions
CONCLUSIONS
A third generation of DPSs is coming out that
promise
Unprecedented computational power through
horizontal scalability
On-demand dynamic load adaptation
Simplified programming models through powerful
event management semantics
Graceful performance degradation in case of faults
A few issues remain to be solved to make DSP
ready for the cloud-era
28/05/2014 Cloud-Based Data Stream Processing 106
INFRASTRUCTURE AWARENESS Most existing DSP systems are infrastructure
oblivious deployment strategies do not take into account the
peculiar characteristics of the available hardware
The physical connection and relationship among infrastructural element is known to be a key factor to both improve system performance and fault tolerance. E.g. Hadoop's “rack awareness”
We think infrastructure awareness is an open research field for data stream processing systems that could possibly bring important improvements.
28/05/2014 Cloud-Based Data Stream Processing 107
COST-EFFICIENCY Users in these days are not only interested in the
performance of the system, but also the monetary cost
Cloud-based DSP systems should consider running cost as a variable for performance optimization
efficient scaling behavior maximizing the system utilization
efficient fault tolerance mechanisms
Bellavista et al.[32] proposed a first prototype, which allows the user to trade-off monetary cost and fault tolerance.
Their prototype only selects a subset of the operators for replication based on a user-defined value for the expected information completeness.
28/05/2014 Cloud-Based Data Stream Processing 108
[32] P. Bellavista, A. Corradi, S. Kotoulas, and A. Reale. Adaptive fault-tolerance for dynamic resource provisioning in
distributed stream processing systems. In EDBT, 2014.
ENERGY-EFFICIENCY
Most large-scale datacenters are striving to "go
green" by making energy consumption more
effective.
DSP systems should consider energy
consumption a yet-another-variable in their
performance optimization process
Note: this aspect is possibly strictly linked to the
infrastructure awareness theme
28/05/2014 Cloud-Based Data Stream Processing 109
ADVANCED ELASTICITY
Most of the existing elastic scaling solutions for DSPs apply simplistic schemes for load balancing.
Operator placement algorithms only optimize system utilization
Other metrics (end to end latency, network bandwidth, etc.) are only partially considered
However these metric are often used to sign contract-binding SLAs
Would it be possible to design DSP systems able to probabilistically guarantee performance ?
28/05/2014 Cloud-Based Data Stream Processing 110
MULTI-DSP INTEGRATION
Most DSPs are particularly well suited for specific use cases
No one-size-fits-all solution
Components automatically selecting the best engine for each give use case would significantly improve the applicability of these systems.
No need to know in advance which is the best solution for given use case
No need to deploy and maintain different solutions
First promising results[33,34]
28/05/2014 Cloud-Based Data Stream Processing 111
[33] M. Duller, J. S. Rellermeyer, G. Alonso, and N. Tatbul. Virtualizing stream processing. Middleware, 2011.
[34] H. Lim and S. Babu. Execution and optimization of continuous queries with cyclops. SIGMOD, 2013.
Storm
Performance
MapReduce
Performance
28/05/2014 Cloud-Based Data Stream Processing 112
MULTI-DSP INTEGRATION
ESPER
Performance
Cyclops New Query
With Window Size w
Choose DSP
1
2
[34] H. Lim and S. Babu. Execution and optimization of continuous queries with cyclops. SIGMOD, 2013.
REFERENCES [1] M. Stonebraker, U. Cetintemel and S. Zdonik "The 8 Requirements of real-time Stream Processing", ACM
SIGMOD Record, 2005.
[2] S. Chandrasekaran et al. "TelegraphCQ: Continuous Dataflow Processing", SIGMOD, 2003.
[3] D. J. Abadi et al. "The Design of the Borealis Stream Processing Engine", CIDR 2005.
[4] T. Akidau, A. Balikov, K. Bekirog ̆lu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom,
and S. Whittle. "MillWheel: Fault-tolerant Stream Processing at Internet Scale". VLDB, 2013.
[5] Improving Twitter Search with Real-time Human Computation, 2013 http://blog.echen.me/2013/01/08/improving-
twitter-search-with-real-time-human-computation
[6] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed Stream Computing Platform.ICDMW,
2010.
[7] "Dublin City Council - Traffic Flow Improved by Big Data Analytics Used to Predict Bus Arrival and
Transit Times". IBM, 2013. http://www-03.ibm.com/software/businesscasestudies/en/us/corp?docid=RNAE-9C9PN5
[8] Z. Qian, Y. He, C. Su, Z. Wu, H. Zhu, T. Zhang, L. Zhou, Y. Yu, Z. Zhang. "Timestream: Reliable Stream
Computation in the Cloud". Eurosys, 2013.
[9] V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. "Streamcloud: An Elastic
and Scalable Data Streaming System". IEEE TPDS, 2012.
[10] N. Marz. "Trident tutorial". https://github.com/nathanmarz/storm/wiki/Trident-tutorial.
[11] M. Shah, et al. "Flux: An adaptive partitioning operator for continuous query systems." In ICDE, 2003.
[12] D. Abadi, et al. "The Design of the Borealis Stream Processing Engine." In CIDR, 2005..
[13] T. Condie, et al. "MapReduce Online." In NSDI,2010.
[14] A. Brito, et al. "Scalable and low-latency data processing with stream mapreduce." CloudCom, 2011.
[15] S. Schneider, et al. "Auto-parallelizing stateful distributed streaming applications." In PACT, 2012.
REFERENCES [16] Wu et al. “Parallelizing Stateful Operators in a Distributed Stream Processing System: How, Should You
and How Much?” In DEBS, 2012.
[17] M. Armbrust, et al. "A view of cloud computing." Communications of the ACM, 2010.
[18] S. Schneider, et al. "Elastic scaling of data parallel operators in stream processing." In IPDPS, 2009.
[19] B. Gedik, et al. “Elastic Scaling for Data Stream Processing." In TPDS, 2013.
[20] G.T. Lakshmanan, et al. "Placement strategies for internet-scale data stream systems." In IEEE Internet
Computing, 2008.
[21] M. Balazinska, et al. "Contract-Based Load Management in Federated Distributed Systems." NSDI. Vol. 4.
2004.
[22] Y.Ahmad, et al. "Network-aware query processing for stream-based applications. In VLDB, 2004.
[23] Y. Zhu, et al. "Dynamic plan migration for continuous queries over data streams." In SIGMOD, 2004.
[24] V. Kalyvianaki et al. "SQPR: Stream query planning with reuse“. In ICDE, 2011.
[25] B. Chandramouli et al. "Accurate latency estimation in a distributed event processing system“. In ICDE,
2011.
[26] Storm. http://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-
topology.html.
[27] L. Aniello, et al. "Adaptive online scheduling in storm.“ In DEBS, 2013.
[28] T. Knauth, et al. "Scaling non-elastic applications using virtual machines." In Cloud, 2011.
[29] W. Kleiminger, et al."Balancing load in stream processing with the cloud." In ICDEW, 2011.
[30] M. Migliavacca, et al. "SEEP: scalable and elastic event processing.“ In Middleware, 2010.
REFERENCES [31] R. Fernandez, et al. "Integrating scale out and fault tolerance in stream processing using operator state
management", SIGMOD 2013.
[32] P. Bellavista, A. Corradi, S. Kotoulas, and A. Reale. "Adaptive fault-tolerance for dynamic resource
provisioning in distributed stream processing systems". In EDBT, 2014.
[33] M. Duller, J. S. Rellermeyer, G. Alonso, and N. Tatbul. "Virtualizing stream processing". Middleware, 2011.
[34] H. Lim and S. Babu. "Execution and optimization of continuous queries with cyclops". SIGMOD, 2013.
Top Related