Discretized Streams: Fault-Tolerant Streaming Computation at Scale Wenting Wang 1.
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
-
Upload
tathagata-das -
Category
Data & Analytics
-
view
41 -
download
0
Transcript of Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
![Page 1: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/1.jpg)
Discretized StreamsFault-Tolerant Streaming Computation at Scale
Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion Stoica
![Page 2: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/2.jpg)
Motivation
Many big-data applications need to process large data streams in near-real timeWebsite
monitoring Fraud detection Ad
monetizationRequire tens to hundreds of nodes
Require second-scale latencies
![Page 3: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/3.jpg)
![Page 4: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/4.jpg)
Challenge
• Stream processing systems must recover from failures and stragglers quickly and efficiently– More important for streaming systems than batch
systems
• Traditional streaming systems don’t achieve these properties simultaneously
![Page 5: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/5.jpg)
Outline
• Limitations of Traditional Streaming
Systems
• Discretized Stream Processing
• Unification with Batch and Interactive
Processing
![Page 6: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/6.jpg)
Traditional Streaming Systems
• Continuous operator modelmutable state
node 1
node 3
input records
node 2
input records
– Each node runs an operator with in-memory mutable state
– For each input record, state is updated and new records are sent out
• Mutable state is lost if node fails
• Various techniques exist to make state fault-tolerant
![Page 7: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/7.jpg)
Fault-tolerance in Traditional Systems
• Separate set of “hot failover” nodes process the same data streams
• Synchronization protocols ensures exact ordering of records in both sets
• On failure, the system switches over to the failover nodes
sync protocol
input
input
hot failover nodes
Fast recovery, but 2x hardware cost
Node Replication [e.g. Borealis, Flux ]
![Page 8: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/8.jpg)
input
input
Fault-tolerance in Traditional Systems
Upstream Backup [e.g. TimeStream, Storm ]
cold failover
node
• Each node maintains backup of the forwarded records since last checkpoint
• A “cold failover” node is maintained
• On failure, upstream nodes replay the backup records serially to the failover node to recreate the state
Only need 1 standby, but slow recovery
backup
replay
![Page 9: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/9.jpg)
Slow Nodes in Traditional Systems
Node Replication
Upstream Backup
input
input
input
input
Neither approach handles stragglers
![Page 10: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/10.jpg)
Our Goal
• Scales to hundreds of nodes
• Achieves second-scale latency
• Tolerate node failures and stragglers
• Sub-second fault and straggler recovery
• Minimal overhead beyond base
processing
![Page 11: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/11.jpg)
Our Goal
• Scales to hundreds of nodes
• Achieves second-scale latency
• Tolerate node failures and stragglers
• Sub-second fault and straggler recovery
• Minimal overhead beyond base
processing
![Page 12: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/12.jpg)
Why is it hard?
Stateful continuous operators tightly integrate “computation” with “mutable
state”
Makes it harder to define clear boundaries when computation and state can be moved around
statefulcontinuous operator
mutable state
inputrecords
outputrecords
![Page 13: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/13.jpg)
Dissociate computation from state
Make state immutable and break computation into small, deterministic,
stateless tasks
Defines clear boundaries where state and computation
can be moved around independently
stateless task
state 1
input 1
state 2
stateless task
state 2
input 2
stateless task
input 3
![Page 14: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/14.jpg)
Batch Processing Systems!
![Page 15: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/15.jpg)
Batch Processing Systems
Batch processing systems like MapReduce divide
– Data into small partitions– Jobs into small, deterministic, stateless map /
reduce tasks
M
M
M
M
M
M
M
M
M
M
M
M
stateless map tasks
stateless reduce tasks
R
R
R
R
R
R
immutable input
dataset
immutable output dataset
immutablemap outputs
![Page 16: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/16.jpg)
Parallel Recovery
Failed tasks are re-executed on the other nodes in parallel
M
M
M
R
R
M
M
M
M
M
M
stateless map tasks
stateless reduce tasks
RR
R
R
MMM
M
M
MR
R
immutable input
dataset
immutable output dataset
![Page 17: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/17.jpg)
Discretized Stream Processing
![Page 18: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/18.jpg)
Discretized Stream Processing
Run a streaming computation as a series of small, deterministic
batch jobs
Store intermediate state data in cluster memory
Try to make batch sizes as small as possible to get second-scale latencies
![Page 19: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/19.jpg)
Discretized Stream Processing
time = 0 - 1: batch operations
Input: replicated dataset stored in
memory
Output or State: non-replicated
datasetstored in memory
input stream
output / state stream
… …
input
time = 1 - 2:input
![Page 20: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/20.jpg)
Example: Counting page views
Discretized Stream (DStream) is a sequence of immutable, partitioned datasets– Can be created from live data streams or by
applying bulk, parallel transformations on other DStreams
views = readStream("http:...", "1 sec")
ones = views.map(ev => (ev.url, 1))
counts = ones.runningReduce((x,y) => x+y)
creating a DStream
transformation
views ones counts
t: 0 - 1
t: 1 - 2
map reduce
![Page 21: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/21.jpg)
Fine-grained Lineage
• Datasets track fine-grained operation lineage
• Datasets are periodically checkpointed asynchronously to prevent long lineages
views ones counts
t: 0 - 1
t: 1 - 2
map reduce
t: 2 - 3
![Page 22: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/22.jpg)
• Lineage is used to recompute partitions lost due to failures
• Datasets on different time steps recomputed in parallel
• Partitions within a dataset also recomputed in parallel
views ones counts
t: 0 - 1
t: 1 - 2
map reduce
t: 2 - 3
Parallel Fault Recovery
![Page 23: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/23.jpg)
Upstream Backup
stream replayed serially
state
views ones counts
t: 0 - 1
t: 1 - 2
t: 2 - 3
Discretized Stream
Processing
parallelism across time intervals
parallelism within a batch
Faster recovery than upstream backup,
without the 2x cost of node replication
Comparison to Upstream Backup
![Page 24: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/24.jpg)
How much faster than Upstream Backup?
Recover time = time taken to recompute and catch up
– Depends on available resources in the cluster– Lower system load before failure allows faster
recovery
Parallel recovery
with 5 nodes
faster than upstream backup
Parallel recovery with 10 nodes
faster than 5 nodes
![Page 25: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/25.jpg)
Parallel Straggler Recovery
• Straggler mitigation techniques– Detect slow tasks (e.g. 2X slower than other
tasks)– Speculatively launch more copies of the tasks
in parallel on other machines
• Masks the impact of slow nodes on the progress of the system
![Page 26: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/26.jpg)
Evaluation
![Page 27: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/27.jpg)
Spark Streaming
• Implemented using Spark processing engine*– Spark allows datasets to be stored in memory,
and automatically recovers them using lineage
• Modifications required to reduce jobs launching overheads from seconds to milliseconds
[ *Resilient Distributed Datasets - NSDI, 2012 ]
![Page 28: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/28.jpg)
How fast is Spark Streaming?
Can process 60M records/second on 100 nodes at 1 second latency
Tested with 100 4-core EC2 instances and 100 streams of text
28
0 50 1000
1
2
3
4WordCount
1 sec2 sec
# Nodes in Cluster
Clus
ter T
hrou
ghpu
t (G
B/s)
0 20 40 60 80100
0
2
4
6
8Grep
1 sec2 sec
# Nodes in Cluster
Clus
ter T
hhro
ughp
ut (G
B/s)
Count the sentences having
a keyword
WordCount over 30 sec sliding
window
![Page 29: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/29.jpg)
How does it compare to others?
Throughput comparable to other commercial stream processing systems
29
SystemThroughput per core [ records / sec ]
Spark Streaming 160k
Oracle CEP 125k
Esper 100k
StreamBase 30k
Storm 30k
[ Refer to the paper for citations ]
![Page 30: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/30.jpg)
Before
fai...
On fa
ilure
Next 3
s
Second 3
s
Third 3
s
Fourth 3
s
Fifth 3
s
Sixth 3
s0.0
1.0
2.0
3.0
4.0
5.0
30s ckpts, 20 nodes
30s ckpts, 40 nodes
10s ckpts, 20 nodes
10s ckpts, 40 nodes
Ba
tch
Pro
ce
ss
ing
Tim
e (
s)
How fast can it recover from faults?
Recovery time improves with more frequent checkpointing and more
nodesFailure
Word Count over 30 sec
window
![Page 31: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/31.jpg)
How fast can it recover from stragglers?
WordCount Grep0.0
1.0
2.0
3.0
4.0
0.55 0.54
3.02 2.40
1.000.64
No straggler Straggler, no speculation
Straggler, with speculation
Ba
tch
Pro
ce
ss
ing
Tim
e (
s)
Speculative execution of slow tasks mask the effect of stragglers
![Page 32: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/32.jpg)
Unification with Batch and Interactive Processing
![Page 33: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/33.jpg)
Unification with Batch and Interactive Processing
• Discretized Streams creates a single programming and execution model for running streaming, batch and interactive jobs
• Combine live data streams with historic dataliveCounts.join(historicCounts).map(...)
• Interactively query live streamsliveCounts.slice(“21:00”, “21:05”).count()
![Page 34: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/34.jpg)
App combining live + historic data
Mobile Millennium Project: Real-time estimation of traffic transit times using live and past GPS observations
34
0 20 40 60 800
400
800
1200
1600
2000
# Nodes in Cluster
GP
S o
bs
erv
ati
on
s p
er
se
co
nd
• Markov chain Monte Carlo simulations on GPS observations
• Very CPU intensive
• Scales linearly with cluster size
![Page 35: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/35.jpg)
Recent Related Work
• Naiad – Full cluster rollback on recovery
• SEEP – Extends continuous operators to enable parallel recovery, but does not handle stragglers
• TimeStream – Recovery similar to upstream backup
• MillWheel – State stored in BigTable, transactions per state update can be expensive
![Page 36: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP](https://reader035.fdocuments.us/reader035/viewer/2022062216/55cb3461bb61eb8d098b4693/html5/thumbnails/36.jpg)
Takeaways
• Discretized Streams model streaming computation as series of batch jobs– Uses simple techniques to exploit parallelism in
streams
– Scales to 100 nodes with 1 second latency
– Recovers from failures and stragglers very fast
• Spark Streaming is open source - spark-project.org– Used in production by ~ 10 organizations!
• Large scale streaming systems must handle faults and stragglers