What to Expect
High-Availability of Master Node (JobManager)
Live Monitoring Event-time, watermarks and
windowing improvements Demo: Fault Tolerance
2
These are only the highlights, more stuff is being worked on!
Some Details
Flink uses ZooKeeper™ for two things:• Leader selection (in case of multiple
JobManagers)• Reliable Storage of Dataflow graph and
checkpoint metadata (more on that later)
6
Live Monitoring
Before:• Accumulators only available after Job
finishes
Now:• Accumulators updated while Job is
running• System accumulators (number of
bytes/records processed…)
8
Why all the Fuss?
11
WindowOperator112131143
Payload: 0x45FD
Timestamp: 13
Window Window
Flow of Data
Elements do not arrive ordered by Timestamp.
? ?
Processing Time Windows
12
WindowOperator112131143
Payload: 0x45FD
Timestamp: 13
1143
Window
11213
Window
Flow of Data
Elements do not arrive ordered by Timestamp.
Event Time Windows
13
WindowOperator112131143
Payload: 0x45FD
Timestamp: 13
Flow of Data
Elements do not arrive ordered by Timestamp.
111314
Window
312
Window
Problem: How do you know when to process
windows?
Some Details
Window Operator waits for watermarks
Upon Watermark Arrival we can process elements with timestamps lower than the watermark
Operators forward watermarks once they know they cannot emit elements with lower timestamp
15
Streaming Fault Tolerance
Ensure that operators see all events• “At least once”• Solved by replaying a stream from a
checkpoint, e.g., from a past Kafka offset
Ensure that operators do not perform duplicate updates to their state• “Exactly once”• Several solutions
17
Exactly-Once Approaches
Discretized streams (Spark Streaming)• Treat streaming as a series of small atomic
computations• “Fast track” to fault tolerance, but restricts
computational and programming model (e.g., cannot mutate state across “mini-batches”, window functions correlated with mini-batch size)
MillWheel (Google Cloud Dataflow)• State update and derived events committed as atomic
transaction to a high-throughput transactional store• Requires a very high-throughput transactional store
Chandy-Lamport distributed snapshots (Flink)18
Best of all Worlds for Streaming
Low latency• Thanks to pipelined engine
Exactly-once guarantees• Variation of Chandy-Lamport
High throughput• Controllable checkpointing overhead
Separates app logic from recovery• Checkpointing interval is just a config parameter
23
I Flink, do you?
26
If you find this exciting,
get involved and start a discussion on Flink‘s mailing list,
or stay tuned by
subscribing to [email protected],following flink.apache.org/blog, and
@ApacheFlink on Twitter
Top Related