Apache flink 1.0.0 overview

25
What’s new in Apache Flink TM 1.0 Kostas Tzoumas @kostas_tzoumas

Transcript of Apache flink 1.0.0 overview

Page 1: Apache flink 1.0.0 overview

What’s new in Apache FlinkTM 1.0

Kostas Tzoumas@kostas_tzoumas

Page 2: Apache flink 1.0.0 overview

Flink 1.0• March 8, 2016

• First release in 1.x.y series

• Initiates backwards compatibility for selected APIs

• More than 64 contributors

• More than 450 JIRAs resolved

Page 3: Apache flink 1.0.0 overview

Flink 1.0: major features

• Out of core state

• Savepoints

• CEP library

• Improved monitoring & Kafka 0.9 support

Page 4: Apache flink 1.0.0 overview

Interface stability

Page 5: Apache flink 1.0.0 overview

Out of core state

Page 6: Apache flink 1.0.0 overview

Out of core state• Alternative to in-memory state

• Powered by RocksDB instances in Flink TMs

• Enabled by using the RocksDBStateBackend

• State limited by disk space only

• State checkpoints save RocksDB databases in reliable store

Page 7: Apache flink 1.0.0 overview

Savepoints

Page 8: Apache flink 1.0.0 overview

Production deployments

• Maintaining stateful applications in production settings comes with its own challenges

• Failures, code upgrades, cluster maintenance, …

• Streaming jobs cannot be simply stopped and restarted

Page 9: Apache flink 1.0.0 overview

Reminder: fault tolerance

• At least once, at most once, exactly once

• Flink guarantees exactly-once processing

• Flink guarantees end to end exactly-once with selected sources and sinks

• e.g., Kafka —> Flink —> HDFS

Page 10: Apache flink 1.0.0 overview

How? Checkpoints• Flink guarantees fault tolerance by regularly taking

checkpoints of the application state without ever stopping the execution

• At failure, input stream is rewinded to the logical time of the last checkpoint

Page 11: Apache flink 1.0.0 overview

Introducing savepoints

• A savepoint is a Flink checkpoint that (1) is taken by the user, (2) is accessible externally, and (3) never expires

• Command line save & resume interface

• Save: flink savepoint <JobID>

• Resume: flink run -s <path/to/savepoint> <jobJar>

Page 12: Apache flink 1.0.0 overview

Savepoints and versions

• A savepoint saves a version of a stateful application at a well-defined time

• E.g.: take snapshots of one application at well-defined times

Page 13: Apache flink 1.0.0 overview

“Like git for state” • Branch off from savepoints creating a tree of

running application versions

Page 14: Apache flink 1.0.0 overview

Essential for production deployments

• Application code upgrades

• Flink version upgrades

• Maintenance, migration, debugging

• What-if simulations

• A/B testing

• Time travel

Page 15: Apache flink 1.0.0 overview

Complex Event Processing

Page 16: Apache flink 1.0.0 overview

FlinkCEP

• What is Complex Event Processing?

• A catch-all term

• In our context: easily detect patterns in streams

Page 17: Apache flink 1.0.0 overview
Page 18: Apache flink 1.0.0 overview

Pattern API

Page 19: Apache flink 1.0.0 overview
Page 20: Apache flink 1.0.0 overview
Page 21: Apache flink 1.0.0 overview

Other features in 1.0• Support for Kafka 0.9 API (and hence MapR

Streams)

• Monitoring console: job submission, checkpoint statistics, detecting bottlenecks

• See http://flink.apache.org/news/2016/03/08/release-1.0.0.html

Page 22: Apache flink 1.0.0 overview

Closing

Page 23: Apache flink 1.0.0 overview

Summary

• Flink 1.0: Initiating backwards compatibility and pushing the envelope even further for production streaming deployments

Page 24: Apache flink 1.0.0 overview

What’s next• SQL

• Dynamic scaling (+ savepoints)

• Hybrid in-memory/out-of-core state backend

• Query-able state

• Support for Apache Mesos

• More connectors and sinks (Kinesis, Cassandra, …)

Page 25: Apache flink 1.0.0 overview

Join the community• Follow: @ApacheFlink, @dataArtisans

• Read: flink.apache.org/blog, data-artisans.com/blog

• Subscribe: (news | dev | user)@flink.apache.org