Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache...

73
Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Transcript of Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache...

Page 1: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Large scale stream processing with Apache FlinkNikolay Stoitsev

Sr. Software Engineer at Uber Tech Sofia

Page 2: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Stream Processing?

Page 3: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Stream Processing?User Interaction Logs

Page 4: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Stream Processing?User Interaction Logs

Application Logs

Page 5: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Stream Processing?User Interaction Logs

Application Logs

Sensor Data

Page 6: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Stream Processing?User Interaction Logs

Application Logs

Sensor Data

Database Commit Logs

Page 7: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Infinite Dataset

Page 8: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Producer Stream

Page 9: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Producer Stream HDFS

Page 10: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Producer Stream HDFS

Hive

Page 11: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Producer Stream HDFS

Hive

Big Latency

Page 12: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Producer Stream

HDFS

Real-time service

Page 13: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Apache Stormstorm.apache.org

Page 14: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

High-latency & accurate

vs.Low-latency & approximation

Page 15: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Lambda architecture

Page 16: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

https://www.oreilly.com/ideas/questioning-the-lambda-architecture

Page 17: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Kappa Architecture

Page 18: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Use Apache KafkaDurable, scalable, fault-tolerant

Page 19: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Producer KafkaStream

Processor

Page 20: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 21: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 22: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Metrics we want to track

Net payout

Daily items sold

Weekly items sold

Order acceptance rate

Order preparation speed

Item rating

Page 23: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Real time

Page 24: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Scalable

Page 25: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Granular

Page 26: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Highly available

Page 27: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Order Stream

Payment Stream

User Rating Stream

Page 28: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Order Stream

Payment Stream

User Rating Stream

Stream Processor

OLAP

Page 29: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

samza.apache.org

Page 30: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Apache Flinkflink.apache.org

Page 31: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Everything is a batch

vs.Everything is a stream

Page 32: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Single JVM Cluster Cloud

Runtime

DataSet API DataStream API

Page 33: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Dataflow graph

Page 34: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Source

Source

Operator

Operator

Operator Sinc

OLAP

Page 35: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html

Page 36: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html

Page 37: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html

Page 38: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Flink Program

Optimizer Graph Builder

Client

Page 39: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Page 40: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Page 41: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Fault tolerant

Page 42: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Page 43: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Lightweight Asynchronous Snapshots for Distributed Dataflows

Paris Carbone, Gyula Fóra, Stephan EwenSeif HaridiKostas Tzoumas

Page 44: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Barrier Msg Msg Barrier Msg Msg Barrier

Operator

Page 45: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Barrier Msg Msg BarrierMsg Msg

Operator

Msg

Snapshot Store

Page 46: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Exactly Once Processing

Page 47: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Can handle very large state

Page 48: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Page 49: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Job Manager

Job Manager

Zookeeper

Page 50: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Job Manager

Job Manager

Zookeeper

Page 51: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Flink Program

Optimizer Graph Builder

Client

Task Manager Task Manager

Snapshot Store

Job Manager

Job Manager

Zookeeper

Page 52: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Joining Streams

Page 53: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Order Stream

User Rating Stream

Page 54: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Order Stream

User Rating Stream

Page 55: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Order Stream

User Rating Stream

Local Join

Local Join

Page 56: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Order Stream

User Rating Stream

Local Join

Local Join

Page 57: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Apache Flink● Can join streams

● Fault tolerant

● Exactly Once Processing

● Combines stream and batch processing

Page 58: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

… but it requires Java/Scala code

Page 59: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Scalable, efficient and robust

Page 60: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

github.com/uber/AthenaX

Page 61: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

SQL → what data to analyze

Flink → how to analyze it

Page 62: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 63: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 64: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 65: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 66: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 67: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 68: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia
Page 69: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Resource estimation and auto scaling

Page 70: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Monitoring and automatic failure recovery

Page 71: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

eng.uber.com/athenax

Page 72: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia

Thanks!Nikolay Stoitsev @ Uber

Page 73: Large scale stream processing with Apache Flink€¦ · Large scale stream processing with Apache Flink Nikolay Stoitsev Sr. Software Engineer at Uber Tech Sofia