Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded...

45
Architecture of Flink's Streaming Runtime Robert Metzger @rmetzger_ [email protected]

Transcript of Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded...

Page 1: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Architecture of Flink's

Streaming Runtime

Robert Metzger

@rmetzger_

[email protected]

Page 2: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

What is stream processing

Real-world data is unbounded and is

pushed to systems

Right now: people are using the batch

paradigm for stream analysis (there was

no good stream processor available)

New systems (Flink, Kafka) embrace

streaming nature of data

2

Web server Kafka topic

Stream processing

Page 3: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

3

Flink is a stream processor with many faces

Streaming dataflow runtime

Page 4: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Flink's streaming runtime

4

Page 5: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Requirements for a stream processor

Low latency

• Fast results (milliseconds)

High throughput

• handle large data amounts (millions of events

per second)

Exactly-once guarantees

• Correct results, also in failure cases

Programmability

• Intuitive APIs

5

Page 6: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Pipelining

6

Basic building block to “keep the data moving”

• Low latency• Operators push

data forward• Data shipping as

buffers, not tuple-wise

• Natural handling of back-pressure

Page 7: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Fault Tolerance in streaming

at least once: ensure all operators see all events

• Storm: Replay stream in failure case

Exactly once: Ensure that operators do not perform duplicate updates to their state

• Flink: Distributed Snapshots

• Spark: Micro-batches on batch runtime

7

Page 8: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Flink’s Distributed Snapshots

Lightweight approach of storing the state

of all operators without pausing the

execution

high throughput, low latency

Implemented using barriers flowing

through the topology

8

Kafka

Consumer

offset = 162

Element

Counter

value = 152

Operator stateData Stream

barrier

Before barrier =part of the snapshot

After barrier =Not in snapshot

(backup till next snapshot)

Page 9: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

9

Page 10: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

10

Page 11: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

11

Page 12: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

12

Page 13: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Best of all worlds for streaming

Low latency• Thanks to pipelined engine

Exactly-once guarantees• Distributed Snapshots

High throughput• Controllable checkpointing overhead

Separates app logic from recovery• Checkpointing interval is just a config parameter

13

Page 14: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Throughput of distributed grep

14

Data Generator

“grep” operator

30 machines, 120 cores

0

20.000.000

40.000.000

60.000.000

80.000.000

100.000.000

120.000.000

140.000.000

160.000.000

180.000.000

200.000.000

Flink, no faulttolerance

Flink, exactlyonce (5s)

Storm, nofault tolerance

Storm, micro-batches

aggregate throughput of 175 million elements per second

aggregate throughput of 9 million elementsper second

• Flink achieves 20x higher throughput

• Flink throughput almost the same with and without exactly-once

Page 15: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Aggregate throughput for stream record

grouping

15

0

10.000.000

20.000.000

30.000.000

40.000.000

50.000.000

60.000.000

70.000.000

80.000.000

90.000.000

100.000.000

Flink, nofault

tolerance

Flink,exactly

once

Storm, nofault

tolerance

Storm, atleast once

aggregate throughput of 83 million elements per second

8,6 million elements/s

309k elements/s Flink achieves 260x higher throughput with fault tolerance

30 machines, 120 cores Network

transfer

Page 16: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Latency in stream record grouping

16

Data Generator

Receiver:Throughput /

Latency measure

• Measure time for a record to travel from source to sink

0,00

5,00

10,00

15,00

20,00

25,00

30,00

Flink, nofault

tolerance

Flink, exactlyonce

Storm, atleast once

Median latency

25 ms

1 ms

0,00

10,00

20,00

30,00

40,00

50,00

60,00

Flink, nofault

tolerance

Flink,exactly

once

Storm, atleastonce

99th percentile latency

50 ms

Page 17: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

17

Page 18: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Exactly-Once with YARN Chaos Monkey

Validate exactly-once guarantees with

state-machine

18

Page 19: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

“Faces” of Flink

19

Page 20: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Faces of a stream processor

20

Stream

processing

Batch

processingMachine Learning at scale

Graph Analysis

Streaming dataflow runtime

Page 21: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

The Flink Stack

21

Streaming dataflow runtime

Specialized Abstractions / APIs

Core APIs

Flink Core Runtime

Deployment

Page 22: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

APIs for stream and batch

22

case class Word (word: String, frequency: Int)

val lines: DataStream[String] = env.fromSocketStream(...)

lines.flatMap {line => line.split(" ").map(word => Word(word,1))}

.window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS))

.groupBy("word").sum("frequency").print()

val lines: DataSet[String] = env.readTextFile(...)

lines.flatMap {line => line.split(" ").map(word => Word(word,1))}

.groupBy("word").sum("frequency")

.print()

DataSet API (batch):

DataStream API (streaming):

Page 23: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

The Flink Stack

23

Streaming dataflow runtime

DataSet (Java/Scala) DataStream (Java/Scala)

Experimental Python API also

available

Data Sourceorders.tbl

FilterMap DataSource

lineitem.tbl

JoinHybrid Hash

buildHT probe

hash-part [0] hash-part [0]

GroupRedsort

forward

API independent Dataflow

Graph representation

Batch Optimizer Graph Builder

Page 24: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Batch is a special case of streaming

Batch: run a bounded stream (data set) on

a stream processor

Form a global window over the entire data

set for join or grouping operations

24

Page 25: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Batch-specific optimizations

Managed memory on- and off-heap

• Operators (join, sort, …) with out-of-core

support

• Optimized serialization stack for user-types

Cost-based Optimizer

• Job execution depends on data size

25

Page 26: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

The Flink Stack

26

Streaming dataflow runtime

Specialized Abstractions / APIs

Core APIs

Flink Core Runtime

Deployment

DataSet (Java/Scala) DataStream

Page 27: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

FlinkML: Machine Learning

API for ML pipelines inspired by scikit-learn

Collection of packaged algorithms • SVM, Multiple Linear Regression, Optimization, ALS, ...

27

val trainingData: DataSet[LabeledVector] = ...val testingData: DataSet[Vector] = ...

val scaler = StandardScaler()val polyFeatures = PolynomialFeatures().setDegree(3)val mlr = MultipleLinearRegression()

val pipeline = scaler.chainTransformer(polyFeatures).chainPredictor(mlr)

pipeline.fit(trainingData)

val predictions: DataSet[LabeledVector] = pipeline.predict(testingData)

Page 28: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Gelly: Graph Processing

Graph API and library

Packaged algorithms• PageRank, SSSP, Label Propagation, Community

Detection, Connected Components

28

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

Graph<Long, Long, NullValue> graph = ...

DataSet<Vertex<Long, Long>> verticesWithCommunity = graph.run(new LabelPropagation<Long>(30)).getVertices();

verticesWithCommunity.print();

env.execute();

Page 29: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Flink Stack += Gelly, ML

29

Gelly

ML

DataSet (Java/Scala) DataStream

Streaming dataflow runtime

Page 30: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Integration with other systems

30

SA

MO

A

DataSet DataStream

Ha

do

op

M/R

Google

Data

flow

Cascadin

g

Sto

rm

Zeppelin

• Use Hadoop Input/Output Formats• Mapper / Reducer implementations• Hadoop’s FileSystem implementations

• Run applications implemented against Google’s Data Flow APIon premise with Flink

• Run Cascading jobs on Flink, with almost no code change• Benefit from Flink’s vastly better performance than

MapReduce

• Interactive, web-based data exploration

• Machine learning on data streams

• Compatibility layer for running Storm code• FlinkTopologyBuilder: one line replacement for

existing jobs• Wrappers for Storm Spouts and Bolts• Coming soon: Exactly-once with Storm

Page 31: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Deployment options

Gelly

Table

ML

SAMOA

Da

taS

et(J

ava

/Sca

la)

Da

taS

tre

am

Hadoop

Lo

ca

lC

luste

rY

AR

NTe

zE

mb

ed

de

d

Dataflow

Dataflow

MRQL

Table

Cascading

Str

eam

ing d

ata

flow

runtim

e

Storm

Zeppelin

• Start Flink in your IDE / on your machine• Local debugging / development using the

same code as on the cluster

• “bare metal” standalone installation of Flinkon a cluster

• Flink on Hadoop YARN (Hadoop 2.2.0+)• Restarts failed containers• Support for Kerberos-secured YARN/HDFS

setups

Page 32: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

The full stack

32

Gelly

Table

ML

SA

MO

A

DataSet (Java/Scala) DataStream

Ha

do

op

M/R

Local Cluster Yarn Tez Embedded

Data

flow

Data

flow

(W

iP)

MR

QL

Table

Ca

sca

din

gStreaming dataflow runtime

Sto

rm (

WiP

)

Zeppelin

Page 33: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Closing

33

Page 34: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

tl;dr Summary

Flink is a software stack of

Streaming runtime

• low latency

• high throughput

• fault tolerant, exactly-once data processing

Rich APIs for batch and stream processing

• library ecosystem

• integration with many systems

A great community of devs and users

Used in production34

Page 35: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

What is currently happening?

Features in progress:

• Master High Availability

• Vastly improved monitoring GUI

• Watermarks / Event time processing /

Windowing rework

• Graduate Streaming API out of Beta

0.10.0-milestone-1 is currently voted

35

Page 36: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

How do I get started?

36

Mailing Lists: (news | user | dev)@flink.apache.org

Twitter: @ApacheFlink

Blogs: flink.apache.org/blog, data-artisans.com/blog/

IRC channel: irc.freenode.net#flink

Start Flink on YARN in 4 commands:# get the hadoop2 package from the Flink download page at# http://flink.apache.org/downloads.htmlwget <download url>tar xvzf flink-0.9.1-bin-hadoop2.tgzcd flink-0.9.1/./bin/flink run -m yarn-cluster -yn 4 ./examples/flink-java-examples-0.9.1-WordCount.jar

Page 37: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

flink.apache.org 37

Flink Forward: 2 days conference with

free training in Berlin, Germany

• Schedule: http://flink-forward.org/?post_type=day

Page 38: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Appendix

38

Page 39: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Managed (off-heap) memory and out-of-

core support

39Memory runs out

Page 40: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Cost-based Optimizer

40

DataSourceorders.tbl

Filter

Map DataSourcelineitem.tbl

JoinHybrid Hash

buildHT probe

broadcast forward

Combine

GroupRed

sort

DataSourceorders.tbl

Filter

Map DataSourcelineitem.tbl

JoinHybrid Hash

buildHT probe

hash-part [0] hash-part [0]

hash-part [0,1]

GroupRed

sort

forwardBest plan

depends on

relative sizes

of input files

Page 41: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

41

case class Path (from: Long, to:Long)val tc = edges.iterate(10) {

paths: DataSet[Path] =>val next = paths

.join(edges)

.where("to")

.equalTo("from") {(path, edge) =>

Path(path.from, edge.to)}.union(paths).distinct()

next}

Optimizer

Type extraction

stack

Task

scheduling

Dataflow

metadata

Pre-flight (Client)

JobManagerTaskManagers

Data

Sourceorders.tbl

Filter

MapDataSourc

elineitem.tbl

JoinHybrid Hash

build

HTprobe

hash-part [0] hash-part [0]

GroupRed

sort

forward

Program

Dataflow

Graph

deploy

operators

track

intermediate

results

Loca

lC

lust

er:

YAR

N, S

tan

dal

on

e

Page 42: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Iterative processing in Flink

Flink offers built-in iterations and delta

iterations to execute ML and graph

algorithms efficiently

42

map

join sum

ID1

ID2

ID3

Page 43: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

Example: Matrix Factorization

43

Factorizing a matrix with28 billion ratings forrecommendations

More at: http://data-artisans.com/computing-recommendations-with-flink.html

Page 44: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

44

Batch

aggregationExecutionGraph

JobManager

TaskManager 1

TaskManager 2

M1

M2

RP

1R

P2

R1

R2

1

2 3a3b

4a

4b

5a

5b

"Blocked" result partition

Page 45: Architecture of Flink's Streaming Runtime · What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream

45

Streaming

window

aggregationExecutionGraph

JobManager

TaskManager 1

TaskManager 2

M1

M2

RP

1R

P2

R1

R2

1

2 3a3b

4a

4b

5a

5b

"Pipelined" result partition