Apache Flink Overview at SF Spark and Friends

39
Introducing Apache Flink™ @StephanEwen

Transcript of Apache Flink Overview at SF Spark and Friends

Page 1: Apache Flink Overview at SF Spark and Friends

IntroducingApache Flink™

@StephanEwen

Page 2: Apache Flink Overview at SF Spark and Friends

Flink’s Recent History

April 2014 April 2015Dec 2014

Top Level Project Graduation

0.7

0.6

0.5

0.9

0.9-m1

Page 3: Apache Flink Overview at SF Spark and Friends

What is Apache Flink?

3

Gelly

Table

ML

SA

MO

A

DataSet (Java/Scala) DataStream (Java/Scala)

Had

oop M

/R

Local Remote YARN Tez Embedded

Data

flow

Data

flow

(W

iP)

MR

QL

Table

Casc

ad

ing

(W

iP)

Streaming dataflow runtime

Zep

pelin

A Top-Level project of the Apache Software Foundation

Page 4: Apache Flink Overview at SF Spark and Friends

Program compilation

4

case class Path (from: Long, to: Long)val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }

Optimizer

Type extraction

stack

Task schedulin

g

Dataflow metadat

a

Pre-flight (Client)

Master Workers

Data Sourceorders.tbl

Filter

Map DataSourcelineitem.tbl

JoinHybrid Hash

buildHT probe

hash-part [0] hash-part [0]

GroupRed

sort

forward

Program

Dataflow GraphIndependent of batch or streaming job

deployoperators

trackintermediate

results

Page 5: Apache Flink Overview at SF Spark and Friends

Native workload support

5

Flink

Streaming topologies

Long batch pipelines

Machine Learning at scale

How can an engine natively support all these workloads?And what does "native" mean?

Graph Analysis

Low latency

resource utilization iterative algorithms

Mutable state

Page 6: Apache Flink Overview at SF Spark and Friends

E.g.: Non-native iterations

6

Step Step Step Step Step

Client

for (int i = 0; i < maxIterations; i++) {// Execute MapReduce job

}

Page 7: Apache Flink Overview at SF Spark and Friends

E.g.: Non-native streaming

7

streamdiscretizer

Job Job Job Job

while (true) { // get next few records // issue batch job}

Data Stream

Page 8: Apache Flink Overview at SF Spark and Friends

Native workload support

8

Flink

Streaming topologies

Long batchpipelines

Machine Learning at scale

How can an engine natively support all these workloads?And what does "native" mean?

Graph Analysis

Low latency

resource utilization iterative algorithms

Mutable state

Page 9: Apache Flink Overview at SF Spark and Friends

Ingredients for “native” support1. Execute everything as streams

Pipelined execution, backpressure or buffered, push/pull model

2. Special code paths for batchAutomatic job optimization, fault tolerance

3. Allow some iterative (cyclic) dataflows

4. Allow some mutable state

5. Operate on managed memoryMake data processing on the JVM robust

9

Page 10: Apache Flink Overview at SF Spark and Friends

10

Stream processing in Flink

Page 11: Apache Flink Overview at SF Spark and Friends

Stream platform architecture

11

- Gather and backup streams- Offer streams for

consumption- Provide stream recovery

- Analyze and correlate streams- Create derived streams and state- Provide these to downstream

systems

Server logs

Trxnlogs

Sensorlogs

Downstreamsystems

Page 12: Apache Flink Overview at SF Spark and Friends

What is a stream processor?

1. Pipelining2. Stream replay

3. Operator state4. Backup and restore

5. High-level APIs6. Integration with batch

7. High availability8. Scale-in and scale-out

12

Basics

State

App development

Large deployments

See http://data-artisans.com/stream-processing-with-flink.html

Page 13: Apache Flink Overview at SF Spark and Friends

Pipelining

13

Basic building block to “keep the data moving”

Note: pipelined systems do not usually transfer individual tuples, but buffers that batch several tuples!

Page 14: Apache Flink Overview at SF Spark and Friends

Operator state User-defined state

• Flink transformations (map/reduce/etc) are long-running operators, feel free to keep around objects

• Hooks to include in system's checkpoint

Windowed streams• Time, count, data-driven windows• Managed by the system (currently WiP)

14

Page 15: Apache Flink Overview at SF Spark and Friends

Streaming fault tolerance Ensure that operators see all events

• “At least once”• Solved by replaying a stream from a checkpoint, e.g.,

from a past Kafka offset

Ensure that operators do not perform duplicate updates to their state• “Exactly once”• Several solutions

15

Page 16: Apache Flink Overview at SF Spark and Friends

Exactly once approaches Discretized streams (Spark Streaming)

• Treat streaming as a series of small atomic computations• “Fast track” to fault tolerance, but does not separate

application logic (semantics) from recovery

MillWheel (Google Cloud Dataflow)• State update and derived events committed as atomic

transaction to a high-throughput transactional store• Needs a very high-throughput transactional store

Chandy-Lamport distributed snapshots (Flink)

16

Page 17: Apache Flink Overview at SF Spark and Friends

Distributed snapshots in Flink

Super-impose checkpointing mechanism on execution instead of using execution as the

checkpointing mechanism17

Page 18: Apache Flink Overview at SF Spark and Friends

18

JobManagerRegister checkpointbarrier on master

Replay will start from here

Page 19: Apache Flink Overview at SF Spark and Friends

19

JobManagerBarriers “push” prior events (assumes in-order delivery in individual channels) Operator checkpointing

starting

Operator checkpointing finished

Operator checkpointing in progress

Page 20: Apache Flink Overview at SF Spark and Friends

20

JobManager Operator checkpointing takes snapshot of state after data prior to barrier have updated the state. Checkpoints currently synchronous, WiP for incremental and asynchronous

State backup

Pluggable mechanism. Currently either JobManager (for small state) or file system (HDFS/Tachyon). WiP for in-memory grids

Page 21: Apache Flink Overview at SF Spark and Friends

21

JobManager

Operators with many inputs need to wait for all barriers to pass before they checkpoint their state

Page 22: Apache Flink Overview at SF Spark and Friends

22

JobManager

State snapshots at sinks signal successful end of this checkpoint

At failure, recover last checkpointed state and restart sources from last barrier guarantees at least once

State backup

Page 23: Apache Flink Overview at SF Spark and Friends

Benefits of Flink’s approach Data processing does not block

• Can checkpoint at any interval you like to balance overhead/recovery time

Separates business logic from recovery• Checkpointing interval is a config parameter, not a variable in the

program (as in discretization)

Can support richer windows• Session windows, event time, etc

Best of all worlds: true streaming latency, exactly-once semantics, and low overhead for recovery

23

Page 24: Apache Flink Overview at SF Spark and Friends

DataStream API

24

case class Word (word: String, frequency: Int)

val lines: DataStream[String] = env.fromSocketStream(...)

lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency") .print()

val lines: DataSet[String] = env.readTextFile(...)

lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print()

DataSet API (batch):

DataStream API (streaming):

Page 25: Apache Flink Overview at SF Spark and Friends

Roadmap Short-term (3-6 months)

• Graduate DataStream API from beta• Fully managed window and user-defined state with pluggable

backends• Table API for streams (towards StreamSQL)

Long-term (6+ months)• Highly available master• Dynamic scale in/out• FlinkML and Gelly for streams• Full batch + stream unification

25

Page 26: Apache Flink Overview at SF Spark and Friends

Batch processingBatch on Streaming

26

Page 27: Apache Flink Overview at SF Spark and Friends

27

Batch Pipelines

Page 28: Apache Flink Overview at SF Spark and Friends

Batch on Streaming Batch programs are a special kind of streaming

program

28

Infinite Streams Finite Streams

Stream Windows Global View

PipelinedData Exchange

Pipelined or Blocking Exchange

Streaming Programs Batch Programs

Page 29: Apache Flink Overview at SF Spark and Friends

29

Batch Pipelines

Data exchange (shuffle / broadcast)is mostly streamed

Some operators block (e.g. sorts / hash tables)

Page 30: Apache Flink Overview at SF Spark and Friends

30

Operators Execution Overlaps

Page 31: Apache Flink Overview at SF Spark and Friends

31

Memory Management

Page 32: Apache Flink Overview at SF Spark and Friends

Memory Management

32

Page 33: Apache Flink Overview at SF Spark and Friends

Smooth out-of-core performance

33More at: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html

Blue bars are in-memory, orange bars (partially) out-of-core

Page 34: Apache Flink Overview at SF Spark and Friends

Other features of FlinkThere is more…

34

Page 35: Apache Flink Overview at SF Spark and Friends

35

More Engine Features

Automatic Optimization /Static Code Analysis

Closed Loop Iterations

StatefulIterations

DataSource

orders.tbl

Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

broadcast

forward

Combine

GroupRed

sort

DataSource

orders.tbl

Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0] hash-part [0]

hash-part [0,1]

GroupRed

sort

forward

Page 36: Apache Flink Overview at SF Spark and Friends

Closing

36

Page 37: Apache Flink Overview at SF Spark and Friends

Apache Flink: community

37

Page 38: Apache Flink Overview at SF Spark and Friends

I Flink, do you?

38

If you find this exciting,

get involved and start a discussion on Flink‘s mailing list,

or stay tuned by

subscribing to [email protected],following flink.apache.org/blog, and

@ApacheFlink on Twitter

Page 39: Apache Flink Overview at SF Spark and Friends

39

flink-forward.org

Bay Area Flink meetupTomorrow