Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

41
Modern Stream Processing with Apache Flink® Till Rohrmann @stsffap GOTO Berlin 2017 1

Transcript of Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Page 1: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Modern Stream Processing with Apache Flink®

Till Rohrmann@stsffap

GOTO Berlin 20171

Page 2: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

2

Original creators ofApache Flink®

dA Platform 2Open Source Apache Flink + dA Application Manager

Page 3: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

What changes faster? Data or Query?

3

Data changes slowlycompared to fastchanging queries

ad-hoc queries, data exploration, ML training and

(hyper) parameter tuning

Batch ProcessingUse Case

Data changes fastapplication logic

is long-lived

continuous applications,data pipelines, standing queries,

anomaly detection, ML evaluation, …

Stream ProcessingUse Case

Page 4: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Batch Processing

4

Page 5: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Stream Processing

5

Page 6: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

6

Apache Flink in a Nutshell

Page 7: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Apache Flink in a Nutshell

7

Queries

Applications

Devices

etc.

Database

Stream

File/ObjectStorage

Stateful computations over streamsreal-time and historic

fast, scalable, fault tolerant, in-memory,event time, large state, exactly-once

HistoricData

Streams

Application

Page 8: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

8

Event Streams State (Event) Time Snapshots

The Core Building Blocks

real-time andhindsight

complexbusiness logic

consistency without-of-order data

and late data

forking /versioning /time-travel

Page 9: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Powerful Abstractions

9

Process Function (events, state, time)

DataStream API (streams, windows)

Stream SQL / Tables (dynamic tables)

Stream- & Batch Data Processing

High-levelAnalytics API

Stateful Event-Driven Applications

val stats = stream.keyBy("sensor").timeWindow(Time.seconds(5)).sum((a, b) -> a.add(b))

def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {// work with event and state(event, state.value) match { … }

out.collect(…) // emit eventsstate.update(…) // modify state

// schedule a timer callbackctx.timerService.registerEventTimeTimer(event.timestamp + 500)

}

Layered abstractions tonavigate simple to complex use cases

Page 10: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Hardened at scale

10

Athena X Streaming SQLPlatform Service

Streaming Platform as a Service

Fraud detectionStreaming Analytics Platform

100s jobs, 1000s nodes, TBs statemetrics, analytics, real time MLStreaming SQL as a platform

Page 11: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

11

Distributed applicationinfrastructure

Page 12: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Good old centralized architecture

12

The big meancentral database

$$$

The grumpyDBA

Application Application Application Application

Page 13: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Write log and tables

13FromtheApacheCassandradocs

Page 14: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Modern distributed app. architecture

14

EventSourcing

TurningtheDBinsideout

CQRS

Page 15: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Modern distributed app. architecture

15

Application

Sensor

APIsApplication

Application

Application

The limit to what you can do is how sophisticatedcan you compute over the stream!

Boils down to: How well do you handle state and time

Page 16: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

16

A Flink-favored approach

Page 17: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Event Sourcing + Memory Image

17

event logpersists events(temporarily)

event /command

Process

main memory

update localvariables/structures

periodically snapshot the

memory

Page 18: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Event Sourcing + Memory Image

18

Recovery: Restore snapshot and replay events since snapshot

event logpersists events(temporarily)

Process

Page 19: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Distributed Memory Image

19

Distributed application, many memory images.Snapshots are all consistent together.

Page 20: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Stateful Event & Stream Processing

20

Scalable embedded state

Access at memory speed &scales with parallel operators

Page 21: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Compute, State, and Storage

21

Classic tiered architecture Streaming architecture

databaselayer

computelayer

application state+ backup

compute+

stream storageand

snapshot storage(backup)

application state

Page 22: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Performance

22

synchronous reads/writesacross tier boundary

asynchronous writesof large blobs

all modificationsare local

Classic tiered architecture Streaming architecture

Page 23: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Consistency

23

distributed transactions

at scale typicallyat-most / at-least once

exactly onceper state =1 =1

Classic tiered architecture Streaming architecture

Page 24: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Scaling a Service

24

separately provision additionaldatabase capacity

provision computeand state together

Classic tiered architecture Streaming architecture

provisioncompute

Page 25: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Rolling out a new Service

25

provision a new database(or add capacity to an existing one)

simply occupies someadditional backup space

Classic tiered architecture

Streaming architecture

provision computeand state together

Page 26: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

What users built on checkpoints…§ Upgrades and Rollbacks§ Cross Datacenter Failover§ State Archiving§ Application Migration§ Spot Instance Region Chasing§ A/B testing§ …

26

Page 27: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

27

What is the next wave ofstream processing

applications?

Page 28: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

What changes faster? Data or Query?

28

Data changes slowlycompared to fastchanging queries

ad-hoc queries, data exploration, ML training and

tuning

Batch ProcessingUse Case

Data changes fastapplication logic

is long-lived

continuous applications,data pipelines, standing queries,

anomaly detection, ML evaluation, …

Stream ProcessingUse Case

Page 29: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Analytics & Business Logic

29

Stream

Source analytics metricsBusiness

Logic

Stream Processor

events

ApplicationDatabase

Page 30: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Blending Analytics & Business Logic

30

Stream

Source StreamingApplication

Stream Processor

events

RPCs

(Actor) Messages

Page 31: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

AthenaX by Uber

31

Page 32: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

32

Can one build an entire sophisticatedweb application (say a social network)

on a stream processor?

(Yes, we can!™)

Page 33: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

33

Social network implemented using event sourcing and CQRS (Command Query Responsibility Segregation) on Kafka/Flink/Elasticsearch/Redis

More:https://data-artisans.com/blog/drivetribe-cqrs-apache-flink

@

Page 34: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

34

The next wave of stream processing applications…

… is all types of statefulapplications that react to

data and time!

Page 35: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

35

Stateful stream applications

versioning, upgrading,rollback, duplicating,

migrating, …

Continuous applications

Page 36: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

The dA Platform Architecture

36

dA Platform 2

Apache FlinkStateful stream processing

KubernetesContainer platform

Logging

Streams from Kafka, Kinesis,

S3, HDFS, Databases, ...

dA Application

ManagerApplication lifecycle

management

Metrics

CI/CD

Real-timeAnalytics

Anomaly- &Fraud Detection

Real-timeData Integration

ReactiveMicroservices (and more)

Page 37: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Versioned Applications, not Jobs/Jars

37

Stream ProcessingApplication

Version 3

Version 2

Version 1

Code and ApplicationSnapshot

upgrade

upgrade

New Application

Version 3a

Version 2afork /

duplicate

Page 38: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Deployments, not Flink Clusters

38

Testing / QA Kubernetes Cluster Production Kubernetes Cluster

Threat MetricsApp. Testing

Activity MonitorApplication

Fraud DetectionApplication

Fraud DetectionApp. Testing

Page 39: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

Hooks for CI/CD pipelines

39

Kubernetes Cluster

ApplicationVersion 1

CI Service

dA Application

Manager

pushupdate

triggerCI

upgradeAPI

statefulupgrade

ApplicationVersion 2

Page 40: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

40

Thank you!@stsffap@ApacheFlink @dataArtisans

Page 41: Modern Stream Processing With Apache Flink @ GOTO Berlin 2017

We are hiring!data-artisans.com/careers

41