Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
-
Upload
till-rohrmann -
Category
Technology
-
view
355 -
download
0
Transcript of Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing with Apache Flink®
Till Rohrmann@stsffap
GOTO Berlin 20171
2
Original creators ofApache Flink®
dA Platform 2Open Source Apache Flink + dA Application Manager
What changes faster? Data or Query?
3
Data changes slowlycompared to fastchanging queries
ad-hoc queries, data exploration, ML training and
(hyper) parameter tuning
Batch ProcessingUse Case
Data changes fastapplication logic
is long-lived
continuous applications,data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream ProcessingUse Case
Batch Processing
4
Stream Processing
5
6
Apache Flink in a Nutshell
Apache Flink in a Nutshell
7
Queries
Applications
Devices
etc.
Database
Stream
File/ObjectStorage
Stateful computations over streamsreal-time and historic
fast, scalable, fault tolerant, in-memory,event time, large state, exactly-once
HistoricData
Streams
Application
8
Event Streams State (Event) Time Snapshots
The Core Building Blocks
real-time andhindsight
complexbusiness logic
consistency without-of-order data
and late data
forking /versioning /time-travel
Powerful Abstractions
9
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch Data Processing
High-levelAnalytics API
Stateful Event-Driven Applications
val stats = stream.keyBy("sensor").timeWindow(Time.seconds(5)).sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {// work with event and state(event, state.value) match { … }
out.collect(…) // emit eventsstate.update(…) // modify state
// schedule a timer callbackctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Layered abstractions tonavigate simple to complex use cases
Hardened at scale
10
Athena X Streaming SQLPlatform Service
Streaming Platform as a Service
Fraud detectionStreaming Analytics Platform
100s jobs, 1000s nodes, TBs statemetrics, analytics, real time MLStreaming SQL as a platform
11
Distributed applicationinfrastructure
Good old centralized architecture
12
The big meancentral database
$$$
The grumpyDBA
Application Application Application Application
Write log and tables
13FromtheApacheCassandradocs
Modern distributed app. architecture
14
EventSourcing
TurningtheDBinsideout
CQRS
Modern distributed app. architecture
15
Application
Sensor
APIsApplication
Application
Application
The limit to what you can do is how sophisticatedcan you compute over the stream!
Boils down to: How well do you handle state and time
16
A Flink-favored approach
Event Sourcing + Memory Image
17
event logpersists events(temporarily)
event /command
Process
main memory
update localvariables/structures
periodically snapshot the
memory
Event Sourcing + Memory Image
18
Recovery: Restore snapshot and replay events since snapshot
event logpersists events(temporarily)
Process
Distributed Memory Image
19
Distributed application, many memory images.Snapshots are all consistent together.
Stateful Event & Stream Processing
20
Scalable embedded state
Access at memory speed &scales with parallel operators
Compute, State, and Storage
21
Classic tiered architecture Streaming architecture
databaselayer
computelayer
application state+ backup
compute+
stream storageand
snapshot storage(backup)
application state
Performance
22
synchronous reads/writesacross tier boundary
asynchronous writesof large blobs
all modificationsare local
Classic tiered architecture Streaming architecture
Consistency
23
distributed transactions
at scale typicallyat-most / at-least once
exactly onceper state =1 =1
Classic tiered architecture Streaming architecture
Scaling a Service
24
separately provision additionaldatabase capacity
provision computeand state together
Classic tiered architecture Streaming architecture
provisioncompute
Rolling out a new Service
25
provision a new database(or add capacity to an existing one)
simply occupies someadditional backup space
Classic tiered architecture
Streaming architecture
provision computeand state together
What users built on checkpoints…§ Upgrades and Rollbacks§ Cross Datacenter Failover§ State Archiving§ Application Migration§ Spot Instance Region Chasing§ A/B testing§ …
26
27
What is the next wave ofstream processing
applications?
What changes faster? Data or Query?
28
Data changes slowlycompared to fastchanging queries
ad-hoc queries, data exploration, ML training and
tuning
Batch ProcessingUse Case
Data changes fastapplication logic
is long-lived
continuous applications,data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream ProcessingUse Case
Analytics & Business Logic
29
Stream
Source analytics metricsBusiness
Logic
Stream Processor
events
ApplicationDatabase
Blending Analytics & Business Logic
30
Stream
Source StreamingApplication
Stream Processor
events
RPCs
(Actor) Messages
…
AthenaX by Uber
31
32
Can one build an entire sophisticatedweb application (say a social network)
on a stream processor?
(Yes, we can!™)
33
Social network implemented using event sourcing and CQRS (Command Query Responsibility Segregation) on Kafka/Flink/Elasticsearch/Redis
More:https://data-artisans.com/blog/drivetribe-cqrs-apache-flink
@
34
The next wave of stream processing applications…
… is all types of statefulapplications that react to
data and time!
35
Stateful stream applications
versioning, upgrading,rollback, duplicating,
migrating, …
Continuous applications
The dA Platform Architecture
36
dA Platform 2
Apache FlinkStateful stream processing
KubernetesContainer platform
Logging
Streams from Kafka, Kinesis,
S3, HDFS, Databases, ...
dA Application
ManagerApplication lifecycle
management
Metrics
CI/CD
Real-timeAnalytics
Anomaly- &Fraud Detection
Real-timeData Integration
ReactiveMicroservices (and more)
Versioned Applications, not Jobs/Jars
37
Stream ProcessingApplication
Version 3
Version 2
Version 1
Code and ApplicationSnapshot
upgrade
upgrade
New Application
Version 3a
Version 2afork /
duplicate
Deployments, not Flink Clusters
38
Testing / QA Kubernetes Cluster Production Kubernetes Cluster
Threat MetricsApp. Testing
Activity MonitorApplication
Fraud DetectionApplication
Fraud DetectionApp. Testing
Hooks for CI/CD pipelines
39
Kubernetes Cluster
ApplicationVersion 1
CI Service
dA Application
Manager
pushupdate
triggerCI
upgradeAPI
statefulupgrade
ApplicationVersion 2
40
Thank you!@stsffap@ApacheFlink @dataArtisans
We are hiring!data-artisans.com/careers
41