2
What is Apache Flink?• Project undergoing incubation in the Apache Software
Foundation
• Originating from the Stratosphere research project started at TU Berlin in 2009
• http://flink.incubator.apache.org
• 59 contributors (doubled in ~ 4 months)
• Has a cool squirrel for a logo
3
What is Apache Flink?
Master
Worker
Worker
Flink Cluster
Analytical Program
Flink Client &Optimizer
4
This talk
• Introduction to Flink
• Flink from a user perspective
• Tour of Flink internals
• Flink roadmap and closing
5
Open Source Data Processing Landscape
5
MapReduce
Hive
Flink
Spark Storm
Yarn Mesos
HDFS
Mahout
Cascading
Tez
Pig
Data processing engines
App and resource management
Applications
Storage, streams KafkaHBase
Crunch
…
6
Common API
StorageStreams
Hybrid Batch/Streaming Runtime
HDFS Files S3
ClusterManager
YARN EC2 Native
Flink Optimizer
Scala API(batch)
Graph API („Spargel“)
JDBC Redis RabbitMQKafkaAzure …
JavaCollections
Streams Builder
Apache Tez
Python API
Java API(streaming)
Apache MRQL
Batc
h
Stre
amin
g
Java API(batch)
Local Execution
7
Flink APIs
8
Parallel Collections / Distributed Data Sets
DataSetA
DataSetB
DataSetC
A (1)
A (2)
B (1)
B (2)
C (1)
C (2)
X
X
Y
Y
Program
Parallel Execution
X Y
Operator X Operator Y
9
Flexible Pipelines
Reduce
Join
Map
Reduce
Map
Iterate
Source
Sink
Source
Map, FlatMap, MapPartition, Filter, Project, Reduce, ReduceGroup, Aggregate, Distinct, Join, CoGoup, Cross, Iterate, Iterate Delta, Iterate-Vertex-Centric
10
DataSet<String> text = env.readTextFile(input);
DataSet<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("\\W")) { out.collect(new Tuple2<>(token, 1)); }) .groupBy(0) .aggregate(SUM, 1);
Word Count, Java API
11
Word Count, Scala API
val input = env.readTextFile(input);val words = input flatMap { line => line.split("\\W+") }val counts = words groupBy { word => word } count()
12
13
Beyond Key/Value Pairs
DataSet<Page> pages = ...;DataSet<Impression> impressions = ...;
DataSet<Impression> aggregated = impressions .groupBy("url") .sum("count");
pages.join(impressions).where("url").equalTo("url") .print()// outputs pairs of matching pages and impressions
class Impression { public String url; public long count;}
class Page { public String url; public String topic;}
// outputs pairs of pages and impressions
14
Distributed architecture
val paths = edges.iterate (maxIterations) { prevPaths: DataSet[(Long, Long)] => val nextPaths = prevPaths .join(edges) .where(1).equalTo(0) { (left, right) => (left._1,right._2) } .union(prevPaths) .groupBy(0, 1) .reduce((l, r) => l) nextPaths}
Client
Optimization andtranslation to data flow
Job Manager
Scheduling, resource negotiation, …
Task Manager
Data node
mem
ory
heap
Task Manager
Data node
mem
ory
heap
Task Manager
Data node
mem
ory
heap
15
What’s new in Flink
16
DependabilityJV
M H
eap
Flink ManagedHeap
Network Buffers
UnmanagedHeap
(next version unifies network buffersand managed heap)
User Code
Hashing/Sorting/Caching
• Flink manages its own memory
• Caching and data processing happens in a dedicated memory fraction
• System never breaks theJVM heap, gracefully spills
Shuffles/Broadcasts
17
Operating onSerialized Data
• serializes data every time Highly robust, never gives up on you
• works on objects, RDDs may be stored serialized Serialization considered slow, only when needed
• makes serialization really cheap: partial deserialization, operates on serialized form Efficient and robust!
18
Operating onSerialized Data
Microbenchmark• Sorting 1GB worth of (long, double) tuples• 67,108,864 elements• Simple quicksort
19
Memory Managementpublic class WC { public String word; public int count;}
emptypage
Pool of Memory Pages
• Works on pages of bytes, maps objects transparently• Full control over memory, out-of-core enabled• Algorithms work on binary representation• Address individual fields (not deserialize whole object)• Move memory between operations
20
Beyond Key/Value Pairs
DataSet<Page> pages = ...;DataSet<Impression> impressions = ...;
DataSet<Impression> aggregated = impressions .groupBy("url") .sum("count");
pages.join(impressions).where("url").equalTo("url") .print()// outputs pairs of pages and impressions
class Impression { public String url; public long count;}
class Page { public String url; public String topic;}
// outputs pairs of pages and impressions
21
Beyond Key/Value Pairs
Why not key/value pairs
• Programs are much more readable ;-)
• Functions are self-contained, do not need to set key for successor)
• Much higher reusability of data types and functionso Within Flink programs, or from other programs
22
Flink programsrun everywhere
Cluster (Batch)Cluster (Streaming)
LocalDebugging
Fink Runtime or Apache Tez
As Java CollectionPrograms
Embedded(e.g., Web Container)
23
Upcoming Streaming API
24
Streaming Throughput
25
Migrate EasilyFlink supports out-of-the-box supports• Hadoop data types (writables)• Hadoop Input/Output Formats• Hadoop functions and object model
Input Map Reduce Output
DataSet DataSet DataSetRed Join
DataSet Map DataSet
OutputS
Input
26
Little tuning or configuration required• Requires no memory thresholds to configure
o Flink manages its own memory
• Requires no complicated network configso Pipelining engine requires much less memory for data exchange
• Requires no serializers to be configuredo Flink handles its own type extraction and data representation
• Programs can be adjusted to data automaticallyo Flink’s optimizer can choose execution strategies automatically
27
Understanding Programs
Visualizes the operations and the data movement of programs
Analyze after execution
Screenshot from Flink’s plan visualizer
28
Understanding Programs
Analyze after execution (times, stragglers, …)
29
Understanding Programs
Analyze after execution (times, stragglers, …)
30
Iterations in other systems
Step Step Step Step Step
ClientLoop outside the system
Step Step Step Step Step
ClientLoop outside the system
Iterations in Flink
31
Streaming dataflowwith feedback
map
join
red.
join
System is iteration-aware, performs automatic optimization
Flink
32
Automatic Optimization for Iterative Programs
Caching Loop-invariant DataPushing work„out of the loop“
Maintain state as index
33
Flink RoadmapWhat is the community currently working on?
• Flink has a major release every 3 months,with >=1 big-fixing releases in-between
• Finer-grained fault tolerance• Logical (SQL-like) field addressing• Python API• Flink Streaming , Lambda architecture support• Flink on Tez• ML on Flink (e.g., Mahout DSL)• Graph DSL on Flink• … and more
34
http://flink.incubator.apache.org
github.com/apache/incubator-flink
@ApacheFlink
Engine comparison
35
Paradigm
Optimization
Execution
API
Optimizationin all APIs
Optimizationof SQL queriesnone none
DAG
Transformations on k/v pair collections
Iterative transformations on collections
RDD Cyclicdataflows
MapReduce onk/v pairs
k/v pairReaders/Writers
Batchsorting
Batchsorting andpartitioning
Batch withmemorypinning
Stream without-of-corealgorithms
MapReduce
36
DataSet<Order> large = ...DataSet<Lineitem> medium = ...DataSet<Customer> small = ...
DataSet<Tuple...> joined1 = large.join(medium).where(3).equals(1) .with(new JoinFunction() { ... });
DataSet<Tuple...> joined2 = small.join(joined1).where(0).equals(2) .with(new JoinFunction() { ... });
DataSet<Tuple...> result = joined2.groupBy(3).aggregate(MAX, 2);
Example: Joins in Flink
Built-in strategies include partitioned join and replicated join with local sort-merge or hybrid-hash algorithms.
⋈⋈
γ
large medium
small
37
DataSet<Tuple...> large = env.readCsv(...);DataSet<Tuple...> medium = env.readCsv(...);DataSet<Tuple...> small = env.readCsv(...);
DataSet<Tuple...> joined1 = large.join(medium).where(3).equals(1) .with(new JoinFunction() { ... });
DataSet<Tuple...> joined2 = small.join(joined1).where(0).equals(2) .with(new JoinFunction() { ... });
DataSet<Tuple...> result = joined2.groupBy(3).aggregate(MAX, 2);
Automatic Optimization
Possible execution 1) Partitioned hash-join
3) Grouping /Aggregation reuses the partitioningfrom step (1) No shuffle!!!
2) Broadcast hash-join
Partitioned ≈ Reduce-sideBroadcast ≈ Map-side
38
Running Programs
> bin/flink run prg.jar
Packaged ProgramsRemote EnvironmentLocal Environment
Program JAR file
JVM
master master
RPC &Serialization
RemoteEnvironment.execute()LocalEnvironment.execute()
Spawn embeddedmulti-threaded environment
39
Unifies various kinds of Computations
ExecutionEnvironment env = getExecutionEnvironment();
DataSet<Long> vertexIds = ...DataSet<Tuple2<Long, Long>> edges = ...
DataSet<Tuple2<Long, Long>> vertices = vertexIds.map(new IdAssigner());
DataSet<Tuple2<Long, Long>> result = vertices .runOperation( VertexCentricIteration.withPlainEdges( edges, new CCUpdater(), new CCMessager(), 100));
result.print();env.execute("Connected Components");
Pregel/Giraph-style Graph Computation
40
Delta Iterations speed up certain problems
by a lot
0
200000
400000
600000
800000
1000000
1200000
1400000
Iteration
Bulk
Delta
Twitter Webbase (20)0
1000
2000
3000
4000
5000
6000
Computations performed in each iteration for connected communities of a social graph
Runtime
Cover typical use cases of Pregel-like systems with comparable performance in a generic platform and developer API.
41
What is Automatic Optimization
Run on a sampleon the laptop
Run a month laterafter the data evolved
Hash vs. SortPartition vs. BroadcastCachingReusing partition/sortExecution
Plan A
ExecutionPlan B
Run on large fileson the cluster
ExecutionPlan C
Top Related