[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)

115
Konrad 'ktoso' Malawski GeeCON 2014 @ Kraków, PL Konrad `@ktosopl` Malawski Akka Streams

description

This is a work in progress of a talk for the Scala User Group in Tokyo. It touches on basics and some ideas behind Reactive Streams as well as the implementation shipped by Akka.

Transcript of [Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)

Konrad 'ktoso' Malawski GeeCON 2014 @ Kraków, PL

Konrad `@ktosopl` Malawski

Akka Streams

Konrad `@ktosopl` Malawski

hAkker @

Konrad `@ktosopl` Malawski

typesafe.com geecon.org

Java.pl / KrakowScala.pl sckrk.com / meetup.com/Paper-Cup @ London

GDGKrakow.pl meetup.com/Lambda-Lounge-Krakow

hAkker @

You?

?

You?

?

z ?

You?

?

z ?

?

You?

?

z ?

?

?

Streams

Streams

Streams“You cannot enter the same river twice” ~ Heraclitus

http://en.wikiquote.org/wiki/Heraclitus

StreamsReal Time Stream Processing !When you attach “late” to a Publisher, you may miss initial elements – it’s a river of data.

http://en.wikiquote.org/wiki/Heraclitus

Reactive Streams

Reactive Streams

!

!

Stream processing

Reactive Streams

Back-pressured !

Stream processing

Reactive Streams

Back-pressured Asynchronous

Stream processing

Reactive Streams

Back-pressured Asynchronous

Stream processing Standardised (!)

Reactive Streams: Goals

1. Back-pressured Asynchronous Stream processing !

2. Standard implemented by many libraries

Reactive Streams - Specification & TCK

http://reactive-streams.org

Reactive Streams - Who?

http://reactive-streams.org

Kaazing Corp. rxJava @ Netflix,

reactor @ Pivotal (SpringSource), vert.x @ Red Hat,

Twitter, akka-streams @ Typesafe,

spray @ Spray.io, Oracle,

java (?) – Doug Lea - SUNY Oswego …

Reactive Streams - Inter-op

http://reactive-streams.org

We want to make different implementations co-operate with each other.

Reactive Streams - Inter-op

http://reactive-streams.org

The different implementations “talk to each other” using the Reactive Streams protocol.

Reactive Streams - Inter-op

http://reactive-streams.org

The Reactive Streams SPI is NOT meant to be user-api. You should use one of the implementing libraries.

Back-pressure, なにですか?

Back-pressure? Example Without

Publisher[T] Subscriber[T]

Back-pressure? Example Without

Fast Publisher Slow Subscriber

Back-pressure? Push + NACK model

Back-pressure? Push + NACK model

Back-pressure? Push + NACK model

Subscriber usually has some kind of buffer

Back-pressure? Push + NACK model

Back-pressure? Push + NACK model

Back-pressure? Push + NACK model

What if the buffer overflows?

Back-pressure? Push + NACK model (a)

Use bounded buffer, drop messages + require re-sending

Back-pressure? Push + NACK model (a)

Use bounded buffer, drop messages + require re-sending

Kernel does this! Routers do this!

(TCP)

Back-pressure? Push + NACK model (b)Increase buffer size… Well, while you have memory available!

Back-pressure? Push + NACK model (b)

Back-pressure? Why NACKing is NOT enough

Back-pressure? Example NACKingたいへんですよ! Buffer overflow is imminent!

Back-pressure? Example NACKingTelling the Publisher to slow down / stop sending…

Back-pressure? Example NACKingNACK did not make it in time,

because M was in-flight!

Back-pressure? speed(publisher) < speed(subscriber)

Back-pressure? Fast Subscriber, No Problem

No problem!

Back-pressure? Reactive-Streams

= “Dynamic Push/Pull”

Just push – not safe when Slow Subscriber Just pull – too slow when Fast Subscriber

Back-pressure? RS: Dynamic Push/Pull

Just push – not safe when Slow Subscriber Just pull – too slow when Fast Subscriber

!Solution:

Dynamic adjustment (Reactive Streams)

Back-pressure? RS: Dynamic Push/Pull

Back-pressure? RS: Dynamic Push/Pull

Slow Subscriber sees it’s buffer can take 3 elements. Publisher will never blow up it’s buffer.

Back-pressure? RS: Dynamic Push/Pull

Fast Publisher will send at-most 3 elements. This is pull-based-backpressure.

Back-pressure? RS: Dynamic Push/Pull

Fast Subscriber can issue more Request(n), before more data arrives!

Back-pressure? RS: Dynamic Push/Pull

Fast Subscriber can issue more Request(n), before more data arrives!

Back-pressure? RS: Accumulate demand

Publisher accumulates total demand per subscriber.

Back-pressure? RS: Accumulate demandTotal demand of elements is safe to publish. Subscriber’s buffer will not overflow.

Back-pressure? RS: Requesting “a lot”

Fast Subscriber, can request “a lot” from Publisher. This is effectively “publisher push”, and is really fast. Buffer size is known and this is safe.

Back-pressure? RS: Dynamic Push/Pull

Back-pressure? RS: Dynamic Push/Pull

Safe! Will never overflow!

わなにですか?

AkkaAkka is a high-performance concurrency library for Scala and Java. !

At it’s core it focuses on the Actor Model:

AkkaAkka is a high-performance concurrency library for Scala and Java. !

At it’s core it focuses on the Actor Model:

An Actor can only: • Send / receive messages • Create Actors • Change it’s behaviour

AkkaAkka has multiple modules: !

Akka-camel: integration Akka-remote: remote actors Akka-cluster: clustering Akka-persistence: CQRS / Event Sourcing Akka-streams: stream processing …

Akka Streams 0.7 early preview

Akka Streams – Linear Flow

Akka Streams – Linear Flow

Akka Streams – Linear Flow

Akka Streams – Linear Flow

Akka Streams – Linear Flow

FlowFrom[Double].map(_.toInt). [...]

No Source attached yet. “Pipe ready to work with Doubles”.

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!!

It’s the world in which Actors live in. AkkaStreams uses Actors, so it needs ActorSystem.

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!implicit val mat = FlowMaterializer()!

Contains logic on HOW to materialise the stream. Can be pure Actors, or (future) Apache Spark (in the future).

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!implicit val mat = FlowMaterializer()!

You can configure it’s buffer sizes etc. (Or implement your own materialiser (“run on spark”))

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!implicit val mat = FlowMaterializer()!

val foreachSink = ForeachSink[Int](println)!val mf = FlowFrom(1 to 3).withSink(foreachSink).run()

Uses the implicit FlowMaterializer

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!implicit val mat = FlowMaterializer()!

val foreachSink = ForeachSink[Int](println)!val mf = FlowFrom(1 to 3).withSink(foreachSink).run()(mat)

Akka Streams – Linear Flow

val mf = FlowFrom[Int].! map(_ * 2).! withSink(ForeachSink(println)) // needs source,! // can NOT run

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Flows are reusable

!! ! ! f.withSource(IterableSource(1 to 10)).run()! ! ! ! f.withSource(IterableSource(1 to 100)).run()! ! ! ! f.withSource(IterableSource(1 to 1000)).run()

Akka Streams <-> Actors – Advanced

val subscriber = system.actorOf(Props[SubStreamParent], ”parent")!!FlowFrom(1 to 100).! map(_.toString).! filter(_.length == 2).! drop(2).! groupBy(_.last).! publishTo(ActorSubscriber(subscriber))!

Akka Streams <-> Actors – Advanced

val subscriber = system.actorOf(Props[SubStreamParent], ”parent")!!FlowFrom(1 to 100).! map(_.toString).! filter(_.length == 2).! drop(2).! groupBy(_.last).! publishTo(ActorSubscriber(subscriber))!

Each “group” is a stream too! “Stream of Streams”.

Akka Streams <-> Actors – Advanced! groupBy(_.last).

GroupBy groups “11” to group “1”, “12” to group “2” etc.

Akka Streams <-> Actors – Advanced! groupBy(_.last).

It offers (groupKey, subStreamFlow) to Subscriber

Akka Streams <-> Actors – Advanced! groupBy(_.last).

It can then start children, to handle the sub-flows!

Akka Streams <-> Actors – Advanced! groupBy(_.last).

For example, one child for each group.

Akka Streams <-> Actors – Advanced

val subscriber = system.actorOf(Props[SubStreamParent], ”parent")!!FlowFrom(1 to 100).! map(_.toString).! filter(_.length == 2).! drop(2).! groupBy(_.last).! publishTo(ActorSubscriber(subscriber))!

普通 Akka Actor, will consume SubStream offers.

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext(n: String) => println(s”n = $n”) ! }!}!

Akka Streams – GraphFlow

GraphFlow

Akka Streams – GraphFlow

Linear Flows or

non-akka pipelines

Could be another RS implementation!

Akka Streams – GraphFlow

Fan-out elements and

Fan-in elements

Akka Streams – GraphFlow

Fan-out elements and

Fan-in elements

Now you need a FlowGraph

Akka Streams – GraphFlow

// first define some pipeline pieces!val f1 = FlowFrom[Input].map(_.toIntermediate)!val f2 = FlowFrom[Intermediate].map(_.enrich)!val f3 = FlowFrom[Enriched].filter(_.isImportant)!val f4 = FlowFrom[Intermediate].mapFuture(_.enrichAsync)!!// then add input and output placeholders!val in = SubscriberSource[Input]!val out = PublisherSink[Enriched]!

Akka Streams – GraphFlow

Akka Streams – GraphFlowval b3 = Broadcast[Int]("b3")!val b7 = Broadcast[Int]("b7")!val b11 = Broadcast[Int]("b11")!val m8 = Merge[Int]("m8")!val m9 = Merge[Int]("m9")!val m10 = Merge[Int]("m10")!val m11 = Merge[Int]("m11")!val in3 = IterableSource(List(3))!val in5 = IterableSource(List(5))!val in7 = IterableSource(List(7))!

Akka Streams – GraphFlow

Akka Streams – GraphFlow

// First layer!in7 ~> b7!b7 ~> m11!b7 ~> m8!!in5 ~> m11!!in3 ~> b3!b3 ~> m8!b3 ~> m10!

Akka Streams – GraphFlow

!// Second layer!m11 ~> b11!b11 ~> FlowFrom[Int].grouped(1000) ~> resultFuture2 !b11 ~> m9!b11 ~> m10!!m8 ~> m9!

Akka Streams – GraphFlow

!// Third layer!m9 ~> FlowFrom[Int].grouped(1000) ~> resultFuture9!m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10!

Akka Streams – GraphFlow

!// Third layer!m9 ~> FlowFrom[Int].grouped(1000) ~> resultFuture9!m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10!

Akka Streams – GraphFlow

!// Third layer!m9 ~> FlowFrom[Int].grouped(1000) ~> resultFuture9!m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10!

Akka Streams – GraphFlow

val resultFuture2 = FutureSink[Seq[Int]]!val resultFuture9 = FutureSink[Seq[Int]]!val resultFuture10 = FutureSink[Seq[Int]]!!val g = FlowGraph { implicit b =>! // ...! m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10! // ...!}.run()!!Await.result(g.getSinkFor(resultFuture2), 3.seconds).sorted! should be(List(5, 7))

Sinks and Sources are “keys” which can be addressed within the graph

Akka Streams – GraphFlow

val resultFuture2 = FutureSink[Seq[Int]]!val resultFuture9 = FutureSink[Seq[Int]]!val resultFuture10 = FutureSink[Seq[Int]]!!val g = FlowGraph { implicit b =>! // ...! m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10! // ...!}.run()!!Await.result(g.getSinkFor(resultFuture2), 3.seconds).sorted! should be(List(5, 7))

Sinks and Sources are “keys” which can be addressed within the graph

Akka Streams – GraphFlow

!val g = FlowGraph {}!

FlowGraph is immutable and safe to share and re-use! Think of it as “the description” which then gets “run”.

Available Elements 0.7 early preview

Available Sources• FutureSource • IterableSource • IteratorSource • PublisherSource • SubscriberSource • ThunkSource • TickSource (timer based) • … easy to add your own!

0.7 early preview

Available operations• buffer • collect • concat • conflate • drop / dropWithin • take / takeWithin • filter • fold • foreach • groupBy • grouped • map • onComplete • prefixAndTail

• broadcast • merge / “generalised merge” • zip • … possible to add your own!

0.7 early preview

Available Sinks• BlackHoleSink • FoldSink • ForeachSink • FutureSink • OnCompleteSink • PublisherSink / FanoutPublisherSink • SubscriberSink • … easy to add your own!

0.7 early preview

Links1. http://akka.io 2. http://reactive-streams.org 3. https://groups.google.com/group/akka-user

ありがとう ございました! Questions? http://akka.io

ktoso @ typesafe.com twitter: ktosopl github: ktoso team blog: letitcrash.com

©Typesafe 2014 – All Rights Reserved