Intro to Akka Streams

Post on 16-Apr-2017

105 views 5 download

Transcript of Intro to Akka Streams

streams

Agenda

• Reactive Streams

• Why Akka Streams?

• API Overview

Reactive Streams

public interface Publisher<T> { public void subscribe(Subscriber<? super T> s);}

public interface Subscriber<T> {

public void onSubscribe(Subscription s);

public void onNext(T t);

public void onError(Throwable t);

public void onComplete();}

public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {}

public interface Subscription {

public void request(long n);

public void cancel();}

Reactive Streams

A standardised spec/contract to achieve asynchronous

back-pressured stream processing.

Standardised ?

Gives us consistent interop between libraries and platforms that implement this spec.

everything is async & back-pressured

Reactive Streams

Stream API Stream API Stream API

Reactive Streams

Stream API Stream API Stream API

Users use this API

Reactive Streams

Stream API Stream API Stream API

Users use this API

Library authors use this API

Async?

• We know async IO from last week

• But there are other types of async operations, that cross over different async boundaries

• between applications

• between threads

• and over the network as we saw

Back-Pressured ?

Publisher[T] Subscriber[T]

Think abstractly about these lines.

“async boundary”

This can be the network, or threads on the same CPU.

Publisher[T] Subscriber[T]

What problem are we trying to solve?

Discrepancy in the rate of processing

• Fast Publisher / Slow Subscriber

• Slow Publisher / Fast Subscriber

Push Model

Publisher[T] Subscriber[T]

100 messages / 1 second

1 message / 1second

Fast Slow

Publisher[T] Subscriber[T]

Publisher[T] Subscriber[T]

drop overflowedrequire resending

Publisher[T] Subscriber[T]

has to keep trackof messages to resendnot safe & complicated

NACK ?

Publisher[T] Subscriber[T]

Publisher[T] Subscriber[T]

stop!

Publisher[T] Subscriber[T]

stop!

Publisher[T] Subscriber[T]

stop!

sh#t!

Publisher[T] Subscriber[T]

publisher didn’t receive NACK in timeso we lost that last message

not safe

Pull ?

Publisher[T] Subscriber[T]

100 messages / 1 second

1 message / 1second

FastSlow

Publisher[T] Subscriber[T]

gimme!

Publisher[T] Subscriber[T]

gimme!

Publisher[T] Subscriber[T]

Publisher[T] Subscriber[T]

gimme!

Publisher[T] Subscriber[T]

gimme!

Publisher[T] Subscriber[T]

gimme!

Publisher[T] Subscriber[T]

gimme!

Publisher[T] Subscriber[T]

gimme!

Publisher[T] Subscriber[T]

gimme!

• Spam!

• Redundant messaging -> flooding the connection

• No buffer/batch support

A different approach

We have to take into account the following scenarios:

• Fast Pub / Slow Sub

• Slow Pub / Fast Sub

Which can happen dynamically

Publisher[T] Subscriber[T]

Data

Demand(n)

Publisher[T] Subscriber[T]

Data

Demand(n)

Dynamic Push/Pull

bounded buffers with no overflowdemand can be accumulated

batch processing -> performance

• Cool let’s implement this using Actors!

• We can, it’s possible … but should it be done ?

The problem(s) with Akka Actors

Type Safety

Any => Unit

Composition

In FP this makes us warm and fuzzyval f: A => Bval g: B => C

val h: A => C = f andThen g

• Using Actors?

• An Actor is aware of who sent it messages and where it must forward/reply them.

• No compositionality without thinking about it explicitly.

Data Flow

• What are streams ? Flows of data.

• Imagine a 10 stage data pipeline you want to model

• Now imagine writing that in Actors.

• Following the flow of data in Actors requires jumping around all over the code base

• Low level, error prone and hard to reason about

Akka Streams APIbuilding blocks

Design Philosophy

• Everything we will cover now are blueprints that describe the actions/effects they perform.

• Reusability

• Compositionality

• “Design your program with a pure functional core,push side-effects to the end of the world and detonate to execute.

- some guy on stackoverflow

• Publisher of data

• Exactly one output

Image from boldradius.com

val singleSrc = Source.single(1)

val iteratorSrc = Source.fromIterator(() => Iterator from 0)

val futureSrc = Source.fromFuture(Future("abc"))

val collectionSrc = Source(List(1,2,3))

val tickSrc = Source.tick(initialDelay = 1 second,

interval = 1 second,tick = "tick-tock")

val requestSource = req.entity.dataBytes

• Subscriber (consumer) of data

• Describes where the data in our stream will go.

• Exactly one input

Image from boldradius.com

Sink.head

Sink.reduce[Int]((a, b) => a + b)

Sink.fold[Int, Int](0)(_ + _)

Sink.foreach[String](println)

FileIO.toPath(Paths.get("file.txt"))

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

Input type

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

Materialized type

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

Materialized type

Available when the stream ‘completes’

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

val futureRes: Future[Int] = Source(1 to 10).runWith(fold)

futureRes.foreach(println)

// 55

So I can get data from somewhere

and I can put data somewhere else.

But I want to do something with it.

• A processor of data

• Has one input and one output

Image from boldradius.com

val double: Flow[Int, Int, NotUsed] = Flow[Int].map(_ * 2)

val src = Source(1 to 10)

val double = Flow[Int].map(_ * 2)

val negate = Flow[Int].map(_ * -1)

val print = Sink.foreach[Int](println)

val graph = src via double via negate to print

graph.run()

-2-4-6-8-10-12-14-16-18-20

• Flow is immutable, thread-safe, and thus freely shareable

• Are Linear flows enough ?

• No, we want to be able to describe arbitrarilly complex steps in our pipelines

Graphs

Flow

Graph

• We define multiple linear flows and then use the Graph DSL to connect them.

• We can combine multiple streams - fan in

• Split a stream into substreams - fan out

Fan-Out

Fan-In

A little example

Some sort of video uploading service

- Stream in video- Process it

- Store it

bcast

ByteStringConvert toArray[Byte]

flowbcast

Process HighRes flow

Process LowRes flow

Process MedRes flow

sink

sink

sink

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Our custom Sink

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Has one input of type ByteString

Takes 3 Sinks, which can be Files, DBs, etc.

Has one input of type ByteString

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Describes 3 processing stagesThat are Flows of Array[Byte] => ByteString

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Has one input of type ByteString

Takes 3 Sinks, which can be Files, DBs, etc.

Describes 3 processing stagesThat are Flows of Array[Byte] => ByteString

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Has one input of type ByteString

Emits result to the 3 Sinks

Takes 3 Sinks, which can be Files, DBs, etc.

Has a type of:Sink[ByteString, (Future[IOResult], Future[IOResult], Future[IOResult])]

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Sink[ByteString, (Future[IOResult], Future[IOResult], Future[IOResult])]

Materialized values

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Things we didn’t have time for

• Integrating with Actors

• Buffering and throttling streams

• Defining custom Graph shapes and stages

Thanks for listening!