Reactive On-line Machine Learning with Akka...

Post on 22-May-2020

26 views 0 download

Transcript of Reactive On-line Machine Learning with Akka...

1

1 March 2017

Reactive On-lineMachine Learningwith Akka Streams

Jan Pustelnik & Kamil Owczarek

2

We all love retro, don’t we?

3

Reactive Streams Made Easy

DATA

BACKPRESSURE

FLOW SINKSOURCE

4

Reactive Fast Data with Akka!

Akka allows you to put your on-line / streaming data structure / algorithm in

context. You don’t need to think about how to take care of data flow and

backpressure.

Well thought out architecture of Akka lets you concentrate on the stuff

relevant to your problem domain.

Akka is geared towards high performance, and can be used in IoT setup,

where e.g. Spark streaming would not fit.

It is easy to port e.g. machine learning algorithms from other streaming

setups (like Spark streaming) to Akka.

5

Backpressure so retro! (1981)

6

Retro is future-proof! (2001)

SEDA is fast because

async and has buffers

and stages are single

threaded

7

On-line algorithms / data structures (1992)

8

Example: Kadane algorithm (on-line, streaming, 1984)

Source: https://en.wikipedia.org/wiki/Maximum_subarray_problem

(…) the maximum subarray problem is the task of finding the contiguous subarray within a one-

dimensional array of numbers which has the largest sum. For example, for the sequence of values

−2, 1, −3, 4, −1, 2, 1, −5, 4; the contiguous subarray with the largest sum is 4, −1, 2, 1, with sum 6.

9

KadaneFlowStage

IN OUT

Akka Plumbing

Stateful Kadane Logic

10

Flow Shape (plumbing)

11

Flow Shape – output handler (plumbing)

12

Flow Shape – output handler (plumbing)

13

Flow Shape – output handler (plumbing)

Better not

fail silently.

14

Flow stage – “Business logic”

Proper Kadane algo logic

here

15

Flow stage – “Business logic”

Proper Kadane algo logic

here

16

Let the flow flow…

17

Bloom filter (on-line, streaming, 1970)

BLOOM DICT

1, 5

7 ?

X ?

1, 5

√ / X? / X

18

It is easy to create new shapes but you can (re)use existing

19

Tripod? Just like the in old days

20

Remember your Topology class? A shape is just a shape…

21

Bloom filter – CrossShape!

Q

U

E

R

I

E

S

DATA

A

N

S

W

E

R

S

DATABLOOM

22

Remember, a shape is just a shape

In2|v

+---------+In1 ~> | cross | ~> Out1

+---------+|v

Out2

=

23

BloomFilterCrossStage, ftw!

24

Two crossing flows… Common shared state… Single thread!

25

Machine Learning with Akka streams!

26

Machine Learning with Akka streams!

27

Online ML models

ON-LINE MACHINE LEARNING

ADVERSARIAL MODELS STATISTICAL MODELS

28

Statistical Models

Idea: the input variable (X) and predicted variable (Y) come from a

probability distribtion p(X, Y)

Aim: predict Y as good as possible: Pr(Y)

Cost function: cost of an error: V(Y, Pr(Y))

Generalized solution: minimze 𝑬[𝑽 𝒀, 𝐏𝐫 𝒀 ) = 𝑽(𝒀, 𝐏𝐫 𝒀 𝒅𝒑(𝑿, 𝒀)

Putting different V functions gives familiar ML algorithms: Linear

Regression, SVM etc.

29

Adversarial Models

• Not frequently mentioned outside scientific community/conferences!

• Problem as a game between the learner and nature:

1. Learner sees input X(i)

2. Learner „makes his move” - predicts output Pr[Y(i)]

3. Nature sees X(i) and Pr[Y(i)] and „makes a move” emitting actual output Y(i)

4. Learner „suffers a loss”: V[Y(i), Pr[Y(i)]]

• Important element: nature’s reaction can depend on prediction

• Actual games, asset trading, varying cost evaluation

30

Recursive Least Squares

• We all know the „least squares” metric from school, right?

• O(dn3) memory complexity

• The formula is recursive:

• Recursive = on-line = O(dn2) memory complexity

31

Reacursive Least Squares

32

Follow The Leader

Adversarial online ML algorithm

Not very complex

Pick the hypothesis one that performed best until now

Paradoxically: good for bounded loss

Careful investment, medical costs evaluation

etc.

Regularized for broadened set of applications

33

Follow The Leader

34

That’s it…

https://en.wikipedia.org/wiki/Banner_Mania