1
1 March 2017
Reactive On-lineMachine Learningwith Akka Streams
Jan Pustelnik & Kamil Owczarek
2
We all love retro, don’t we?
3
Reactive Streams Made Easy
DATA
BACKPRESSURE
FLOW SINKSOURCE
4
Reactive Fast Data with Akka!
Akka allows you to put your on-line / streaming data structure / algorithm in
context. You don’t need to think about how to take care of data flow and
backpressure.
Well thought out architecture of Akka lets you concentrate on the stuff
relevant to your problem domain.
Akka is geared towards high performance, and can be used in IoT setup,
where e.g. Spark streaming would not fit.
It is easy to port e.g. machine learning algorithms from other streaming
setups (like Spark streaming) to Akka.
5
Backpressure so retro! (1981)
6
Retro is future-proof! (2001)
SEDA is fast because
async and has buffers
and stages are single
threaded
7
On-line algorithms / data structures (1992)
8
Example: Kadane algorithm (on-line, streaming, 1984)
Source: https://en.wikipedia.org/wiki/Maximum_subarray_problem
(…) the maximum subarray problem is the task of finding the contiguous subarray within a one-
dimensional array of numbers which has the largest sum. For example, for the sequence of values
−2, 1, −3, 4, −1, 2, 1, −5, 4; the contiguous subarray with the largest sum is 4, −1, 2, 1, with sum 6.
9
KadaneFlowStage
IN OUT
Akka Plumbing
Stateful Kadane Logic
10
Flow Shape (plumbing)
11
Flow Shape – output handler (plumbing)
12
Flow Shape – output handler (plumbing)
13
Flow Shape – output handler (plumbing)
Better not
fail silently.
14
Flow stage – “Business logic”
Proper Kadane algo logic
here
15
Flow stage – “Business logic”
Proper Kadane algo logic
here
16
Let the flow flow…
17
Bloom filter (on-line, streaming, 1970)
BLOOM DICT
1, 5
7 ?
X ?
1, 5
√ / X? / X
√
18
It is easy to create new shapes but you can (re)use existing
19
Tripod? Just like the in old days
20
Remember your Topology class? A shape is just a shape…
21
Bloom filter – CrossShape!
Q
U
E
R
I
E
S
DATA
A
N
S
W
E
R
S
DATABLOOM
22
Remember, a shape is just a shape
In2|v
+---------+In1 ~> | cross | ~> Out1
+---------+|v
Out2
=
23
BloomFilterCrossStage, ftw!
24
Two crossing flows… Common shared state… Single thread!
25
Machine Learning with Akka streams!
26
Machine Learning with Akka streams!
27
Online ML models
ON-LINE MACHINE LEARNING
ADVERSARIAL MODELS STATISTICAL MODELS
28
Statistical Models
Idea: the input variable (X) and predicted variable (Y) come from a
probability distribtion p(X, Y)
Aim: predict Y as good as possible: Pr(Y)
Cost function: cost of an error: V(Y, Pr(Y))
Generalized solution: minimze 𝑬[𝑽 𝒀, 𝐏𝐫 𝒀 ) = 𝑽(𝒀, 𝐏𝐫 𝒀 𝒅𝒑(𝑿, 𝒀)
Putting different V functions gives familiar ML algorithms: Linear
Regression, SVM etc.
29
Adversarial Models
• Not frequently mentioned outside scientific community/conferences!
• Problem as a game between the learner and nature:
1. Learner sees input X(i)
2. Learner „makes his move” - predicts output Pr[Y(i)]
3. Nature sees X(i) and Pr[Y(i)] and „makes a move” emitting actual output Y(i)
4. Learner „suffers a loss”: V[Y(i), Pr[Y(i)]]
• Important element: nature’s reaction can depend on prediction
• Actual games, asset trading, varying cost evaluation
30
Recursive Least Squares
• We all know the „least squares” metric from school, right?
• O(dn3) memory complexity
• The formula is recursive:
• Recursive = on-line = O(dn2) memory complexity
31
Reacursive Least Squares
32
Follow The Leader
Adversarial online ML algorithm
Not very complex
Pick the hypothesis one that performed best until now
Paradoxically: good for bounded loss
Careful investment, medical costs evaluation
etc.
Regularized for broadened set of applications
33
Follow The Leader
Top Related