Scalable Event Processing – Pushing the limits with Push Streams! - Tim Ward

31
Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved. OSGi Community Event Nov 2016 OSGi Community Event 2016 co-located at EclipseCon Europe 2016 Scalable Event Processing Pushing the limits with Push Streams! Tim Ward http://www.paremus.com [email protected]

Transcript of Scalable Event Processing – Pushing the limits with Push Streams! - Tim Ward

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

OSGi Community Event 2016co-located at

EclipseCon Europe 2016

Scalable Event Processing Pushing the limits with Push Streams!

Tim Ward http://www.paremus.com [email protected]

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

•Chief Technology Officer at Paremus

• 8 years developing OSGi specifications

•Chair of the OSGi IoT Expert Group

• Interested in Asynchronous Distributed Systems

•Author of Manning’s Enterprise OSGi in Action

• http://www.manning.com/cummins

Who is Tim Ward? @TimothyWard

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Working with Data

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Iteration in Java

We’ve probably all written this code:for(int i =0; i < list.size(); i ++) {

MyData data = list.get(i);…

} and this code:Iterator<MyData> it = list.iterator();while(it.hasNext()) {

MyData data = it.next();…

}and this code:for(MyData data : list) {

…}

Data is everywhere, and Java’s collections make processing data easy

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Problems with iteration

Whilst Java iteration is easy, it’s still easy to make mistakes:

This is bad for Linked Lists!for(int i =0; i < list.size(); i ++) {

MyData data = list.get(i);…

}

“External” iteration also pushes control logic into your code!Parallelising the processing is a huge task

Java 8 updated the Collections API to support “internal” iterationPowerful functional concepts were added via the Stream API

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Iteration in Java 8

Streaming collections is very simple:list.stream().forEach(data -> …);

Internal iteration separates the “what” from the “how” in your codeParallel processing can occur implicitly if the list supports itFunctional pipelines allow for easy processing

list.stream()

.map(MyData::getAge)

.filter(i -> i < 15)

.count();

intermediate operations

terminal operation

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Properties of Java 8 Streams

Java 8 Streams have a number of useful properties:

They are lazyStreams only process data on-demandThis is triggered by a “terminal operation”

They can “short circuit” Some operations don’t need to see the whole streamfindFirst() and findAny() can return if an element is found

Importantly the Stream is “pulling” the data from the data structure

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Streams of data

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Streams - the overloaded concept

Before Java 8, a “stream” of data was a java.io.InputStreamWhilst you probably didn’t think of it that way, you still iteratedint read;while((read = is.read()) != -1) {

byte data = (byte) read;…

}

The big difference with an InputStream is that it may blockA thread may get “stuck” waiting for user input, or a slow network

Java NIO has non-blocking input, but is much harder to use

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Streams - the overloaded concept (2)

An InputStream behaves like an ordered Collection of bytesUsing a Java 8 Stream over these bytes could make senseis.forEach(data -> …)

But the InputStream may block the thread indefinitely

If the input is asynchronous then resources are wasted by waitingA slow function may not process data as fast as it arrivesRapid bursts of data may overload the consumer

All of this is independent of the data, be it bytes or ObjectsJava 8 Streams aren’t able to cope with asynchronous data

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Push-based asynchronous streams

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Push-based Streams

Push-based-streams are fundamentally different from pull-based streamsThe processing function is called when data arrives, not when the previous entry has been processedAsynchronous operation allows for high throughput and parallelisation

Terminal operations must be asynchronous and non-blocking

The Promise is the primitive of Asynchronous ProgrammingRather than returning values a push-based stream returns a PromiseA Promise represents a delayed result that will be “resolved” laterOSGi Promises are a good option here

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Mapping Java 8 Streams to a push model

The Java 8 Stream has a rich and powerful APIMaking it push-based is easier than you might think!Change the return type of the terminal operations

Pull Push

long count() Promise<Long> count()

boolean anyMatch() Promise<Boolean> anyMatch()

boolean allMatch() Promise<Boolean> allMatch()

Optional<T> min() Promise<Optional<T>> min()

Optional<T> max() Promise<Optional<T>> max()

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Problems with this approach

This model actually works very well, but there are some problems

How do we know when an asynchronous stream has finished?A pull-based model can simply indicate that there is no more data

Pull-based iteration offers a natural “brake” by processing elements in turnPush-based systems can be overwhelmed by “Event Storms”Even a single client thread can be problematic if it is too eager!

How do we cope with this?

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Using Events to communicate

Pushing the raw data into the consumer is simple, but insufficientConsumers need a pushed event to indicate the end of a streamEvents should also be able to propagate failures

An Event is therefore a simple wrapper for data or metadatapublic static enum EventType { DATA, ERROR, CLOSE };

public final class PushEvent<T> {

public EventType getType() { … }

public T getData() { … }

public Exception getFailure{ … }

}

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Learning lessons from Network Engineers

Push-based streams share a lot of concepts with computer networksAsynchronous deliveryProducer and Consumer may run at different rates

TCP solves the “event storm” problem with back-pressureThe producer gets faster, but backs off if the consumer’s ACK rate drops

Our push-streams need to feed back to the data producerThe simplest way to do this is simply to say “don’t call me for a while”

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

The AsyncConsumer

A simple functional interface for consuming data:public interface PushEventConsumer {

long accept(PushEvent<T> event);

}

End of Stream events can be detected and back-pressure returnedIf positive then the producer should wait at least that longIf zero then continue as soon as possibleIf negative then close the stream

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Buffers, Windows and Circuit Breakers

An event consumer may receive events on many threadsThe event consumer may also be slow to process data

Buffering allows a thread switch, freeing up the producer’s threadIt also allows the consumer to scale up or down the number of workersIncoming data is queued until it can be processed

The buffer can return back-pressure based on how full it isBuffering is a built-in feature of the push-based stream

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Buffering events

1723113119

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Buffering events (2)

What happens when the buffer gets full?

We could use an infinite buffer, but memory isn’t infinite…Blocking is a possibility, but not very asynchronous!

A good option is simply to close the streamThis is called a “circuit breaker” - it trips if the consumer falls too far behindThis model protects against event storms

1723113119375?

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Windowing events

An event consumer may wish to receive batches of events to processThe consumer can then forward an aggregate event

Batches can be defined using an absolute number, or a time windowThe underlying behaviour is similar to buffering

1723113119

17231131195 37Number

Time

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Producing data events

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Producing events

So far we’ve focussed on consuming and processing eventsThis is not very useful unless we can produce events!

Event producers are connected to consumers using the open() methodpublic interface PushEventSource<T> {

Closeable open(PushEventConsumer<? super T> event);

}

The returned Closeable can be used to “end” the streamThis is useful when the data stream is infinite!

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Helpful behaviours

Event Producers have to cope with multiple registrationsThey also have to handle back-pressure from the consumerSometimes the producer has no choice about waiting!

Writing a producer should not be hard, so we provide helpThe SimplePushEventSource provides buffered publishingThe Buffer/Circuit breaker allows the producer to “ignore” back pressureA connect Promise allows event delivery to be lazily started

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Playing with streams

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Playing with push streams

We need an asynchronous source of eventsThe UK rail network has an open data API!

Events are batched (for performance) and delivered using STOMPEvents are JSON objects and so can easily be processed!

Let’s have a play!

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

• For more about OSGi...• Specifications at http://www.osgi.org

• Enterprise OSGi in Action

• http://www.manning.com/cummins

• For more about the Push Streams• http://github.com/osgi/design

•See the IoT Competition at 1745 today

Questions?

Thanks!

http://www.paremus.com [email protected]

http://www.manning.com/cummins

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Addendum

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

Demo Gremlins

This morning I was testing my demoI couldn’t connect to the Network Rail FeedThey had Deactivated my account!

No demo = Unhappy Audience :(

But OSGi is amazing technology from the future!I built a new module using MQTT to consume events from the OSGi Train1 hour later - Working Demo. No other code had to change!

Network Rail reactivated my account 10 minutes later…

Copyright © 2005 - 2016 Paremus Ltd. May not be reproduced by any means without express permission. All rights reserved.

OSGi Community Event Nov 2016

www.paremus.com @Paremus [email protected]