Intro stream processing.be meetup #1

15
StreamProcessing.be Brussels, May 27, 2015 Theme: hosted solutions for Stream Processing and ML #StreamProcessingBe

Transcript of Intro stream processing.be meetup #1

Page 1: Intro stream processing.be meetup #1

StreamProcessing.beBrussels, May 27, 2015

Theme: hosted solutions forStream Processing and ML

#StreamProcessingBe

Page 2: Intro stream processing.be meetup #1

Agenda

15’ Intro (Peter)35’ Azure Stream Analytics and ML (Jan)5’ short break35’ Google Cloud DataFlow (Alex)35’ Amazon AWS ML (Nils)

Page 3: Intro stream processing.be meetup #1

Many thanks to

Microsoft BeluxJan, Alex, Nils@maasg, @svendfxBigData.be, DataScience.be, AWS Belgiumyou !

Page 4: Intro stream processing.be meetup #1

Next StreamProcessing.be Meetup

Thu, June 25, 2015, near Mechelen station(looking for a location +/- 50 ppl)

● Introduction to Apache Kafka (Svend)● Akka Streams and Kinesis (Peter)● Understanding Spark Streaming (Gerard)

Page 5: Intro stream processing.be meetup #1

whoami : Peter Vandenabeele @peter_v

All Things Data (my consultancy)

current clients:Real Impact Analytics

Telecom Analytics (emerging markets)

“Green” start-up (stealth mode)IoT project (see next Meetup)

Page 6: Intro stream processing.be meetup #1

Why ?

(before anything else)

Page 7: Intro stream processing.be meetup #1

Why Stream Processing ?

(a personal view)

Page 8: Intro stream processing.be meetup #1

E.g. collaborative research (2013)UniProt

(180 GB)monthly update

consumerupdate cost

≅freq (1/month)

*size (180 GB)

*# consumers (5)

fetch + load + indexFULL data set

Page 9: Intro stream processing.be meetup #1

solution: Stream of updates (CDC)Users tablecontinuous

updates

consumerupdate cost

≅Rate of Change(10% / month)

*size * # consumers

fetch + loadONLY updates

stream

3M entries300k updates/month(independent of consumer update frequency)

Page 10: Intro stream processing.be meetup #1

Why Stream Processing ?

Real-time*

Big Data*

Distributed processing(“many collaborators”)

Page 11: Intro stream processing.be meetup #1

Stream becomes the “master data”

● see stream as the master data (not the DB)● allows real-time, distributed processing● allows unification between:

○ operational teams○ analytics teams○ security, ...

● e.g. Kafka at LinkedIn (Kappa architecture)

Page 12: Intro stream processing.be meetup #1

Kafka (LinkedIn) : Martin Kleppmann

source : Martin Kleppmannat strata Hadoop London

Page 13: Intro stream processing.be meetup #1

Kafka (LinkedIn) : Jay Kreps

source: Jay Krepson slideshare

“I ♥ Log”Real-time Data and Apache Kafka

Page 14: Intro stream processing.be meetup #1

Why Stream Processing ?

Peter : real-time * (big data * distributed proc.)Nathan Marz : recovery from human error + ...Jay Kreps : organizational scalability + ...Martin Kleppmann : data agility + …YOU : ??? let’s discuss at beer ...

Page 15: Intro stream processing.be meetup #1

Speakers for today● Jan Tielens (Microsoft) @jantielens● Alex Van Boxel (Vente-Exclusive.com)

@alexvb● Nils De Moor (Woorank) @ndemoor