Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools

Federico Fernández

Master's Thesis Defense

19th July 2018

Supervisor: Prof. (Chang'an Univ.) PD Dr. habil. Christian Prehofer

Advisors: Tanmaya Mahapatra, M.Sc., Dr. Ilias Gerostathopoulos


Source: IDC, Intel, United Nations


Outline

Technical University of MunichOutline

1. Introduction & Demo

2. Objectives & Methodology

3. Related Work

4. Conceptual Approach

5. Implementation

6. Evaluation1. The SmartSantander Project

2. Evaluation Scenario

3. Results

7. Conclusions

Objectives & Methodology

Technical University of MunichObjectives & Methodology

• Research questions:

Which abstractions are necessary to modularize Flink programs so that they

can be created from flow-based, graphical mashup tools?

How can end users get support during the process of creating Flink

programs graphically?

Objectives & Methodology

Technical University of MunichObjectives & Methodology

• Research questions revisited:

How can end users get support during the process of creating Flink programs

graphically so that they place visual components in the right order?

Which abstractions are necessary to modularize Flink streaming programs

so that they can be created from flow-based, graphical mashup tools?

Objectives & Methodology

Technical University of MunichObjectives & Methodology

• Research questions revisited:

• Methodology

1. Literature review

2. Design

• Analyze mashup tools (aFlux) and Flink

• Outcome: mashup components that allow the creation of Flink jobs

3. Implementation

• Java code generation + packaging of final job

• Continuous validation to support users

4. Evaluation → SmartSantander

How can end users get support during the process of creating Flink programs

graphically so that they place visual components in the right order?

Which abstractions are necessary to modularize Flink streaming programs

so that they can be created from flow-based, graphical mashup tools?

Related Work

• Nussknacker• Open-source solution

• Architecture

• Engine

• User Interface

• Integrations

• Other tools• IBM SPSS Modeler

• Microsoft Azure Stream Analytics

Conceptual Approach

Technical University of MunichConceptual Approach

Model 1: Translator

Enable the creation of programs for Stream Analytics graphically.

Model 2: End-User Continuous Support

Continuously assess the end-user flow composition for semantic validity and provide feedback about it.

Conceptual Approach

Model 1: Translator

Enable the creation of programs for Stream Analytics graphically.

Model 2: End-User Continuous Support

Continuously assess the end-user flow composition for semantic validity and provide feedback about it.

Conceptual Approach

Model 1: Translator

Enable the creation of programs for Stream Analytics graphically.

Model 2: End-User Continuous Support

Continuously assess the end-user flow composition for semantic validity and provide feedback about it.

• Graphical Parser• Create internal model from GUI

• Instantiate actors

• Actor System & Actors• Specific Flink functionality

• Parameterized, generic structure of Flink statements

• Exchange messages → Specific Tree-Like Data Structure (STDS)

• Code Generator

• Mapping of the actual Flink API

• User-defined properties

• Generates, compiles, packages

Conceptual Approach

Model 1: Translator

Enable the creation of programs for Stream Analytics graphically.

Model 2: End-User Continuous Support

Continuously assess the end-user flow composition for semantic validity and provide feedback about it.

Conceptual Approach

Model 1: Translator

Enable the creation of programs for Stream Analytics graphically.

Model 2: End-User Continuous Support

Continuously assess the end-user flow composition for semantic validity and provide feedback about it.

Visual Component Ashould



afterVisual Component B


visual component


visual componentisPrecedentisConsecutiveisMandatory

• Semantics between nodes

• Checked every time two mashup components are wired together

Implementation: Translator

• Graphical parser → embedded into aFlux

• Actors• Exchange FlinkFlowMessage → Contains a STDS

• 12 actors that map Flink’s DataStream API and CEP Library

• Code generator• Java source code generation → JavaPoet library

• FlinkAPIMapper → based on the JavaParser library

• Generates Abstract Syntax Tree (AST) from Flink sources

• Singleton design pattern to boost performance

• Package final job → MavenInvoker

Implementation: Translator

Implementation: End-User Support

• Visual Components → aFlux Mashup


• Conditions implemented in ToolSemanticsCondition

• An array can be defined when developing a new mashup


• Errors are shown to the user when creating

the flow• Component becomes red

• Component name gets an asterisk (“*”)

• Details are shown in the right-hand panel

• Available to all mashup components in aFlux

Implementation: End-User Support

Evaluation: The SmartSantander Project

• City-scale experimental research facility• 3000 IEEE 802.15.4 devices

• 200 GPRS modules

• Static locations + on-board of mobile vehicles

• Here focusing on:

• Traffic Intensity Monitoring

• Environmental Monitoring

• Flink extension to retrieve live data• Independent of aFlux! → Can be contributed to the community

Evaluation: Scenario

Evaluation: Scenario

Goal → prove how easy it is to create Flink jobs from aFlux

UC1: Real Time Data Processing

AggregateFunction, AllWindowedStream, DataStream,

FilterFunction, MapFunction, RichSourceFunction,

SlidingWindow, StreamExecutionEnvironment, TumblingWindow

Code Description

UC1E1Temperature vs. air quality in a certain area in relation with the

average of the city

UC1E2 Air quality vs. traffic charge in the city center

UC1E3 Noise vs. traffic charge in the city center

UC1E4 Max/min monitor

UC2: Pattern Detection

DataStream, Pattern, PatternSelectFunction, PatternStream

Code Description

UC2E1 Traffic increasing in a certain area

UC2E2 Heatwave in the city

Evaluation: Results

• Use Case 1, Experiment 1 (Temperature vs. Air Quality)

Tumbling Windows: size=5min Sliding Windows: size=5min, slide=1min

Live data from SmartSantander API @ 9th July 2018.

Evaluation: Results

• Use Case 2, Experiment 1 (Traffic Jams Detection)

Live data from SmartSantander API @ 9th July 2018.

AfterMatchSkipStrategy strat = AfterMatchSkipStrategy.noSkip();

Pattern<TrafficObservation, TrafficObservation> myPattern =

Pattern.<TrafficObservation>begin("start", strat)

.where(new SimpleCondition<TrafficObservation>() {


public boolean filter(TrafficObservation trafficObservation) throws Exception {

if (trafficObservation.getCharge() >= 50)

return true;

return false;



.where(new SimpleCondition<TrafficObservation>() {


public boolean filter(TrafficObservation trafficObservation) throws Exception {

if (trafficObservation.getCharge() >= 60)

return true;

return false;



.followedBy("end").where(new SimpleCondition<TrafficObservation>() {


public boolean filter(TrafficObservation trafficObservation) throws Exception {

if (trafficObservation.getCharge() >= 75)

return true;

return false;



PatternStream<TrafficObservation> patternStream = CEP.pattern(filteredTraffic, myPattern);

DataStream<SmartSantanderAlert> alerts =

PatternSelectFunction<TrafficObservation, SmartSantanderAlert>() {


public SmartSantanderAlert select(Map<String,

List<TrafficObservation>> map) throws Exception {

TrafficObservation event = map.get("end").get(0);

return new SmartSantanderAlert("Charge went too high in " + event.toString());



Conclusions

• Stream Analytics suits the IoT use-case

• IoT mashup tools as enabling technology

• Research Questions1. Abstractions to modularize Flink streaming programs so that they can be created graphically

2. End-user support while creating programs graphically

• Main contributions

1. A new extension for aFlux that allows the creation of Flink jobs

2. Support for semantics validation in aFlux

3. A new extension for Flink that allows the integration of live data from SmartSantander

• Future lines

• Flink APIs

• User Experience

• Unattended mechanism to deploy the jobs

Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools

Federico Fernández

Master's Thesis Defense

19th July 2018

Supervisor: Prof. (Chang'an Univ.) PD Dr. habil. Christian Prehofer

Advisors: Tanmaya Mahapatra, M.Sc., Dr. Ilias Gerostathopoulos

Apache Flink

Apache Flink

Streaming Architecture

Windows

Tumbling Windows Sliding Windows Session Windows Global Window

Streaming Architecture

Programming Flink

Programming Flink

Flink Connector for SmartSantander

Flink Connector for SmartSantander

aFlux

aFlux

Technical University of Munich

