Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing...

13
Software Systems Engineering Selbst Selbst-Adaptive Big Data Adaptive Big Data Architekturen Architekturen als als Klaus Schmid, Holger Eichelberger {schmid, eichelberger}@sse.uni-hildesheim.de Selbst Selbst-Adaptive Big Data Adaptive Big Data Architekturen Architekturen als als Grundlage Grundlage für für Ressourcen Ressourcen-Optimale Optimale Verarbeitung Verarbeitung {schmid, eichelberger}@sse.uni-hildesheim.de

Transcript of Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing...

Page 1: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen alsals

Klaus Schmid, Holger Eichelberger

{schmid, eichelberger}@sse.uni-hildesheim.de

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen alsals

GrundlageGrundlage fürfür RessourcenRessourcen--OptimaleOptimale

VerarbeitungVerarbeitung

{schmid, eichelberger}@sse.uni-hildesheim.de

Page 2: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Self-adaptive Big Data Architectures

• Big Data

– Processing of large and complex data sets

– Too difficult for traditional data processing applications

– 3V: Volume, Velocity, Volatility

• Problem:

– Volatile stream characteristics (several orders of magnitude)

– Soft real-time processing

– Limited resources / Scale-out not possible

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 1

– Limited resources / Scale-out not possible

• Goal: Sustain quality of data analysis

– Adaptive processing

– Lightweight

Page 3: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Application Instance: FP7 QualiMaster

Risk identification in financial markets

Motivation

Risk identification in financial markets

• Interconnected markets

• Regular risk analysis requested

by EU / US law

• Licensed data

• Bursty data streams

– Financial data

– Social web

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 2

Always optimal processing→ too much HW $$$

Page 4: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Adaptive Systems (MAPE-K)

EASy-Producer

Know-

ledge

EASy-Producer→ Tool for Product Lines and Adaptive Systems

Supports• Variability / Adaptation space modeling

• Constraint analysis

• Derivation of consequence

• Complex instantiation process

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 3

• Complex instantiation process

Page 5: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Data Analysis Pipeline

Stream Processing

Financial source

Twittersource

Financial preprocessing

Spam filteringEvent

detection

Correlationcomputation

Resultsink

Dynamic Graph

Compilation

Focus recommender

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 4

Page 6: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Algorithm Family

• Idea: Exchange algorithms

Stream Processing

• Idea: Exchange algorithms

– Same functionality

– Different runtime characteristics

Financial preprocessing

Correlationcomputation

Dynamic Graph

Compilation

Hayashi-Yoshida Transfer Entropy

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 5

Software Hardware(FPGA)

Software Hardware(FPGA)

Page 7: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

QM-IConfCommands,

User triggersEnd-user

QualiMaster Architecture

Adaptive System Architecture

CoordinationMonitoring / Analysis

Data Management

Adaptation

StateUser triggers

Data

End-userapplication

CloudScale-

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 6

Execution Systems

Pipelines

Storm Hardware Hadoop

DataScale-Out

Page 8: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Adaptation Mechanisms

Scoped by model of adaptation space / adaptation script

Runtime Adaptation Mechanisms

Scoped by model of adaptation space / adaptation script

Mechanism QualiMaster / Stream Processing

Exchange of components

Change of parameters

• Aim: Stream transparency

• Triggered by constraints

• Upcoming: Decision by performance profile

• Triggered by algorithms

• Triggered by user

• Upcoming: Decision by performance profile

20 s →

110 ms

10 ms

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 7

Re-parallelization / migration

• Upcoming: Decision by performance profile

• Upcoming: Source volume prediction

• Last resort: Load shedding

• Storm: Rebalance

• Storm extension

8 s →

50 ms

Page 9: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Lessons learned

• Developing adaptive code is complex

• Storm: Good foundation for distributed stream processing

• Stable installation not trivial

• Testing is tricky and time consuming

• Monitoring aggregates too much (but extensible)

• Small bugs lead to large effects

• Does not support adaptation

• Technology is developing fast

Documentation!

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 8

• Technology is developing fast

• Twitter Heron

• Apache Spark

• Supporting frameworks

Model-baseddevelopment!

Page 10: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Product-Line based Approach

Domain-Specific Modeler

Approach

Domain-Specific Modeler

Domain / Variability Model

Code Generation

-Producer

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 9

Code Generation

Domain-Specific Infrastructure

Page 11: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Domain-specific configuration

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 10

Page 12: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Results

• Topological configuration

Results

• Several pipelines generated: 5 demo + 4 test pipelines

• Validation: <250 ms

• Instantiation

• 4 minutes

• 30 KLOC in 195 artifacts

• Deployable artifact: 40 - 150MBytes

• Integration of algorithms

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 11

• Integration of algorithms

• Integration of adaptation mechanisms / monitoring

• Voice of the “user”

• Clear separation of algorithm/pipelines

• Generate more

Page 13: Selbst --Adaptive Big Data Architekturen als Grundlage ... · Lessons learned • Developing adaptive code is complex • Storm: Good foundation for distributed stream processing

Software

Systems

Engineering

SelbstSelbst--Adaptive Big Data Adaptive Big Data ArchitekturenArchitekturen

fürfür ResourcenResourcen--OptimaleOptimale VerarbeitungVerarbeitung

Summary / Results

• Resource optimization requires processing alternatives

Summary

1684 ticks/s →

1.4M correlations

• Volatile Big Data requires adaptive processing

• Generative approaches can successfully

• Create major parts of technical code (30KLOC, 195 artifacts)

• Integrate complex runtime mechanisms (<110 ms)

• Create deployable artifacts (40-150MBytes)

• Relieve Data Analysts from technical work

1.4M correlations

Output becomesbottleneck!

GI FG Architekturen, 30.06.2016 © SSE, University of Hildesheim 11

The research leading to these results has received funding from the European Union Seventh

Framework Programme [FP7/2007-2013] under grant agreement n° 619525 (QualiMaster).

Project homepage: http://www.qualimaster.eu

Open Source: https://github.com/QualiMaster, http://ssehub.github.io/

Twitter: @QualiMasterEU