Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System...

12
Distributed FoodBroker STEPHAN KEMPER , ANDRÉ PETERMANN, MARTIN JUNGHANNS

Transcript of Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System...

Page 1: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Distributed FoodBrokerSTEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 2: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

AgendaIntroduction

BasicsFoodBroker

Flink

GRADOOP

Distributed FoodBroker

EvaluationSystem

Results

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 3: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

IntroductionGraphs as data structure

Tools for graph based analysis

Required data data generators

FoodBroker

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 4: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Basics – FoodBrokerSimulates predefined business process[1]

Two phases ‘Brokerage’ and ‘Complaint Handling’

Two kinds of dataMaster data

Transactional data

Executions represented by

graphs

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 5: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Basics – Apache FlinkOpen Source framework[2]

Provides runtime environment

Allows transformations among distributed datasets

Datasources and –sinks: HDFS

NoSQL Databases

https://flink.apache.org

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 6: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Basics – GRADOOPOpen Source framework [3]

Extended Property Graph Model (EPGM)[4]

Provides operations for graphs and graph collections

Implemented on top of Flink

http://www.gradoop.org

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 7: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Distributed FoodBroker

Flink + GRADOOP + FoodBroker = Distributed FoodBroker

Scalable for generating huge amounts of data

Problems

Brokerage and Complaint Handling mostly linear

Master data used in each transaction

Brokerage Process

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

SalesQuotation

Page 8: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

FoodBroker config

Starts n transactions

Scalefactor defines generated data amount

Brokerage and Complaint Handling

similar to original FoodBroker

Distributed FoodBroker - Process

Distributed FoodBroker Process

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 9: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Evaluation - SystemCluster of 5 computer

Each consists ofIntel XEON W3520 (4 x 2.6 Ghz)

6 GB RAM

500 GB Samsung HDD

SofwareApache Flink 1.1.2

Hadoop 2.5.2

GRADOOP 0.3.0-SNAPSHOT

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 10: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Evaluation - Results

1

10

100

1000

10000

1 2 4 8 16

tim

e in

s

parallelism #cores

Execution of Distributed FoodBroker

Scalefactor 100 Scalefactor 1000

1

2

4

8

16

1 2 4 8 16

spee

du

p

parallelism #cores

Speedup of Distributed FoodBroker

Scalefactor 100 Scalefactor 1000 linear

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 11: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

References[1] Petermann, André et al.: FoodBroker - Generating Synthetic Datasets for Graph-Based

Business Analytics. 5th Works. on Big Data Benchmarking (WBDB), LNCS 8991, 2014.

[2] Carbone, Paris et al.: Apache FlinkTM: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull., 38(4), 2015.

[3] Junghanns, Martin et al.: Gradoop: Scalable Graph Data Management and Analytics with Hadoop. Bericht, University of Leipzig, 2015.

[4] Junghanns, Martin et al.: Analyzing Extended Property Graphs with Apache Flink. Proc. SIGMOD, 2016.

DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Page 12: Distributed FoodBroker - Uni Stuttgart€¦ · GRADOOP Distributed FoodBroker Evaluation System Results DISTRIBUTED FOODBROKER : STEPHAN KEMPER, ANDRÉ PETERMANN, MARTIN JUNGHANNS

Thank you for your attention!