Model driven engineering for big data management systems

42
www.modeliosoft.com Model driven engineering for big data management systems Marcos ALMEIDA [email protected] Sarah DAHAB [email protected] Andrey SADOVYKH [email protected] 1

Transcript of Model driven engineering for big data management systems

Page 1: Model driven engineering for big data management systems

www.modeliosoft.com

Model driven engineering for big data management systems

Marcos ALMEIDA [email protected]

Sarah DAHAB [email protected]

Andrey SADOVYKH [email protected]

1

Page 2: Model driven engineering for big data management systems

Outlines

Introduction Model-driven development Big Data

Juniper Sample application Conclusions

www.modeliosoft.com 2

Page 3: Model driven engineering for big data management systems

20 ME

2006

17,5 ME

2005

70 ME

2013

ParisRennesNantes

Sophia

SOFTEAM – a French IT services / Software vendor

• SOFTEAM, a growing company 25 years’ experience 900 experts Regular growth

• Specialist in OO technologies, new architectures, methodologies

• Banking, Defense, Telecom, …

www.modeliosoft.com 3

23 ME

2008

Page 4: Model driven engineering for big data management systems

Modelio for Software and System Engineering

• UML editor with 20 years’ historyo CloudMLo SysMLo MARTEo Code generationo Documentationo Teamwork

www.modeliosoft.com 4

• Available under open source at Modelio.org

Page 5: Model driven engineering for big data management systems

MODEL-DRIVEN DEVELOPMENT

www.modeliosoft.com 5

Page 6: Model driven engineering for big data management systems

It is all about models … Starting with UML

www.modeliosoft.com 6

Requirements

UML Use Cases

Architecture

UML Componentsand Classes

Design

Refined Classesor Domain Specific Language

Implementation

Code generationJava, C++, Frameworks

Page 7: Model driven engineering for big data management systems

Model = Code

www.modeliosoft.com 7

Page 8: Model driven engineering for big data management systems

Typical example: Control system for a frigate

• 800+ components• Developed by 100+ engineers• 1M+ LOC

• MDD fosters Productivity and Quality witho Code generationo Components reuseo Tracingo Automation

www.modeliosoft.com 8

Page 9: Model driven engineering for big data management systems

Curious DSL example: Ruby on Rails

Haml HTML%br{:clear => left’} <br clear=”left”/>%p.foo Hello <p class=”foo”>Hello</p>%p#foo Hello <p id=”foo”>Hello</p>.foo <div class=”foo”>...</div>#foo.bar <div id=”foo” class=”bar”>...</div>

www.modeliosoft.com 9

Feature: User can manually add movie Scenario: Add a movie Given I am on the RottenPotatoes home page When I follow "Add new movie" Then I should be on the Create New Movie page When I fill in "Title" with "Men In Black" And I should see "Men In Black"

Cucumber and Capybara

HAML

Page 10: Model driven engineering for big data management systems

What do we get from MDD?

Pros• Design once, deploy

everywhere!• Write your

transformation once, transform anything!

Cons• Transformations are

hard to write…• How to make sure they

are CORRECT? i.e.– Is there any

data/semantic loss?

www.modeliosoft.com 10

Page 11: Model driven engineering for big data management systems

BIG DATA

www.modeliosoft.com 11

Page 12: Model driven engineering for big data management systems

Volume, variety, velocity

1. @-mails sent every second : 2,9 million

2. Video uploaded to YouTube every minute: 25 hours

3. Data processed by Google every day: 24 petabytes

4. Tweets per day: 50 million

5. Products ordered on Amazon per second: 73 items

www.modeliosoft.com 12

Page 13: Model driven engineering for big data management systems

Only 0,5 % of data is analyzed

• In 2012, 2 837EB generated - just 0,5% actually analyzed.

That still amounts to 14EB (or 14.185 million terabytes)

Source: IDC & EMC

www.modeliosoft.com 13

Page 14: Model driven engineering for big data management systems

The main problem is Heterogeneity!

• Many different database management systemso Ex:

• MySQL (www.mysql.com/), • Big Table (http://research.google.com/archive/bigtable.html)• SimpleDB (http://aws.amazon.com/simpledb/)• Memcached (http://memcached.org/)• …

• Many underlying data representation paradigmso Ex:

• Relational Databases• Key-value Stores• Object-oriented Databases• Big Tables• …

www.modeliosoft.com 14

Page 15: Model driven engineering for big data management systems

The basis of our solution is MDE… Why?

• Separating the problem from the solutiono In JUNIPER we model the solution

• Fostering automationo Analysiso Code generation

www.modeliosoft.com 15

BusinessObjects Transformation

HDFS

MySQL

MongoDB

Abstract ModelsSpecific Models / code

Transformation

Transformation

Page 16: Model driven engineering for big data management systems

Understanding the problem… Why is it so HARD? (1/2)• Target Technologies based on different paradigms• Example:

www.modeliosoft.com 16

A

B

JPA@Entitypublic class A { @Basic public B getB(){ … }…}

SQLcreate table A (…)create table B (…)create table A_B (…)

Page 17: Model driven engineering for big data management systems

Understanding the problem… Why is it so HARD? (2/2)• Target structure is variable• Example:

www.modeliosoft.com 17

A

B

ER

NoSQL

A

BAB

Here A and B are

independent entities

Here, for performance reasons, B is

embedded in AA

B

Page 18: Model driven engineering for big data management systems

Illustration: comparative features of MongoDB and PostgreSQL

www.modeliosoft.com 18

Page 19: Model driven engineering for big data management systems

Our solution: a component based approach to NoSQL heterogeneity

• Generic model transformation chaino Integrated to other Juniper tools

• Audit rules• Model to model transformations• Code generators

• Database specific instantiationsoApplication architecture modellingoData modelling oHardware architecture (deployment) modelling

www.modeliosoft.com 19

Page 20: Model driven engineering for big data management systems

www.modeliosoft.com 20

Page 21: Model driven engineering for big data management systems

The Juniper FP7 EU project

Website: http://www.juniper-project.org/Start Date: 2012-12-01Duration: 36 monthsTotal cost: 4 M€

www.modeliosoft.com 21

Page 22: Model driven engineering for big data management systems

JUNIPER integrates Big Data technologies over MPI

www.modeliosoft.com 22

DOCs StreamsDBs

Data Processing

Stage 1 Stage N

BusinessIntelligence

Analytical DBs

Visualization

dbdb

DOCsDOCs

Data Processing in JUNIPER

S1

S3

S2

Analytical DBs

mpi

mpi

mpi

mpi

FPGA-enabled nodesHadoop

HPC

Page 23: Model driven engineering for big data management systems

Modelling in Juniper

www.modeliosoft.com 23

Models

High level Architecture

(Nodes,Programs, Streams…)

Real-timeconstraints

Java Code Code

Generation (+MPI initialization, communication, etc)

Reverse Engineering

SchedulabilityAnalysis

Tool

SchedulingAdvisor

Measurements & Advice

Deployment Scripts

ConfigurationModelExport

CodeGeneration

Page 24: Model driven engineering for big data management systems

Mapping Programming Model, UML and MARTE

www.modeliosoft.com 24

JUNIPERProgram

Channel

Cloud Node

ProgrammingModel

UML MARTE

Page 25: Model driven engineering for big data management systems

Modelling the application and real-time constraints

www.modeliosoft.com 25

Real-time constrains- response time- bandwidth

Big Data flowJUNIPER Programs

Page 26: Model driven engineering for big data management systems

Modelling the hardware infrastructure at a high level

www.modeliosoft.com 26

Cloud Node

CPU with 4 cores Hard drive

Page 27: Model driven engineering for big data management systems

MPI code generation

www.modeliosoft.com 27

Page 28: Model driven engineering for big data management systems

Overview of the Juniper programming model concepts

Next step: integrating data modelling to the programming model

www.modeliosoft.com 28

Page 29: Model driven engineering for big data management systems

Business data modelling in Juniper

• Example

www.modeliosoft.com 29

Uniquely identifying pieces of data

Partitioning dataIn different nodes

Page 30: Model driven engineering for big data management systems

Business data modelling in Juniper

• Concepts

www.modeliosoft.com 30

Page 31: Model driven engineering for big data management systems

Approach taken for dealing with heterogeneity in JUNIPER

1. Define a generic template for Modelio modules to provide support for big data management systems

2. Instantiate the template for MongoDB and PostgreSQL

www.modeliosoft.com 31

Page 32: Model driven engineering for big data management systems

MongoDB modelling module

www.modeliosoft.com 32

Page 33: Model driven engineering for big data management systems

MongoDB Example (1/2)

www.modeliosoft.com 33

Page 34: Model driven engineering for big data management systems

MongoDB Example (2/2) + DEMO Video

Database schema configuration scripts

Deployment scripts

Configuration scripts

www.modeliosoft.com 34

Page 35: Model driven engineering for big data management systems

PostgreSQL modeller module

www.modeliosoft.com 35

Page 36: Model driven engineering for big data management systems

PostgreSQL Example

master installation scriptstandby installation script

configuration files

www.modeliosoft.com 36

Page 37: Model driven engineering for big data management systems

DATABASE MIGRATION SAMPLE APPLICATION

www.modeliosoft.com 37

Page 38: Model driven engineering for big data management systems

[VIDEO]

www.modeliosoft.com 38

Page 39: Model driven engineering for big data management systems

CONCLUSIONS

www.modeliosoft.com 39

Page 40: Model driven engineering for big data management systems

In short…

• Challenge: o Big data applications How should we handle heterogeneous data ??

• Juniper response:o Model driven solution for designing real-time big data systemso Component based solution to heterogeneity

• General business objects + big data concepts modelling • Database specific concepts

– Modelling– Model transformations– Code generation

www.modeliosoft.com 40

Page 41: Model driven engineering for big data management systems

… and Perspectives / Exploitation

• Source code and documentation available on our websiteo http://forge.modelio.org/projects/junipero http://forge.modelio.org/projects/mongodb-modelero http://forge.modelio.org/projects/postgresql-modeler

• Tutorial + Dissemination on our forum

www.modeliosoft.com 41

Page 42: Model driven engineering for big data management systems

Questions?Marcos AlmeidaSOFTEAM | ModelioSoft{name.surname}@softeam.fr

SOFTEAM R&D Web Site: http://rd.softeam.com

Modelio Web Site : http://www.modelio.orghttp://forge.modelio.org/projects/juniper

JUNIPER Web Site : http://www.juniper-project.org

www.modeliosoft.com 42

*

*for your questions