Model driven engineering for big data management systems

Post on 14-Apr-2017

136 views 0 download

Transcript of Model driven engineering for big data management systems

www.modeliosoft.com

Model driven engineering for big data management systems

Marcos ALMEIDA marcos.almeida@softeam.fr

Sarah DAHAB sarah.dahab@telecom-sudparis.eu

Andrey SADOVYKH andrey.sadovykh@softeam.fr

1

Outlines

Introduction Model-driven development Big Data

Juniper Sample application Conclusions

www.modeliosoft.com 2

20 ME

2006

17,5 ME

2005

70 ME

2013

ParisRennesNantes

Sophia

SOFTEAM – a French IT services / Software vendor

• SOFTEAM, a growing company 25 years’ experience 900 experts Regular growth

• Specialist in OO technologies, new architectures, methodologies

• Banking, Defense, Telecom, …

www.modeliosoft.com 3

23 ME

2008

Modelio for Software and System Engineering

• UML editor with 20 years’ historyo CloudMLo SysMLo MARTEo Code generationo Documentationo Teamwork

www.modeliosoft.com 4

• Available under open source at Modelio.org

MODEL-DRIVEN DEVELOPMENT

www.modeliosoft.com 5

It is all about models … Starting with UML

www.modeliosoft.com 6

Requirements

UML Use Cases

Architecture

UML Componentsand Classes

Design

Refined Classesor Domain Specific Language

Implementation

Code generationJava, C++, Frameworks

Model = Code

www.modeliosoft.com 7

Typical example: Control system for a frigate

• 800+ components• Developed by 100+ engineers• 1M+ LOC

• MDD fosters Productivity and Quality witho Code generationo Components reuseo Tracingo Automation

www.modeliosoft.com 8

Curious DSL example: Ruby on Rails

Haml HTML%br{:clear => left’} <br clear=”left”/>%p.foo Hello <p class=”foo”>Hello</p>%p#foo Hello <p id=”foo”>Hello</p>.foo <div class=”foo”>...</div>#foo.bar <div id=”foo” class=”bar”>...</div>

www.modeliosoft.com 9

Feature: User can manually add movie Scenario: Add a movie Given I am on the RottenPotatoes home page When I follow "Add new movie" Then I should be on the Create New Movie page When I fill in "Title" with "Men In Black" And I should see "Men In Black"

Cucumber and Capybara

HAML

What do we get from MDD?

Pros• Design once, deploy

everywhere!• Write your

transformation once, transform anything!

Cons• Transformations are

hard to write…• How to make sure they

are CORRECT? i.e.– Is there any

data/semantic loss?

www.modeliosoft.com 10

BIG DATA

www.modeliosoft.com 11

Volume, variety, velocity

1. @-mails sent every second : 2,9 million

2. Video uploaded to YouTube every minute: 25 hours

3. Data processed by Google every day: 24 petabytes

4. Tweets per day: 50 million

5. Products ordered on Amazon per second: 73 items

www.modeliosoft.com 12

Only 0,5 % of data is analyzed

• In 2012, 2 837EB generated - just 0,5% actually analyzed.

That still amounts to 14EB (or 14.185 million terabytes)

Source: IDC & EMC

www.modeliosoft.com 13

The main problem is Heterogeneity!

• Many different database management systemso Ex:

• MySQL (www.mysql.com/), • Big Table (http://research.google.com/archive/bigtable.html)• SimpleDB (http://aws.amazon.com/simpledb/)• Memcached (http://memcached.org/)• …

• Many underlying data representation paradigmso Ex:

• Relational Databases• Key-value Stores• Object-oriented Databases• Big Tables• …

www.modeliosoft.com 14

The basis of our solution is MDE… Why?

• Separating the problem from the solutiono In JUNIPER we model the solution

• Fostering automationo Analysiso Code generation

www.modeliosoft.com 15

BusinessObjects Transformation

HDFS

MySQL

MongoDB

Abstract ModelsSpecific Models / code

Transformation

Transformation

Understanding the problem… Why is it so HARD? (1/2)• Target Technologies based on different paradigms• Example:

www.modeliosoft.com 16

A

B

JPA@Entitypublic class A { @Basic public B getB(){ … }…}

SQLcreate table A (…)create table B (…)create table A_B (…)

Understanding the problem… Why is it so HARD? (2/2)• Target structure is variable• Example:

www.modeliosoft.com 17

A

B

ER

NoSQL

A

BAB

Here A and B are

independent entities

Here, for performance reasons, B is

embedded in AA

B

Illustration: comparative features of MongoDB and PostgreSQL

www.modeliosoft.com 18

Our solution: a component based approach to NoSQL heterogeneity

• Generic model transformation chaino Integrated to other Juniper tools

• Audit rules• Model to model transformations• Code generators

• Database specific instantiationsoApplication architecture modellingoData modelling oHardware architecture (deployment) modelling

www.modeliosoft.com 19

www.modeliosoft.com 20

The Juniper FP7 EU project

Website: http://www.juniper-project.org/Start Date: 2012-12-01Duration: 36 monthsTotal cost: 4 M€

www.modeliosoft.com 21

JUNIPER integrates Big Data technologies over MPI

www.modeliosoft.com 22

DOCs StreamsDBs

Data Processing

Stage 1 Stage N

BusinessIntelligence

Analytical DBs

Visualization

dbdb

DOCsDOCs

Data Processing in JUNIPER

S1

S3

S2

Analytical DBs

mpi

mpi

mpi

mpi

FPGA-enabled nodesHadoop

HPC

Modelling in Juniper

www.modeliosoft.com 23

Models

High level Architecture

(Nodes,Programs, Streams…)

Real-timeconstraints

Java Code Code

Generation (+MPI initialization, communication, etc)

Reverse Engineering

SchedulabilityAnalysis

Tool

SchedulingAdvisor

Measurements & Advice

Deployment Scripts

ConfigurationModelExport

CodeGeneration

Mapping Programming Model, UML and MARTE

www.modeliosoft.com 24

JUNIPERProgram

Channel

Cloud Node

ProgrammingModel

UML MARTE

Modelling the application and real-time constraints

www.modeliosoft.com 25

Real-time constrains- response time- bandwidth

Big Data flowJUNIPER Programs

Modelling the hardware infrastructure at a high level

www.modeliosoft.com 26

Cloud Node

CPU with 4 cores Hard drive

MPI code generation

www.modeliosoft.com 27

Overview of the Juniper programming model concepts

Next step: integrating data modelling to the programming model

www.modeliosoft.com 28

Business data modelling in Juniper

• Example

www.modeliosoft.com 29

Uniquely identifying pieces of data

Partitioning dataIn different nodes

Business data modelling in Juniper

• Concepts

www.modeliosoft.com 30

Approach taken for dealing with heterogeneity in JUNIPER

1. Define a generic template for Modelio modules to provide support for big data management systems

2. Instantiate the template for MongoDB and PostgreSQL

www.modeliosoft.com 31

MongoDB modelling module

www.modeliosoft.com 32

MongoDB Example (1/2)

www.modeliosoft.com 33

MongoDB Example (2/2) + DEMO Video

Database schema configuration scripts

Deployment scripts

Configuration scripts

www.modeliosoft.com 34

PostgreSQL modeller module

www.modeliosoft.com 35

PostgreSQL Example

master installation scriptstandby installation script

configuration files

www.modeliosoft.com 36

DATABASE MIGRATION SAMPLE APPLICATION

www.modeliosoft.com 37

[VIDEO]

www.modeliosoft.com 38

CONCLUSIONS

www.modeliosoft.com 39

In short…

• Challenge: o Big data applications How should we handle heterogeneous data ??

• Juniper response:o Model driven solution for designing real-time big data systemso Component based solution to heterogeneity

• General business objects + big data concepts modelling • Database specific concepts

– Modelling– Model transformations– Code generation

www.modeliosoft.com 40

… and Perspectives / Exploitation

• Source code and documentation available on our websiteo http://forge.modelio.org/projects/junipero http://forge.modelio.org/projects/mongodb-modelero http://forge.modelio.org/projects/postgresql-modeler

• Tutorial + Dissemination on our forum

www.modeliosoft.com 41

Questions?Marcos AlmeidaSOFTEAM | ModelioSoft{name.surname}@softeam.fr

SOFTEAM R&D Web Site: http://rd.softeam.com

Modelio Web Site : http://www.modelio.orghttp://forge.modelio.org/projects/juniper

JUNIPER Web Site : http://www.juniper-project.org

www.modeliosoft.com 42

*

*for your questions