IncQuery-D: Distributed Incremental Graph Queries

22
Budapest University of Technology and Economics Department of Measurement and Information Systems DISTRIBUTED INCREMENTAL GRAPH QUERIES Gábor Szárnyas, Dániel Varró 2 February, 2015 22nd Minisymposium of the Department of Measurement and Information Systems

Transcript of IncQuery-D: Distributed Incremental Graph Queries

Page 1: IncQuery-D: Distributed Incremental Graph Queries

Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems

DISTRIBUTED INCREMENTAL GRAPH QUERIES

Gábor Szárnyas, Dániel Varró

2 February, 2015

22nd Minisymposium of the Department of Measurement and Information Systems

Page 2: IncQuery-D: Distributed Incremental Graph Queries

MOTIVATION

Page 3: IncQuery-D: Distributed Incremental Graph Queries

Performance issues

Agile Model-Driven Development

Modeling

Codegeneration

Testing

Early validationsTransformations

Scalabilitychallenges

Page 4: IncQuery-D: Distributed Incremental Graph Queries

Model Sizes

Models = graphs with 100M–1B elements

o Car industry

o Avionics

o Software analysis

o Cyber-physical systems

Source: Markus Scheidgen, Automated and TransparentModel Fragmentation for Persisting Large Models, 2012

application model size

software models 108

sensor data 109

geo-spatial models 1012

Validation may take hours

Page 5: IncQuery-D: Distributed Incremental Graph Queries

MDE

Scalability

Incrementality

Incremental queries

Incremental transformation

Storing partialresults

Trackingchanges

Page 6: IncQuery-D: Distributed Incremental Graph Queries

Motivating Example

Pattern for an AUTOSAR validation constraint

Communicationchannel

Logical signal Mapping Physical signal

Invalid submodel

Validation

Valid submodel

Page 7: IncQuery-D: Distributed Incremental Graph Queries

Antijoin

Join

Join

Fill indexer nodesStore interim resultsRead result setEdit modelPropagating changesRead result set

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Result set

Page 8: IncQuery-D: Distributed Incremental Graph Queries

CURRENT STATE OF RESEARCH

Page 9: IncQuery-D: Distributed Incremental Graph Queries

EMF-INCQUERY

Rete-based incremental graph query engine

Open source Eclipse project

Typical use cases

o Validation

o Incremental model transformation

oModel synchronization

Page 10: IncQuery-D: Distributed Incremental Graph Queries

Single Workstation Limitations

Majority of tools mostly work for <1M model elements due to resource exhaustion

Best tools: <10M model elements

JVM limitations: cannot handle 15+ GB heap memory efficiently

Proposed solution

o Horizontal scaling: distributed system

Page 11: IncQuery-D: Distributed Incremental Graph Queries

Problem Statement

Scalability

Scalable storage

Scalable query engine

Distributed NoSQLdatabases

Distributed INCQUERY:

INCQUERY-D

Complex queries

Big models

Page 12: IncQuery-D: Distributed Incremental Graph Queries

Goals of INCQUERY-D

Objectives

o Distributed incremental pattern matching

o Adapting EMF-INCQUERY’s tooling to distributed DBs

o Executed over a cloud infrastructure (COTS hardware)

Achieve scalability by avoiding memory bottleneck

o Sharding separately

• Data

• Indexers

• Query network

o In memory

• Index + query

Page 13: IncQuery-D: Distributed Incremental Graph Queries

RESEARCH QUESTIONS AND RESULTS

Page 14: IncQuery-D: Distributed Incremental Graph Queries

Architecture and Data Representation

Is it possible to build a query engine which works on various backends using different data representation formats?

Is it possible to serve multiple users concurrently?

Page 15: IncQuery-D: Distributed Incremental Graph Queries

INCQUERY-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

In-memory EMF modelDatabaseshard 0

Server 0

Rete net

Indexer layer

EMF-INCQUERY INCQUERY-D

Distributed query evaluation network

Distributed indexer Model access adapterIndexing

Indexer Indexer Indexer Indexer

Join

Join

Antijoin

In-memory storage

Distributed indexing, notification

Production network• Stores intermediate query results• Propagates changes

Distributed persistent storage

Distributed production network• Each intermediate node can be allocated

to a different host• Remote internode communication

Page 16: IncQuery-D: Distributed Incremental Graph Queries

Scalable Incremental Query Evaluation

Is it possible to utilise an incremental query evaluation algorithm in a distributed system for high performance query evaluation?

How can we benchmark a distributed system in areproducible manner?

Page 17: IncQuery-D: Distributed Incremental Graph Queries

Benchmark Results for Revalidation

Quick response time for models with 88M elements

Different characteristics

Page 18: IncQuery-D: Distributed Incremental Graph Queries

Dimensions of Scalability

Infrastructure

o Number of machines

o Available memory / CPU

o Network performance

o Number of concurrent users

Model

o Model size

o Model characteristics

Queries

o Number of queries

o Query complexity

Page 19: IncQuery-D: Distributed Incremental Graph Queries

Optimisation and Dynamic Reconfiguration

How can we scale and optimise such a system?

How can the system adapt to the changes

o in the system?

o in the cloud environment?

How can we estimate the resources required by a certain setup?

Page 20: IncQuery-D: Distributed Incremental Graph Queries

Dynamic Resource Allocation

Server 1 Server 2 Server 3Server 0

Indexer Indexer Indexer Indexer

Join

Join

Antijoin

10% 70% 60%

Δ

80%90%

Join

25%75%

Δ

Δ

Memory usage

Page 21: IncQuery-D: Distributed Incremental Graph Queries

Conclusion

MDE provides Big Data questions for research

Horizontal scaling is a way for querying large models

Theoretical challenges

o Distributed pattern matching algorithm

o Data representation

o Dynamic resource allocation

Practical challenges

o Integrating technologies: database, messaging framework, monitoring, user interface, etc.

o High performance query evaluation

Page 22: IncQuery-D: Distributed Incremental Graph Queries

Ω