IncQuery-D: Distributed Incremental Graph Queries

Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems

DISTRIBUTED INCREMENTAL GRAPH QUERIES

Gábor Szárnyas, Dániel Varró

2 February, 2015

22nd Minisymposium of the Department of Measurement and Information Systems

MOTIVATION

Performance issues

Agile Model-Driven Development

Modeling

Codegeneration

Testing

Early validationsTransformations

Scalabilitychallenges

Model Sizes

Models = graphs with 100M–1B elements

o Car industry

o Avionics

o Software analysis

o Cyber-physical systems

Source: Markus Scheidgen, Automated and TransparentModel Fragmentation for Persisting Large Models, 2012

application model size

software models 108

sensor data 109

geo-spatial models 1012

Validation may take hours

Scalability

Incrementality

Incremental queries

Incremental transformation

Storing partialresults

Trackingchanges

Motivating Example

Pattern for an AUTOSAR validation constraint

Communicationchannel

Logical signal Mapping Physical signal

Invalid submodel

Validation

Valid submodel

Antijoin

Fill indexer nodesStore interim resultsRead result setEdit modelPropagating changesRead result set

Rete Algorithm

Communication channel

Logical signal Mapping Physical signal

Result set

CURRENT STATE OF RESEARCH

EMF-INCQUERY

Rete-based incremental graph query engine

Open source Eclipse project

Typical use cases

o Validation

o Incremental model transformation

oModel synchronization

Single Workstation Limitations

Majority of tools mostly work for <1M model elements due to resource exhaustion

Best tools: <10M model elements

JVM limitations: cannot handle 15+ GB heap memory efficiently

Proposed solution

o Horizontal scaling: distributed system

Problem Statement

Scalability

Scalable storage

Scalable query engine

Distributed NoSQLdatabases

Distributed INCQUERY:

INCQUERY-D

Complex queries

Big models

Goals of INCQUERY-D

Objectives

o Distributed incremental pattern matching

o Adapting EMF-INCQUERY’s tooling to distributed DBs

o Executed over a cloud infrastructure (COTS hardware)

Achieve scalability by avoiding memory bottleneck

o Sharding separately

• Data

• Indexers

• Query network

o In memory

• Index + query

RESEARCH QUESTIONS AND RESULTS

Architecture and Data Representation

Is it possible to build a query engine which works on various backends using different data representation formats?

Is it possible to serve multiple users concurrently?

INCQUERY-D Architecture

Server 1

Databaseshard 1

Server 2

Databaseshard 2

Server 3

Databaseshard 3

Transaction

In-memory EMF modelDatabaseshard 0

Server 0

Rete net

Indexer layer

EMF-INCQUERY INCQUERY-D

Distributed query evaluation network

Distributed indexer Model access adapterIndexing

Indexer Indexer Indexer Indexer

Antijoin

In-memory storage

Distributed indexing, notification

Production network• Stores intermediate query results• Propagates changes

Distributed persistent storage

Distributed production network• Each intermediate node can be allocated

to a different host• Remote internode communication

Scalable Incremental Query Evaluation

Is it possible to utilise an incremental query evaluation algorithm in a distributed system for high performance query evaluation?

How can we benchmark a distributed system in areproducible manner?

Benchmark Results for Revalidation

Quick response time for models with 88M elements

Different characteristics

Dimensions of Scalability

Infrastructure

o Number of machines

o Available memory / CPU

o Network performance

o Number of concurrent users

o Model size

o Model characteristics

Queries

o Number of queries

o Query complexity

Optimisation and Dynamic Reconfiguration

How can we scale and optimise such a system?

How can the system adapt to the changes

o in the system?

o in the cloud environment?

How can we estimate the resources required by a certain setup?

Dynamic Resource Allocation

Server 1 Server 2 Server 3Server 0

Indexer Indexer Indexer Indexer

Antijoin

10% 70% 60%

80%90%

25%75%

Memory usage

Conclusion

MDE provides Big Data questions for research

Horizontal scaling is a way for querying large models

Theoretical challenges

o Distributed pattern matching algorithm

o Data representation

o Dynamic resource allocation

Practical challenges

o Integrating technologies: database, messaging framework, monitoring, user interface, etc.

o High performance query evaluation

IncQuery-D: Distributed Incremental Graph Queries

Engineering

Transcript of IncQuery-D: Distributed Incremental Graph Queries

Optimization of Incremental Queries in the Cloudreal.mtak.hu/48176/1/cloudmde15_incqueryd_allocation_u.pdfdistributed architecture. INCQUERY-D is a system based on a distributed Rete

EMF-IncQuery: Blazing-fast reaction time even for very large diagrams (Sirius integration)

The Quill Distributed Analytics Library and Platform · couples incremental query logic speciﬁcation, a small but rich set of data movement operations, ... Ad-hoc queries, one-time

Optimization of Incremental Queries in the Cloud · 2017. 3. 2. · Optimization of Incremental Queries in the Cloud Jozsef Makai, G´ abor Sz´ arnyas,´ Akos Horv´ ´ath, Istv

BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

Incremental Evaluation of Sliding-Window Queries over Data ...incremental evaluation of sliding-window queries over data streams. 2.1 Sliding-window Query Semantics A sliding-window

Incremental Parsing by Modular Recurrent Connectionist ...papers.nips.cc/paper/201-incremental-parsing-by-modular-recurrent... · Incremental Parsing by Modular Recurrent ... Incremental

Incremental Hierarchical Discriminant Regressionweng/research/TNN-IHDR.pdf · Incremental Hierarchical Discriminant Regression ... incremental learning, cortical development, discriminant

The ingraph project - Amazon S3 · The ingraph project and incremental evaluation of Cypher queries Gábor Szárnyas, József Marton. Incremental Queries. Live railway model. Live

Databases – Queries and Database Practice Queries

Optimization of Incremental Queries in the Cloudceur-ws.org/Vol-1563/paper1.pdfINCQUERY-D [1] is a distributed, incremental model query engine that aims to address scalability issues

Incremental Checking of OCL Constraints through SQL queries

Incremental Model Queries in Model-driven Designhome.mit.bme.hu/~bergmann/download/phd-booklet-bergmann...Budapest University of Technology and Economics Department of Measurement

EMF-INCQuery Incremental evaluation of model …mit.bme.hu/~rath/ppt/EMF-IncQuery_Tutorial_ECMFA11_Rath.pdfBudapestUniversityof)TechnologyandEconomics Department)of)Measurement)and)Informa

EMF-IncQuery 0.7 Presentation for Itemis

EMF-IncQuery presentation at TOOLS 2012

Xcore meets IncQuery: How the New Generation of DSLs are Made

IncQuery-D: Incremental Model Queries in the Cloudand ad-hoc querying challenges by adapting incremental graph search techniques – known from the EMF-IncQuery framework – to a

Incremental Aggregation on Multiple Continuous Queries

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges