Efficient and scalable graph view maintenance for deductive graph

Click here to load reader

  • date post

    03-Jan-2017
  • Category

    Documents

  • view

    216
  • download

    1

Embed Size (px)

Transcript of Efficient and scalable graph view maintenance for deductive graph

  • Technische Berichte Nr. 99

    des Hasso-Plattner-Instituts fr Softwaresystemtechnik an der Universitt Potsdam

    Efficient and Scalable Graph View Maintenance for Deductive Graph Databases based on Generalized Discrimination NetworksThomas Beyhl, Holger Giese

    ISBN 978-3-86956-339-8ISSN 1613-5652

  • Technische Berichte des Hasso-Plattner-Instituts fr Softwaresystemtechnik an der Universitt Potsdam

  • Technische Berichte des Hasso-Plattner-Instituts fr Softwaresystemtechnik an der Universitt Potsdam | 99

    Thomas Beyhl | Holger Giese

    Efficient and Scalable Graph View Maintenance for Deductive Graph Databases based on

    Generalized Discrimination Networks

    Universittsverlag Potsdam

  • Bibliografische Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet ber http://dnb.dnb.de/ abrufbar. Universittsverlag Potsdam 2015 http://verlag.ub.uni-potsdam.de/ Am Neuen Palais 10, 14469 Potsdam Tel.: +49 (0)331 977 2533 / Fax: 2292 E-Mail: [email protected] Die Schriftenreihe Technische Berichte des Hasso-Plattner-Instituts fr Softwaresystemtechnik an der Universitt Potsdam wird herausgegeben von den Professoren des Hasso-Plattner-Instituts fr Softwaresystemtechnik an der Universitt Potsdam. ISSN (print) 1613-5652 ISSN (online) 2191-1665 Das Manuskript ist urheberrechtlich geschtzt. Druck: docupoint GmbH Magdeburg ISBN 978-3-86956-339-8 Zugleich online verffentlicht auf dem Publikationsserver der Universitt Potsdam: URN urn:nbn:de:kobv:517-opus4-79535 http://nbn-resolving.de/urn:nbn:de:kobv:517-opus4-79535

    mailto:[email protected]

  • Graph databases provide a natural way of storing and querying graph data.In contrast to relational databases, queries over graph databases enable to referdirectly to the graph structure of such graph data. For example, graph patternmatching can be employed to formulate queries over graph data.

    However, as for relational databases running complex queries can be very time-consuming and ruin the interactivity with the database. One possible approachto deal with this performance issue is to employ database views that consist ofpre-computed answers to common and often stated queries. But to ensure thatdatabase views yield consistent query results in comparison with the data fromwhich they are derived, these database views must be updated before queriesmake use of these database views. Such a maintenance of database views must beperformed efficiently, otherwise the effort to create and maintain views may notpay off in comparison to processing the queries directly on the data from whichthe database views are derived.

    At the time of writing, graph databases do not support database views and arelimited to graph indexes that index nodes and edges of the graph data for fastquery evaluation, but do not enable to maintain pre-computed answers of complexqueries over graph data. Moreover, the maintenance of database views in graphdatabases becomes even more challenging when negation and recursion have to besupported as in deductive relational databases.

    In this technical report, we present an approach for the efficient and scalableincremental graph view maintenance for deductive graph databases. The mainconcept of our approach is a generalized discrimination network that enablesto model nested graph conditions including negative application conditions andrecursion, which specify the content of graph views derived from graph data storedby graph databases. The discrimination network enables to automatically derivegeneric maintenance rules using graph transformations for maintaining graphviews in case the graph data from which the graph views are derived change. Weevaluate our approach in terms of a case study using multiple data sets derivedfrom open source projects.

    5

  • Contents

    1. Introduction 91.1. State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2. Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3. Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2. Needs and Requirements 172.1. Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3. Overview 203.1. Graph Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2. View Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3. View Maintenance Engine . . . . . . . . . . . . . . . . . . . . . . . . . 243.4. Query Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    4. View Definition Approach 274.1. View Reference Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2. View Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3. View Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4. View Query Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.5. View Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.6. Mapping Nested Conditions to View Models . . . . . . . . . . . . . . 354.7. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    5. Efficient and Scalable View Graph Maintenance 435.1. Traversing View Models . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2. Naive Batch Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 455.3. Batch Maintenance with Preservation . . . . . . . . . . . . . . . . . . 465.4. Incremental Black Box Maintenance . . . . . . . . . . . . . . . . . . . 505.5. Incremental White Box Maintenance . . . . . . . . . . . . . . . . . . . 555.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    7

  • Contents

    6. Evaluation 586.1. Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2. Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.3. Evaluation Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.4. Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    7. Related Work 697.1. Discrimination Networks . . . . . . . . . . . . . . . . . . . . . . . . . 697.2. Database View Maintenance . . . . . . . . . . . . . . . . . . . . . . . . 747.3. Graph Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.4. Graph Querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.5. Incremental Model-Driven Engineering . . . . . . . . . . . . . . . . . 887.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    8. Conclusion and Future Work 93

    References 94

    A. Metamodel 104

    B. View Graph Maintenance Algorithms 108B.1. Naive Batch Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 108B.2. Batch Maintenance with Preservation . . . . . . . . . . . . . . . . . . 110B.3. Incremental Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 112

    C. View Modules for Design Pattern Recovery 121

    8

  • 1. Introduction

    Nowadays, graph data is ubiquitous and browsing these graph data is an elemen-tary task to work with graph data. For example, users in social networks and theirrelationships constitute a graph and a query that answers the cause of friendshipfor two companioned users is an interesting and also complex query. Anotherexample is the domain of software engineering where abstract syntax graphs ofsource code and models are queried for, e.g., employed software design patterns asdefined by Gamma et al. [23] to investigate software architectures or recommendrefactorings as proposed by Fowler [22] to improve the source code. Also queriesbetween graph data with different schemes are stated in practice. For example,searching for chains of traceability links between graphs that represent require-ment documents (e.g., SysML requirement models [65]), abstract syntax graphs ofmodels (e.g., UML class models [64]), and source code (e.g., Java source code).

    In practice, graph databases provide a natural way of storing and queryinggraph data. One advantage of this fact is that queries that process graph data canrefer directly to this graph structure. For example, graph pattern matching can beemployed to formulate queries over graph data. However, graph pattern matchingcan be very time-consuming when the size of the graph data increases to a largenumber of nodes and edges. For example, subgraph isomorphism testing used forgraph pattern matching is known to be NP-complete [18].

    Furthermore, as for relational databases running complex queries always fromscratch although only few nodes and edges of the data graphs in the graph databasechanged can be very inefficient. One possible approach to deal with this perfor-mance issue is to employ database views that consist of pre-computed answers tocommon and often stated queries. According to Gupta et al. [32] "a view [...] definesa function from a set of base tables to a derived table" for relational databases. But, toensure that database views yield consistent query results in comparison with thedata from which they are derived database views must be updated before queriesmake use of these database views. Gupta et al. [32] refer to "the process of updating[...] views in response to changes to the underlying data [as] view maintenance". Such amaintenance of views must be performed efficiently, otherwise the effort to createand maintain such database views may