ECMFA 2016 slides

21
Intro Experiment setup Results Stress-Testing Centralised Model Stores Antonio García-Domínguez, Dimitris Kolovos, Konstantinos Barmpis, Ran Wei and Richard Paige University of York, Aston University ECMFA’16 July 6th, 2016 A. García-Domínguez et al. Stress-Testing Centralised Model Stores 1 / 21

Transcript of ECMFA 2016 slides

Page 1: ECMFA 2016 slides

Intro Experiment setup Results

Stress-Testing Centralised Model Stores

Antonio García-Domínguez, Dimitris Kolovos, KonstantinosBarmpis, Ran Wei and Richard Paige

University of York, Aston University

ECMFA’16July 6th, 2016

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 1 / 21

Page 2: ECMFA 2016 slides

Intro Experiment setup Results

Approaches for collaborative modelling

Use file-based models over standard VCSSimple to use, reuses mature VCS (SVN/Git)Large models can be broken up into fragmentsLoss of big picture (no simple way to do model-wide queries)

Use specialized model repositories (e.g. CDO)

Harder to use, proprietary versioning, less widely adoptedModels are directly stored in a databaseQueries are answered from the database

Hawk: solving limitations with file-based VCSMirrors and reconnects fragments into a graph DBQueries are fast, versioning and storage are orthogonal

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 2 / 21

Page 3: ECMFA 2016 slides

Intro Experiment setup Results

Simplified workflow of Hawk

Workflow implemented by HawkHawk uses a monitor to watch over collections of model files:local folders, SVN/Git repos, Eclipse workspaces...If files are changed, graph is updated to mirror their contentsGraph DB can be then queried through local/remote APIs

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 3 / 21

Page 4: ECMFA 2016 slides

Intro Experiment setup Results

Structure of a Hawk indexMetamodel and model on the left side produce graph on the right side

Node types: metamodels, types, instances and filesTwo lookup tables for metamodels and files

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 4 / 21

Page 5: ECMFA 2016 slides

Intro Experiment setup Results

Additional features in Hawk

Indexed attributesCommon scenario: find an Author by nameUsers can tell Hawk to index a type by an attributeEOL queries will reuse index transparently, e.g.“Author.all.select(x | x.name = ’Value’)”

Derived featuresAnother scenario: find Authors with 10+ booksHawk can be told to precompute this and prepare a lookupEOL queries written with the new feature will be sped up, e.g.“Author.all.select(x | x.nBooks >= 10)”

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 5 / 21

Page 6: ECMFA 2016 slides

Intro Experiment setup Results

Model repositories: Eclipse CDO

Pluggable storageCDO can support multiple storage solutionsDB store is the most mature (embedded H2 by default)Other stores include MongoDB, db4o or Objectivity

Caching and queryingCDO provides an EMF Resource implementationResource provides comprehensive generic cachingRemote queries are supported (OCL)

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 6 / 21

Page 7: ECMFA 2016 slides

Intro Experiment setup Results

Comparing remote query APIs in CDO and Hawk

HawkBased on Apache Thrift (JSON / binary formats) + gzipStateless service-oriented API (e.g. “query”, “addRepository”)Client → server: request-responseServer → client: subscribe-publishSupports HTTP(S) and TCP

CDOBased on Eclipse Net4j (binary)Stateful buffer-oriented API (opaque sequences of bytes)Bidirectional communication between client and server:

TCP: persistent connectionHTTP(S): client polls server

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 7 / 21

Page 8: ECMFA 2016 slides

Intro Experiment setup Results

Research questions

Observations about CDO and HawkBoth represent a model as a databaseBoth have remote model querying APIsEach system has made different API design choicesHow do those choices impact query throughput?

QuestionsRQ1: impact of HTTP vs TCP?RQ2: impact of API design?RQ3: impact of caching and indexed/derived attributes?

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 8 / 21

Page 9: ECMFA 2016 slides

Intro Experiment setup Results Network Queries

Experiment setup: systems used

ObservationsCDO and Hawk used same hardware, same version of Eclipse(Mars), same HTTP server (Jetty) and memory (4GiB)Only one of CDO or Hawk ran at a timeController manages clients and collects results through SSH

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 9 / 21

Page 10: ECMFA 2016 slides

Intro Experiment setup Results Network Queries

Experiment setup: workload

Model used: set4 from GraBaTs 2009Reverse engineered from Eclipse JDT source codeContains 4.9M elements: 677MB XMI file1.4GB in CDO (H2 database)1.9GB in Hawk (Neo4j graph)

Workload configurationsServers are “warmed up” to a steady state firstLightest workload: 1 machine runs 1000 queries over 1 threadRest: 2 machines, each runs 500 queries over 2–32 threads

MeasurementsTime to connect + query + retrieve element IDsRefer to paper for notched box plots and statistical tests

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 10 / 21

Page 11: ECMFA 2016 slides

Intro Experiment setup Results Network Queries

Queries: OCL

Listing 1: OQ: GraBaTs query in OCL for evaluating CDO1 DOM::TypeDeclaration.allInstances()→select(td |2 td.bodyDeclarations→selectByKind(DOM::MethodDeclaration)3 →exists(md : DOM::MethodDeclaration |4 md.modifiers5 →selectByKind(DOM::Modifier)6 →exists(mod : DOM::Modifier | mod.public)7 and md.modifiers8 →selectByKind(DOM::Modifier)9 →exists(mod : DOM::Modifier | mod.static)

10 and md.returnType.oclIsTypeOf(DOM::SimpleType)11 and md.returnType.oclAsType(DOM::SimpleType).name.fullyQualifiedName12 = td.name.fullyQualifiedName))

Summary

Finds all possible singletons (returned from a static and publicmethod within the same type).

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 11 / 21

Page 12: ECMFA 2016 slides

Intro Experiment setup Results Network Queries

Queries: basic EOL

Listing 2: HQ1: translation of OQ to EOL for evaluating Hawk1 return TypeDeclaration.all.select(td|2 td.bodyDeclarations.exists(md:MethodDeclaration|3 md.returnType.isTypeOf(SimpleType)4 and md.returnType.name.fullyQualifiedName == td.name.fullyQualifiedName5 and md.modifiers.exists(mod:Modifier|mod.public==true)6 and md.modifiers.exists(mod:Modifier|mod.static==true)));

SummaryDirect translation of the OCL query.

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 12 / 21

Page 13: ECMFA 2016 slides

Intro Experiment setup Results Network Queries

Queries: EOL + extended MethodDeclarations

Listing 3: HQ2: HQ1 using derived attributes on MethodDeclaration1 return MethodDeclaration.all.select(md |2 md.isPublic and md.isStatic and md.isSameReturnType3 ).collect( td | td.eContainer ).asSet;

Better approachTell Hawk to extend MethodDeclaration with “isPublic”,“isStatic” and “isSameReturnType”Perform lookup for the relevant MethodDeclarationsRetrieve the set of TypeDeclarations that contain them

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 13 / 21

Page 14: ECMFA 2016 slides

Intro Experiment setup Results Network Queries

Queries: EOL + extended TypeDeclarations

Listing 4: HQ2: HQ1 using derived attributes on TypeDeclaration1 return TypeDeclaration.all.select(td|td.isSingleton);

Even better approachTell Hawk to extend TypeDeclaration with “isSingleton”Perform lookup for the relevant TypeDeclarations directly

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 14 / 21

Page 15: ECMFA 2016 slides

Intro Experiment setup Results

RQ1: protocol impact (CDO)HTTP degrades CDO noticeably

1 2 4 8 16 32 640

1

2

3

4

5·104

Client threads

Medianrespon

setim

e(m

s)

TCPHTTP

1 2 4 8 16 32 640

5

10

15

20

25

Client threads

Failedqu

eries(C

DO+HTTP)

HTTP woes635.66% hit for 1 client, still noticeable for 2 and 4Slight chance of errors or incorrect results for 4+ threads

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 15 / 21

Page 16: ECMFA 2016 slides

Intro Experiment setup Results

RQ1: protocol impact (Hawk)HTTP hit is consistent for Hawk

1 2 4 8 16 32 640

1

2

3

4

5·104

Client threads

Medianrespon

setim

e(m

s)

TCPHTTP

Hawk+HTTP has a roughly consistent 20% performance hitNo failed queries and no incorrect query results

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 16 / 21

Page 17: ECMFA 2016 slides

Intro Experiment setup Results

RQ2: API design impactPacket traces with Wireshark explain HTTP results

CDO trace: 58 packets (10.2kB)

Session setup → query setup → 6s of silence → resultsConclusion: CDO+HTTP uses regular polling for server-clientcommunication, and CDO reports results asynchronouslyIntroduces delay, breaks down for many clientsSuggestion: long polling / WebSockets instead?

Hawk trace: 14 packets (2.8kB)

Single request/response pair (no session/query setup)Simple and reliable for small result setsMay have problems transmitting large result setsSuggestion: optional async query API (pub-sub)

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 17 / 21

Page 18: ECMFA 2016 slides

Intro Experiment setup Results

RQ3: impact of internals

1 2 4 8 16 32 64

102

103

104

Client threads

Medianrespon

setim

e(m

s)CDO + OCL

Hawk + EOL, basicHawk + EOL, isPublic

Hawk + EOL, isSingleton

CDO has more extensive generic caching than Hawk: e.g. SQLlog shows it caches “X.all” in memory (Hawk uses DB cache)Hawk outperforms CDO by 10x–100x with derived attributes(replaces iteration with lookups + set intersections)

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 18 / 21

Page 19: ECMFA 2016 slides

Intro Experiment setup Results

What would be my ideal API?

Service-oriented, sync+async sidesService orientation makes third party integration easierSynchronous req/resp: simple operations, small queriesAsynchronous pub/sub: complex operations, large queriesSync API can set up async operations

Flexible encoding with transparent compressionProvide multiple encodings through code generationTransparent gzip compression is easy to integrateNote: HTTP fields didn’t add that much overhead (20%)

Internals for faster queries

Uncommon queries: extensive caching (as in CDO)Common queries: query-specific indices (as in Hawk)

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 19 / 21

Page 20: ECMFA 2016 slides

Intro Experiment setup Results

Conclusions and future work

SummaryIn collaborative modelling, many users will query the samemodels repeatedly to arrive at shared answersCDO and Hawk implement remote querying very differentlyFrom our results, we have suggested what an ideal remotequery API would be like

Future workWider assortment of queries (e.g. ones that exercise largerportions of the models or produce large result sets)Extend the range of configurations (tools, stores)Analysing remote queries to offload tasks to client

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 20 / 21

Page 21: ECMFA 2016 slides

Intro Experiment setup Results

End of the presentation

Questions?

@antoniogado

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 21 / 21