ECMFA 2016 slides

Intro Experiment setup Results

Stress-Testing Centralised Model Stores

Antonio García-Domínguez, Dimitris Kolovos, KonstantinosBarmpis, Ran Wei and Richard Paige

University of York, Aston University

ECMFA’16July 6th, 2016

A. García-Domínguez et al. Stress-Testing Centralised Model Stores 1 / 21


Approaches for collaborative modelling

Use file-based models over standard VCSSimple to use, reuses mature VCS (SVN/Git)Large models can be broken up into fragmentsLoss of big picture (no simple way to do model-wide queries)

Use specialized model repositories (e.g. CDO)

Harder to use, proprietary versioning, less widely adoptedModels are directly stored in a databaseQueries are answered from the database

Hawk: solving limitations with file-based VCSMirrors and reconnects fragments into a graph DBQueries are fast, versioning and storage are orthogonal



Simplified workflow of Hawk

Workflow implemented by HawkHawk uses a monitor to watch over collections of model files:local folders, SVN/Git repos, Eclipse workspaces...If files are changed, graph is updated to mirror their contentsGraph DB can be then queried through local/remote APIs



Structure of a Hawk indexMetamodel and model on the left side produce graph on the right side

Node types: metamodels, types, instances and filesTwo lookup tables for metamodels and files



Additional features in Hawk

Indexed attributesCommon scenario: find an Author by nameUsers can tell Hawk to index a type by an attributeEOL queries will reuse index transparently, e.g.“Author.all.select(x | x.name = ’Value’)”

Derived featuresAnother scenario: find Authors with 10+ booksHawk can be told to precompute this and prepare a lookupEOL queries written with the new feature will be sped up, e.g.“Author.all.select(x | x.nBooks >= 10)”



Model repositories: Eclipse CDO

Pluggable storageCDO can support multiple storage solutionsDB store is the most mature (embedded H2 by default)Other stores include MongoDB, db4o or Objectivity

Caching and queryingCDO provides an EMF Resource implementationResource provides comprehensive generic cachingRemote queries are supported (OCL)



Comparing remote query APIs in CDO and Hawk

HawkBased on Apache Thrift (JSON / binary formats) + gzipStateless service-oriented API (e.g. “query”, “addRepository”)Client → server: request-responseServer → client: subscribe-publishSupports HTTP(S) and TCP

CDOBased on Eclipse Net4j (binary)Stateful buffer-oriented API (opaque sequences of bytes)Bidirectional communication between client and server:

TCP: persistent connectionHTTP(S): client polls server



Research questions

Observations about CDO and HawkBoth represent a model as a databaseBoth have remote model querying APIsEach system has made different API design choicesHow do those choices impact query throughput?

QuestionsRQ1: impact of HTTP vs TCP?RQ2: impact of API design?RQ3: impact of caching and indexed/derived attributes?


Intro Experiment setup Results Network Queries

Experiment setup: systems used

ObservationsCDO and Hawk used same hardware, same version of Eclipse(Mars), same HTTP server (Jetty) and memory (4GiB)Only one of CDO or Hawk ran at a timeController manages clients and collects results through SSH



Experiment setup: workload

Model used: set4 from GraBaTs 2009Reverse engineered from Eclipse JDT source codeContains 4.9M elements: 677MB XMI file1.4GB in CDO (H2 database)1.9GB in Hawk (Neo4j graph)

Workload configurationsServers are “warmed up” to a steady state firstLightest workload: 1 machine runs 1000 queries over 1 threadRest: 2 machines, each runs 500 queries over 2–32 threads

MeasurementsTime to connect + query + retrieve element IDsRefer to paper for notched box plots and statistical tests



Queries: OCL

Listing 1: OQ: GraBaTs query in OCL for evaluating CDO1 DOM::TypeDeclaration.allInstances()→select(td |2 td.bodyDeclarations→selectByKind(DOM::MethodDeclaration)3 →exists(md : DOM::MethodDeclaration |4 md.modifiers5 →selectByKind(DOM::Modifier)6 →exists(mod : DOM::Modifier | mod.public)7 and md.modifiers8 →selectByKind(DOM::Modifier)9 →exists(mod : DOM::Modifier | mod.static)

10 and md.returnType.oclIsTypeOf(DOM::SimpleType)11 and md.returnType.oclAsType(DOM::SimpleType).name.fullyQualifiedName12 = td.name.fullyQualifiedName))

Summary

Finds all possible singletons (returned from a static and publicmethod within the same type).



Queries: basic EOL

Listing 2: HQ1: translation of OQ to EOL for evaluating Hawk1 return TypeDeclaration.all.select(td|2 td.bodyDeclarations.exists(md:MethodDeclaration|3 md.returnType.isTypeOf(SimpleType)4 and md.returnType.name.fullyQualifiedName == td.name.fullyQualifiedName5 and md.modifiers.exists(mod:Modifier|mod.public==true)6 and md.modifiers.exists(mod:Modifier|mod.static==true)));

SummaryDirect translation of the OCL query.



Queries: EOL + extended MethodDeclarations

Listing 3: HQ2: HQ1 using derived attributes on MethodDeclaration1 return MethodDeclaration.all.select(md |2 md.isPublic and md.isStatic and md.isSameReturnType3 ).collect( td | td.eContainer ).asSet;

Better approachTell Hawk to extend MethodDeclaration with “isPublic”,“isStatic” and “isSameReturnType”Perform lookup for the relevant MethodDeclarationsRetrieve the set of TypeDeclarations that contain them



Queries: EOL + extended TypeDeclarations

Listing 4: HQ2: HQ1 using derived attributes on TypeDeclaration1 return TypeDeclaration.all.select(td|td.isSingleton);

Even better approachTell Hawk to extend TypeDeclaration with “isSingleton”Perform lookup for the relevant TypeDeclarations directly



RQ1: protocol impact (CDO)HTTP degrades CDO noticeably

1 2 4 8 16 32 640

1

2

3

4

5·104

Client threads

Medianrespon

setim

e(m

s)

TCPHTTP

1 2 4 8 16 32 640

5

10

15

20

25

Client threads

Failedqu

eries(C

DO+HTTP)

HTTP woes635.66% hit for 1 client, still noticeable for 2 and 4Slight chance of errors or incorrect results for 4+ threads



RQ1: protocol impact (Hawk)HTTP hit is consistent for Hawk

1 2 4 8 16 32 640

1

2

3

4

5·104

Client threads

Medianrespon

setim

e(m

s)

TCPHTTP

Hawk+HTTP has a roughly consistent 20% performance hitNo failed queries and no incorrect query results



RQ2: API design impactPacket traces with Wireshark explain HTTP results

CDO trace: 58 packets (10.2kB)

Session setup → query setup → 6s of silence → resultsConclusion: CDO+HTTP uses regular polling for server-clientcommunication, and CDO reports results asynchronouslyIntroduces delay, breaks down for many clientsSuggestion: long polling / WebSockets instead?

Hawk trace: 14 packets (2.8kB)

Single request/response pair (no session/query setup)Simple and reliable for small result setsMay have problems transmitting large result setsSuggestion: optional async query API (pub-sub)



RQ3: impact of internals

1 2 4 8 16 32 64

102

103

104

Client threads

Medianrespon

setim

e(m

s)CDO + OCL

Hawk + EOL, basicHawk + EOL, isPublic

Hawk + EOL, isSingleton

CDO has more extensive generic caching than Hawk: e.g. SQLlog shows it caches “X.all” in memory (Hawk uses DB cache)Hawk outperforms CDO by 10x–100x with derived attributes(replaces iteration with lookups + set intersections)



What would be my ideal API?

Service-oriented, sync+async sidesService orientation makes third party integration easierSynchronous req/resp: simple operations, small queriesAsynchronous pub/sub: complex operations, large queriesSync API can set up async operations

Flexible encoding with transparent compressionProvide multiple encodings through code generationTransparent gzip compression is easy to integrateNote: HTTP fields didn’t add that much overhead (20%)

Internals for faster queries

Uncommon queries: extensive caching (as in CDO)Common queries: query-specific indices (as in Hawk)



Conclusions and future work

SummaryIn collaborative modelling, many users will query the samemodels repeatedly to arrive at shared answersCDO and Hawk implement remote querying very differentlyFrom our results, we have suggested what an ideal remotequery API would be like

Future workWider assortment of queries (e.g. ones that exercise largerportions of the models or produce large result sets)Extend the range of configurations (tools, stores)Analysing remote queries to offload tasks to client



End of the presentation

Questions?

@antoniogado


ECMFA 2016 slides

Software

Transcript of ECMFA 2016 slides