ECMFA 2016 slides
-
Upload
antonio-garcia-dominguez -
Category
Software
-
view
177 -
download
0
Transcript of ECMFA 2016 slides
Intro Experiment setup Results
Stress-Testing Centralised Model Stores
Antonio García-Domínguez, Dimitris Kolovos, KonstantinosBarmpis, Ran Wei and Richard Paige
University of York, Aston University
ECMFA’16July 6th, 2016
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 1 / 21
Intro Experiment setup Results
Approaches for collaborative modelling
Use file-based models over standard VCSSimple to use, reuses mature VCS (SVN/Git)Large models can be broken up into fragmentsLoss of big picture (no simple way to do model-wide queries)
Use specialized model repositories (e.g. CDO)
Harder to use, proprietary versioning, less widely adoptedModels are directly stored in a databaseQueries are answered from the database
Hawk: solving limitations with file-based VCSMirrors and reconnects fragments into a graph DBQueries are fast, versioning and storage are orthogonal
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 2 / 21
Intro Experiment setup Results
Simplified workflow of Hawk
Workflow implemented by HawkHawk uses a monitor to watch over collections of model files:local folders, SVN/Git repos, Eclipse workspaces...If files are changed, graph is updated to mirror their contentsGraph DB can be then queried through local/remote APIs
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 3 / 21
Intro Experiment setup Results
Structure of a Hawk indexMetamodel and model on the left side produce graph on the right side
Node types: metamodels, types, instances and filesTwo lookup tables for metamodels and files
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 4 / 21
Intro Experiment setup Results
Additional features in Hawk
Indexed attributesCommon scenario: find an Author by nameUsers can tell Hawk to index a type by an attributeEOL queries will reuse index transparently, e.g.“Author.all.select(x | x.name = ’Value’)”
Derived featuresAnother scenario: find Authors with 10+ booksHawk can be told to precompute this and prepare a lookupEOL queries written with the new feature will be sped up, e.g.“Author.all.select(x | x.nBooks >= 10)”
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 5 / 21
Intro Experiment setup Results
Model repositories: Eclipse CDO
Pluggable storageCDO can support multiple storage solutionsDB store is the most mature (embedded H2 by default)Other stores include MongoDB, db4o or Objectivity
Caching and queryingCDO provides an EMF Resource implementationResource provides comprehensive generic cachingRemote queries are supported (OCL)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 6 / 21
Intro Experiment setup Results
Comparing remote query APIs in CDO and Hawk
HawkBased on Apache Thrift (JSON / binary formats) + gzipStateless service-oriented API (e.g. “query”, “addRepository”)Client → server: request-responseServer → client: subscribe-publishSupports HTTP(S) and TCP
CDOBased on Eclipse Net4j (binary)Stateful buffer-oriented API (opaque sequences of bytes)Bidirectional communication between client and server:
TCP: persistent connectionHTTP(S): client polls server
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 7 / 21
Intro Experiment setup Results
Research questions
Observations about CDO and HawkBoth represent a model as a databaseBoth have remote model querying APIsEach system has made different API design choicesHow do those choices impact query throughput?
QuestionsRQ1: impact of HTTP vs TCP?RQ2: impact of API design?RQ3: impact of caching and indexed/derived attributes?
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 8 / 21
Intro Experiment setup Results Network Queries
Experiment setup: systems used
ObservationsCDO and Hawk used same hardware, same version of Eclipse(Mars), same HTTP server (Jetty) and memory (4GiB)Only one of CDO or Hawk ran at a timeController manages clients and collects results through SSH
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 9 / 21
Intro Experiment setup Results Network Queries
Experiment setup: workload
Model used: set4 from GraBaTs 2009Reverse engineered from Eclipse JDT source codeContains 4.9M elements: 677MB XMI file1.4GB in CDO (H2 database)1.9GB in Hawk (Neo4j graph)
Workload configurationsServers are “warmed up” to a steady state firstLightest workload: 1 machine runs 1000 queries over 1 threadRest: 2 machines, each runs 500 queries over 2–32 threads
MeasurementsTime to connect + query + retrieve element IDsRefer to paper for notched box plots and statistical tests
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 10 / 21
Intro Experiment setup Results Network Queries
Queries: OCL
Listing 1: OQ: GraBaTs query in OCL for evaluating CDO1 DOM::TypeDeclaration.allInstances()→select(td |2 td.bodyDeclarations→selectByKind(DOM::MethodDeclaration)3 →exists(md : DOM::MethodDeclaration |4 md.modifiers5 →selectByKind(DOM::Modifier)6 →exists(mod : DOM::Modifier | mod.public)7 and md.modifiers8 →selectByKind(DOM::Modifier)9 →exists(mod : DOM::Modifier | mod.static)
10 and md.returnType.oclIsTypeOf(DOM::SimpleType)11 and md.returnType.oclAsType(DOM::SimpleType).name.fullyQualifiedName12 = td.name.fullyQualifiedName))
Summary
Finds all possible singletons (returned from a static and publicmethod within the same type).
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 11 / 21
Intro Experiment setup Results Network Queries
Queries: basic EOL
Listing 2: HQ1: translation of OQ to EOL for evaluating Hawk1 return TypeDeclaration.all.select(td|2 td.bodyDeclarations.exists(md:MethodDeclaration|3 md.returnType.isTypeOf(SimpleType)4 and md.returnType.name.fullyQualifiedName == td.name.fullyQualifiedName5 and md.modifiers.exists(mod:Modifier|mod.public==true)6 and md.modifiers.exists(mod:Modifier|mod.static==true)));
SummaryDirect translation of the OCL query.
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 12 / 21
Intro Experiment setup Results Network Queries
Queries: EOL + extended MethodDeclarations
Listing 3: HQ2: HQ1 using derived attributes on MethodDeclaration1 return MethodDeclaration.all.select(md |2 md.isPublic and md.isStatic and md.isSameReturnType3 ).collect( td | td.eContainer ).asSet;
Better approachTell Hawk to extend MethodDeclaration with “isPublic”,“isStatic” and “isSameReturnType”Perform lookup for the relevant MethodDeclarationsRetrieve the set of TypeDeclarations that contain them
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 13 / 21
Intro Experiment setup Results Network Queries
Queries: EOL + extended TypeDeclarations
Listing 4: HQ2: HQ1 using derived attributes on TypeDeclaration1 return TypeDeclaration.all.select(td|td.isSingleton);
Even better approachTell Hawk to extend TypeDeclaration with “isSingleton”Perform lookup for the relevant TypeDeclarations directly
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 14 / 21
Intro Experiment setup Results
RQ1: protocol impact (CDO)HTTP degrades CDO noticeably
1 2 4 8 16 32 640
1
2
3
4
5·104
Client threads
Medianrespon
setim
e(m
s)
TCPHTTP
1 2 4 8 16 32 640
5
10
15
20
25
Client threads
Failedqu
eries(C
DO+HTTP)
HTTP woes635.66% hit for 1 client, still noticeable for 2 and 4Slight chance of errors or incorrect results for 4+ threads
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 15 / 21
Intro Experiment setup Results
RQ1: protocol impact (Hawk)HTTP hit is consistent for Hawk
1 2 4 8 16 32 640
1
2
3
4
5·104
Client threads
Medianrespon
setim
e(m
s)
TCPHTTP
Hawk+HTTP has a roughly consistent 20% performance hitNo failed queries and no incorrect query results
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 16 / 21
Intro Experiment setup Results
RQ2: API design impactPacket traces with Wireshark explain HTTP results
CDO trace: 58 packets (10.2kB)
Session setup → query setup → 6s of silence → resultsConclusion: CDO+HTTP uses regular polling for server-clientcommunication, and CDO reports results asynchronouslyIntroduces delay, breaks down for many clientsSuggestion: long polling / WebSockets instead?
Hawk trace: 14 packets (2.8kB)
Single request/response pair (no session/query setup)Simple and reliable for small result setsMay have problems transmitting large result setsSuggestion: optional async query API (pub-sub)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 17 / 21
Intro Experiment setup Results
RQ3: impact of internals
1 2 4 8 16 32 64
102
103
104
Client threads
Medianrespon
setim
e(m
s)CDO + OCL
Hawk + EOL, basicHawk + EOL, isPublic
Hawk + EOL, isSingleton
CDO has more extensive generic caching than Hawk: e.g. SQLlog shows it caches “X.all” in memory (Hawk uses DB cache)Hawk outperforms CDO by 10x–100x with derived attributes(replaces iteration with lookups + set intersections)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 18 / 21
Intro Experiment setup Results
What would be my ideal API?
Service-oriented, sync+async sidesService orientation makes third party integration easierSynchronous req/resp: simple operations, small queriesAsynchronous pub/sub: complex operations, large queriesSync API can set up async operations
Flexible encoding with transparent compressionProvide multiple encodings through code generationTransparent gzip compression is easy to integrateNote: HTTP fields didn’t add that much overhead (20%)
Internals for faster queries
Uncommon queries: extensive caching (as in CDO)Common queries: query-specific indices (as in Hawk)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 19 / 21
Intro Experiment setup Results
Conclusions and future work
SummaryIn collaborative modelling, many users will query the samemodels repeatedly to arrive at shared answersCDO and Hawk implement remote querying very differentlyFrom our results, we have suggested what an ideal remotequery API would be like
Future workWider assortment of queries (e.g. ones that exercise largerportions of the models or produce large result sets)Extend the range of configurations (tools, stores)Analysing remote queries to offload tasks to client
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 20 / 21
Intro Experiment setup Results
End of the presentation
Questions?
@antoniogado
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 21 / 21