IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 Automated Synthesis and Dynamic Analysis...

20
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 Automated Synthesis and Dynamic Analysis of Tradeoff Spaces for Object-Relational Mapping Hamid Bagheri, Chong Tang, and Kevin Sullivan Abstract—Producing software systems that achieve acceptable tradeoffs among multiple non-functional properties remains a significant engineering problem. We propose an approach to solving this problem that combines synthesis of spaces of design alternatives from logical specifications and dynamic analysis of each point in the resulting spaces. We hypothesize that this approach has potential to help engineers understand important tradeoffs among dynamically measurable properties of system components at meaningful scales within reach of existing synthesis tools. To test this hypothesis, we developed tools to enable, and we conducted, a set of experiments in the domain of relational databases for object-oriented data models. For each of several data models, we used our approach to empirically test the accuracy of a published suite of metrics to predict tradeoffs based on the static schema structure alone. The results show that exhaustive synthesis and analysis provides a superior view of the tradeoff spaces for such designs. This work creates a path forward toward systems that achieve significantly better tradeoffs among important system properties. Index Terms—Specification-driven Synthesis, Tradespace Analysis, ORM, Static Analysis, Dynamic Analysis, Relational logic. 1 I NTRODUCTION When the consequences of variations in design decisions leading to system implementations are unclear, it is im- portant to conduct tradeoff studies. Design space explo- ration and tradeoff analysis can help decision-makers by revealing designs that people might miss [4], [24], [37], illuminating sensible and non-sensical tradeoffs, and helping decision-makers to balance tradeoffs that design decisions impose on diverse stakeholders. Ultimately it can provide evidence in support of principled decisions about which path or paths to pursue toward a realized system. Such studies can be done at different modeling and measurement granularities, for whole systems or individual components, and at many points in system development and evolution, and even runtime. Yet, today, systematic tradeoff analysis remains rare. Instead, developers are usually trusted to use design heuristics, tacit knowledge, and other such methods in developing point solutions that, it is hoped, will be good enough for stakeholders in all key dimensions. Similarly, when tools automatically produce imple- mentations, they often use single-point strategies. Con- sider object-relational mapping (ORM) tools, now pro- vided in many programming environments. They map object-oriented data models to relational schemas and code for managing application data. They often use a single mapping strategy, and do not help engineers to understand available solutions or the tradeoffs in time and space performance, evolvability, etc. that they entail. In practice, systematic tradespace analysis is hard. Generating large numbers of complex variants by hand is often impractical. Spaces of variants can be huge, even for modest specifications. Property estimation functions can be hard to specify, validate, implement and compute, and can vary greatly in accuracy. In this paper, we present an approach to solving this problem that combines synthesis of spaces of design alternatives and dynamic analysis of each point in the resulting spaces. The approach relies on specification- driven synthesis of both design spaces and test loads for comparative, dynamic analysis of non-functional prop- erties of variant designs across such spaces. We use formal notations, namely domain-specific modeling languages embedded in a relational logic, to specify design spaces. We then equip each language with a formal semantics in the form of a set of “non- deterministic mapping relations,” which we also express in the underlying relational logic. The key idea is that these rules map a given model to a broad space of possible solutions, which vary in dimensions built into the semantics, and which are rooted in the given domain of discourse. We then subject synthesized designs (in the semantic domain) to dynamic measurement in multiple dimensions of performance under the synthesized loads. The result is a multi-dimensional, empirical characteri- zation of the tradespace for the given model. We hypothesize that this approach has potential to help engineers understand important tradeoffs among dynamically measurable properties for important classes of software designs, at meaningful scales, within reach of existing synthesis tools. To test this hypothesis, we developed tools to enable a set of experiments in the domain of relational databases for object-oriented data models. For each of several data models, we used our approach to empirically charac- terize tradeoffs among designs predicted by previously published metrics based on model (namely SQL schema) structure as opposed to dynamic profiling. The designs we assessed were predicted to be high performing in the sense of being Pareto optimal in the dimensions measured. We then used our dynamic analysis results

Transcript of IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 Automated Synthesis and Dynamic Analysis...

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1

    Automated Synthesis and Dynamic Analysis ofTradeoff Spaces for Object-Relational Mapping

    Hamid Bagheri, Chong Tang, and Kevin Sullivan

    Abstract—Producing software systems that achieve acceptable tradeoffs among multiple non-functional properties remains asignificant engineering problem. We propose an approach to solving this problem that combines synthesis of spaces of designalternatives from logical specifications and dynamic analysis of each point in the resulting spaces. We hypothesize that this approachhas potential to help engineers understand important tradeoffs among dynamically measurable properties of system components atmeaningful scales within reach of existing synthesis tools. To test this hypothesis, we developed tools to enable, and we conducted, aset of experiments in the domain of relational databases for object-oriented data models. For each of several data models, we usedour approach to empirically test the accuracy of a published suite of metrics to predict tradeoffs based on the static schema structurealone. The results show that exhaustive synthesis and analysis provides a superior view of the tradeoff spaces for such designs. Thiswork creates a path forward toward systems that achieve significantly better tradeoffs among important system properties.

    Index Terms—Specification-driven Synthesis, Tradespace Analysis, ORM, Static Analysis, Dynamic Analysis, Relational logic.

    F

    1 INTRODUCTION

    When the consequences of variations in design decisionsleading to system implementations are unclear, it is im-portant to conduct tradeoff studies. Design space explo-ration and tradeoff analysis can help decision-makers byrevealing designs that people might miss [4], [24], [37],illuminating sensible and non-sensical tradeoffs, andhelping decision-makers to balance tradeoffs that designdecisions impose on diverse stakeholders. Ultimately itcan provide evidence in support of principled decisionsabout which path or paths to pursue toward a realizedsystem. Such studies can be done at different modelingand measurement granularities, for whole systems orindividual components, and at many points in systemdevelopment and evolution, and even runtime.

    Yet, today, systematic tradeoff analysis remains rare.Instead, developers are usually trusted to use designheuristics, tacit knowledge, and other such methods indeveloping point solutions that, it is hoped, will be goodenough for stakeholders in all key dimensions.

    Similarly, when tools automatically produce imple-mentations, they often use single-point strategies. Con-sider object-relational mapping (ORM) tools, now pro-vided in many programming environments. They mapobject-oriented data models to relational schemas andcode for managing application data. They often use asingle mapping strategy, and do not help engineers tounderstand available solutions or the tradeoffs in timeand space performance, evolvability, etc. that they entail.

    In practice, systematic tradespace analysis is hard.Generating large numbers of complex variants by handis often impractical. Spaces of variants can be huge, evenfor modest specifications. Property estimation functionscan be hard to specify, validate, implement and compute,and can vary greatly in accuracy.

    In this paper, we present an approach to solving thisproblem that combines synthesis of spaces of designalternatives and dynamic analysis of each point in theresulting spaces. The approach relies on specification-driven synthesis of both design spaces and test loads forcomparative, dynamic analysis of non-functional prop-erties of variant designs across such spaces.

    We use formal notations, namely domain-specificmodeling languages embedded in a relational logic, tospecify design spaces. We then equip each languagewith a formal semantics in the form of a set of “non-deterministic mapping relations,” which we also expressin the underlying relational logic. The key idea is thatthese rules map a given model to a broad space ofpossible solutions, which vary in dimensions built intothe semantics, and which are rooted in the given domainof discourse. We then subject synthesized designs (in thesemantic domain) to dynamic measurement in multipledimensions of performance under the synthesized loads.The result is a multi-dimensional, empirical characteri-zation of the tradespace for the given model.

    We hypothesize that this approach has potential tohelp engineers understand important tradeoffs amongdynamically measurable properties for important classesof software designs, at meaningful scales, within reachof existing synthesis tools.

    To test this hypothesis, we developed tools to enable aset of experiments in the domain of relational databasesfor object-oriented data models. For each of several datamodels, we used our approach to empirically charac-terize tradeoffs among designs predicted by previouslypublished metrics based on model (namely SQL schema)structure as opposed to dynamic profiling. The designswe assessed were predicted to be high performing inthe sense of being Pareto optimal in the dimensionsmeasured. We then used our dynamic analysis results

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2

    to statistically assess the predictive accuracy of the pre-viously published “static” metrics.

    These static analysis functions estimate various prop-erties from the static structures of relational schemas.Such functions can be computed nearly instantaneously,but their accuracy had not previously been assessed.Their suitability as a basis for making tradeoff decisionswas thus unclear.

    To summarize, the main contribution of this paperis an experimental demonstration that our approachto specification-driven synthesis and tradeoff analysishas the potential to better inform tradeoff decisionsfor important components of modern software systems.At a finer grain, this paper makes several componentcontributions:• Formal model of object-relational mapping strategies: We

    develop what is to our knowledge the first formal-ization of fine-grained and mixed ORM strategiesby means of mapping functions. We construct thisformal specification in Alloy’s relational logic [30].

    • Tradespace synthesis: We contribute a fully automatedapproach showing how to exploit the power of ourformal specification for derivation of formally pre-cise and application-specific ORM tradeoff spaces.

    • Dynamic analysis of synthesized tradespaces: We mea-sure properties of synthesized database designs bysynthesizing, from the same object models, schema-specific test loads and by then profiling databaseperformance under these synthesized loads. Wehave not yet explored systematically the impact ofvariation in load characteristics as an independentvariable, but our approach would in principle allowfor such analysis.

    • Trademaker tool implementation: We develop Trade-maker as a web-accessible tool, backed with ourformal analysis engine, for specification-driven syn-thesis of ORM tradespaces. We make Trademakeravailable to the research and education community1.

    • Experimental validation of published ORM metrics: Wepresent results from experiments run on severalcase studies adopted from different domains, cor-roborating inaccuracy of published static metrics forpredicting tradeoffs based on the schema structuresalone. Data collected from the experiments furthershow that exhaustive synthesis and analysis pro-vides a superior view of the tradeoff spaces for suchdesigns.

    The rest of this paper is organized as follows. Sec-tion 2 presents object-relational mapping as a concretedriving problem. Section 3 presents the overview ofour approach. Sections 4–7 describe the details of ourapproach and its implementation for expression andsynthesis of design spaces and loads. Section 8 and 9report and discuss data from the experimental testingof the approach. Finally, the paper concludes with anoutline of the related research and the future work.

    1. http://www.jazz.cs.virginia.edu:8080/Trademaker

    Fig. 1: A simple object model with three classes, Order,Customer, and PreferredCustomer, a one-to-many as-sociation between Customer and Order, and with Pre-ferredCustomer inheriting from Customer.

    2 DRIVING PROBLEM

    While our long-term aim is to improve engineeringproductivity and quality through advances in designspace science and technology, our short-term researchstrategy is to use the analysis of spaces of object-relational database mappings, in particular, as a tractableand useful driving problem.

    It is common, nowadays, for software applicationswritten in an object-oriented language, such as Java, touse relational databases for persistent storage. Trans-formations between instance models in these twoparadigms yet encounter the so-called object-relationimpedance mismatch problem [29], due to a significant dis-tance between the object-oriented and relational theories.Dedicated object-relational middleware is now widelyused to bridge the gap between object-oriented appli-cations and relational database management systems(RDBMS). The bridge is realized on the basis of a datamapping specification.

    Even if this approach relieves the developer of respon-sibility for the majority of runtime related aspects, e.g.,storing and retrieving persistent objects, the develop-ment of application-specific data mapping specificationsyet remains an inherently difficult and error-prone task.

    In practice, one starts with an object-oriented datamodel (OODM) as in Figure 1 and eventually selects oneof many possible strategies for mapping such a modelto a relational database schema. Figure 2 illustrates threesuch strategies; and Listing 1, a database set-up scriptderived from one of these solutions. The developer isfaced with the challenge of selecting from a wide varietyof mapping strategies available for each class associationand inheritance relationship. These mapping strategieshave varying impacts on the non-functional propertiesof applications, such as time and space performance andevolvability [28], [31], [35].

    Simple ORM solutions, many in everyday use, lockone into point solutions by using a single mapping strat-egy, failing to consider many possible available solutionsor the tradeoffs that they entail [42]. The problem for thedeveloper, then, becomes selecting the mapping strate-gies that best suit the desired non-functional requirementpriorities for a particular application. It requires, among

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 3

    Fig. 2: Three mapping strategies. White boxes representclasses; gray titles, corresponding tables, and black andwhite arrows, mapping and inheritance relationships.Foreign keys are marked as fKeys.

    1 CREATE TABLE ‘ Order ‘ (2 ‘ orderID ‘ i n t ( 1 1 ) NOT NULL,3 ‘ orderValue ‘ i n t ( 1 1 ) ,4 ‘ customerID ‘ i n t ( 1 1 ) ,5 KEY ‘ FK customerID idx ‘ ( ‘ customerID ‘ ) ,6 PRIMARY KEY ( ‘ orderID ‘ )7 ) ;8

    9 CREATE TABLE ‘ Customer ‘ (10 ‘DType ‘ varchar ( 3 1 ) ,11 ‘ discount ‘ i n t ( 1 1 ) ,12 ‘ customerID ‘ i n t ( 1 1 ) NOT NULL,13 ‘ customerName ‘ varchar ( 3 1 ) ,14 PRIMARY KEY ( ‘ customerID ‘ )15 ) ;

    Listing 1: Synthesized MySQL database creation scriptderived from the mapping alternative shown in Fig.2b (elided for space and readability).

    other things, a thorough understanding of both objectand relational paradigms, of large spaces of possiblemappings, and of the tradeoffs involved in makingchoices in these spaces.

    3 APPROACH OVERVIEWThis section provides an overview of our technical ap-proach for tradespace synthesis and dynamic analysis.

    Figure 3 gives a high-level overview of Trademaker.Boxes represent processing modules, and ovals representmodule inputs and outputs. Trademaker takes as inputan application object model. Object models are givenas expressions in AlloyOM, a domain-specific languageembedded in Alloy’s relational logic language [30]. Toan input object model, it applies a design space synthesisfunction that, in essence, implements a non-deterministic

    Fig. 3: Overview of Trademaker.

    semantics to compute the set (or space) of concretedesign variants from which we want to choose a designto achieve desirable tradeoffs. The design space synthesisfunction is realized as a semantic mapping predicatein relational logic, taking expressions in the object datamodeling language to corresponding concrete designspaces in the semantic domain, here relational databasesschemas. A relational logic solver computes the results,which are then transformed into useful forms as SQLdatabase creation scripts (cf. Listing 1).

    To dynamically analyze diverse database designs, weface the challenge that variant designs (such as differentSQL schemas for the same object model) present differ-ent interfaces to the surrounding ORM tool. There are ofcourse many commercial tools for generating databasetesting loads from schemas. In our case, however, suchvariant schemas implement a common object model. Toprovide a fair comparison of designs notwithstandingtheir particular exposed interfaces, a common test loadneeds to be specialized for each particular design. Fullyautomating the specialization of a common test load forfair comparative profiling of varying design solutionsposed an interesting technical challenge.

    Trademaker thus first automatically transforms theobject model into a corresponding load model that drivessynthesis of common test loads, which are essentiallyobject model data instances (OM-instances in Fig. 3). Wecall them common loads because all concrete databaseimplementations would have to handle them. Theyare object-model-level, rather than concrete-database-schema-level. To specialize such synthesized loads foreach particular design, the load concretization moduleexploits the abstraction function linking concrete designsto abstract models, but in reverse, and applied as aconcretization function to a common test load for eachconcrete design. In the following sections, we describethe details of our approach.

    4 DESIGN SPACE SYNTHESISThis section presents our approach to synthesize designtradespaces for ORM using static metrics published inthe database literature [28] [15]. The next two sections

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 4

    then detail our approach to dynamic analysis of syn-thesized tradespaces, specifically synthesizing commonloads and specializing abstract loads to the variantschemas in the design space, respectively.

    We first introduce domain-specific languages (DSLs)that we developed within the Alloy language to letdevelopers specify object and relational schemas in Alloy(§ 4.1). Next, we present formalizations of object-relationalmapping strategies (§ 4.2) as well as verification of suchsemantic mapping rules (§ 4.3). We then use these for-malizations to automate synthesis of design spaces ofOR mappings (§ 4.4).

    As an enabling technology, we chose Alloy [30] asa specification language and satisfaction engine. Alloyis a formal modeling language with a comprehensiblesyntax that stems from notations ubiquitous in objectorientation, and semantics based on the first-order rela-tional logic [30], making it an appropriate language fordeclarative specification of both application data objectmodel and semantic mapping rules. More specifically,we chose it for three reasons. First, its logical and rela-tional operators make Alloy an appropriate language forspecifying object-relational mapping strategies. Second,its ability to compute solutions that satisfy complexconstraints is useful as an automation mechanism. Third,Alloy has a rigorously defined semantics closely relatedto those of relational databases, thereby providing asound formal basis for our approach.

    The Alloy Analyzer is a constraint solver that sup-ports automatic analysis of models written in Alloy.The analysis process is based on a translation of Al-loy specifications into a Boolean formula in conjunctivenormal form (CNF), which is then analyzed using off-the-shelf SAT solvers. With respect to the constraintsin a given model, the Alloy Analyzer can be usedeither to find solutions satisfying them, or to generatecounterexamples violating them. The Alloy Analyzer isa bounded checker, so a user-specified scope on the sizeof the domains needs to be specified. In the matter of theobject-relational mappings, the scope states the numberof elements of each top-level signature, such as Class, At-tribute and Association. The analysis is thus performedthrough exhaustive search for satisfying instances withinthe specified scopes.

    4.1 AlloyOM Domain-specific LanguageTo carry out the synthesis, we begin by developingdomain-specific languages in Alloy that specify thesource and target languages, for describing object andrelational schemas, respectively. These specifications de-fine element types, and how they are related and con-strained to constitute valid expressions in these respec-tive domains.

    Listing 2 outlines the specification of the element typesin the AlloyOM DSL. The DSL built-ins are definedas top-level Alloy signatures. A signature paragraphintroduces a basic element type and a set of its relations,called fields, accompanied by the type of each field.

    1 a b s t r a c t s ig Class{2 a t t r S e t : s e t Attr ibute ,3 id : s e t Attr ibute ,4 parent : lone Class ,5 i s A b s t r a c t : one Bool6 }78 a b s t r a c t s ig A t t r i b u t e{}9

    10 a b s t r a c t s ig Assoc ia t ion{11 s r c : one Class ,12 dst : one Class ,13 s r c m u l t i p l i c i t y : one M u l t i p l i c i t y S t a t e ,14 d s t m u l t i p l i c i t y : one M u l t i p l i c i t y S t a t e15 }

    Listing 2: Partial specification of the AlloyOM DSL inAlloy.

    The specification defines three main constructs as top-level signatures to model the basic AlloyOM constructs:Class, Attribute and Association. Note that thesesignatures are defined as abstract, meaning that theycannot have instance objects without explicitly extendingthem.

    According to lines 2–5, the Class signature containsfour fields: id, parent, attrSet and isAbstract. The id fieldwithin the Class signature (line 3) represents the identi-fier of the corresponding class. The inheritance relation-ship in the object model is represented by the parentrelation in the AlloyOM DSL. Specifically, the inheritancerelationship between two classes of c and p, wherec inherits from p, for example, will represent by theexpression of parent = p specified within the c classsignature definition. The keyword lone specifies that anelement is optional. In this specific case it indicates thatthe parent element is optional, and a Class may haveone or no declared parent. The attrSet specifies theset of attributes as defined for the corresponding class inthe object model. Finally, the isAbstract field denoteswhether a given class is abstract or not; the keywordone, used in its definition, states that every Class objectis mapped to exactly one Bool object, representing itsabstract state.

    To describe each attribute within the AlloyOM DSL,one can define a signature that extends the corre-sponding data type signature. The AlloyOM DSL pro-vides a set of predefined data types, such as In-teger, Real, String and Bool, that extends the basicAttribute signature (line 8). Each Association sig-nature contains four fields (as stated in lines 11–14): src,dst, src_multiplicity and dst_multiplicity.The first two fields specify the source and destinationof the association, respectively. Association multiplicitydefines the number of object instances that can be at eachend of the association. The src_multiplicity anddst_multiplicity fields, thus, represent associationmultiplicities of source and destination classes respec-tively, and can have values of either ONE or MANY.

    The code snippet in Listing 3 represents the diagramof Figure 1 delineating an object model for a simple

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 5

    1 one sig Customer extends Class{}{2 a t t r S e t = customerID +customerName3 id=customerID4 i s A b s t r a c t = No5 no parent6 }78 one sig Order extends Class{}{9 a t t r S e t = orderID + orderValue

    10 id = orderID11 i s A b s t r a c t = No12 no parent13 }1415 one sig CustomerOrderAssociation extends Assoc ia t ion{}{16 s r c = Customer17 dst = Order18 s r c m u l t i p l i c i t y = ONE19 d s t m u l t i p l i c i t y = MANY20 }2122 one sig PreferredCustomer extends Class{}{23 a t t r S e t = discount24 id = customerID25 parent = Customer26 i s A b s t r a c t = No27 }

    Listing 3: Object model of Figure 1 formally modeledin our AlloyOM DSL.

    customer-order example. The order class has two at-tributes of orderID and orderValue, which are as-signed to the attrSet field of the Order class. Theid field specifies the orderID as the identifier of thisclass. The last two lines of the Order signature speci-fication denote that Order is not an abstract class andhas no parent. Similarly, the code snippet of lines 8–13represents PreferredCustomer signature definition.According to the diagram, the PreferredCustomerclass inherits from the Customer class. The expressionon line 11, thus, specifies Customer as the parent of thePreferredCustomer class.

    To assist the users of Trademaker and to further makeour tool chain more accessible, we have developed afront-end transformer that translates UML-based objectmodels—simply can be modeled by UML modelingtools, such as ArgoUML2—to a formal representation inthe AlloyOM language. Figure 4 shows a snapshot of theAlloyOM model transformer. Such a model transformerrelieves the Trademaker’s user of responsibility for de-veloping formal specifications, while facilitating the useof the entire tradeoff analysis tool chain. The AlloyOMmodel transformer, along with more details on the Al-loyOM language, is available from the project’s websiteat http://jazz.cs.virginia.edu:8080/Trademaker/help.jsf.

    A similarly defined Alloy-embedded relationalschema specifies the co-domain of an object-relationalmap. A relation schema, in the relational model [22],is a set of attributes; a relational schema is thus a setof relation schemas; a relation, at the instance level,is a set of tuples over the attributes defined in therelation schema. Each relation schema has a primarykey that may consist of one or more attributes of a

    2. http://argouml.tigris.org/

    Fig. 4: A front-end transformer that translates UML-based object models to a formal representation in theAlloyOM language.

    relation schema. A foreign key is a set of attributes of aschema used to reference relation instances (i.e., tuples)of another schema; foreign keys define referentialconstraints between relations.

    4.2 Formalizing ORM Strategies as Semantic Map-ping RulesAfter specifying the abstractions involved in the sourceand destination domains, the next step is to express map-ping rules as additional predicates that relate elementsof the source domain to the constructs in the destination.These mapping rules, which basically provide a non-deterministic operational semantics to expressions inAlloyOM, are specified once as a semantic mappingpredicate in relational logic. Combining these rules witha particular AlloyOM source model and finding allmodels of the resulting set of constraints reveals the setof corresponding design solutions.

    To define such mapping rules, we rely on object-relational mapping strategies, informally defined in theliterature [18], [28], [31], [35], and formalize them usingrelational logic. Specifically, to provide a basis for precisemodeling of the space of mapping alternatives, we haveformalized ORM strategies in an appropriate level ofgranularity.

    To manage association relationships, we have formallyspecified three ORM strategies of own association table,foreign key embedding and merging into single table [35]. Wehave also specified three more ORM strategies for inheri-tance relationships: class relation inheritance (CR), concreteclass relation inheritance (CCR) and single relation inheri-tance (SR) [18]. Furthermore, as the aforementioned ORMstrategies for inheritance relationships are just applicableto the whole inheritance hierarchies, we have specifiedextra predicates for more fine-grained strategies [35]:Union Superclass, Joined Subclass and Union Subclass, suit-able to be applied to part of inheritance hierarchies,letting the developer design a detailed mapping specifi-cation using the combination of various ORM strategies.In fact, different fine-grained mapping strategies can beapplied into different parts of a multi-level hierarchy.There are, however, some constraints that prevent allpossible combinations. To make the idea concrete, in therest of this section, we illustrate the semantics of thesestrategies that were first briefly introduced in our priorwork [13], [14].

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 6

    1 module ORMStrategies23 pred unionSubclass [ c : Class ]{4 / / Ass ign ing a t a b l e t o e a c h c o n c r e t e c l a s s5 ( c . i s A b s t r a c t = No) implies { one t : Table |6 t . t A s s o c i a t e = c and one t . t A s s o c i a t e }7 one c . ˜ t A s s o c i a t e8 / / r e l a t i o n a l f i e l d s c o r r e s p o n d i n g t o a t t r i b u t e s o f

    t h e a s s o c i a t e d c l a s s9 ( c . i s A b s t r a c t = No) implies { a l l a : A t t r i b u t e |

    10 a in c . a t t r S e t => oneAssocFieldClass [ a , c ] }11 / / r e l a t i o n a l f i e l d s c o r r e s p o n d i n g t o i n h e r i t e d

    a t t r i b u t e s o f t h e c l a s s12 ( c . i s A b s t r a c t = No and some c . ˆ parent ) implies { a l l a

    : A t t r i b u t e |13 a in c . ˆ parent . a t t r S e t => oneAssocFieldClass [ a , c ] }14 # c . ˜ t A s s o c i a t e . f i e l d s = # ( c . * parent ) . a t t r S e t15 / / Ass ign ing t h e pr imary Key16 ( c . ˜ t A s s o c i a t e ) . primaryKey = ( c . id . ˜ f A s s o c i a t e )17 c . i s A b s t r a c t = No implies oneAssocFieldClass [ c . id , c ]18 / / Ass ign ing t h e f o r e i g n Key19 no c . ˜ t A s s o c i a t e . foreignKey20 }

    Listing 4: A snippet of the Alloy model of theunion subclass strategies for mapping classes withininheritance hierarchies.

    4.2.1 Mapping of inheritance relationshipsListings 4–5 show the semantics of fine-grained strate-gies for mapping individual classes within inheritancehierarchies. Each ORM strategy is represented by aseparate Alloy predicate.

    Union Subclass ORM strategy. The first Alloy predi-cate (Listings 4), parameterized by Class c, outlines theunionSubclass strategy (lines 3–20). This pattern impliesmapping c into a separate relational table, should c be aconcrete class. The tAssociate is a relation from a tableto its corresponding class(es). Using the join operator,t.tAssociate thus states a set of all classes handled bythe t table. The ∼ operator represents the transposeoperation over a binary relation, which reverses theorder of atoms within the relation. The statement ofline 7, thus, using the multiplicity keyword one statesthat c is mapped to exactly one relational table. Thestrategy predicate then states, in lines 9–14, that the ta-ble encompasses relational fields corresponding to bothattributes defined by c and all attributes inherited tothis class. The in keyword represents the set inclusionoperator. The operators ‘ˆ’ and ‘∗’ represent transitiveclosure and reflexive transitive closure, respectively. The‘#’ operator is the Alloy set cardinality operator. Theexpression c.ˆparent.attrSet (line 13) represents the setof all attributes inherited to c from its parents followingone or more traversals along parent edges. As such,to retrieve object instances of this class, i.e., all objectswhose most specific type is c, only one table needs to beaccessed. This strategy thus implies no referential con-straint over the mapped relations. As a concrete example,in the example shown in Figure 2c, a separate tableis associated to the PreferredCustomer class, which isbeing mapped by the unionSubclass strategy. Note thatthe predicate specification relies on a separate predicate,oneAssocFieldClass (lines 65–69), that assigns a table

    22 pred j o i n e d S u b c l a s s [ c : Class ] {23 / / Ass ign ing a t a b l e t o e a c h c o n c r e t e c l a s s24 ( c . i s A b s t r a c t = No) implies { one t : Table |25 t . t A s s o c i a t e = c and one t . t A s s o c i a t e }26 one c . ˜ t A s s o c i a t e27 / / r e l a t i o n a l f i e l d s c o r r e s p o n d i n g t o non− inhe r i t ed

    a t t r i b u t e s o f t h e a s s o c i a t e d c l a s s28 some c . parent implies {29 # c . ˜ t A s s o c i a t e . f i e l d s = # c . a t t r S e t + 1 / / f o r t h e

    f o r e i g n Key30 } e lse {31 # c . ˜ t A s s o c i a t e . f i e l d s = # c . a t t r S e t32 }33 ( c . i s A b s t r a c t =No) implies { a l l a : A t t r i b u t e |34 a in c . a t t r S e t => oneAssocFieldClass [ a , c ] }35 / / Ass ign ing t h e pr imary Key36 c . ˜ t A s s o c i a t e . primaryKey = c . id . ˜ f A s s o c i a t e37 c . i s A b s t r a c t = No implies oneAssocFieldClass [ c . id , c ]38 / / Ass ign ing t h e f o r e i g n Key39 some c . parent implies {40 c . ˜ t A s s o c i a t e . foreignKey = c . parent . id . ˜ f A s s o c i a t e }41 e lse { no c . ˜ t A s s o c i a t e . foreignKey }42 }

    Listing 5: A snippet of the Alloy model of thejoined subclass strategies for mapping classes withininheritance hierarchies.

    field to a class attribute given as input.Joined Subclass ORM strategy. The joinedSubclass

    strategy is then presented in Listing 5, lines 22–42. Usingthis strategy, the attributes defined by the class and theprimary key attributes inherited from its super class aremapped to a separate table (lines 24–34). The table struc-ture mapped using this pattern simply resembles theapplication object model, and the impact of changes ina class is limited only to the scope of the correspondingtable, thus improving its maintainability. An example forapplication of the joinedSubclass strategy is shown inFigure 2a, that leads to a separate table for Preferred-Customer with a foreign key to its superclass (Customer)corresponding table. Different from the unionSubclassstrategy, the relational table mapped using this patternonly includes attributes defined by the class c, ratherthan all its inherited attributes. Object instances arethus scattered over several rows in different relationaltables. So to retrieve an object instance, it may requireseveral joins that in turn may negatively affect the queryperformance [35].

    Union Superclass ORM strategy. The statementsshown in Listing 6 represent the Alloy predicate for theunionSuperclass mapping strategy, which implies map-ping c into the same relational table as its super class(es)are mapped to. Such a table then stores object instancesof c and all its transitive super classes, i.e., those classesreachable from c via the inheritance relationships (lines47–48). Thus, when a class in an inheritance hierarchyis being mapped by the unionSuperclass strategy, all itssuper classes should also be mapped—using the samemapping pattern—into the same table the given class ismapped to. Moreover, a type indicator field is needed todistinguish the class type of each row in the table (lines52–55). In the example of Figure 2b, a table is associatedto both Customer and PreferredCustomer classes, which

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 7

    44 pred unionSuperclass [ c : Class ] {45 / / Ass ign ing a t a b l e t o e a c h c o n c r e t e c l a s s46 ( c . i s A b s t r a c t = No) => one Table oneAssocFieldClass [ a , c ] }52 / / The t a b l e has an a d d i t i o n a l f i e l d d i s t i n g u i s h i n g

    t h e t y p e o f r e c o r d s t o r e d53 one f : F i e l d | f . f A s s o c i a t e in DType and54 f in c . ˜ t A s s o c i a t e . f i e l d s and55 c . ˜ t A s s o c i a t e . f i e l d s = c . ˜ t A s s o c i a t e . foreignKey + c

    . ˜ t A s s o c i a t e . t A s s o c i a t e . a t t r S e t . ˜ f A s s o c i a t e + f56 / / Ass ign ing t h e pr imary Key57 c . ˜ t A s s o c i a t e . primaryKey = c . id . ˜ f A s s o c i a t e58 / / Ass ign ing t h e f o r e i g n Key59 ( no a : Assoc ia t ion | a . dst in c . * parent ) =>{60 # ( c . * parent . ˜ t A s s o c i a t e ) . foreignKey = #{ a :

    Assoc ia t ion |61 a . dst in c . * parent and a . d s t m u l t i p l i c i t y = MANY

    and no a . ˜ t A s s o c i a t e }62 }63 }64 / / Ass ign ing a r e l a t i o n a l f i e l d t o a c l a s s a t t r i b u t e65 pred oneAssocFieldClass [ a : At t r ibute , c : Class ] {66 one f : F i e l d |67 f . f A s s o c i a t e = a and68 f in c . ˜ t A s s o c i a t e . f i e l d s69 }

    Listing 6: A snippet of the Alloy model of theunion superclass strategies for mapping classes withininheritance hierarchies.

    are being mapped by the unionSuperclass strategy. Thismapping pattern significantly reduces the number oftables in the relational schema, and no joins are neededto retrieve object instances of a single class. However, itmay introduce NULL values into the database. Specifi-cally, storing object instances of super classes into sucha table mapped using the unionSuperclass strategy in-troduces NULL values for attributes defined just for achild class [28].

    4.2.2 Mapping of AssociationsThere are several alternatives for the mapping of anassociation in the object model. We illustrate the se-mantics of the corresponding mapping strategies in thefollowing. Note that since n-ary associations are typicallytransformed into binary ones in design models [35], ourdiscussion in this section focuses on different possiblemapping alternatives for binary relationships.

    Own Association Table ORM strategy. The Alloypredicate of Listing 7, parameterized by Association asc,outlines the ownAssociationTable strategy. This patternimplies mapping each of the classes and the asc toseparate relations. An example of this mapping strategyis shown in Figure 2a. The representation of the asso-ciation in a separate table improves comprehensibilityof the relational schema mapped using this pattern,and supports changes in cardinality of classes involvedat either end of the association. Note that there arethree types of association relationships based on the

    70 pred ownAssociationTable [ asc : Assoc ia t ion ] {71 / / Ass ign ing a t a b l e t o t h e a s s o c i a t i o n and e a c h o f

    i t s ends72 one t : Table | asc . s r c in t . t A s s o c i a t e73 one t : Table | asc . dst in t . t A s s o c i a t e74 one t : Table | t . t A s s o c i a t e = asc75 one asc . ˜ t A s s o c i a t e76 one asc . s r c . ˜ t A s s o c i a t e77 one asc . dst . ˜ t A s s o c i a t e78 / / r e l a t i o n a l f i e l d s c o r r e s p o n d i n g t o a t t r i b u t e s o f

    t h e a s s o c i a t e d c l a s s e s79 assocEnd [ asc . s r c ]80 assocEnd [ asc . dst ]81 asc . ˜ t A s s o c i a t e . f i e l d s = asc . s r c . ˜ t A s s o c i a t e .

    primaryKey + asc . dst . ˜ t A s s o c i a t e . primaryKey82 # asc . ˜ t A s s o c i a t e . f i e l d s = 283 / / Ass ign ing t h e pr imary Key84 asc . s r c . ˜ t A s s o c i a t e . primaryKey =85 asc . s r c . id . ˜ f A s s o c i a t e86 asc . dst . ˜ t A s s o c i a t e . primaryKey =87 asc . dst . id . ˜ f A s s o c i a t e88 asc . ˜ t A s s o c i a t e . primaryKey = asc . s r c . ˜ t A s s o c i a t e .

    primaryKey + asc . dst . ˜ t A s s o c i a t e . primaryKey89 / / Ass ign ing t h e f o r e i g n Key90 asc . ˜ t A s s o c i a t e . foreignKey = asc . s r c . id . ˜ f A s s o c i a t e +

    asc . dst . id . ˜ f A s s o c i a t e91 fKeysForMany [ asc ]92 }9394 pred assocEnd [ aEnd : Class ] {95 a l l a : aEnd . a t t r S e t {96 a ! in Class => oneAssocField [ a ]97 }98 aEnd . ˜ t A s s o c i a t e . f i e l d s in99 aEnd . ˜ t A s s o c i a t e . foreignKey + aEnd . ˜ t A s s o c i a t e .

    t A s s o c i a t e . a t t r S e t . ˜ f A s s o c i a t e + DType . ˜ f A s s o c i a t e100 }101102 pred oneAssocField [ a : A t t r i b u t e ] {103 one f : F i e l d {104 f . f A s s o c i a t e = a105 one f . f A s s o c i a t e106 one a . ˜ f A s s o c i a t e107 f in a . ˜ a t t r S e t . ˜ t A s s o c i a t e . f i e l d s108 }109 }

    Listing 7: A snippet of the Alloy predicate for theown association table strategy for mapping associationrelationships.

    multiplicities of source and destination classes involvedin such relationships. One-to-one relationships are thosein which association multiplicity at each end is limitedto one. In one-to-many relationships, participation of oneend in the association is greater than one. Finally, inmany-to-many relationships, association multiplicities atboth ends are greater than one. The ownAssociationTablestrategy is applicable to each of these three types. At thesame time, should query performance be an importantissue, it may not be a best mapping alternative.

    Foreign Key Embedding ORM strategy. Listing 8presents the Alloy predicate for the foreignKeyEmbed-ding ORM strategy. Using this mapping, a foreign keyto the table, that corresponds to the class involved inthe relationship with the multiplicity of one, would beembedded into the table corresponding to the other endof the association. A foreign key is a non-empty setof attributes of a relation which is used to point totuples of another relation. Essentially, foreign keys spec-ify referential constraints between relations. Figure 2b

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 8

    110 pred foreignKeyEmbedding [ asc : Assoc ia t ion ] {111 / / Ass ign ing a t a b l e t o e a c h end o f t h e a s s o c i a t i o n112 one t : Table | asc . dst in t . t A s s o c i a t e113 some asc . dst . ˜ t A s s o c i a t e114 ! ( ( asc . s r c . ˜ t A s s o c i a t e = asc . s r c . ˜ parent . ˜ t A s s o c i a t e )115 or ( asc . s r c . ˜ t A s s o c i a t e = asc . s r c . parent . ˜ t A s s o c i a t e ) )116 implies one t : Table | t . t A s s o c i a t e =asc . s r c117 some asc . s r c . ˜ t A s s o c i a t e118 / / No t a b l e i s a s s i g n e d t o t h e a s s o c i a t i o n119 no t : Table | t . t A s s o c i a t e = asc120 no asc . ˜ t A s s o c i a t e121 / / r e l a t i o n a l f i e l d s c o r r e s p o n d i n g t o a t t r i b u t e s o f

    t h e a s s o c i a t e d c l a s s e s122 assocEnd [ asc . s r c ]123 assocEnd [ asc . dst ]124 / / Ass ign ing t h e pr imary Key125 asc . s r c . ˜ t A s s o c i a t e . primaryKey =126 asc . s r c . id . ˜ f A s s o c i a t e127 asc . dst . ˜ t A s s o c i a t e . primaryKey =128 asc . dst . id . ˜ f A s s o c i a t e129 asc . s r c . ˜ t A s s o c i a t e . primaryKey in130 asc . dst . ˜ t A s s o c i a t e . f i e l d s131 asc . s r c . ˜ t A s s o c i a t e . primaryKey in132 asc . dst . ˜ t A s s o c i a t e . foreignKey133 / / Ass ign ing t h e f o r e i g n Key134 fKeysForMany [ asc ]135 }

    Listing 8: A snippet of the Alloy predicate for theforeignKey embedding strategy for mapping associationrelationships.

    shows an example application of this mapping strategy.This mapping pattern is applicable to the one-to-oneand one-to-many relationships. Because association rela-tionships in relational schema are maintained throughthe use of foreign keys, this mapping reduces the re-dundancies observed in relational tables mapped usingmergingOneTable, discussed next. However, relational ta-bles mapped using this pattern are not as maintainableas those mapped using the previous pattern, because,among other things, any cardinality changes to many-to-many are not easily integrated [35].

    Merging into Single Table ORM strategy. The Alloypredicate of Listing 9 presents the mergingOneTablemapping strategy, in which both classes and the asso-ciation are merged into a single relation. Applying thispattern would result in achieving better query perfor-mance, as no further joins are needed to retrieve objectinstances linked by one-to-one associations. It, however,may introduce NULL values into the table, especiallywhen the mapping pattern applies to the other types ofassociations, such as one-to-many relationships.

    4.3 Verifying Semantic Mapping RulesFormalizing DSLs and mapping rules in an analyzablespecification language not only enables automatic syn-thesis of transformed models, but also provides the basisto formally validate their correctness. By expressing es-sential properties of object-relational mappings requiredto be checked, we then use an automated relational logicanalyzer to verify them. We specify such implicationsrequired to be checked as assertions. Assertions statea set of constraints intended to follow from specifica-tions [30]. Correctness of mappings is then checked using

    136 pred mergingOneTable [ asc : Assoc ia t ion ] {137 / / Both C l a s s e s and A s s o c i a t i o n s would be merged i n t o a

    s i g l e T a b l e138 one t : Table {139 t . t A s s o c i a t e = asc + asc . s r c + asc . dst140 # t . t A s s o c i a t e = 3141 }142 one asc . ˜ t A s s o c i a t e143 / / r e l a t i o n a l f i e l d s c o r r e s p o n d i n g t o a t t r i b u t e s o f

    t h e a s s o c i a t e d c l a s s e s144 a l l a : asc . ( s r c + dst ) . a t t r S e t {145 a ! in Class => ( one f : F i e l d {146 f . f A s s o c i a t e = a147 f in a . ˜ a t t r S e t . ˜ t A s s o c i a t e . f i e l d s148 } )149 }150 one f : F i e l d {151 f . f A s s o c i a t e = asc . s r c152 f in asc . ˜ t A s s o c i a t e . f i e l d s153 }154 one f : F i e l d {155 f . f A s s o c i a t e = asc . dst156 f in asc . ˜ t A s s o c i a t e . f i e l d s157 }158 # F i e l d = # F i e l d . f A s s o c i a t e159 / / In one−to−one a s s o c i a t i o n r e l a t i o n s h i p s , t h e

    pr imary key o f t h e t a b l e a s s o c i a t e d t o t h ec o m b i n a t i o n o f C l a s s e s and t h e i r A s s o c i a t i o n canbe t h e pr imary key o f e i t h e r o f c l a s s e s , h e r e i ti s a s s i g n e d t o t h e pr imary key o f t h e s r c o f t h ea s s o c i a t i o n

    160 asc . ˜ t A s s o c i a t e . primaryKey = F i e l d

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 9

    1 a s s e r t noTableForAbstractClasses{2 a l l c : Class | c . i s A b s t r a c t =Yes => no t : Table | t .

    t A s s o c i a t e = c3 }45 a s s e r t t a b l e F i e l d s{6 a l l c : Class | c . ˜ t A s s o c i a t e . f i e l d s in7 c . ˜ t A s s o c i a t e . t A s s o c i a t e . a t t r S e t . ˜ f A s s o c i a t e +8 c . ˜ t A s s o c i a t e . foreignKey + DType . ˜ f A s s o c i a t e9 }

    Listing 10: Two examples of assertions.

    where each assertion is exhaustively checked againsta huge set of model instances up to a certain bound.In other words, the analyzer is a bounded checker,guaranteeing the validity of assertions within a boundedinstance space. We bound execution of assertion check-ing with the ultimate scope used for synthesis of trans-formed models, and thus expect the validity of assertionsfor all generated instances.

    4.4 Design Space ExplorationIn the previous section, we showed how analyzablespecifications can be used to formalize ORM strategies.In this section, we tackle the other aspect that needs to beclarified: how one can apply a design space explorationapproach to generate design spaces of OR mappingsbased on those specifications.

    A design space is a set of possible design alterna-tives, and design space exploration (DSE) is the processof traversing the design space to determine particulardesign alternatives that not only satisfy various designconstraints, but are also optimized in the presence ofa set of additional objectives [38]. The process can bebroken down into three key steps: (1) Modeling the spaceof mapping alternatives; (2) Evaluating each alternativeby means of a set of metrics; (3) Traversing the space ofalternatives, so characterized, to select one as the chosedesign.

    4.4.1 Modeling the Space of Mapping AlternativesFor each application object model, due to a variety ofdifferent mapping options (cf. Sect. 4.2) available for eachclass, its attributes and associations, and its position inthe inheritance hierarchy, there are several valid variants.To model the space of all mapping alternatives, wedevelop a generic mixed mapping specification basedon fine-grained strategies formalized in previous sec-tion. This generic mixed mapping specification lets theautomatic model finder choose for each element of theobject model any of the relevant strategies, e.g. any ofthe fine-grained generalization mapping strategies for agiven class within an inheritance hierarchy.

    Applying such a loosely constrained mixed mappingstrategy into the object model leads to a set of ORMspecifications constituting the design space. While theyall represent the same object model and are consistentwith the rules implied by a given mixed mapping

    strategy, they exhibit different quality attributes. Forexample, how inheritance hierarchies are being mappedto relational models affects the required space for datastorage and the required time for query execution.

    We called this mapping strategy loosely constrainedbecause it does not concretely specify the details of themapping, such as applying, for example the UnionSub-class strategy to a specific class. An expert user, though, isable to define a more tightly constrained mixed mappingby means of the parameterized predicates Trademakerprovides. The more detailed the mapping specifications,the narrower the outcome design space, and the less therequired postprocessing search.

    4.4.2 Static Analysis of Synthesized Designs

    The choice of mapping strategy impacts key non-functional system properties. Response time perfor-mance, storage space and maintainability are amongthe set of quality attributes defined by the ISO/IEC9126-1 standard that are influenced by the choice ofOR mappings. To statically measure these attributes inan ORM design space, we use a set of metrics sug-gested by Holder et al. [28] and Baroni et al. [15]. Themetrics are called Table Access for Type Identification(TATI ), Number of Corresponding Table (NCT), Numberof Corresponding Relational Fields (NCRF), AdditionalNull Value (ANV), Number of Involved Classes (NIC)and Referential Integrity Metric (RIM). In the following,we present three of these metrics for time and spaceperformance.

    Table Accesses for Type Identification (TATI)

    Table Accesses for Type Identification (TATI) is a perfor-mance metric for polymorphic queries [28]. Intuitively,a high value for TATI(C) implies a longer executiontime for a polymorphic query on C. According to thedefinition, given a class C, TATI(C) defines the numberof different tables that correspond to C and all its sub-classes. Our tools total up TATI values for each class asthe overall TATI measure for each solution alternative.

    Number of Corresponding Tables (NCT)

    Number of Corresponding Tables (NCT) is a perfor-mance metric for insert and update queries. Intuitively,the performance of such queries mainly depends on thenumber of tables that hold data of the requested object.This metric, thus, specifies the number of tables thatcontain data necessary to assemble objects of a givenclass [28]. According to the definition, given a class C,NCT(C) equals to NCT of its direct super class, if C ismapped to the same table as its super class. Otherwise,if C is mapped to its own table, NCT(C) equals to NCTof its direct super class plus one. Finally, if C is a rootclass, NCT(C) equals to 1. Our tool computes totaledNCT values over classes as the NCT measure for eachsolution alternative.

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 10

    Additional Null Value (ANV)The Additional Null Value (ANV) metric specifies thestorage space for null values when different classes arestored in a common table [28]. According to the defini-tion, given a class C, ANV(C) equals to the number ofnon-inherited attributes of C multiplied by the numberof other classes that are being mapped to the particulartable to which C is mapped. Our tools present totals forANV values over all classes as the ANV measures foreach solution alternative.

    To apply these metrics to synthesized solutions, wedesigned specific Alloy queries. Here we describe onefor measuring the TATI metric. The others are evaluatedsimilarly.

    TATI(C) = #(C.*(∼parent).∼tAssociate)

    Here the dot operator denotes a relational join. The Al-loy ∼ operator represents the transpose operation over abinary relation, which reverses the order of atoms withinthe relation. Given the tAssociate (abstraction) relationthat maps tables to their associated elements (i.e. Classor Association) within the object model, its transposeis the relation that maps each element to its associatedtable within the relational structure. The Alloy * operatorrepresents the reflexive-transitive closure operation of arelation. Accordingly, the expression of “C.*(∼parent)”states a set of classes that have the class “C” as theirancestor in their inheritance hierarchy. The query expres-sions then, by using the Alloy set cardinality operator #,computes the TATI metric.

    Our static metrics suite comprises six such staticmeasures. The vector of these functions defines a 6-dimensional static analysis function applicable to Alloy-synthesized concrete designs (e.g., Figure 6). Our toolsmap this function over all elements of a synthesizeddesign space to produce a tradeoff surface. The spider di-agram, shown in Figure 5, illustrates one Pareto-optimalpoint on that surface for our example customer-ordersystem. To display quality measures in one diagram, wenormalized the values. Such diagrams can assist in con-ducting tradeoff analyses by making it easier to visualizeand compare alternatives. According to the diagram, ifthe designer opts for performance, she may decide to useSol. 5 instead of Sol. 4, as the latter has worse values forthe TATI and NCT performance metrics.

    4.4.3 Exploring, Evaluating and ChoosingThe next step is to explore and prune the space ofmapping alternatives according to quality measures.Trademaker partitions the space of satisfactory mixedmapping specifications into equivalence classes and se-lects at most a single candidate from each equivalenceclass for presenting to the end-user.

    To partition the space, Trademaker evaluates eachalternative with respect to previously described relevantmetrics. So each equivalence class consists of all alter-natives that exhibit the same characteristics. Specifically,two alternatives a1 and a2 are equivalent if value(a1, mi)

    Fig. 5: Multi-dimensional quality measures for pareto-optimal solutions.

    = value(a2, mi) for all metrics (mi). Because equivalentalternatives all satisfy the mapping constraints, we selectone alternative in each equivalence class to find a choicealternative. Given that quality characteristics are usuallyconflicting, there is generally no single optimum solutionbut there are several pareto-optimal choices representingbest trade-offs.

    Having computed satisfying solutions, we then un-parse these solutions from intermediate representationin relational logic into a desirable form, here SQL coun-terparts.

    Fig. 6: OR mapping for customer-order example

    Figure 6 presents a graphical depiction of an Alloy ob-ject encoding a synthesized OR mapping solution, whereovals represent tables, solid rectangles represent classes,and dotted boxes represent attributes. This solution isone of five Pareto-optimal solutions in the design spacefor our customer-order object model. The diagram isaccurate but edited to omit some details for readability.In this diagram, Table1 is associated to Customer andPreferredCustomer classes, and Table0 is associated toboth Order and CustomerOrderAssociation.

    From this Alloy solution, our tools generate the SQLscript of Listing 1. The script sets up a database with

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 11

    the two tables: Order, with attributes orderID, customerIDand orderValue; and Customer, with attributes, customerID,customerName, discount, and DType. Both Customer andPreferredCustomer objects are stored in this table underthis particular mapping strategy, with the DType fielddistinguishing the type of record stored.

    5 ABSTRACT LOAD SYNTHESISOur approach to synthesizing abstract loads starts withthe automated transformation of a given Alloy-OMmodel into a related Alloy specification that we call aload model. We then use the Alloy Analyzer to synthe-size abstract loads from this load model. Alloy solutionsto the load model encode abstract object model datainstances (OM-instances), which are what we take as syn-thesized loads with which to test synthesized designs.This section describes this functionality in more detail.

    For each instance of Class and Association in theAlloy-OM model, our model transformer synthesizes asignature definition. When the class under considerationinherits from another class, the synthesized signaturedefinition extends its parent signature definition. Giventhe specification of Order represented in Listing 3, thefollowing code snippet represents its counterpart in asynthesized load model.s ig Order{

    orderValue : one Int ,orderID : one I n t

    }

    The one multiplicity constraints used in the declara-tion of elements’ signatures within the Alloy-OM model(Listing 3) specify them as singleton signatures. Whilethese constraints are required by the tradeoff spacegenerator (e.g. to not generate multiple tables for a classin the model), they are unneeded for load generation,and thus omitted in the load model. The element at-tributes in the object model are also declared as fields ofthe corresponding load signature definition representingrelations from the signature to the attribute type.

    Finally, two sets of constraints are synthesized asfact paragraphs in the load model to guarantee bothreferential integrity of generated data as well as unique-ness of element identifiers with reference to the set ofelement instances to be generated. Referential constraintsrequire every value of a particular attribute of an elementinstance to exist as a value of another attribute in adifferent element.

    Consider the association relationship between Cus-tomer and Order classes from our running example(Figure 2). The code snippet of Listing 11 represents syn-thesized constraints in the load model for the customer-order association.

    The expression of lines 1–3 states that if any twoelements of type CustomerOrderAssociation have the sameorderID and customerID, the elements are identical. Thisconstraint rules out duplicate elements. The fact con-straint of lines 5–8 states that for any orderID andcustomerID fields of a CustomerOrderAssociation, there

    are Order and Customer instances with the same orderIDand customerID.

    Applying the Alloy Analyzer to the derived loadmodel yields the desired load in the form of object modeldata instances (OM-instances). Figure 7 depicts a gener-ated OM-instance, which is essentially an Alloy solutionobject, where ovals represent associations, solid rectan-gles represent classes, and dotted boxes represent values.This solution represents two customers with customerIDof 64 and 225, the latter a preferred customer with 10percent discount, along with their orders. From manysuch solutions we derive an abstract (application-object-model-level, rather than concrete-database-schema-level)load with which to test the performance of manydatabase instances.

    Improving the efficiency of the load generator. One ofthe challenges we faced involved the scalability of thisapproach to load synthesis. A large number of solutionsgenerated by the Alloy Analyzer were symmetric to pre-viously generated instances, and thus did not contributeusefully to the load being generated. We explored anumber of ways to improve efficiency of the load gener-ator. The one that we found worked best is the iterativerefinement of the load model by adding constraints thateliminate permutations of the already generated OM-instances. Without this improvement, it took 21 hoursfor Trademaker to generate test loads for one of ourexperiments. Given this approach, the time was reducedto about 2 hours—an order of magnitude speed up inthe synthesis of test loads.

    6 ABSTRACT LOAD CONCRETIZATIONThe next challenge we discuss is to convert abstractload OM-instance objects into concrete SQL queries ona per-database basis. This is the task of specializingabstract load elements to the variant schemas presentedby different solutions in the design space. Our Alloy-to-SQL transformer handles this task. To create SQL state-ments for a given database, the Alloy-to-SQL transformerrequires an inverse OR mapping, namely the abstrac-tion function that describes how a particular, concretedatabase schema implements the abstract object model.

    Algorithm 1 outlines this transformation for insertqueries, as it suffices to make our point. The approachsupports the generation of select and update queries aswell, which are important, of course, for comprehensivedynamic analysis.

    The logic of the algorithm is as follows. Iterate over allelements in a given OM-instance (e.g., classes and asso-ciations) whose values can be populated into databasesthrough insert statements. Look up the mapping todetermine the table in which the element values shouldbe stored. For each relational field in the associated table,if the OM-instance contains a value corresponding tothat field, insert the value into the field. Otherwise, inthe case that the field is a DType, insert the name of theelement into the field. Finally, if the field is a foreignKey,

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 12

    Fig. 7: An example of OM-instance.

    1 f a c t {2 a l l o1 , o2 : CustomerOrderAssociation | o1 . orderID = o2 .

    orderID and o1 . customerID = o2 . customerID => o1=o23 }4

    5 f a c t {6 a l l o : CustomerOrderAssociation | one c : Order | o . orderID =

    c . orderID7 a l l o : CustomerOrderAssociation | one c : Customer | o .

    customerID = c . customerID8 }

    Listing 11: Part of the load model specification generatedfor customer-order association.

    Algorithm 1: Generate SQL Insert StatementsInput: omi: OM-instance, map: OR mappingOutput: A set of SQL insert statements

    1 for element in omi do2 T = map.TableAssociat(element);3 F = T.fields;4 for field in F do5 value = getValueFromOMI(field);6 if value != null then7 add “field = value” into statements8 end9 else

    10 if field == “DType” then11 value = element.name;12 end13 if isForeignKey(field) then14 attr =

    findAttributeFromAssociation(field);15 value = getValueFromOMI(attr);16 end17 add “field = value” into statements18 end19 end20 end

    find the associated attribute from a relevant associationin the given OM-instance, and insert its value into thefield.

    Consider the database alternative for our running ex-ample, in which we store the customer-order associationdata into the order table (Figure 2b). In that case, the fieldof customerID in the Order table is a foreignKey, and its

    1 INSERT INTO ‘ Customer ‘ ( ‘ customerID ‘ , ‘ DTYPE‘ ) VALUES ( 6 4 , ’Customer ’ ) ;

    2 INSERT INTO ‘ Customer ‘ ( ‘ customerID ‘ , ‘ DTYPE‘ ) VALUES ( 2 2 5 , ’PreferredCustomer ’ ) ;

    3 INSERT INTO ‘ Order ‘ ( ‘ orderID ‘ , ‘ orderValue ‘ , ‘ customerID ‘ )VALUES ( 1 8 4 , 5 1 1 , 6 4 ) ;

    4 INSERT INTO ‘ Order ‘ ( ‘ orderID ‘ , ‘ orderValue ‘ , ‘ customerID ‘ )VALUES ( 3 6 6 , 5 1 0 , 2 2 5 ) ;

    Listing 12: Generated SQL insert statements from OM-instance of Figure 7 for implementation mapping ofFigure 6.

    values come from the associated customerOrderAssoci-ation element.

    Listing 12 represents the set of SQL insert statementsgenerated from the OM-instance of Figure 7 accordingto the mapping of Figure 6. The first two generatedstatements define insert queries to store instances ofCustomer and PreferredCustomer into the Customertable along with appropriate DType values for each one.The next two statements then store instances of Orderand CustomerOrderAssociation into the Order table.

    7 TOOL IMPLEMENTATIONWe have implemented our approach in a tool calledTrademaker, which is freely available4. Trademaker isa web-accessible tool that implements our ORM syn-thesis and analysis approach. It supports automated,specification-driven synthesis of ORM design spacesand static analysis using the aforementioned metrics. Itprovides a web interface, user account and job manage-ment (job submission, asynchronous execution, statusreporting, persistence), computation and presentationof Pareto-optimal subsets of synthesized designs underthe given metrics, and synthesis of SQL databases forselected designs.

    Figure 8 presents a screenshot of a Trademaker run.Rows present Pareto-optimal designs, and columns,analysis results. It takes an object model as input, ex-pressed in the Alloy-OM DSL (cf., section 4.1), usesthe Alloy Analyzer to exhaustively enumerate satisfy-ing solutions, applies static analysis functions to eachdesign, filters them for Pareto optimality, presents the

    4. Research artifacts and experimental data are available athttp://www.jazz.cs.virginia.edu:8080/Trademaker

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 13

    Fig. 8: A view of our tool to provide decision-makerswith Pareto-optimal OR mapping solutions based onstatic analysis results; columns and rows represent met-rics and solution alternatives, respectively.

    results—design solutions and property estimates in sixdimensions—and delivers usable MySQL scripts to in-stantiate selected designs.

    8 EXPERIMENTAL VALIDITY TEST OF STATICPREDICTION RULES

    As an experimental test of our approach to specification-driven, automated dynamic analysis of non-functionalproperty tradeoffs across design spaces, we apply theapproach to test the validity of the static predictors ofdatabase performance. We formulate, test, and provideexperimental data in support of three driving hypothe-ses:• H1: The ordering of alternatives predicted by the

    static metrics predicts that of the dynamic analysisresults

    • H2: The relative magnitudes of static measures ofalternatives predict those of the dynamic analysisresults

    • H3: Dynamic analysis using scale-limited synthe-sized loads predicts performance under much largerloads

    This section summarizes the design and execution ofour experiment, the data we collected, its interpretation,and our results.

    8.1 Subject SystemsWe synthesized design spaces and compared static pre-dictions with dynamic results for six subject systems, se-lected from different sources and of a variety of differentdomains, ranging from our research lab projects to appli-cations adopted from the database literature and open-source software communities. The selected experimentalsubjects are representative of a large class of useful appli-cations at a scale matched to the state-of-the-art synthesistechniques. Their object models vary in terms of sizeand complexity, and contain multiple association andinheritance relationships, which induce many possiblechoices of object-model to relational schema mappings.Table 1 shows characteristics of these systems in termsof the number of classes, associations, and inheritancerelationships.

    The first is the object model of an E-commerce systemadopted from Lau and Czarnecki [34]. It represents acommon architecture for open-source and commercial E-commerce systems. It has 15 classes connected by 9 asso-ciations with 7 inheritance relationships. The second andthird object models are for systems we are developingin our lab. Decider is another system to support designspace exploration. Its object model has 10 Classes, 11Associations, and 5 inheritance relationships. The thirdobject model is for a system, CSOS, a kind of cyber-social operating system meant to help coordinate peopleand tasks. In scale, it has 14 Classes, 4 Associations, and6 inheritance relationships. The fourth and fifth objectmodels are from two open source applications. We ob-tained their object models by reverse engineering of theirdatabase schemas. WordPress is an open source blogsystem [3]. Its object model has 13 classes connected by10 associations with 8 inheritance relationships. Moodleis a learning management system [1], which is widelyused in colleges and universities. It has 12 classes con-nected by 8 associations and consists of 4 inheritancerelationships. We also analyzed an extended version ofour customer-order example.

    TABLE 1: Subject systems.

    Subject # Classes # Associations # InheritanceSystem relationshipsDecider 10 11 5

    E-commerce 15 9 7CSOS 14 4 6

    WordPress 13 10 8Moodle 12 8 4

    Cust-order (ext) 4 2 1

    8.2 Planning and ExecutionOur experimental procedure involved the synthesis ofboth design spaces of database alternatives and severalabstract loads in a variety of sizes for each subjectsystem. Given the synthesized schemas, we created adatabase for each alternative. We then populated gener-ated data into databases, and ran concrete queries overthose databases. We measured and collected the numberof concrete queries generated from abstract loads foreach database alternative, query execution time, as wellas the size of each database.

    We used an ordinary PC with an Intel Core i73.40 Ghz processor and 6 GB of main memory, withSAT4J as our SAT solver. Database queries were per-formed on a MySQL 5.5.30 database management sys-tem (DBMS), installed on a machine equipped withan AMD Opteron 6134 800 Mhz processor and 64GBmemory. Data and statistical information are availableat http://jazz.cs.virginia.edu:8080/Trademaker/data.

    Table 2 summarizes the generated solution space foreach subject system. There is one row for each system.The columns indicate the total number of solutions, thenumber of static equivalence classes where equivalenceis determined by equality of static analysis results, and

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 14

    TABLE 2: Design space sizes for subject systems.

    Subj. Sys. Solutions Eq.Classes Pareto Sols.Decider 386 154 12

    E-commerce 846 360 16CSOS 278 121 21

    WordPress 924 276 13Moodle 586 144 12

    Cust-order (ext) 28 14 10

    TABLE 3: Part of the generated data sets for the E-commerce experiment; the second row shows abstractloads generated for the E-commerce system within eachdata set; each cell in the other rows corresponds to thesize of generated concrete load (in terms of the numberof select/insert statements) for the database alternativeand data set given on the axes.

    E-commerce Dataset 1 Dataset 2 Dataset 3abstract load 862 2,576 164,813

    Sol.19 456/678 13,698/20,172 2,471,700/3,100,350Sol.121 320/540 9,770/16,244 1,647,800/2,276,450Sol.264 397/618 12,073/18,547 2,142,140/2,770,790Sol.348 379/600 11,395/17,869 1,977,360/2,606,010

    the number of Pareto-optimal solutions under the givenstatic metrics.

    We investigated and compared two different methodsfor generation of data sets. The first method generateddata using our formal synthesis methods. For the sec-ond, we hand-developed a load generator for generatinglarge loads that nevertheless respect the constraints inour object models (e.g., referential constraints betweenelements).

    Three data sets were developed for each subject sys-tem to support the task of evaluating the static metrics.

    Dataset 1. This data set is generated using the Alloy-based data generator, where the maximum bit-width forintegers is restricted to 5. This leads to the generation ofa small data set for our experiments.

    Dataset 2. This data set is generated using our Alloy-based data generator. The maximum bit-width for inte-gers is restricted to 10, which leads to the generation ofa larger data set compared with the former data set.

    Dataset 3. As with many formal techniques, the com-plexity of constraint satisfaction restricts the size of mod-els that can practically be analyzed and synthesized [23],[32]. For experimental purposes, we hand-implementeda more scalable data generator. It does not generatequeries directly, but rather replaces the constraint solverfor synthesis of abstract loads. Having synthesized largerabstract loads in the form of OM-instances, using themechanisms already used in the Alloy-based data gener-ator (cf. Section 5), the generator then transforms abstractloads into sets of concrete queries targeting diverseimplementation alternatives.

    Table 3 presents the size of generated data sets forsome of the solution alternatives for the E-commercesystem. Observe that the sizes of concrete queries—in theform of select/insert statements—refined from commonabstract loads are different in various solution alterna-

    tives, depending on the way that each implementationmapping alternative refines an abstract object model intothe concrete representation in relational structure (cf.Section 6).

    Table 4 presents the time that it takes to executethe generated concrete loads for the same solution al-ternatives, as shown in Table 3, for the E-commercesystem. We have handled uncontrollable factors in ourexperiments by repeating each experiment 10 times andcomputing the average execution time. According to thetable, the way that each solution alternative refines theabstract data loads into the concrete SQL statementsdirectly affects the the time that it takes to execute theircorresponding concrete loads, and thus influences theperformance of the design solution.

    In the rest of this section, we present the experimentaldata to address the three hypotheses driving our re-search.

    8.3 Results for Hypothesis H1 (Order)To test the predictive accuracy of our static metrics,we compared its predictions against the results of ourdynamic analysis. To evaluate our first hypothesis—whether the relative order of implementation alterna-tives is predicted by static metrics—we compute Spear-man correlation coefficients, an appropriate correlationstatistic for order-based consistency analysis. It measuresthe degree of consistency between two ordinal vari-ables [41]. A correlation of 1 indicates perfect correlation,while 0 indicates no meaningful correlation. Negativenumbers indicate negative correlations.

    Table 5 summarizes correlation coefficients betweenstatic metrics and dynamic measures. The data showreasonably strong but somewhat inconsistent positivecorrelations between statically predicted and actual run-time performance for TATI (average correlation of 0.90)and NCT (average correlation of 0.88). These metricsappear moderately to strongly predictive of the relativeordering databases run-time performance, at least for thekinds of loads employed in our experiments.

    The performance of the ANV predictor varies acrossthe subject systems. ANV predicts well in the E-commerce, Decider, and Moodle experiments and mod-erately in the CSOS data, but weakly in the WordPressand customer-order-extended sets. Moreover, the results

    TABLE 4: The running time (in seconds) to execute thegenerated concrete loads (select/insert SQL statementqueries) for the solution alternatives shown in Table 3,for the E-commerce system. To account for the effects ofuncontrollable factors in our experiments, we ran eachset of data 10 times, and computed the average.

    E-commerce Dataset 1 Dataset 2 Dataset 3Sol.19 0.090/0.313 2.584/5.802 763.920/401.420Sol.121 0.069/0.255 1.921/4.744 555.079/305.316Sol.264 0.084/0.276 2.251/5.318 694.825/346.232Sol.348 0.083/0.272 2.174/5.110 623.970/329.959

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 15

    show negative correlation between the ANV metric anddatabase size. As the number of null values increases,size decreases. This observation for the ANV metric is indirect contrast to what is predicted. One possible reasonis that when the ANV metric increases, the numberof tables for the database solution under considerationdecreases. Assuming that the database system efficientlystores null values, the database size would reduce.

    To further evaluate predictive accuracy of static met-rics, we consider the case in which designers use eachstatic metric as a two-class classifier. We, thus, measureprecision, recall and F-measure as follows:Precision is the percentage of those alternatives pre-dicted by a given metric as more preferable in termsof a given quality attribute that were also classified asmore preferable by the actual analysis: TPTP+FPRecall is the percentage of alternatives classified morepreferable by the actual analysis that were also predictedas more preferable by the a given metric: TPTP+FNF-measure is the harmonic mean of precision and recall:2∗Precision∗RecallPrecision+Recall

    where TP (true positive), FP (false positive), and FN(false negative) represent the number of solution alter-natives that are truly predicted as preferable, falselypredicted as preferable, and missed, respectively.

    While the static metrics output predictions of qualitycharacteristics as natural numbers, our actual dynamicanalysis of query execution performance and requiredstorage space are in terms of seconds and bytes, respec-tively. To classify an alternative as preferable, we thus usemedian for each set of result values as a threshold. Wemeasure evaluation metrics for each subject system withrespect to three data sets.

    Table 6 summarizes the results of our experimentsto evaluate the accuracy of static metric predictors astwo-class classifiers. The average precision, recall andF-measure are depicted in Figure 9. The results showthe accuracy of the TATI and NCT metrics in classifying

    TABLE 5: Correlation coefficients between the relativeorder of solution alternatives predicted by static metricsand those observed from actual runtime measures.

    TATI NCT ANV

    DeciderDataset1 0.93 0.93 -0.98Dataset2 0.96 0.96 -0.92Dataset3 0.95 0.92 -0.91

    E-commerceDataset1 0.97 0.96 -0.95Dataset2 0.98 0.95 -0.95Dataset3 0.97 0.94 -0.95

    CSOSDataset1 0.83 0.76 -0.97Dataset2 0.57 0.62 -0.79Dataset3 0.58 0.74 -0.55

    WordPressDataset1 0.95 0.97 -0.25Dataset2 0.96 0.97 -0.31Dataset3 0.96 0.97 -0.26

    MoodleDataset1 0.95 0.90 -0.98Dataset2 0.99 0.96 -0.95Dataset3 0.99 0.92 -0.98

    Cust-order(ext)Dataset1 0.89 0.92 -0.51Dataset2 0.74 0.53 -0.44Dataset3 0.66 0.82 -0.56

    TABLE 6: Experimental results of evaluating OR metricsas two-class classifiers.

    TATI NCT ANV

    Prec

    isio

    n

    Rec

    all

    F-m

    easu

    re

    Prec

    isio

    n

    Rec

    all

    F-m

    easu

    re

    Prec

    isio

    n

    Rec

    all

    F-m

    easu

    re

    DeciderDS1 1 0.75 0.86 0.83 0.83 0.83 0 0 0DS2 0.83 0.83 0.83 1 1 1 0.17 0.17 0.17DS3 1 1 1 0.83 0.83 0.83 0.17 0.17 0.17

    E-commerceDS1 1 1 1 1 1 1 0 0 0DS2 1 1 1 1 1 1 0 0 0DS3 0.9 0.9 0.9 1 1 1 0 0 0

    CSOSDS1 0.75 0.82 0.78 0.75 0.82 0.78 0.36 0.33 0.35DS2 0.83 1 0.91 0.67 0.80 0.73 0.36 0.33 0.35DS3 0.83 1 0.91 0.83 1 0.91 0.36 0.33 0.35

    WordPressDS1 1 1 1 1 1 1 0.14 0.14 0.14DS2 1 1 1 1 1 1 0.14 0.14 0.14DS3 1 1 1 1 1 1 0.25 0.29 0.27

    MoodleDS1 1 0.75 0.86 0.88 1 0.93 0.17 0.17 0.17DS2 1 1 1 0.75 1 0.86 0.17 0.17 0.17DS3 1 1 1 0.75 1 0.86 0 0 0

    Cust-order DS1 1 1 1 1 1 1 0.43 0.60 0.50

    (ext) DS2 0.83 1 0.91 0.67 0.80 0.73 0.57 0.67 0.62DS3 0.83 1 0.91 0.83 1 0.91 0.43 0.60 0.50

    Fig. 9: Bar plot of the average precision, recall andF-measure for considering static metrics as two-classclassifiers.

    implementation alternatives in terms of their run-timeperformance. The average precision and recall for allfour experiments are about 90%, showing a low rateof both false positives and false negatives. The ANVmetric, however, achieves an average under 30% in allevaluation metrics.

    The experimental data thus suggests that, under thegenerated abstract loads, the relative order of implemen-tation alternatives predicted by static metrics of TATIand NCT is indicative of their comparative preferencein actual runtime performance, but this is not the casefor ANV as a static predictor of storage space.

    8.4 Results for Hypothesis H2 (Magnitudes)To address the second hypothesis—relative magnitude ofstatic predictions matter—we employ a coefficient of de-termination denoted R2, as a metric for how well actualoutcomes are predicted by the static metrics. Figure 10plots the results. For brevity, only results from Dataset 3are presented; other data sets give similar results.

    The performance of the predictors varies widely acrosssystems and predictors. TATI and NCT are predictive

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 16

    of performance for four of the systems, i.e., the E-commerce, Decider, WordPress, and Moodle systems,but relatively poor predictors for CSOS. TATI performspoorly in the customer-order-extended data. ANV pre-dicts size in the E-commerce and Moodle experiments,and moderately in the Decider and WordPress data, butinconsistently and weakly in the CSOS data and not atall in the customer-order-extended set.

    We interpret this data as suggesting that the rela-tive magnitudes of static metrics for various solutionalternatives are not reliably indicative of the relativemagnitudes of actual performance, and that ANV is apoor indicator of the storage space. One is advised to usesuch static metrics with caution. While TATI and NCTmetrics predict the relative order of solution alternativeswith high confidence, the difference in predicted valuesof two alternatives is not a good indicator of their actualrun-time difference.

    8.5 Results for H3 (Small vs. Large Loads)To address the third hypothesis—that small, formallysynthesized loads predict the outcomes of much largerloads—we employ the Pearson product-moment corre-lation statistic. Pearson measures the degree of lineardependence between two variables, not necessarily or-dinal, as opposed to the Spearman test. A correlationof 1 represents perfect correlation, and 0, no meaningfulcorrelation.

    We summarize correlation coefficients between experi-mental results obtained from smaller data sets of 1 and 2and that of the large Dataset 3 in Figure 11. The averagePearson correlation coefficient between Dataset3 and thefirst and second data sets are 88% and 94%, respectively.These data lend support to the proposition that smaller-scale test sets produced by specification-driven synthesiscan provide valid predictions of performance underlarger, more realistic loads.

    9 DISCUSSION AND LIMITATIONSThe overarching problem this work addresses is inter-esting and important: the need for improved scienceand technologies to support decision making in complexand poorly understood tradeoff spaces, particularly in-volving tradeoffs among non-functional properties, alsosometimes called ilities. We need languages in which tospecify design spaces, techniques for synthesizing andanalyzing design spaces, mechanisms for mapping staticand dynamic analysis functions across design spaces,techniques for validating such metrics, and tools thatenable engineers to use the science to improve realengineering practice. We also need measures for ilitiesthat are important but hard to measure today: for evolv-ability, some dependability properties, affordability ofconstruction, and more.

    Trademaker takes an important step towards thisoverarching objective in the context of object-relationaldatabase mappings, but we envision the ideas set forth

    in this research to find a broader application in othercomputing domains as well. It suggests the possibilityof useful formal languages for specifying design spacesin support of formal synthesis of both designs and com-parative analysis loads. We showed that specializationof common loads is enabled by access to abstractionfunctions from concrete to abstract designs, which can beembedded in the results of design synthesis. Concretiza-tion functions proved useful not only for scale-limited,formally synthesized loads, but for concretizing abstractloads produced by other means.

    Our experimental analysis using Trademaker to testthe validity of static predictors of database performancebased on published but not validated metrics indicatesthat two of the metrics appear to produce meaningfulsignals, while the third appears not useful. The dataalso indicate a need for caution in relying on the staticmetrics. Their predictive accuracy, even in the “good”cases, varied across application models. That said, wecan now provide automated dynamic analysis as a fall-back. We are integrating support for invoking such au-tomated analysis into our Trademaker tool. Trademakeritself has real potential utility for object-relational map-ping and partial application synthesis; but its greatersignificance is as a demonstration of our research resultsand a testbed for further research on formal tradespacemodeling and analysis.

    There are of course limitations in our approach and inthis work. We mention those most relevant to a properevaluation of this effort. First, the static metrics weevaluated sum the values of published metrics over theelements of each design alternative. We thus extendedthe original metrics and our statistical results shouldtechnically be read as pertaining to these extensions ofthe original measures.

    Second, while our synthesis mechanisms are imple-mented and working, our infrastructure for runningsynthesized concrete loads against synthesized designsstill relies on some manual processing. Our statisticaldata were thus derived by dynamic analyses of cer-tain subsets of our synthesized designs. We selected thesubsets deemed Pareto-optimal by the static metrics. Asour infrastructure matures, we will conduct whole-spacedynamic analyses, which we expect to produce resultsconsistent with the basic result presented here. We areon a path to support automated whole-space dynamicanalysis through Trademaker. The work reported inthis paper did nevertheless involve the synthesis anddynamic analysis of over 300 database alternatives.

    Third, our experiments to date tested our hypothesesfor “random” loads of varying sizes. Real applicationswill generally produce non-random loads. Whether thestatic metrics we tested are predictive for large, realapplications remains unclear. On the other hand, weoffer dynamic analysis at scale as an alternative to staticmetrics. The proposed Trademaker tool suite furthersupports reasonable extension for new types of test loadgenerators. In fact, separating abstract load synthesis

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 17

    Fig. 10: Correlation between static metrics and actual run-time measure; columns represent scatter plots of observedvalues across systems versus predicted values by TATI, NCT and ANV metrics from left to right, respectively; R2

    correlation coefficient is shown at the bottom of each plot.

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 18

    Fig. 11: Summary of Pearson correlation coefficientsbetween experimental results obtained from smaller datasets and that of the large Dataset3.

    from load concretization into variant designs, as pro-posed in the Trademaker architecture (cf. Section 3),enables extension/revision of each, independent of theother, thus supports the necessary extensions to enrichthe tradeoff analysis environment. We envision a futurein which some systems run many design variants inparallel, perhaps with small but representative loadsabstracted from real loads on live systems, to detectconditions in which dynamic switching to new imple-mentation strategies should be considered.

    Finally, there is the issue of scalability. Using Alloyas a constraint solver entails scalability constraints. Wecan handle object models with tens of classes. Industrialdatabases often involve thousands of classes. It is un-likely that our current implementation technology willwork at that scale. For now, it does have real potentialas an aid to smaller-scale system development. That wecan present an object model for a realistic web service,synthesize a broad space of ORM strategies, select onebased on tradeoff analysis, automatically obtain an SQL-database setup script, provide it Java EE, and have muchof an enterprise-type application up and running withlittle effort is exciting, even if it does not address (yet)the most demanding needs of industry.

    10 RELATED WORKWe can identify in the literature a large body of researchrelated to ours. Here, we provide an overview of themost notable research and examine them in the light ofour research.

    Formal derivation of database implementations.A number of researchers have proposed formal ap-proaches for deriving database-centric implementationsfrom high-level specifications [19], [33]. Alchemy refinesAlloy specifications into PLT Scheme implementationswith a special focus on persistent databases [33]. Alongthe same line, Cunha and Pacheco proposed an approachthat translates a subset of Alloy into the correspondingrelational database operations [19]. Both Alchemy andCunha and Pacheco’s approach refine the specification

    into a single implementation, whereas Trademaker gen-erates spaces of possible database design alternatives.While these research efforts share with ours the emphasison using formal methods, our work differs fundamen-tally in its emphasis on the generation of spaces ofimplementation alternatives, not just point solutions.

    Object-relational mapping. A large body of workhas focused on object-relational mapping approaches tothe object-relational impedance mismatch problem [18],[29], [31], [35]. Philippi [35] categorized the mappingstrategies in a set of pre-defined quality trade-off levels,which are used to develop a model driven approachfor the generation of OR mappings. This work similarto many other work we studied derives a single de-sign solution from input specifications. Moreover, theydid not apply static metrics, nor dynamic analysis tomeasure the effectiveness of design alternatives. Ourtechnique is inspired in part by the work of Cabibbo andCarosi [18], discussed more complex mapping strategiesfor inheritance hierarchies, in which various strategiescan be applied independently to parts of a multi-levelhierarchy. Our approach is novel in having formalizedORM strategies previously informally described in someof these research efforts, thereby enabling automaticgeneration of OR mappings for each application objectmodel.

    Drago et al. [21] considered OR mapping strategiesas a variation points in their work on feedback provi-sioning. They extended the QVT-relations language withannotations for describing design variation points, andprovided a feedback-driven backtracking capability toenable engineers to explore the design space. While thiswork is concerned with the performance implications ofchoices of per-inheritance-hierarchy OR mapping strate-gies, it does not attack the problem that we address,the automation of dynamic analysis through synthesisof design spaces and fair loads for comparative dynamicanalysis.

    The other relevant thrust of research has focusedon mapping UML models enriched with OCL invari-ants into relational structures and constraints. Amongothers, Heidenreich et al. [27] developed a model-driven framework to map object models representedin UML/OCL into declarative query languages, suchas SQL and XQuery. While Heidenreich et al.’s ap-proach concentrates on mapping OCL invariants intoan implementation-level language to enforce semanticaldata integrity at the implementation level, Trademakerautomatically generates database schemas mainly basedon structural constraints. Badawy and Richta [6] pro-vided some rules guiding derivation of declarative con-straints and triggers from OCL specifications. These tworesearch work are complementary. Extending the sameline, Al-Jumaily et al. [26] developed a model-driventool transforming the OCL constraints into SQL triggers.Demuth et al. [20] also discussed a number of differentapproaches to implement OCL-to-SQL mapping, anddeveloped a tool that transforms each OCL invariant

  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 19

    into a separate SQL view definition. Different fromthese research efforts transforming an object model toa single counterpart in relational structures, Trademakergenerates tradeoff spaces of object-relational mappingswith focus on structural mapping alternatives, ratherthan transformation of integrity constraints.