Manufacturing & Service Operations Management · ISSN 1523-4614 (print) Š ISSN 1526-5498 (online) ...

This article was downloaded by: [175.176.173.30] On: 28 April 2015, At: 20:14Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Manufacturing & Service Operations Management

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

Linking Cyclicality and Product QualityManuel E. Sosa, Jürgen Mihm, Tyson R. Browning,

To cite this article:Manuel E. Sosa, Jürgen Mihm, Tyson R. Browning, (2013) Linking Cyclicality and Product Quality. Manufacturing & ServiceOperations Management 15(3):473-491. http://dx.doi.org/10.1287/msom.2013.0432

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2013, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

http://pubsonline.informs.org

http://dx.doi.org/10.1287/msom.2013.0432

http://pubsonline.informs.org/page/terms-and-conditions

http://www.informs.org

MANUFACTURING & SERVICEOPERATIONS MANAGEMENT

Vol. 15, No. 3, Summer 2013, pp. 473–491ISSN 1523-4614 (print) � ISSN 1526-5498 (online) http://dx.doi.org/10.1287/msom.2013.0432

© 2013 INFORMS

Linking Cyclicality and Product Quality

Manuel E. SosaTechnology and Operations Management, INSEAD, Singapore 138676,

[email protected]

Jürgen MihmTechnology and Operations Management, INSEAD, 77300 Fontainebleau, France,

[email protected]

Tyson R. BrowningDepartment of Information Systems and Supply Chain Management, Neeley School of Business,

Texas Christian University, Fort Worth, Texas 76129, [email protected]

This paper examines the impact of architectural decisions on the level of defects in a product. We viewproducts as collections of components linked together to work as an integrated whole. Previous work has

established modularity (how decoupled a component is from other product components) as a critical determinantof defects, and we confirm its importance. Yet our study also provides empirical evidence for a relationshipbetween product quality and cyclicality (the extent to which a component depends on itself via other productcomponents). We find cyclicality to be a determinant of quality that is distinct from, and no less importantthan, modularity. Extending this main result, we show how the cyclicality–quality relationship is affected bythe centrality of a component in a cycle and the distribution of a cycle across product modules. These findings,which are based on an analysis of open source software development projects, have implications for the studyand design of complex systems.

Key words : product architecture; cycles; modularity; iterative problem solving; defectsHistory : Received: February 27, 2012; accepted: January 1, 2013. Published online in Articles in Advance

April 11, 2013.

1. IntroductionThis paper studies the relationship between the deci-sions that establish a product’s architecture andtheir consequences for product quality. Studyingthis architecture–quality relationship is vital because,despite the wealth of research on product architec-ture, we still do not understand many aspects of howarchitectural decisions actually affect product qual-ity (Henderson and Clark 1990, Ulrich 1995, Baldwinand Clark 2000, Sosa et al. 2004, Ramachandran andKrishnan 2008). This paper focuses on one particu-lar architectural property, cyclicality, whereby compo-nents depend on themselves via other components.Our key theoretical and empirical objective is to studywhether and how cyclicality is related to the levels ofdefects in a product.

In addressing our research goal, we first con-firm previous results suggesting that modularity, thearchitectural property most salient in the literature,prevents defects in a product. Then, as our key con-tribution, we establish empirically that cyclicality is adistinct architectural determinant of the level of prod-uct defects and that its effect on product quality is assizeable as the effect of modularity. Finally, we deepenour understanding of the cyclicality–quality relation-

ship by examining how product defects are related todifferent facets of cyclicality, such as the centrality ofeach component in a cycle and the distribution of acycle across product modules.

In examining this architecture–quality relationship,we consider a product (hardware or software) as aweb of interconnected components (as in Clarksonet al. 2004, MacCormack et al. 2006, Braha andBar-Yam 2007, Sosa et al. 2007b, Gokpinar et al. 2010).Previous research on network-based architecture hasfocused on (component) modularity, the extent towhich a component is decoupled (or independent)from other components in the product, as the mainarchitectural feature of interest (Card and Agresti1988, Clarkson et al. 2004, MacCormack et al. 2006,Sosa et al. 2007b). This research has empirically estab-lished that modularity is associated with the design ofless defective products (Card and Agresti 1988, Briandet al. 1999, Aggarwal et al. 2007, Burrows et al. 2010).

However, there are good reasons to believe thatfocusing on modularity, as the main architecturaldeterminant of quality, is too simplistic. Recognizingcyclicality as another fundamental architectural prop-erty is important for both conceptual and empiricalreasons. Conceptually, component cycles inhibit the

473

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Sosa, Mihm, and Browning: Linking Cyclicality and Product Quality474 Manufacturing & Service Operations Management 15(3), pp. 473–491, © 2013 INFORMS

proper decomposition of design problems (becausethere is no self-evident sequence in which to design,build, and test components involved in cycles) andtherefore require iterative problem solving, whichresults in cognitive and organizational challenges(Smith and Eppinger 1997a, b; Mihm et al. 2003).In contrast, if there are no cycles, then problems canbe properly decomposed and be solved in a serialmanner, one subproblem at a time (Eppinger et al.1994). Empirically, component modularity and cycli-cality can co-occur and be correlated, making it diffi-cult to determine which factor is the principal driverof the observed effects.

To develop a thorough understanding of thecyclicality–quality relationship, we investigate howvarious aspects of cyclicality relate to product qual-ity. Is the extent to which a component is involvedin cyclical dependency patterns a significant determi-nant of defects? Is the effect of cyclicality on qualityas substantial as the effect of modularity? Are all com-ponents in a cycle equally prone to defects? In anysystem of even moderate complexity, components aretypically organized into modules (Simon 1996). Doesthe organization of components into modules affectthe relationship between cyclicality and quality? Byaddressing these questions, we aim to close an impor-tant gap between the information systems literatureand the operations management literature. The for-mer has not considered cyclicality to be an importantdeterminant of defects. And even though the latterhas developed methods to uncover cyclicality in com-plex development settings, its research has not yetlinked cyclicality to the levels of product defects.

Previous research in information systems and com-puter science has investigated the determinants ofdefects in software products (e.g., Card and Agresti1988, Chidamber and Kemerer 1994, Briand et al.1999, Aggarwal et al. 2007). It has classified suchdeterminants into two broad categories: intracompo-nent and intercomponent. (Intuitively, a component ina software product is a cohesive collection of sourcecode.) Intracomponent determinants are concernedwith aspects characterizing the individual component,whereas intercomponent determinants are concernedwith components’ interactions. With respect to themost salient intracomponent determinants of qual-ity, previous findings suggest that larger and moreinternally complex components are likely to have ahigher number of defects (McCabe 1976, Henry andSelig 1990, Kan 1995). As for intercomponent deter-minants, previous research on software architecturehas focused on the modularity of a component as themain feature of interest (e.g., MacCormack et al. 2006,2008). If a component is more modular—that is, if acomponent depends on few other components—thenit is likely to exhibit a lower level of defects (Card and

Agresti 1988, Card and Glass 1990, Chidamber andKemerer 1994, Kan 1995, Briand et al. 1999, Aggarwalet al. 2007). Despite the multitude of determinantsthat research in information systems has explored,it has yet to recognize the importance of architecturalcyclicality as a determinant of defect proneness.

The operations management literature has alsoexplored different aspects of modularity (Ulrich 1995,Baldwin and Clark 2000). Yet, beyond modularity,researchers in this area have developed representa-tions and methods for identifying cyclical structureswhen modeling the new product development pro-cess as a collection of networked tasks (Steward 1981,Eppinger et al. 1994, Mihm et al. 2003). This streamof research has led to a critical insight: tasks that areinterrelated in a cyclical manner tend to require moremanagerial attention (Smith and Eppinger 1997a, b)because cyclical structures entail design iterations thatcan affect the time required to complete a develop-ment effort (Mihm et al. 2003, Braha and Bar-Yam2007). However, this literature has taken a modelingapproach to formulate its predictions; therefore, it hasestablished no empirical links between cyclicality andoutcome measures (such as product quality) and hasnot been able to explore the cyclicality construct inways that would build a more nuanced understand-ing of how its different facets affect product quality.

Given how difficult it is to capture a comprehen-sive longitudinal data set that contains product archi-tecture, quality, and resource attributes, it is hardlysurprising that the cyclicality–quality relationship hasescaped rigorous empirical examination. We over-come this challenge by taking advantage of theemergence of open source software to build a sub-stantial data set that includes 28,394 observations of7,103 product components in 111 releases of 17 Java-based applications developed by the Apache SoftwareFoundation. We focus on open source software appli-cations for several reasons: they are complex systemsin which cyclicality patterns are present (yet difficultto identify); they exhibit relatively fast rates of change(much like fruit flies in studies of biological evolu-tion); and their source code constitutes an accessible,efficient, reliable, and standardized means of captur-ing all the architectural features relevant to our studyof product architecture. Moreover, open source devel-opment settings typically feature centralized systemsused for tracking and managing of quality issues.

2. Theory and HypothesesWe begin this section by taking a network view ofproduct architecture to (a) establish, in a calibra-tion hypothesis, component modularity as the moststudied characteristic of product architecture, and(b) define cyclicality and hypothesize about its con-sequences for product quality. Yet, the network view,

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Sosa, Mihm, and Browning: Linking Cyclicality and Product QualityManufacturing & Service Operations Management 15(3), pp. 473–491, © 2013 INFORMS 475

with its focus on how product components connectwith each other, ignores the hierarchical organizationof components into modules. Hence, to gain a morecomprehensive view of the effects of cyclicality onquality, we close the section by examining how aproduct’s hierarchical module structure influences thecyclicality–quality relationship.

A network view of a product’s architecture con-siders products as interlinked components or sub-systems. This view emphasizes the role played bydependencies among product components (Eppingerand Browning 2012). A dependency is established byany direct relationship between two product compo-nents (Sosa et al. 2007b): spatial dependencies are theresult of two components requiring a specific spa-tial configuration, structural dependencies arise whenthere is a required transmission of mechanical loadsbetween components, energy dependencies emergedue to energy flow requirements between compo-nents, material dependencies capture the flow of mate-rial (e.g., water, oil, steam, air) between components,and, finally, information dependencies map how dif-ferent components interact to process information.Mapping the totality of dependency patterns for acomplex system such as an aircraft engine requirescapturing multiple types (Sosa et al. 2007b). However,software products are particular in that their com-ponents are linked exclusively by information flows.For instance, MacCormack et al. (2006) captured thearchitecture of large, open source software systems bymapping how their components connected with eachother via function calls.

Different systems vary significantly in their depen-dency patterns, and these patterns affect the level ofdifficulty experienced by development organizationswhen designing, building, and testing products. It isonly natural that such difficulties should affect prod-uct quality. Figure 1 depicts the three basic patternsof dependencies that can be observed within a sys-tem (Thompson 1967, Eppinger et al. 1994). In Fig-ure 1(a), the three components are independent andthus have no effect on each other (barring resourceconstraints); hence, each component can be designedindependently. With respect to quality, this is a triv-ial case. Because there is no dependency among thecomponents, neither are there any network-based dif-ferences among them. Therefore, any variation in theirnumber of defects must be due to their inherent char-acteristics (e.g., their internal complexity) and not tonetwork-related aspects. We argue next that connec-tivity per se has a significant effect on product quality,with specific mechanisms depending on the type orpattern of connectivity.

Figure 1 Three Dependency Structure Patterns forComponents A, B, and C

A

C

B

A

C

B

A

C

B

(a) Independence(b) Serial

dependence(c) Cyclical

dependence

2.1. Effects of Component Modularity:A Calibration Hypothesis

In Figure 1(b), components are connected in aserial manner (i.e., component C provides input tocomponents A and B, and B provides input to A).This pattern of dependency is consistent with a fun-damental characteristic of directed graphs, reachability:the property of being able to “walk” from one nodeto another via a set of directed “edges” (Harary 1969,Gould 1988). From a managerial viewpoint, reacha-bility implies that components should be designed ina serial order (i.e., C, B, and then A in Figure 1(b)).Components A and C have opposite reachability char-acteristics: A is reached by all other components,whereas C reaches all other components. More gen-erally, a component in any directed design chain islikely to be positioned in either the so-called upstreamor downstream end of the chain. Previous work inboth the product architecture and software devel-opment literatures has shown that such positioningaffects the component’s defect proneness.

First we consider downstream components. Anargument well grounded on the notion of designchange propagation has been established and empir-ically verified for why downstream components areparticularly error prone. During a typical develop-ment process, the design of each component changesrepeatedly. Such design changes may propagate fromupstream components (here, component C) to down-stream components (component A) via their depen-dencies (Clarkson et al. 2004, Giffin et al. 2009).Hence, downstream components of a design chainhave been shown, for both hardware and software, tobe at increased risk of induced design changes (Cardand Agresti 1988, Krishnan et al. 1997, Terwieschet al. 2002). Design changes in downstream compo-nents, triggered to adapt to design changes made inupstream components, are likely to destabilize thedownstream components’ designs and hence increasethe risk of quality issues (Card and Agresti 1988,Terwiesch and Loch 1999, Burrows et al. 2010).

The argument for downstream components beingmore defect prone is fully in line with the expected

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


benefits of component modularity. A downstream com-ponent with low reachability is largely decoupledfrom other components in the system and thus enjoysa high level of component modularity, which by thepreceding arguments should have a positive effect onits quality (Ulrich 1995, Baldwin and Clark 2000, Sosaet al. 2007b, MacCormack et al. 2008).

In contrast to their downstream counterparts, a con-clusive theory for the error proneness of upstreamcomponents is lacking. Upstream components do notpose the challenge of dealing with design changesinduced by the fate of other components (Braha andBar-Yam 2007). Not surprisingly, then, several empiri-cal studies in the information systems literature havefound no significant relationship between upstreamcomponents and defect proneness (Kan 1995, Briandet al. 1999, Aggarwal et al. 2007).

Thus, consistent with past findings in the litera-ture, we posit the following calibration hypothesisthat examines the relationship between downstreamcomponent modularity and defect levels; we leavethe effect to upstream component modularity as anempirical question.

Hypothesis 1 (H1) (Downstream ComponentModularity Helps). The number of defects exhibited bya focal component is positively associated with the extentto which such a component depends upon other productcomponents.

2.2. The Effect of Component CyclicalityFigure 1(c) illustrates a second fundamental character-istic of directed graphs and the network-based viewof architectures: cyclicality (Harary 1969, Gould 1988).1

In this panel, all of the components are involvedin a cyclical dependence. Component A depends oninput from B, which depends on input from C, whichdepends on input from A. We use the term in-cyclecomponent for any component that is part of a cycle,where “cycle” is the set of components for whicha dependency path exists from each component to

1 There are graph-theoretic reasons for the prominence of reachabil-ity and cyclicality within a network view of product architectures.A product architecture can be represented as a graph—that is, aset of nodes and edges. All graph properties can be defined withrespect to walks (where a “walk 0 0 0 is a finite alternating sequenceof [nodes] and edges that begins with the [node] x and ends withthe [node] y and in which each edge in the sequence joins the[node] that precedes it in the sequence to the [node] that follows itin the sequence” (Gould 1988, p. 8)). In practice, most graph prop-erties are defined in terms of walks. There are two fundamentaltypes of walks, closed and open (e.g., Gould 1988, p. 9). Open walksdetermine whether (and how) the starting node reaches the endnode, so they are well suited to characterizing different reachabilityaspects. Closed walks define cycles. Graphs may thus be categorizedas either cyclical or acyclical (Wasserman and Faust 1994) depend-ing on whether they do or do not, respectively, contain at least oneclosed walk.

every other. (A component that is not part of sucha cycle is referred to as a noncycle component.) Thus,an in-cycle component depends on itself via othercomponents in the cycle, and component cyclicality isthe extent to which a component depends on itself viaother components—i.e., the number of components inthe cycle.

Developing in-cycle components in complex sys-tems is especially challenging because (in contrast tononcycle components) they form problem structuresthat require (i) iterative problem solving and (ii) addi-tional coordination efforts.

Because cyclical dependency patterns imply no nat-ural sequence in which components can be concep-tualized and designed, developers must address thedevelopment of in-cycle components in an iterativefashion. Iterative problem solving can occur in eithersequential or parallel fashion (or any combination ofthem). Developers may iterate in a sequential fashionby myopically considering and redesigning in-cyclecomponents one at a time until the entire cycle designconverges to a commonly accepted solution (Smithand Eppinger 1997b, Mihm et al. 2003). Alterna-tively, developers may iterate in parallel by consid-ering and redesigning all in-cycle components at onceuntil an overarching solution for the entire systemis achieved (Smith and Eppinger 1997a, Mihm et al.2003). Either approach, however, is likely to increasethe error-proneness of in-cycle components relative tononcycle components. Sequential iteration is likely tosuffer from numerous repeated component redesignsdue to feedback signals coming from design revi-sions of other components in the cycle (Mihm et al.2003). Such repeated redesigns (which are absent orless likely in noncycle components) increase the riskof errors during development. In parallel iteration,all in-cycle components are designed and redesignedat once, which may limit the amount of revisionsnecessary. However, compared to designing eachcomponent one at a time, it entails considerable cog-nitive effort because the amount of information thatneeds to be concurrently dealt with increases drasti-cally, which again makes in-cycle components suscep-tible to errors (Miller 1956). In sum, because in-cyclecomponents are part of difficult-to-decompose designproblems that are hard to solve, they are more likelyto exhibit more defects than noncycle components,which form part of linear or independent prob-lem structures that do not require iterative problemsolving.

In addition to iterative problem solving, in-cyclecomponents are likely to require more coordinationneeds than noncycle components. During the devel-opment of complex hardware or software systems,different components are typically developed by dif-ferent designers or different design teams (Mockus

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


et al. 2000, Sosa et al. 2004, Cataldo et al. 2006,MacCormack et al. 2006). Hence, actors designingcomponents involved in cyclical problem structuresare likely to face higher coordination needs thanthose designing components involved in noncycli-cal problem structures because of either the repeateddesign–build–test iterations to which such in-cyclecomponents are likely to be subjected or the concur-rency of design efforts. Hence, such components aremore likely to suffer from coordination pitfalls andthus have more defects (Gokpinar et al. 2010).

Our arguments so far have focused on thedichotomy between in-cycle and noncycle compo-nents. However, larger cycles are expected to be moreerror prone because both iterative problem solvingand coordination needs increase with the number ofcomponents involved in a cycle. Iterative problem-solving approaches that must address a larger numberof components increase either the number of designrevisions before reaching convergence or the numberof components that need to be considered concur-rently (Mihm et al. 2003). Either way, larger cyclesmake iterative problem solving more problematic.Arguments about coordination costs associated withcycles lead to the same conclusion. The greater thenumber of components involved in a cycle, the moreelements there are that at risk of a coordination break-down. Hence, we formulate the following hypothesis.

Hypothesis 2 (H2) (Component CyclicalityHurts). The number of defects exhibited by a componentincreases with component cyclicality.

2.3. The Effect of Cyclicality CentralityAlthough component cyclicality is constant for allcomponents involved in a specific cycle, such in-cyclecomponents may differ in their network positionswithin that cycle, which may have further conse-quences for the number of defects affecting each ofthose in-cycle components. Even though all compo-nents in a cycle are interconnected, some componentsmay occupy more central positions in the cycle’s net-work structure than do others.

Centrality, a concept developed and extensivelystudied in the context of social networks (Freeman1979, Wasserman and Faust 1994), refers to the identi-fication of the most important or prominent nodes inthe network. Central nodes exhibit the properties ofa center node in a star-shaped graph (Freeman 1979).They exhibit a maximum number of direct connec-tions to, a minimum distance to, and a maximumlikelihood of being in between all other nodes in thegraph. Because we study the notion of cyclicality, welimit the boundaries of the graph to the componentsforming a cycle and the dependencies among them.

Given the above properties of central componentsin a cycle, they are more likely to be involved in a

sequence of sequential iterative problem solving thanare peripheral cycle components. Hence, central in-cycle components are at higher risk of experiencingthe unanticipated nonlinear effects that characterizesequential iterative problem solving. To see this, con-sider all paths that link back to a focal compo-nent in a cycle (touching another component in thecycle at most once). For any in-cycle component,at least one such feedback path must exist; if there ismore than one path, then the feedback along all ofthose paths needs to be incorporated into the compo-nent design before the design iteration can converge.In other words, more paths means more feedback andthus a greater chance that the iteration either doesnot converge or produces unwanted results (Mihmet al. 2003). Components that are more central in thecycle are touched by many such paths (Wassermanand Faust 1994), which leads to our next hypothesis.

Hypothesis 3 (H3) (Increasing Cyclical Cen-trality Hurts). The number of defects exhibited by acomponent is increasing in its centrality within the cycle.

2.4. The Effect of GroupingComponents Into Modules

Complex engineered systems are typically conceivedin terms of a hierarchy of modules and submodules—that is, “in a boxes-within-boxes form” (Simon 1996,p. 128). It is, therefore, reasonable to examine how thedecisions to group components into modules influ-ence the relationship between cyclicality and quality.An example of this situation is illustrated by Figure 2,which gives the hierarchical view of an architectureby grouping the membership of the components inFigure 1(c) as well as three additional components(A1, B1, C1) into three distinct, two-component mod-ules: MA, MB, and MC . (The modules are shaded forvisual distinction.)

Although modules can be formed in many differ-ent ways, the dominant modularization strategy is to

Figure 2 Six-Component System Grouped Into Three Modules

MB

MC

MA

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


(i) group functionally similar and highly interdepen-dent components into modules and (ii) minimize thedependence of modules on each other (Parnas 1972,Simon 1996, Baldwin and Clark 2000). Thus, design-ers typically concentrate connections among com-ponents within modules while limiting connectionsacross module boundaries (Baldwin and Clark 2000);this is known as the principle of “near decomposabil-ity” (Simon 1996). In software development terms, fol-lowing this strategy maximizes module cohesion andminimizes coupling across modules (Stevens et al.1974, Chidamber and Kemerer 1994, Briand et al.1999). This grouping principle facilitates problemsolving by decoupling modules and thereby allowingthe design–build–test process for each module to “becarried out with some degree of independence of thedesign of others” (Simon 1996, p. 128).

We expect that a cycle (such as the one in Figure 2)whose components are distributed across multiplemodules will be especially prone to defects. Suchdefect proneness is a consequence of increased coor-dination requirements and decreased visibility, twofactors that are more predominant across than withinmodule boundaries. Modules in the product domainusually mirror developer groups in the organizationdomain (Henderson and Clark 1990, Sosa et al. 2004,MacCormack et al. 2012). Such mirroring is even truefor open source software development projects, whichoften do not exhibit formal organizational group-ings. Developers working on different modules arelikely to be cognitively more “distant” than develop-ers developing components that are functionally sim-ilar because “developers tend to work on problemsthat are identified with areas of the code they aremost familiar. Some work on the product’s core ser-vices, while others work on particular features thatthey developed” (Mockus et al. 2000, p. 266). Special-ization largely dictates how developers self-assign tomodules. The resulting organizational distance thatis fostered by the module structure implies a lackof familiarity of the interdependent actors involved;coordination breakdowns result and the risk of com-ponent defects increases (Staats 2012).

In addition to requiring coordinated communica-tion among developers, dependencies across moduleboundaries run the risk of remaining unidentified(Allen 1977, Sosa et al. 2004) because modules are nor-mally considered to be self-contained entities. Hence,cycles whose components are distributed across mul-tiple modules are less likely to be identified as in-cyclecomponents. In that case, efforts to plan and managethe iterative problem solving required by such cycleswould be compromised (Pich et al. 2002), which inturn would lead to higher levels of defects. All theseconsiderations lead to our final hypothesis.

Hypothesis 4 (H4) (Modules Failing to Encap-sulate Cycles Hurt). The number of defects exhibitedby an in-cycle component is increasing in the number ofmodules involved in its cycle.

3. Empirical Study: Open SourceSoftware Development

To test our hypotheses, we studied open source, Java-based software applications from the Apache Soft-ware Foundation (http://www.apache.org/), whichis one of the largest, most established, best studiedopen source communities of developers and userswho share values and a standard development pro-cess (Roberts et al. 2006). We chose to focus on Javabecause it is one of the most widely used object-oriented programming languages and because itssource code captures component dependencies andmodule constituents in an explicit and structured way.

3.1. DataAs our initial database, we identified 69 Java-baseddevelopment projects at the Apache Software Foun-dation in mid-2008. An effective examination of thecausal relationship between architectural characteris-tics and quality requires a longitudinal data set, sowe focused on the 37 applications for which succes-sive major releases were available. From these 37, weselected those for which we could access all neces-sary data sources: bug-tracking systems (to determinethe number of bugs), precompiled (“prebuilt”) code inso-called Java ARchive (JAR) files (to codify productarchitecture features), original source code (to mea-sure such product-related attributes as source lines ofcode (SLOC)), release notes (to determine product-related innovative features), and version managementtools (to associate bugs with the components theyaffected). These filters left us with a set of 28,394observations of 7,103 product components across 111releases (versions) of 17 applications—an average of256 components per version.

We compiled an integrated data set from those fivesources. First, we examined Apache’s Bugzilla andJira bug-tracking systems to obtain the bugs associ-ated with each version. Each of these systems allowsusers and developers to enter bug reports, whichare classified in terms of their potential severity andare processed by the development team in a struc-tured way. This process applies to all bugs that arenot fixed by a developer during initial program-ming, and the databases of these bug-tracking sys-tems record the status and resolution of each bugassociated with any version. We developed a Webcrawler to automate the gathering of these data. Sec-ond, we downloaded the precompiled version ofeach major release of each application (available as

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


a JAR file) from the Apache archives or the applica-tion’s website; we did not use minor releases becausethey typically involve relatively small changes. Weused LDM (Lattix Dependency Manager), a com-mercially available software application developedby Lattix Inc. (http://www.lattix.com/), to build adesign structure matrix (DSM) representation fromthe source code (as captured in the JAR file) andto extract the module membership of components.Third, we downloaded the original source code foreach version. This step involved locating and down-loading more than 110 source packages and—becausethe correspondence between Java classes and filesis nearly one to one—examining more than 28,000source code files. Accessing these files was necessaryto measure various dimensions of the complexity ofeach Java class. Fourth, we consulted each version’sofficial release notes to gather information on new-ness, age, and other important application-level con-trols. Finally, we developed a second Web crawlerto retrieve and extract data from the “change log”files of the version control tools, the so-called subver-sion (SVN) repositories, to link each bug to the Javaclass(es) that it affected; in other words, we countedall of the bugs that affected each Java class, notingthat some bugs affected more than one class. Fromthe SVN repositories we were also able to mine dataabout timing and authorship. Our analysis was basedon all the reported bugs that had been fixed or werein the process of being fixed (they are all “patched”bugs); we did not include “unpatched” bugs owingto the lack of information on which components theyaffected. (No selection bias was thereby introducedinto our analysis because we were able to considerall components in all of the product versions in oursample.) However, we did control for the number ofunpatched bugs associated with the version to whicha component belonged. Finally, we also checked thatpatched and unpatched bug groups were not signif-icantly different with respect to their ratio of severeand nonsevere bugs.

3.2. Dependent and Predictor Variables

3.2.1. Dependent Variable: Number of Bugs perComponent. The main dependent variable in ouranalysis is ycis , the number of bugs explicitly associ-ated with component c of application i (in version s).We assign a bug to a component based on the infor-mation reported in the bug-tracking and version con-trol systems.

3.2.2. Independent Variables. We define threesets of independent variables. First, we use measuresof fan-out and fan-in to test for the effects of down-stream and upstream component modularity, respec-tively. Second, we discuss how to identify in-cycle

components to measure component cyclicality andcyclicality centrality (to test H2 and H3). Third, wedefine a variable that captures precisely how the pres-ence of hierarchical modules encapsulates cycles (totest H4).

Software applications are systems of connectedcomponents grouped into modules (Shaw and Garlan1996, Martin 2002, Sangal et al. 2005). Modeling a sys-tem requires defining what constitutes a component.Although we could perform our analysis at thelevel of methods or even lines of code, we use the“Java class” as our unit of analysis. (A method isa self-contained collection of programming instruc-tions that typically includes variable instantiationand control flow statements, such as “if 0 0 0 then” and“while 0 0 0do” statements; a Java class is a collec-tion of methods. A class in our data set contains,on average, 10 methods. Our data set contains onlyJava applications, wherein files and classes are typi-cally coextensive.) There are three reasons for choos-ing the Java class as our unit of analysis. First,a class tends to provide a set of common function-ality (e.g., a set of low-level mathematical functions)maintained as one cohesive piece of software, oftenin one source file supplied by an individual devel-oper. Second, significant attributes of the architec-ture are apparent at the class level, rendering furtherdecomposition unnecessary for our purposes. Third,this level of decomposition is consistent with pastwork on software architecture (e.g., Sangal et al. 2005,MacCormack et al. 2006).

Our measures are based upon a design structurematrix representation of the dependency structure ofthe components in each version of each application inour sample. A DSM is a square matrix whose rowsand columns are both labeled with N componentsand whose off-diagonal cells indicate the components’directed dependencies (Browning 2001). A depen-dency results from a “call” made by one componentto another (Sangal et al. 2005, Cataldo et al. 2006,MacCormack et al. 2012).2 An off-diagonal mark incell (i1 j) of the DSM indicates that the Java class incolumn j calls the class in row i, and hence that theclass in column j depends on the class in row i. Forexample, consider the left panel of Figure 3, whichshows a “flat” DSM representation of the entire Ant(version 1.4) application with 160 components and

2 Dependencies include invocations (static, virtual, and interface),which allow for various types of method calls; inheritances (exten-sions and implementations), which allow a class to extend ordefine new behaviors; data member references, which refer to thefield of a class; and constructs, a method call for creating an object.We include these dependencies because they are typically integralto the system’s design and because developers create them deliber-ately. They comprise the various ways in which classes “connect”with other classes in a Java-based software application.

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Figure 3 Representations of Ant 1.4: (Left) Flat DSM; (Center) Visibility Matrix; (Right) Sequenced DSM

676 directed dependencies. (The term “flat” signifiesthat this DSM does not capture the arrangement ofcomponents into modules.)

As for metrics, we follow MacCormack et al. (2008)in using fan-out and fan-in to measure the lack of com-ponent modularity. This is consistent with previouswork that studies modularity at the component leveland measures component modularity as the lack ofdependency among product components (Sosa et al.2007b, Cabigiosu and Camuffo 2012). First, to mea-sure the lack of downstream component modular-ity needed to test H1, we calculate component fan-out(C_FAN_OUTcis) as the fraction of product compo-nents on which component c depends, either directlyor indirectly. With reference to Figure 1(b), for exam-ple, the fan-out of component A is 100% because com-ponent A depends (directly or indirectly) on all othercomponents in that system; in contrast, the fan-out ofcomponents B and C are, respectively, 50% and 0%. Ingeneral, component fan-out is derived from the visi-bility matrix V , which is a square binary matrix of sizeN (where N still denotes the number of components)whose nonzero cells 4vi1 j 5 indicate that componentj depends on component i, either directly or indi-rectly, via any number of intermediary components(Sharman and Yassine 2004, MacCormack et al. 2006).This matrix V is the binary sum of the first N powersof the flat DSM (applying Boolean matrix multiplica-tion). The center panel of Figure 3 shows the visibilitymatrix for Ant 1.4. We calculate fan-out componentvisibility as follows (see MacCormack et al. 2008):

C_FAN_OUTcis =

∑

k vkc

N − 11

where the numerator is the sum of all nonzero cellsin column c of V . This measure captures the fractionof components that might affect c as their changespropagate to c.

Although we do not explicitly hypothesize for theeffects of upstream component modularity, we stillcontrol for its potential existence. To measure the lack

of upstream component modularity, we calculate com-ponent fan-in (C_FAN_INcis) as the fraction of compo-nents that depend (either directly or indirectly) on c.Referring again to Figure 1(b), the fan-in of com-ponent C is 100% because all other components inthat system depend (directly or indirectly) on com-ponent C; in contrast, the fan-in of components Band A are, respectively, 50% and 0%. We calculate fan-in component visibility as follows:

C_FAN_INcis =

∑

k vck

N − 13

here the numerator is the sum of all nonzero cellsin row c of V . This measure captures the fraction ofcomponents that might be affected by a change incomponent c.

To determine whether a component is involvedin a cycle, we apply the procedure described byWarfield (1973) to the DSM, although we substi-tute Tarjan’s (1972) more efficient (linear order ofgrowth) algorithm to identify the unique sets ofin-cycle components. The right panel of Figure 3shows the result of applying this algorithm to theflat DSM of Ant 1.4; it reveals that Ant 1.4 con-tains three component cycles—the three highlightedblocks along the diagonal—which contain 6, 11, and18 components, respectively. We can now define thecomponent-level indicator variable: IN_CYCLEcis = 1if component c belongs to a component cycle ofversion s of application i (and IN_CYCLEcis = 0otherwise). Then, to capture component cyclical-ity, we measure CYCLE_SIZEcis as the number ofcomponents in the cycle to which component cbelongs. (CYCLE_SIZEcis = 0 for noncycle compo-nents.) Finally, we measure the centrality of a com-ponent within a cycle, IN_CYCLE_DEGREEcis , as thenumber of other components in the cycle to whichcomponent c belongs that are directly connectedwith component c (Freeman 1979, Wasserman andFaust 1994).

To calculate a measure that captures how modulesencapsulate cycles, we identify the module structure

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Figure 4 Hierarchical DSM of Ant 1.4, Including Module Boundaries

r

t

l

*

cc

u *

*

o

of the code for version s of application i. Becausethe nested subdirectory structure (and thus hierar-chical organization of the source code into nestedmodules) is captured in the name of each Java class,we are able to associate each component uniquelywith a set of hierarchically structured modules. Fig-ure 4 shows a hierarchical DSM representation ofAnt 1.4, which overlays the nested module struc-ture onto the dependency structure of the product.Although each component is uniquely assigned to acomponent module (i.e., a module that contains onlycomponents), the existence of “modules of modules”results in the three-level hierarchical module structureshown. Because we study the implications of group-ing components into (component) modules as a first-order effect of the architecture’s hierarchical aspects,we count the number of cross-module boundariesspanned by the cycle. That number is denoted by thevariable MODULES_CROSS_CYCLEcis .

3.3. Control VariablesOther factors may be related to the number ofbugs affecting a component. We include two sets ofcontrol variables: system-level and component-levelfactors (see Table 1, for which a more detailed ver-sion is provided in Online Appendix A, available assupplemental material at http://dx.doi.org/10.1287/msom.2013.0432). Table 2 gives descriptive statisticsand pairwise correlations for the variables includedin the analysis. Among the system-level controls, thebreadth and depth of the hierarchical module struc-ture are (as expected) positively correlated; this sug-gests that as applications grow in the number of their

component modules, they also grow in the numberof their modules of modules. However, these twovariables are negatively correlated with the applica-tion’s propagation cost, which indicates that applica-tions using more hierarchical module structures havea dependency structure that is less interconnected(MacCormack et al. 2006). The positive correlationbetween an application’s age and its average cyclo-matic complexity is consistent with the conjecturethat older applications are made up of componentsthat are internally complex. Among the component-level controls, the strong positive correlations amongcumulative changes and cumulative committers andauthors indicate that, as expected, workload and useof resources are positively associated with each other.Not surprisingly, the two measures of intracomponentcomplexity (SLOC and the average cyclomatic com-plexity of the methods constituting a component) arepositively correlated. Finally, the cyclicality variablesare, as expected, highly correlated among themselvesand also with component fan-in and fan-out. The pos-itive correlations between IN_CYCLE and C_FAN_INand C_FAN_OUT are consistent with the fact thatthese three measures depend (in different ways) onthe connectivity patterns of the focal component withother components and how these other componentsinteract.

4. Analysis and ResultsThe dependent variable in our analysis counts thenumber of bugs affecting component c. Because ourdata have both a hierarchical structure (component c

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Table 1 Control Variables

System-level factors (version s of application i)UNPATCHED_BUGS is Number of unpatched bugs (not yet associated with specific components) in this versionAGE is Number of days between the first version available and this versionDAYS_BEFORE is Number of days between this and the previous versionDAYS_AFTER is Number of days between this and the next versionNEWNESS is Number of new features (added functionality) and improvements (modifications to existing functionality) in this versionAPP_SLOC is Number of kilolines of source code in this versionAPP_AVG_CC is Average cyclomatic complexity of all methods in this version (cyclomatic complexity is the minimum number of linearly

independent paths in the control flow graph of a method in a software program; McCabe 1976)NUM_NOM_MODULES is Number of modules that contain actual components (not simply nested modules) in this version; this variable measures the

overall breadth of the hierarchical module structure in this versionHIERARCHY _DEPTH is Maximum number of levels between the leaf (component) levels and the root level in this version; this variable measures

the overall depth of the hierarchical module structure of this versionAVG_INTERFACE_USAGE is Average of Martin’s (2002, p. 267) distance metric across all modules in this version; this metric gauges the developers’

deviation from the “recommended” source code structure that best handles cross-module dependenciesPROPAGATION_COST is Average fan-in and fan-out visibility of all components in this version (MacCormack et al. 2006) as computed from the

visibility matrix V

Component-level factors (component c of application i in version s)C_AGE cis Number of days since the component was first included in this applicationC_EXPL_CHANGES cis Number of nonbug changes (improvements, new features, etc.) explicitly associated with this component in this versionC_IMPL_CHANGES cis Number of total changes (bugs, improvements, new features, etc.) implicitly associated with this component in this version

because of their introduction time (explicit assignment is not available)C_CUM_CHANGES cis Cumulative number of changes associated with the component prior to this versionC_CUM_COMMITTERS cis Cumulative number of committers associated with the component prior to this versionC_CUM_AUTHORS cis Cumulative number of authors associated with the component prior to this versionC_INTERFACCE_USAGE cis Distance metric (Martin 2002, p. 267) for the component’s module in this versionC_AVG_CC cis Average cyclomatic complexity (McCabe 1976) of the component’s methods in this versionC_SLOC cis Number of kilolines of source code in this version of the component

belongs to application i) and a panel structure (com-ponent c can be observed in any of several versions, s,of application i), we must use a hierarchical mod-eling framework with panel data (Raudenbush andBryk 2002). This type of analysis allows us to testfor component-level effects in the presence of system-level covariates while controlling for the lack of inde-pendence in observations from the same componentand also from the same application. In addition,because our dependent variable is a bug count forcomponent c, we estimate a hierarchical Poissonregression model (Cameron and Trivedi 1998). (Notethat our “count” dependent variable does not exhibitsigns of overdispersion: its variance is not signifi-cantly larger than its mean.) For estimation purposes,we use the xtmepoisson procedure recently imple-mented in Stata 12 with a component-specific ran-dom intercept that is nested in its correspondingapplication-specific random intercept (Raudenbushand Bryk 2002, Rabe-Hesketh and Skrondal 2008).Hence, our baseline model is a random-interceptmodel of the following form:

E6ycis � xcis1zis7= exp4�zis +�xcis + �ci + �i + �cis50

Consistent with the hierarchical linear modelingapproach, this model fits a multilevel, mixed-effects

Poisson regression that contains not only “fixed”effects (the �, �-coefficients), which are analogous tostandard regression coefficients, but also “random”intercepts (the �-parameters), which are assumed tovary (following a Gaussian distribution) across com-ponents and applications. Our regression models pre-dict that the expected number of bugs affectingcomponent c in application i depends exponentiallyon two sets of linearly independent regressors: a firstset 8zis9 defined at the system level, and a secondset 8xcis9 defined at the component level, both instanti-ated in version s. We use raw data in our estimations,but our results are robust to both grand-mean andapplication-level centering (Kreft et al. 1995).

When testing for the hypothesized effects of cycli-cality, we enhance our baseline model by includ-ing random coefficients for the cyclicality variablesof interest with an unstructured covariance structureof the random parameters (Singer and Willett 2003,Rabe-Hesketh and Skrondal 2008). These models pro-vide a significantly better fit to our data—than do thecorresponding random-intercept models—by relaxingthe assumption that cyclicality effects are the sameacross all the applications in the sample. Intuitively,a random-coefficient model is analogous to a modelthat includes interaction effects between the variable

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


T able2

Descrip

tiveStatisticsan

dCo

rrelations

ofVa

riables

4N=

2813

945

Varia

bles

Mea

n/SD

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

1Nu

mbe

rofb

ugs,y c

is0.

2/0.

610

002UN

FIXE

D_BU

GSis

45.6

/55.

200

1810

003AG

E is

786.

5/65

2.6

0006

0023

1000

4DA

YS_B

EFOR

E is

230.

9/25

9.6

0007

0027

0046

1000

5DA

YS_A

FTER

is29

9/32

4.6

0018

0044

0043

0032

1000

6NE

WNE

SSis

31.6

/43.

400

1700

4000

1400

3200

1610

007AP

P_SL

OCis

22.3

/20.

6−

0006

−00

0200

0900

1100

11−

0012

1000

8AP

P_AV

G_CC

is2.

2/0.

900

0300

0200

3800

0900

0500

01−

0014

1000

9NU

M_N

OM_M

ODUL

ESis

43/4

2.6

−00

0700

1700

05−

0012

0000

−00

06−

0024

−00

3310

0010

HIER

ARCH

Y_DE

PTH i

s4.

3/1.

1−

0007

0009

0003

−00

0200

09−

0010

−00

06−

0041

0081

1000

11AV

G_INTERF

ACE_US

AGE i

s0.

2/0.

0−

0010

0008

0015

0024

0017

−00

1900

31−

0007

0026

0033

1000

12PR

OPAG

ATION_

COST

is15

.4/8

.700

00−

0003

−00

07−

0009

−00

0700

19−

0006

−00

01−

0028

−00

45−

0022

1000

13C_

AGE c

is40

0.3/

494.

100

0900

1100

6400

5300

3500

0500

1800

25−

0013

−00

0900

1500

0410

0014

C_EX

PL_C

HANG

EScis

0.6/

3.2

0035

0017

0001

0007

0019

0031

−00

0500

05−

0011

−00

13−

0009

−00

0100

0410

0015

C_IM

PL_C

HANG

EScis

0.2/

0.6

0026

0002

0001

0002

−00

0700

05−

0013

−00

0100

03−

0002

−00

0900

0600

0700

0610

0016

C_CU

M_C

HANG

EScis

2.9/

9.7

0045

0026

0009

0023

0014

0044

−00

0600

00−

0005

−00

06−

0012

0002

0024

0040

0031

1000

17C_

CUM_C

OMMITTERS

cis

1.0/

2.0

0036

0033

0016

0038

0019

0050

−00

1200

0100

02−

0001

−00

03−

0005

0031

0037

0030

0081

1000

18C_

CUM-AUT

HORS

cis

0.9/

2.2

0041

0007

−00

0100

01−

0007

0020

−00

14−

0005

0001

−00

01−

0026

0000

0011

0023

0045

0070

0057

1000

19C_

INTERF

ACE_US

AGE c

is0.

3/0.

2−

0002

0021

0004

0004

0010

0006

−00

10−

0007

0027

0016

0025

0017

0004

0000

0000

−00

0400

02−

0012

1000

20C_

AVG_

CCcis

2.1/

2.0

0010

0006

0005

0000

0002

0003

0000

0002

0003

−00

02−

0001

0007

0003

0004

0011

0008

0010

0011

0007

1000

21C_

SLOC

cis

0.1/

0.1

0023

0001

0003

0000

0001

−00

0200

06−

0001

0002

0001

0004

0003

0006

0006

0018

0017

0014

0022

0002

0039

1000

22C_

FAN_

OUT c

is15

.1/2

1.6

0011

−00

01−

0002

−00

03−

0003

0008

−00

0400

03−

0011

−00

19−

0009

0040

0004

0003

0013

0010

0007

0015

0008

0024

0019

1000

23C_

FAN_

INcis

15.9

/19.

400

04−

0003

−00

05−

0005

−00

0400

07−

0001

−00

04−

0013

−00

19−

0009

0046

0005

0002

0008

0005

0000

0004

0009

−00

0400

0500

1010

0024

CYCL

E_SIZE

cis

12.6

/30.

900

0600

1300

00−

0002

0004

0002

0006

−00

1200

0900

0500

0600

2300

05−

0002

0008

0004

0003

0005

0017

0011

0017

0047

0043

1000

25IN_C

YCLE_D

EGRE

E cis

1.4/

4.5

0019

0010

0002

−00

0100

0200

0300

02−

0001

0002

−00

0200

0100

1300

0600

0400

2000

1600

1200

1900

0800

1600

3900

3500

3100

5710

0026

MOD

ULES

_CRO

SS_C

YCLE

cis

1.3/

3.9

0008

0012

0002

−00

0300

0100

05−

0009

−00

0900

2501

500

0400

1200

00−

0001

0016

0010

0013

0012

0015

0013

0015

0032

0034

0072

0048

1000

27IN_C

YCLE

cis

0.2/

0.4

0014

0008

0001

−00

0100

0200

0200

02−

0005

0001

−00

0400

0100

1900

0600

0400

1200

1100

0600

1400

0800

1500

2500

5300

4100

7700

5900

62

Note

.Cor

rela

tions

grea

tert

han

|0.0

2|ar

esi

gnifi

cant

atp<

0001

.

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


of interest (here, a cyclicality variable) and a group-level indicator variable (here, an application-leveldummy variable). Note that, given the hierarchicalnature of our data, we cannot include application-level indicator variables.

Estimates for the �- and �-coefficients in our finalset of regression models are reported in Table 3.Models 1–3 are random intercept models, whereasModels 4–7 are random coefficient models. As recom-mended by Singer and Willett (2003), we included anapplication-specific random coefficient of the cyclical-ity variable of interest only if doing so significantlyreduced the model’s deviance statistic with respectto the nested model that excludes such a coefficient.(In our models, a model’s deviance statistic is −2 ·

(log-likelihood statistic).) In Online Appendix B, weinclude Table B, providing details of how we eval-uated the reduction in deviance statistic associatedwith the inclusion of each application-specific ran-dom coefficient of the cyclicality variables of inter-est. Table B also shows the standard deviations ofboth component- and application-level main randomparameters of all the models shown in Table 3.

4.1. Testing the HypothesesModel 1 includes system-level control variables. Mostof the coefficients for these control variables are sig-nificant, which confirms their relevance. This modelshows that components in large applications (as mea-sured by the number of kilolines of the application’ssource code in version s) and with higher averagecyclomatic complexity are likely to exhibit a greaternumber of defects. This is consistent with the infor-mation systems literature, which suggests that bothSLOC and cyclomatic complexity are good predic-tors of the effort required to build, test, and maintainsoftware applications (McCabe 1976, Henry and Selig1990). The hierarchical grouping of components bythe modules in an application seems to influence com-ponent quality: having more breadth and more depthin the hierarchical module structure is associated withfewer defects. (Given the high correlation betweenNUM_NOM_MODULES and HIERARCHY_DEPTH,we test the robustness of our results to the exclu-sion of HIERARCHY_DEPTH and confirm that theyare not sensitive to that change.) Finally, componentsin applications whose modules deviate from the codestructure recommended for handling interfaces acrossmodules seem to be more defect prone (Martin 2002),and components in applications with higher propa-gation cost seem to have, on average, fewer defects(MacCormack et al. 2006).

Model 2 adds component-level controls. This modelcontrols for the number of changes (e.g., incrementalimprovements and the addition of new features) asso-ciated with the focal component c. The positive and

significant coefficients for C_EXPL_CHANGES andC_IMPL_CHANGES suggest that, for a given com-ponent, the number of bugs is positively correlatedwith the number of non-bug-fixing changes that affectit. In addition, we control for the amount of organi-zational attention and resources associated with thefocal component, since its inception, in application i.The negative and significant coefficient for cumulativechanges to the source code of component c priorto the current version s, C_CUM_CHANGES, sug-gests that components dealt with in previous ver-sions of the application are less likely to be affectedby bugs in the current version. However, the greaterthe number of authors and committers dealing witha component in the past, the more likely it is thatsuch a component will be associated with a highernumber of bugs in the current version; in otherwords, the coefficients for C_CUM_AUTHORS andC_CUM_COMMITTERS are both positive and sig-nificant. Given the high correlation between thesetwo variables and C_CUM_CHANGES, we testthe robustness of our results to the exclusion ofC_CUM_CHANGES and find that our results arenot sensitive to such exclusion. Finally, Model 2confirms that a component’s cyclomatic complexity(C_AVG_CC) and source lines of code (C_SLOC) areimportant determinants of how many bugs it has(Card and Glass 1990, Henry and Selig 1990).

Model 3 tests for the effects of downstream com-ponent modularity (H1). Here we include two typesof architectural variables: C_FAN_OUT (how muchcomponent c depends on other components) andC_FAN_IN (how much other components dependon c). The coefficient for C_FAN_OUT is positiveand significant (consistent with H1), and significantlylarger than C_FAN_IN (p < 00001). (To test the dif-ference of these two coefficients, we estimated analternative model that includes C_FAN_OUT and anew variable defined as the sum of C_FAN_OUTand C_FAN_IN. In that model, the coefficient esti-mate of C_FAN_OUT tests the significance of the dif-ference of the parameters of interest.) This modelconfirms, in line with the bulk of previous research,that the directionality of dependencies influences therelationship between a component’s modularity andits number of defects (e.g., Card and Agresti 1988,Kan 1995, Briand et al. 1999, Aggarwal et al. 2007,Burrows et al. 2010). Components that depend onmany other components are likely to be associatedwith a higher number of bugs than are componentsthat depend on fewer other components. Observethat, despite the positive and significant coefficient forC_FAN_IN in this partial model, the effect becomesinsignificant when, in subsequent models, the effectof cyclicality is also included. Such instability of the

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Table 3 Hierarchical Poisson Regressions Predicting the Number of Bugs per Component 4N = 2813945

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7

UNFIXED_BUGSis 00004∗∗∗ 00002∗∗∗ 00002∗∗∗ 00002∗∗∗ 00003∗∗∗ 00002∗∗∗ 00002∗∗∗

4000015 4000015 4000015 4000015 4000015 4000015 4000015AGEis −00002∗∗∗ −00003∗∗∗ −00002∗∗∗ −00003∗∗∗ −00003∗∗∗ −00002∗∗∗ −00002∗∗∗

4000005 4000005 4000005 4000005 4000005 4000005 4000005DAYS_BEFOREis −00001∗∗∗ −00001∗∗∗ −00001∗∗∗ −00001∗∗∗ −00001∗∗∗ −00001∗∗∗ −00001∗∗∗

4000005 4000005 4000005 4000005 4000005 4000005 4000005DAYS_AFTERis 00002∗∗∗ 00002∗∗∗ 00002∗∗∗ 00002∗∗∗ 00002∗∗∗ 00002∗∗∗ 00002∗∗∗

4000005 4000005 4000005 4000005 4000005 4000005 4000005NEWNESSis 00001 00001 00001 00001 00001 00001 00001

4000015 4000015 4000015 4000015 4000015 4000015 4000015APP_SLOCis 00011∗∗∗ 00012∗∗∗ 00011∗∗∗ 00015∗∗∗ 00015∗∗∗ 00015∗∗∗ 00011∗∗∗

4000045 4000045 4000045 4000045 4000045 4000045 4000045APP_AVG_CCis 00151∗∗∗ 00137∗∗∗ 00137∗∗∗ 00125∗∗∗ 00127∗∗∗ 00128∗∗∗ 00135∗∗∗

4000145 4000135 4000135 4000135 4000135 4000135 4000135NUM_NOM_MODULESis −00022∗∗∗ −00023∗∗∗ −00022∗∗∗ −00025∗∗∗ −00025∗∗∗ −00025∗∗∗ −00023∗∗∗

4000035 4000035 4000035 4000035 4000035 4000035 4000035HIERARCHY_DEPTHis −00210∗∗ −00237∗∗ −00225∗∗ −00261∗∗ −00249∗∗ −00256∗∗ −00219∗∗

4001055 4001085 4001075 4001115 4001105 4001105 4001085AVG_INTERFACE_USAGEis 30673∗∗ 40234∗∗∗ 40044∗∗ 50176∗∗∗ 40376∗∗∗ 40885∗∗∗ 40229∗∗∗

4105305 4105755 4105685 4106555 4106655 4107045 4105815PROPAGATION_COSTis −00062∗∗∗ −00064∗∗∗ −00087∗∗∗ −00115∗∗∗ −00110∗∗∗ −00108∗∗∗ −00081∗∗∗

4000085 4000085 4000085 4000105 4000105 4000105 4000085C_AGEcis 00000 00000 00000∗ 00000∗ 00000∗ 00000

4000005 4000005 4000005 4000005 4000005 4000005C_EXPL_CHANGEScis 00030∗∗∗ 00027∗∗∗ 00027∗∗∗ 00027∗∗∗ 00028∗∗∗ 00029∗∗∗

4000035 4000035 4000035 4000035 4000035 4000035C_IMPL_CHANGEScis 00070∗∗∗ 00063∗∗∗ 00047∗∗∗ 00045∗∗∗ 00045∗∗∗ 00062∗∗∗

4000135 4000135 4000135 4000135 4000135 4000135C_CUM_CHANGEScis −00015∗∗∗ −00013∗∗∗ −00012∗∗∗ −00012∗∗∗ −00012∗∗∗ −00012∗∗∗

4000025 4000025 4000025 4000025 4000025 4000025C_CUM-COMMITTERScis 00162∗∗∗ 00140∗∗∗ 00148∗∗∗ 00151∗∗∗ 00150∗∗∗ 00142∗∗∗

4000135 4000135 4000135 4000135 4000135 4000135C_CUM-AUTHORScis 00054∗∗∗ 00052∗∗∗ 00047∗∗∗ 00045∗∗∗ 00046∗∗∗ 00047∗∗∗

4000095 4000095 4000095 4000095 4000095 4000095C_INTERFACE_USAGEcis 00021 00020 00013 00026 00000 −00022

4001015 4000995 4001005 4001005 4001015 4000995C_AVG_CCcis 00095∗∗∗ 00073∗∗∗ 00072∗∗∗ 00073∗∗∗ 00074∗∗∗ 00072∗∗∗

4000095 4000095 4000095 4000095 4000095 4000095C_SLOCcis 00002∗∗∗ 00002∗∗∗ 00002∗∗∗ 00002∗∗∗ 00002∗∗∗ 00002∗∗∗

4000005 4000005 4000005 4000005 4000005 4000005C_FAN_OUTcis 00019∗∗∗ 00015∗∗∗ 00015∗∗∗ 00015∗∗∗ 00014∗∗∗

4000015 4000015 4000015 4000015 4000015C_FAN_INcis 00004∗∗∗ 00000 00000 00000 00001

4000015 4000025 4000025 4000025 4000015CYCLE_SIZEcis 00013∗∗∗ 00012∗∗∗ 00009∗∗

4000055 4000045 4000045IN_CYCLE_DEGREEcis 00016∗∗ 00017∗∗

4000085 4000085MODULES_CROSS_CYCLEcis 00027∗∗

4000115IN_CYCLEcis 00459∗∗∗

4001405Log-likelihood −9,329.459 −8,745.888 −8,623.929 −8,564.721 −8,557.613 −8,554.501 −8,590.827AIC 18,702.92 17,553.78 17,313.86 17,201.44 17,195.23 17,191.00 17,253.65

Notes. All models include component- and application-specific nested random effects as well as year fixed effects. In addition, Models 4–7 include application-specific random coefficients of the cyclicality variables. Standard errors are given in parentheses.

∗p < 001; ∗∗p < 0005; ∗∗∗p < 0001.

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


effect of C_FAN_IN in our models is fully consis-tent with the inconclusive findings reported in theliterature addressing the relationship between fan-inand quality (Briand et al. 1999). It also highlights theimportance of accounting for the effect of componentcyclicality when evaluating the effects of other com-ponent architectural features.

Model 4 tests H2, which predicts a detrimen-tal effect on quality for components with highercomponent cyclicality. This model includes the pre-dictor variable CYCLE_SIZE, the number of compo-nents in the cycle to which component c belongs.This model shows a positive and significant coeffi-cient for CYCLE_SIZE, indicating that the larger thecycle involving component c, the greater the expectednumber of bugs associated with that component.Note that Model 4 is a random-coefficient model thatincludes an application-specific random slope associ-ated with CYCLE_SIZE, relaxing the assumption thatthe effect of CYCLE_SIZE is constant across all appli-cations in the sample. This random-coefficient modelfits our data better than does the nested random-intercept model (Singer and Willett 2003): the differ-ence in the deviance statistic of the random interceptmodel (17,209.29) and the random coefficient model(17,129.44) is 79.85, which easily exceeds the 0.05 crit-ical value of a �2 distribution with two degrees offreedom (5.99). This suggests that the nested random-intercept model should be rejected in favor of therandom-coefficient model, even though both modelsstrongly support H2.

Model 5 tests H3, which predicts that, if we controlfor the size of the cycle, then a component‘s degreeof in-cycle centrality is positively associated with itsnumber of defects. The positive and significant coef-ficient for IN_CYCLE_DEGREE lends empirical sup-port to this hypothesis and suggests that componentsin same-size cycles can differ in their number ofdefects: those components occupying more central(respectively, more peripheral) positions in their cycleare likely to have a larger (respectively, smaller)number of defects. Model 5 includes application-specific random slopes for both CYCLE_SIZE andIN_CYCLE_DEGREE because their inclusions make asignificant deviance reduction when compared to thenested random-coefficient model that includes a ran-dom slope of CYCLE_SIZE only (deviance reduction =

8055 > critical �2 (0.05 right tailed, 3) = 7081).Finally, Model 6 tests H4. (Model 6 is a ran-

dom coefficient model that includes application-specific random slopes for both CYCLE_SIZE andIN_CYCLE_DEGREE, but not for MODULES_CROSS_CYCLE. We do not include an application-specificrandom slope for MODULES_CROSS_CYCLE becausedoing so does not yield a significant reduction ofdeviance statistic with respect to the nested model

that includes application-specific random slopes forboth CYCLE_SIZE and IN_CYCLE_DEGREE. Deviancereduction = 8091 < critical �2 (0.05 right tailed, 4) =

9049.) H4 predicts that components involved in cyclesthat cross a greater number of module boundaries aremore likely to exhibit a greater number of defects.The model yields a positive and significant coef-ficient for MODULES_CROSS_CYCLE, which indi-cates that—beyond the effect of CYCLE_SIZE andIN_CYCLE_DEGREE—in-cycle components are likely(in line with H4) to have even more defects whenthey are not encapsulated by a single module. In oursample, 87% of the cycles cross at least one moduleboundary, which suggests that multimodule cycles arecommon. This suggests that, in an open source soft-ware development context, developers seem to neglectthe negative consequences of dealing with cyclicaldependencies across modules or are not aware of theexistence of such cycles spanning multiple modules.

We tested the robustness of all findings reportedin this section with respect to alternative modelspecifications. First, all the results are robust to theinclusion of quadratic terms for fan-out and fan-in, which were both negative and significant. Thisalternative specification suggests that the relation-ship between fan-out (and fan-in) and a component’sdefects may be captured by an inverted U-shape.However, the shapes of the quadratic functions areappreciably different for fan-out and fan-in. Afterincluding the effect of CYCLE_SIZE in our models,the quadratic function of fan-out does not peak withinthe range of fan-out values in our sample, suggest-ing a decreasing marginal return effect of fan-out(instead of an inverted U-shape form). For fan-in, thequadratic function is a “shallow” inverted U-shapethat peaks about its mean. Second, we estimate hier-archical Poisson regression models that include—instead of a component-specific random effect nestedin its corresponding application—a version-specificrandom effect nested in its corresponding applicationrandom effect. Our results are robust to this hierar-chical model specification. Third, we estimate a zero-inflated Poisson (nonhierarchical) regression modelwith version-level fixed effects to ensure that our datado not contain too many zeros. The Vuong (1989) test,which compares a zero-inflated Poisson regressionwith a standard Poisson regression (featuring version-level fixed effects), does not significantly favor thezero-inflated model.

Although the analysis provides strong support forour hypotheses, a discussion of causality is in order.First, because our dependent variable is measuredwithin a time span that does not commence until afterall the independent variables have been measured(i.e., bugs are not discovered until after a version ofthe product has been released), it is unlikely that the

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


existence of unidentified defects leads to the establish-ment of specific dependency patterns, such as cycli-cal dependencies, among product components. Thisreduces the risk of reverse causality. Second, the lagbetween the independent variables and our depen-dent variable mitigates the risk of unobserved factors(e.g., contemporaneous measurement errors) affectingboth the dependent variable and our predictor vari-ables in a similar manner. Of course, these consider-ations are not sufficient to guarantee causality in thestrictest sense.

4.2. Effect Size of CyclicalityIn this section we not only estimate the magnitude ofthe effect of component cyclicality but also compareit with the effect size of component modularity (mea-sured by the lack of component fan-out). To estimatean overall effect of component cyclicality, we estimatea random coefficient regression model that includesIN_CYCLE as the only cyclicality variable of interest.Such a model (Model 7 in Table 3) fits our data betterthan does the corresponding nested random-interceptmodel (deviance reduction = 31076 > critical �2 (0.05right tailed, 2) = 5099). This model yields a positiveand significant coefficient of IN_CYCLE (0.459, p <00001). According to that model, an in-cycle compo-nent has, on average, 58.3% (e00459 − 1) more defectsthan a noncycle component. In comparison, such amodel also shows a positive and significant coeffi-cient of C_FAN_OUT (0.014, p < 00001). Hence, anincrease of a single standard deviation in a compo-nent’s fan-out is correlated with 35.3% more defects(e4000145421065 − 1). Thus, our regression results suggestthat, on average, the overall effect of a componentbeing in a cycle is of the same order of magnitude asthe effect of component fan-out.

To test the robustness of our effect size estimatesagainst potential confounding effects due to the highcorrelations between IN_CYCLE and C_FAN_OUT(and C_FAN_IN), we reestimate the effect size ofIN_CYCLE using a matching approach. Toward thisend, we implement a propensity score matchingapproach commonly used in medical trials and eco-nomics when seeking to evaluate a treatment effectin nonrandomized observational studies (Rosenbaumand Rubin 1983). The rationale behind this approachis to define a propensity score as the conditional prob-ability of a component being in a cycle (i.e., of beinga “treated” component) in terms of the component’sother characteristics (e.g., its fan-out and fan-in). Onemust then identify components that have both a sim-ilar propensity score and a similar, “balanced” set ofcovariates—in this case, fan-in and fan-out—for thesame range of propensity scores. Matched groups ofcomponents with similar propensity scores and a bal-anced set of covariates are used to estimate the average

effect of treatment on the treated (ATT) as the differencebetween the expected number of bugs for the treatedunits (the in-cycle components) versus the untreatedunits (the noncycle components) of the matched sam-ple. We are ultimately interested in whether such adifference (percentage wise) in the number of bugs isthe same as (or greater than) the difference obtainedfrom our regression results.

For estimation purposes we use the pscore and attsmethods implemented in Stata by Becker and Ichino(2002). The pscore method determines the propen-sity score by estimating a logistic regression thatincludes component-specific attributes likely to beassociated with the inclusion of a component in acycle. These component-level covariates include age,average cyclomatic complexity, and number of sourcecode lines as well as fan-in and fan-out. We alsoinclude version-level fixed effects. As expected, themost salient predictors of being in a cycle are thefan-in and fan-out variables (their coefficients havez-scores that exceed 60).

Because it is virtually impossible to find two unitsin a sample that have the exact same propensityscore, one must also devise an algorithm to iden-tify matching groups that have both similar propen-sity scores and balanced covariates. For this, we usethe stratification method executed by atts in Statabecause, by definition, it guarantees a balanced setof matched samples if the outcome of the pscore isalso balanced (Becker and Ichino 2002). In a sampleof 6,064 components that is matched and balancedwith respect to fan-in and fan-out, we find an ATT of0.141 (p < 00001)—in other words, the average in-cyclecomponent has 0.141 more bugs than the averagenoncycle component in our matched sample. Moreimportantly, in that sample the in-cycle componentshave, on average, 79.7% more bugs than the noncy-cle components (0.318 versus 0.177). (Unfortunately,a propensity score matching approach is not suitableto test for the effect of a continuous variable such asfan-out.) Overall, the results from using this match-ing approach to estimate the effect size of cyclicalityindicate that the effect of cyclicality is (i) comparableto the one estimated from our regression results and(ii) not confounded with the effect of either fan-out orfan-in.

5. DiscussionProduct quality matters. The competitiveness of mostcompanies depends on it. Both the popular pressand academic research have documented the nega-tive consequences of poor quality. For example, thedecline of market share among the Big Three U.S.automakers in recent decades has been attributed tomediocre product quality (Klier 2009); Firestone even

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


faced demise when a product design fault led to sev-eral fatal accidents (Pinedo et al. 2000). A wealthof studies in different contexts has documented theconsequences of poor quality on firm survivability(Li and Hamblin 2003), market share (Mohrman et al.1995), and profits (Fuentes-Fuentes et al. 2004). Con-versely, the long-term survival and widespread adop-tion of systems based on open source software hasbeen (at least partially) attributed to code quality(Ajila and Wu 2007). Many drivers of product qualityhave been recognized, and strategies for improving ithave received widespread attention (Cua et al. 2001).Hence, improving our understanding of the factorsthat drive product is of paramount importance.

The literature has already identified product archi-tecture as a major factor (Ulrich 1995, Ulrich andEppinger 2012). However, previous research hasfocused on modularity as the most salient architec-tural characteristic in the architecture–quality rela-tionship (e.g., Briand et al. 1999, Aggarwal et al.2007, Burrows et al. 2010). Our study demonstratesthat a second architectural feature, cyclicality, is sim-ilarly important. We empirically link cyclicality toquality, and we identify particular aspects of cycli-cality that significantly affect quality. First, we findthat component cyclicality is a significant predictorof component defectiveness whose effect is of thesame order of magnitude as is modularity. Second,in untangling the cyclicality construct, we learn thata component’s centrality in a cycle plays a significantrole: components that occupy a more central posi-tion in a cycle are more prone to defects than arecomponents that occupy peripheral positions. Finally,we show that architecture is determined not only bythe dependency structure but also by the hierarchi-cal grouping of components into modules: the defectproneness of product components increases with thenumber of module boundaries crossed by their cycli-cal dependencies.

Establishing an empirical link between componentcyclicality and the level of defects of product compo-nents highlights the importance of studying the rela-tionship between architectural properties of productcomponents and other dimensions of performance.Given the iterative nature and higher coordinationneeds associated with in-cycle components, we wouldexpect these components not only to be more defec-tive, but also to be at higher risk of missing sched-ule and budget targets, and thus to negatively impactmultiple dimensions of product development. Yet,empirical evidence for this assertion is currently lack-ing. In addition, looking at the dynamic evolution ofproducts, one could argue that cycles play a signifi-cant role in how products evolve (MacCormack et al.2006, 2008; Sosa et al. 2007a): considering again the

iterative approaches and coordination needs associ-ated with in-cycle components, one could expect thesecomponents to exhibit different rates of redesign,upgrade, and removal than noncycle components.

Our work also has implications for understandingproduct architecture on a conceptual level. Our resultsshow that the interplay of the module and depen-dency structures relates to product quality. This inter-play (and its consequences for quality) is intriguing,because the forces that shape a product’s modulestructure are substantially different from those thatshape its dependency structure. According to the clas-sical trope of the architecture literature, establishinga hierarchical module structure entails breaking theproduct into several major building blocks or subsys-tems and then mapping the product’s functionalityto each (Ulrich 1995). This process is repeated, top-down, in a nested manner, for each subsystem untilall functions of a system have been assigned to com-ponents (Ulrich and Eppinger 2012), resulting in thesystem’s intended architecture (i.e., the modules thatsystem architects and managers deliberately set up asthe system’s building blocks). In contrast, the prod-uct’s dependency structure, which determines theexistence of cycles in the product, is the result of myr-iad local decisions that are typically made by tech-nical personnel who optimize performance in termsof the local criteria associated with their components.Such decisions are made in a bottom-up mannerand thus, from the viewpoint of management, simply“emerge,” resulting in the system’s actual architecture.Our results highlight the importance of aligning bothaspects of architecture by showing how misalignmentbetween module structure and dependency structure(e.g., modules failing to fully encapsulate cycles) hasa negative effect on quality. This means that systemarchitects should—as early in the design process aspossible—look beyond the hierarchy of a system orproduct’s modules to examine its actual dependencystructure where cycles reside. Defect proneness can bemitigated by properly aligning the product’s hierar-chical modules with its dependency structure.

Given our results, the information systems litera-ture should explicitly take into account the role ofarchitectural cyclicality when studying the factors thatdrive software performance. For instance, an impor-tant software architecture decision in the informationsystems literature is the refactoring of computer code.Software code quality tends to “decay” over timebecause additions and changes to the code often failto follow the prescribed design rules, such as whereto add allowable dependencies. Refactoring improvesthe internal structure of the code by “redistribute[ing]classes, variables and methods across the class hierar-chy in order to facilitate future adaptations and exten-sions” (Mens and Tourwé 2004, p. 126) and is vital

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


for the long-term survival of a software application.However, source code decay may be driven by thedivergence of the actual and intended architectures(even if they were originally aligned) as dependencydecisions continue to be made in a distributed wayby a large number of decision makers (whereas archi-tecture decisions are seldom revisited, and even thenby only a central architect or small group of archi-tects). This divergence may well call for realignment,which is the purpose of refactoring. Several methodshave been proposed to help identify code segmentsin need of refactoring (e.g., Kataoka et al. 2001, Simonet al. 2001), yet these methods have overlooked theneed to monitor the existence and characteristics ofcycles as important determinants in the refactoring.The analysis presented in this paper suggests that thedependency structure—and especially the presence ofcycles—is an important additional clue in the searchfor code elements to be refactored.

Our findings suggest two architectural action fieldsthat managers should consider to improve productquality in a typical development process.

Visualize the architecture. It is crucial for managersto understand the key components of product archi-tectures: dependencies among components (espe-cially when they form cycles) and nested modulesthat group components. Visualizing the architectureis fundamental for improving one’s understandingof both its technical and organizational aspects,because product architecture decisions influence for-mal and informal organizational structures (Sosaet al. 2004, MacCormack et al. 2006, Eppinger andBrowning 2012).

Identify and manage component cycles. Managersshould routinely identify component cycles (stem-ming from the product’s dependency structure) aswell as the actors responsible for their design, becausethe cycles and actors both will require disproportion-ate attention. To identify component cycles, managersmust actually disregard the constraints imposed bythe particular arrangement of components into nestedmodules. Our empirical results suggest several stepsthat can be taken to mitigate the negative effects ofidentified cycles. First, try to break the cycle by rerout-ing the critical dependencies that form them, espe-cially where they stem from components central to thecycle. If a cycle cannot be broken, then its size shouldbe reduced, because larger cycles are more detrimen-tal in placing a larger fraction of components at riskof defects. Third, reduce cycle complexity, especiallyfor components that are central to the cycle. Reducingtheir centrality in the cycle can improve code qual-ity. Fourth, ensure that modules encapsulate cycles.Finally, although managers must identify and mon-itor component cycles in the short run, they shouldpreempt cycles in the long run by establishing and

enforcing design rules (Baldwin and Clark 2000) thatspecify types of allowable component relationships(Sangal et al. 2005).

Our study has some limitations. In particular, theanalysis was performed on a sample of Java-basedapplications developed by the open source ApacheSoftware Foundation. To be able to understand thelimits of any attempt to generalize our findings, twoimportant attributes of our empirical context meritdiscussion. First, because the organizational structuresin open source development settings are typicallygeographically distributed, the product architecturesthat emerge from such settings are likely to beless interconnected than the architectures of prod-ucts developed in closed source development set-tings (MacCormack et al. 2012). We could, therefore,expect the occurrence of cycles to be less salient inopen source than in closed source projects. Yet, with-out further studies on closed settings, it is unclearwhether the effect of cyclicality would be any differ-ent in equivalent open and closed settings. Second,although open source development does not neces-sarily rely on formal organizational structures definedby any particular firm, nor on face-to-face communi-cations for informal interpersonal coordination, coor-dination efforts between interdependent actors takeplace through different mechanisms such as commit-ters who act as project leaders and online discus-sion lists that enable direct communication amongauthors (Mockus et al. 2000). Hence, studies in othersettings are needed before our findings can be fullygeneralized.

We have been able to apply methods—developedin the context of exploring the task structure of devel-opment processes—to the wealth of data availablein the open source space, thereby linking the con-cept of cyclicality to quality outcomes and explicatingcyclicality’s different facets. Our findings raise impor-tant questions. This paper has focused on the con-sequences of a product’s cyclical dependencies, butwhat are their antecedents? Where in the product arecycles likely to form? Under what circumstances dothey arise? How do they grow or shrink? Can wedefine an architecture that is optimal in terms of min-imizing defects? (For a first attempt in this direction,see Sosa et al. 2011.) How do architectural and orga-nizational patterns interact and coevolve over time?(See Colfer and Baldwin 2010, MacCormack et al.2012.) How would such coevolution influence defectproneness and other performance metrics? Furtherexploration of the consequences of cycles might askhow the presence of cycles affects the time requiredto fix defects. Addressing these questions poses inter-esting challenges for future research. This paper pro-vides an important step toward the development ofan empirically validated theory of product architec-ture design.

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Supplemental MaterialSupplemental material to this paper is available at http://dx.doi.org/10.1287/msom.2013.0432.

AcknowledgmentsThe authors appreciate support from Lattix Inc., whichprovided them with the software used to document thearchitecture of the products and sequence their matrices tocalculate some of the variables. They thank Neeraj Sangaland Frank Waldman for insightful feedback throughout thisresearch and Christoph Loch for his comments on an ear-lier version of this paper. They also thank Bhavani Shanker,Aman Neelappa, and Gaurav Shahlot for their researchassistance. They appreciate inputs received at research sem-inars at Rady School of Business (University of California,San Diego), London Business School, IESE Business School,Judge School of Business (University of Cambridge), Geor-gia Tech College of Management, Singapore ManagementUniversity, and the University of Southern Denmark. Theauthors also thank the entire editorial team for the insight-ful comments and clear guidance throughout the reviewprocess. The first two authors are grateful for the finan-cial support from the INSEAD R&D Committee [Grants2520-360, 2520-519]. The third author is grateful for supportfrom the Neeley Summer Research Award Program fromthe Neeley School of Business at Texas Christian Universityand a grant from the U.S. Navy, Office of Naval Research[Grant N00014-11-1-0739].

ReferencesAggarwal KK, Singh Y, Kaur A, Malhotra R (2007) Investigating

effect of design metrics on fault proneness in object-orientedsystems. J. Object Tech. 6(10):127–141.

Ajila SA, Wu D (2007) Empirical study of the effects of open sourceadoption on software development economics. J. Systems Soft-ware 80(9):1517–1529.

Allen TJ (1977) Managing the Flow of Technology (MIT Press,Cambridge, MA).

Baldwin CY, Clark KB (2000) Design Rules: The Power of Modularity(MIT Press, Cambridge, MA).

Becker SO, Ichino A (2002) Estimation of average treatment effectsbased on propensity scores. Stata J. 2(4):358–377.

Braha D, Bar-Yam Y (2007) The statistical mechanics of complexproduct development: Empirical and analytical results. Man-agement Sci. 53(7):1127–1145.

Briand LC, Daly J, Wüst J (1999) A unified framework for couplingmeasurement in object-oriented systems. IEEE Trans. SoftwareEngrg. 25(1):91–121.

Browning TR (2001) Applying the design structure matrix to systemdecomposition and integration problems: A review and newdirections. IEEE Trans. Engrg. Management 48(3):292–306.

Burrows R, Ferrari FC, Lemos OAL, Garcia A, Taïani F (2010) Theimpact of coupling on the fault-proneness of aspect-orientedprograms: An empirical study. Proc. IEEE 21st Internat. Sympos.Software Reliability Engrg. (ISSRE), San Jose, CA, 329–338.

Cabigiosu A, Camuffo A (2012) Beyond the “mirroring” hypothesis:Product modularity and interorganizational relations in the airconditioning industry. Organ. Sci. 23(3):686–703.

Cameron AC, Trivedi PK (1998) Regression Analysis of Count Data(Cambridge University Press, Cambridge, UK).

Card DN, Agresti WW (1988) Measuring software design complex-ity. J. Systems Software 8(3):185–197.

Card DN, Glass RL (1990) Measuring Software Design Quality(Prentice-Hall, Englewood Cliffs, NJ).

Cataldo M, Wagstrom P, Herbsleb JD, Carley KM (2006) Identifica-tion of coordination requirements: Implications for the designof collaboration and awareness tools. Proc. ACM Conf. Comput.-Supported Cooperative Work (ACM, New York), 353–362.

Chidamber S, Kemerer C (1994) A metrics suite for object-orienteddesign. IEEE Trans. Software Engrg. 20(6):476–493.

Clarkson PJ, Simons C, Eckert C (2004) Predicting change propaga-tion in complex design. J. Mech. Design 126(5):788–797.

Colfer L, Baldwin CY (2010) The mirroring hypothesis: Theory, evi-dence and exceptions. HBS Working Paper 10-058, HarvardBusiness School, Boston.

Cua KO, McKone KE, Schroeder RG (2001) Relationships betweenimplementation of TQM, JIT, and TPM and manufacturing per-formance. J. Oper. Management 19(6):675–694.

Eppinger SD, Browning TR (2012) Design Structure Matrix Methodsand Applications (MIT Press, Cambridge, MA).

Eppinger SD, Whitney DE, Smith RP, Gebala DA (1994) A model-based method for organizing tasks in product development.Res. Engrg. Design 6(1):1–13.

Freeman L (1979) Centrality in social networks. Conceptual clarifi-cation. Soc. Networks 1(3):215–239.

Fuentes-Fuentes MM, Albacete-Sáez CA, Lloréns-Montes FJ (2004)The impact of environmental characteristics on TQM principlesand organizational performance. Omega 32(6):425–442.

Giffin M, Weck OD, Bounova G, Keller R, Eckert C, Clarkson PJ(2009) Change propagation analysis in complex technical sys-tems. J. Mech. Design 131(8):081001-1–081001-14.

Gokpinar B, Hopp WJ, Iravani SMR (2010) The impact of mis-alignment of organizational structure and product architectureon quality in complex product development. Management Sci.56(3):468–484.

Gould RJ (1988) Graph Theory (Benjamin-Cummings, MenloPark, CA).

Harary F (1969) Graph Theory (Addison-Wesley, Reading, MA).Henderson RM, Clark KB (1990) Architectural innovation: The

reconfiguration of existing product technologies and the failureof established firms. Admin. Sci. Quart. 35(1):9–30.

Henry SM, Selig C (1990) Predicting source-code complexity at thedesign stage. IEEE Software 7(2):36–44.

Kan SH (1995) Metrics and Models in Software Quality Engineering(Addison-Wesley, Reading, MA).

Kataoka Y, Ernst MD, Griswold WG, Notkin D (2001) Automatedsupport for program refactoring using invariants. Proc. IEEEInternat. Conf. Software Maintenance, Florence, Italy, 736–743.

Klier T (2009) From tail fins to hybrids: How Detroit lost its domi-nance of the U.S. auto market. Econom. Perspect. 33(2):2–17.

Kreft I, Leeuw Jd, Aiken L (1995) The effect of different forms ofcentering in hierarchical linear models. Multivariate BehavioralRes. 30(1):1–21.

Krishnan V, Eppinger SD, Whitney DE (1997) A model-basedframework to overlap product development activities. Manage-ment Sci. 43(4):437–451.

Li X, Hamblin DJ (2003) The impact of performance and practicefactors on UK manufacturing companies’ survival. Internat. J.Production Res. 41(5):963–979.

MacCormack A, Baldwin CY, Rusnak J (2008) The impact of compo-nent modularity on design evolution: Evidence from the soft-ware industry. HBS Working Paper 08-038, Harvard BusinessSchool, Boston.

MacCormack A, Baldwin CY, Rusnak J (2012) Exploring the dualitybetween product and organizational architectures: A test of the“mirroring” hypothesis. Res. Policy 41(8):1309–1324.

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


MacCormack A, Rusnak J, Baldwin CY (2006) Exploring the struc-ture of complex software designs: An empirical study of opensource and proprietary code. Management Sci. 52(7):1015–1030.

Martin RC (2002) Agile Software Development (Prentice Hall,Englewood Cliffs, NJ).

McCabe T (1976) A complexity measure. IEEE Trans. Software Engrg.2(4):308–320.

Mens T, Tourwé T (2004) A survey of software refactoring. IEEETrans. Software Engrg. 30(2):126–139.

Mihm J, Loch C, Huchzermeier A (2003) Problem-solving oscil-lations in complex engineering projects. Management Sci.49(6):733–750.

Miller GA (1956) The magical number seven, plus or minus two:Some limits on our capacity for processing information. Psych.Rev. 63(2):81–97.

Mockus A, Fielding RT, Herbsleb J (2000) A case study of opensource software development: The Apache server. Proc. 22ndInternat. Conf. Software Engrg. (ICSE), Limerick, Ireland, 263–272.

Mohrman SA, Tenkasi RV, Edward EL, Ledford GE (1995) Totalquality management: Practice and outcomes in the largest USfirms. Employee Relations 17(3):26–41.

Parnas DL (1972) On the criteria to be used in decomposing sys-tems into modules. Commun. ACM 15(12):1053–1058.

Pich MT, Loch CH, Meyer AD (2002) On uncertainty, ambigu-ity and complexity in project management. Management Sci.48(8):1008–1023.

Pinedo M, Sehasdri S, Zemel E (2000) The Ford-Firestone case.Teaching case, Stern School of Business, New York University,New York.

Rabe-Hesketh S, Skrondal A (2008) Multilevel and Longitudinal Mod-eling Using Stata, 2nd ed. (Stata Press, College Station, TX).

Ramachandran K, Krishnan V (2008) Design architecture and intro-duction timing for rapidly improving industrial products.Manufacturing Service Oper. Management 10(1):149–171.

Raudenbush S, Bryk A (2002) Hierarchical Linear Models, Applica-tions and Data Analysis Methods, 2nd ed. (Sage Publications,Thousand Oaks, CA).

Roberts JA, Hann I-H, Slaughter SA (2006) Understanding the moti-vations, participation, and performance of open source soft-ware developers: A longitudinal study of the Apache projects.Management Sci. 52(7):984–999.

Rosenbaum P, Rubin D (1983) The central role of the propen-sity score in observational studies for causal effects. Biometrika70(1):41–55.

Sangal N, Jordan E, Sinha V, Jackson D (2005) Using dependencymodels to manage complex software architecture. Proc. 20thAnnual ACM SIGPLAN Conf. Object-Oriented Programming, Sys-tems, Languages, Appl. (OOPSLA) (ACM, New York), 167–176.

Sharman DM, Yassine AA (2004) Characterizing complex productarchitectures. Systems Engrg. 7(1):35–60.

Shaw M, Garlan D (1996) Software Architecture (Prentice Hall, UpperSaddle River, NJ).

Simon F, Steinbrückner F, Lewerentz C (2001) Metrics based refac-toring. Proc. 5th Eur. Conf. Software Maintenance and Reengineer-ing, Lisbon, Portugal, 30–38.

Simon HA (1996) The Sciences of the Artificial, 3rd ed. (MIT Press,Cambridge, MA).

Singer JD, Willett JB (2003) Applied Longitudinal Data Analysis(Oxford University Press, New York).

Smith RP, Eppinger SD (1997a) Identifying controlling features ofengineering design iteration. Management Sci. 43(3):276–293.

Smith RP, Eppinger SD (1997b) A predictive model of sequen-tial iteration in engineering design. Management Sci. 43(8):1104–1120.

Sosa ME, Browning T, Mihm J (2007a) Studying the dynamics of thearchitecture of software products. Proc. ASME Internat. DesignEngrg. Tech. Conf. Comput. Inform. Engrg. Conf. (IDETC/CIE2007) (ASME, New York), 329–342.

Sosa ME, Eppinger SD, Rowles CM (2004) The misalignment ofproduct architecture and organizational structure in complexproduct development. Management Sci. 50(12):1674–1689.

Sosa ME, Eppinger SD, Rowles CM (2007b) A network approach todefine modularity of components in product design. J. Mech.Design. 129(11):1118–1129.

Sosa ME, Mihm J, Browning TR (2011) Degree distributionand quality in complex engineered systems. J. Mech. Design133(10):101008-1–101008-15.

Staats B (2012) Unpacking team familiarity: The effects of geo-graphic location and hierarchical role. Production Oper. Manage-ment 21(3):619–635.

Stevens WP, Myers GJ, Constantine LL (1974) Structured design.IBM Systems J. 13(2):115–139.

Steward DV (1981) The design structure system: A method for man-aging the design of complex systems. IEEE Trans. Engrg. Man-agement 28(3):71–74.

Tarjan R (1972) Depth-first search and linear graph algorithms.SIAM J. Comput. 1(2):146–160.

Terwiesch C, Loch CH (1999) Managing the process of engi-neering change orders: The case of the climate control sys-tem in automobile development. J. Product Innov. Management16(2):160–172.

Terwiesch C, Loch CH, Meyer AD (2002) Exchanging preliminaryinformation in concurrent engineering: Alternative coordina-tion strategies. Organ. Sci. 13(4):402–419.

Thompson JD (1967) Organizations in Action (McGraw-Hill,New York).

Ulrich KT (1995) The role of product architecture in the manufac-turing firm. Res. Policy 24(3):419–440.

Ulrich KT, Eppinger SD (2012) Product Design and Development,5th ed. (McGraw-Hill, New York).

Vuong QH (1989) Likelihood ratio tests for model selection andnon-nested hypotheses. Econometrica 57(2):307–333.

Warfield JN (1973) Binary matrices in system modeling. IEEE Trans.Systems, Man, Cybernetics 3(5):441–449.

Wasserman S, Faust K (1994) Social Network Analysis (CambridgeUniversity Press, Cambridge, UK).

Dow

nloa

ded

from

info

rms.

org

by [

175.

176.

173.

30]

on 2

8 A

pril

2015

, at 2

0:14

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Manufacturing & Service Operations Management · ISSN 1523-4614 (print) Š ISSN 1526-5498 (online) ...

Documents

Transcript of Manufacturing & Service Operations Management · ISSN 1523-4614 (print) Š ISSN 1526-5498 (online) ...