Information Systems Success in Free and Open Source Software...

SOFTWARE PROCESS IMPROVEMENT AND PRACTICESoftw. Process Improve. Pract. 2006; 11: 123–148Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/spip.259

Information SystemsSuccess in Free and OpenSource SoftwareDevelopment: Theory andMeasures‡ Research SectionKevin Crowston1*,†, James Howison1 and Hala Annabi21 Syracuse University, School of Information Studies, 348 Hinds Hall,Syracuse, NY 13244-4100, USA2 University of Washington, The Information School, Box 352840, Suite 370Mary Gates Hall, Seattle, WA 98195-2840, USA

Information systems success is one of the most widely used dependent variables in informationsystems (IS) research, but research on free/libre and open source software (FLOSS) often fails toappropriately conceptualize this important concept. In this article, we reconsider what successmeans within a FLOSS context. We first review existing models of IS success and successvariables used in FLOSS research and assess them for their usefulness, practicality and fit tothe FLOSS context. Then, drawing on a theoretical model of group effectiveness in the FLOSSdevelopment process, as well as an on-line discussion with developers, we present additionalconcepts that are central to an appropriate understanding of success for FLOSS.

In order to examine the practicality and validity of this conceptual scheme, the second halfof our article presents an empirical study that demonstrates operationalizations of the chosenmeasures and assesses their internal validity. We use data from SourceForge to measure theproject’s effectiveness in team building, the speed of the project at responding to bug reportsand the project’s popularity. We conclude by discussing the implications of this study for ourproposed extension of IS success in the context of FLOSS development and highlight futuredirections for research. Copyright © 2006 John Wiley & Sons, Ltd.

KEY WORDS: free/libre open source software; information systems success; concept development; survival analysis

∗ Correspondence to: Kevin Crowston, Syracuse University School of Information Studies, 348 Hinds Hall, Syracuse, NY 13244-4100USA†E-mail: [email protected]‡Earlier versions of this article appeared as: Crowston, K., Annabi, H., & Howison, J. (2003): Defining open source software projectsuccess, iI in Proceedings of the 24th International Conference on Information Systems (ICIS 2003), Seattle, WA; Crowston, K.,Annabi, H., Howison, J., & Masango, C. (2004): Towards a portfolio of FLOSS project success measures, Article presented at theWorkshop on Open Source Software Engineering, 26th International Conference on Software Engineering, Edinburgh.

Copyright © 2006 John Wiley & Sons, Ltd.

Research Section K. Crowston, J. Howison and H. Annabi

1. INTRODUCTION

The long-term goal of our research is to iden-tify processes that enable distributed softwareteam performance, specifically, the performance offree/libre open source software (FLOSS) develop-ment teams. In this article, we take a needed step inthis direction by developing measures for the suc-cess of FLOSS projects. This step is needed becausewe will not be able to improve software processesif we cannot identify what constitutes an improve-ment. Information systems (IS) success is one ofthe most widely used dependent variables in ISresearch. Not surprisingly, much attention has beengiven to how best to measure it (e.g. DeLone andMcLean 1992, 2002, 2003, Seddon et al. 1999, Rai et al.2002, Seddon 1997). However, the unique nature ofFLOSS development makes some measures moreappropriate than others and requires the additionof hitherto unconsidered measures.

FLOSS is a broad term used to embrace softwarethat is developed and released under either a ‘freesoftware’ or an ‘open source’ license.1 Both opensource and free software are free in two senses:‘free as in speech’, meaning that the code may beredistributed and reused in other FLOSS projects,and ‘free as in beer’, meaning that the software isavailable for download without charge. As well,many (though by no means all) FLOSS developerscontribute to projects as volunteers without work-ing for a common organization or being paid. As wewill discuss, these two characteristics have impli-cations for the applicability of certain measures ofsuccess.

It is important to develop measures of success forFLOSS projects for at least two reasons. First, having

1 The free software movement and the open source movementare distinct but share important characteristics. The licensesthey use allow users to obtain and distribute the software’soriginal source without charge (software is ‘free as in beer’)and to inspect, modify and redistribute modifications to thesource code. While the open source movement views thesefreedoms pragmatically (as a ‘development methodology’), thefree software movement emphasizes the meaning of ‘free as inspeech,’ which is captured by the French/Spanish ‘libre’, andone of their methods of supporting those freedoms is ‘copyleft,’famously embodied in the General Public License, meaning thatderivative works must be made available under the same licenseterms as the original. See http://www.gnu.org/philosophy/andhttp://opensource.org. While the differences and similaritiesof the movements are interesting, this article focuses ondevelopment practices in distributed work and those are largelyshared across the movements. We therefore choose the acronymFLOSS, standing for free/libre and open source software.

such measures will be useful for FLOSS project lead-ers in assessing their projects. In some cases, FLOSSprojects are sponsored by third parties, so measuresare useful for sponsors to understand the return ontheir investment. Second, FLOSS is an increasinglyvisible and copied mode of systems development.Millions of users, including major corporations,depend on FLOSS systems such as Linux (and,of course, the Internet, which is heavily dependenton FLOSS tools), but as Scacchi (2002) notes, ‘littleis known about how people in these communitiescoordinate software development across differentsettings, or about what software processes, workpractices, and organizational contexts are necessaryto their success’. An EU/NSF workshop on prior-ities for FLOSS research identified the need bothfor learning ‘from open source modes of organiza-tion and production that could perhaps be appliedto other areas’ and for ‘a concerted effort on opensource in itself, for itself’ (Ghosh 2002). But to beable to learn from teams that are working well, weneed to have a definition of ‘working well’.

1.1. Outline of Article

This search for appropriate FLOSS success mea-sures presented in this article proceeds accordingto the following outline. In the first half of thearticle, we develop a richer conceptualization ofsuccess measures for FLOSS drawing on a numberof sources. We first review the literature on IS suc-cess to see what measures might be adopted and toidentify problems in applying others to the contextof FLOSS development. We then step back and dis-cuss the process model underlying the existing ISmodels and extend these models to fit the FLOSScontext. We do so with reference to a model ofgroup effectiveness derived from Hackman (1987).We next assess the face validity of our conceptualscheme using the opinions of FLOSS developerselicited through SlashDot, a popular Web-baseddiscussion board (http://slashdot.org/). The com-parison suggests additional measures that might beincorporated to develop a fuller understanding ofFLOSS project success that we integrate into ourconceptual scheme of success in FLOSS develop-ment. Finally, we examine recent research articleson FLOSS to see what measures of success havebeen used in practice, and comment on their appro-priateness and utility. The result of this conceptualdevelopment work is a set of possible measures

Copyright © 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 123–148

124

Research Section Success in FLOSS Development: Theory and Measures

of FLOSS development effectiveness and relatedoperationalizations.

In order to examine the practicality and validityof this conceptual scheme, in the second half ofour article we present an empirical study thatdemonstrates its operationalization and assesses theinternal validity of the measures. For this purpose,we use data from SourceForge, the largest hub forFLOSS development projects. Finally we concludeby discussing the implications of this study for ourproposed extension of IS success in the context ofFLOSS development and highlight future directionsfor research.

2. THEORY DEVELOPMENT: MEASURINGTHE SUCCESS OF FLOSS DEVELOPMENT

In this section, we describe the process throughwhich we developed a conceptual model of successmeasures for FLOSS development. We discuss inturn our review of models of success in the ISliterature, extensions to existing conceptual models,feedback from FLOSS developers and review ofmeasure of success applied in the empirical FLOSSliterature.

2.1. Literature Review: Conceptual Modelsof Information System Success

FLOSS is a form of system development, so webegin our hunt for success measures in the IS liter-ature. Note though that we are not attempting anexhaustive review of this extensive literature, butrather are using the conceptual models presentedin the literature to identify success measures rele-vant to FLOSS. The most commonly cited modelfor IS success is that of DeLone and McLean (1992,

2002, 2003), shown in Figure 1. This model sug-gests six interrelated measures of success: systemquality, information quality, use, user satisfaction,individual impact and organizational impact. Sed-don (1997) proposed a related model that includessystem quality, information quality, perceived use-fulness, user satisfaction and IS use. Taken together,these models suggest a number of possible measuresthat could be applied to FLOSS.

2.1.1. System and Information QualityCode quality has been studied extensively in softwareengineering. This literature provides many possiblemeasures of the quality of software including under-standability, completeness, conciseness, portability,consistency, maintainability, testability, usability,reliability, structuredness and efficiency (Boehmet al. 1976, Gorton and Liu 2002). ISO standard 9126defines software quality as including functionality,reliability, usability, efficiency, maintainability andportability, each with subdimensions. A commonlyused measure of quality is the number of defects perthousand lines of code (Diaz and Sligo 1997, Goranson1997) or the probability of a fault in a module (Basiliet al. 1994). To this list should be added the qualityof the system documentation. Code quality measureswould seem to be particularly practical for studies ofFLOSS, since the code is publicly available. Indeed, afew studies have already examined this dimension.For example, Stamelos et al. (2002) suggested thatFLOSS code is generally of good quality. Mishraet al. (2002) offer an analytic model that suggestsfactors contributing to FLOSS code quality, such asnumber of developers, mix of talent level, etc. Onthe other hand, not many FLOSS systems includeinformation (i.e. data) per se, so the dimension ofinformation quality seems to be less applicable.

Figure 1. DeLone and McLean’s Model of IS Success (DeLone and McLean (1992), Figure 2, p. 87) ’’Reprinted bypermission. Copyright 1992 INFORMS. DeLone and McLean ER. 1992. Information Systems Success. The quest for thedependent variable Information Systems Research 3(1): 60–95, The institute for Operations Research and the ManagementSciences, 7240 Parkway Drive, Suite 310, Hanover, Maryland 21076, USA’’


125


2.1.2. User SatisfactionUser satisfaction is an often-used measure of systemsuccess. For example, it is common to ask stakehold-ers if they felt a project was a success (e.g. Guinanet al. 1998). There is some data available regardinguser satisfaction with FLOSS projects. For exam-ple, Freshmeat, a Web-based system that tracksreleases of FLOSS (http://freshmeat.net/), collectsuser ratings of projects. Unfortunately, these ratingsare based on a nonrandom sample (i.e. users whotake the time to volunteer a rating), making theirrepresentativeness suspect. Furthermore, we haveobserved that the scores seem to have low variance:in a recent sample of 59 projects, we found thatscores ranged only from 7.47 to 9.07 out of 10. Itseems likely that users who do not like a piece ofsoftware simply do not bother to enter ratings. Theredo not seem to be any easily obtainable data on therelated measures of perceived ease of use and useful-ness (Davis 1989). Opinions expressed on projectmailing lists are a potential source of qualitativedata on these facets, though again there would bequestions about the representativeness of the data.

In principle, it should be possible to survey usersto collect their satisfaction with or perceptions ofthe software. However, to do so properly posesa serious methodological problem. Because mostFLOSS projects are freely distributed through mul-tiple channels, the population of users is unknown,making it impossible to create a true random sampleof users. In this respect, FLOSS differs greatly fromIS developed in an organizational setting or for thecommercial market, which have a clearly defineduser population. The situation is also different fromthat for the Web, another nontraditional systemsenvironment, because with a Web site users are bydefinition the ones who visit the site, making thepopulation effectively self-identifying. To achievethe same effect for FLOSS, the best solution mightbe to build the survey into the software, thoughdoing so might annoy some users. For example,recent versions of the Mozilla Web browser includea program that offers to report crashes and collectother feedback.

2.1.3. UseAlthough there is some debate about its appropri-ateness (DeLone and McLean 2003, Seddon 1997),many studies employ system use as an indicationof IS success. For software for which use is vol-untary, as is the case for most FLOSS, use seems

like a potentially relevant indicator of the project’ssuccess. Some interesting data are available. Forrare projects, these numbers can be directly mea-sured. For example, Netcraft conducts a survey ofWeb server deployment,2 which estimates the mar-ket share of different Web servers. Other projectsthat require some kind of network connectioncould potentially be measured in the same way(e.g. instant messaging or peer-to-peer file shar-ing clients), but this approach does not seem to bewidely applicable. Avery Pennarun’s Debian Pop-ularity Contest3 collects statistics on the usage ofsoftware on Linux machines running the Debian dis-tribution. Users install a program that collects andreports usage information daily and the resultingstatistics show that packages have been installed,and which of these have been recently used. Unfor-tunately, these data are collected from a nonrandomsample of machines, running a particular Linux dis-tribution, so the results are likely not representativeof use in the broader population.

Rather than measuring actual use, it may be suf-ficient to count the actual or potential number ofusers of the software, which we label ‘popularity’(Stewart and Ammeter 2002). A simple measure ofpopularity, and a popular one in the FLOSS liter-ature reviewed below, is the number of downloadsmade of a project. Download numbers are read-ily available from various sites. Of course, not alldownloads result in use, so variance in the con-version ratio will make downloads an unreliableindicator of use. Furthermore, because FLOSS canbe distributed through multiple outlets, on-line aswell as offline (e.g. on CDs), the count from anysingle source is likely to be quite unreliable as a mea-sure of total users. A particularly important channelis ‘distributions’ such as RedHat, SuSE or Debian.Distributions provide purchasers with pre-selectedbundles of software packaged for easy installationand are often sold on a CD-ROM to obviate the needto download everything. Indeed, the most popularsoftware might be downloaded only rarely becauseit is already installed on most users’ machines andstable enough to not require the download of reg-ular updates. Therefore, an important measure ofpopularity to consider is the package’s inclusion indistributions.

2 http://news.netcraft.com/archives/webserver survey.html3 http://people.debian.org/∼apenwarr/popcon/


126


Other sources of data reflecting on users are avail-able. Freshmeat provides a popularity measure forpackages it tracks, though a better name might be‘interest’, as it is one step further removed fromactual use. The measure is calculated as the geo-metric mean of subscriptions and two counts ofpage viewings of project information.4 Similarly,SourceForge provides information on the numberof page views of the information pages for projects itsupports.

Finally, it may be informative to measure usefrom perspectives other than from an end user.In particular, the openness of FLOSS means thatother projects can build on top of it. Therefore, onemeasure of a project’s success may be that manyother projects use it. Package dependency informa-tion between projects can be obtained from thepackage descriptions available through the variousdistributions’ package-management systems. Anal-ysis of source code could reveal the reuse of codefrom project to project (though identifying the trueorigin of the code could be difficult).

2.1.4. Individual or Organizational ImpactsThe final measures in DeLone and McLean’s (1992)model are individual and organizational impacts forthe users. Though there is considerable interest inthe economic implications of FLOSS, these measuresare hard to define for regular I/S projects and

4 http://freshmeat.net/faq/view/30/

doubly hard for FLOSS projects because of theproblems defining the intended user base andexpected outcomes. Therefore, these measures arelikely to be unusable for most studies of individualFLOSS projects.

2.1.5. SummaryTo summarize, existing models of IS success sug-gest a range of potential success measures for FLOSSprojects as shown in Table 1. However, a number ofthe measures are inapplicable, while others are dif-ficult to apply in the FLOSS environment. We notethat many of these measures are based on a visionof system development in an organization and donot take into account the unique characteristics ofthe FLOSS development environment. A deeperunderstanding of the differences between the pro-cess model underlying the IS success literature andthe process of FLOSS development is needed.

2.2. Reconsidering Process: the Process of FlossDevelopment

The previous section considered how success hasbeen measured in the IS literature and its appli-cability to FLOSS development. In this section, weextend these models by re-examining the visionof systems development underlying DeLone andMcLean’s success model to identify additional mea-sures that might be used for FLOSS project success.DeLone and McLean explicitly state that their model

Table 1. Success measures suggested by the IS literature

Measure of success Indicators Audience

System andinformation quality

Code quality (e.g. understandability,completeness, conciseness, portability,consistency, maintainability, testability, usability,reliability, structuredness, efficiency)

Users, developers

Documentation quality

User satisfaction User ratings Users, developersOpinions on mailing listsUser surveys

Use Use (e.g. Debian popularity contest) DevelopersNumber of usersDownloadsInclusion in distributionsPopularity or views of information pagePackage dependenciesReuse of code

Individual andorganizational impacts

Economic and other implications Users, developers


127


of project success was built by considering ‘a processmodel [that] has just three components: the creationof a system, the use of the system, and the conse-quences of this system use’ (DeLone and McLean2002), which we have shown graphically in Figure 2.We note that the measures included in the modelfocus on the use and consequences of the system (theright side of the figure), and do not open up eitherbox in the process. While this focus may be appro-priate given the traditional concern of IS researchwith the organizational implication of IS, it seems tounduly restrict the range of measures considered.

The choice of measures also seems to be influ-enced by the relative ease of access to the useenvironment compared to the development envi-ronment for packaged or commercial software. Inthe context of FLOSS though, researchers are fre-quently faced with the opposite situation, in thatmany aspects of the development process are pub-licly visible while the use environment is difficult tostudy or even identify. For both reasons, we believethat it will be useful to complement previously iden-tified IS success measures with the ones that takeadvantage of the availability of data on the devel-opment process. As a structure for analyzing thesystems development process, we draw on Hack-man’s (1987) model of effectiveness of work teams.An attractive feature of this model is that effective-ness is conceptualized along multiple dimensions.In addition to task output, Hackman includes theteam’s continued capability to work together and satis-faction of individual team members’ personal needs asrelevant outputs. The following discussion exam-ines such measures of success.

2.2.1. Measures of the Output of Systems DevelopmentTwo of the measures in the DeLone and McLean’smodel concern the product of the systems devel-opment process, namely, systems quality and

information quality. We first consider possible addi-tional measures of this process step in the FLOSScontext.

Project Completion. First, given the large numberof abandoned software projects (Ewusi-Mensah1997), simply completing a project may be a signof success. However, it is common for FLOSSprojects to be continually in development, making itdifficult to say when they are completed. Faced withthis problem, Crowston and Scozzi (2002) insteadmeasured success as the progress of a project fromalpha to beta to stable status, as self-reported by theteam. For example, for many teams the 1.0 releaseis a significant milestone.

Second, another commonly used measure of suc-cess is whether the project achieved its goals. Thisassessment is typically made by a comparison ofthe project outcomes with the formal requirementsspecifications. However, FLOSS projects often donot have such specifications. Scacchi (2002) exam-ined the process of ‘requirements engineering’ inopen source projects and provided a comparisonwith the traditional processes (e.g. Davis 1990,Jackson 1995). He argues that rather than a for-mal process, FLOSS requirements are developedthrough what he terms ‘software informalisms’,which do not result in agreed ‘requirements doc-umentation’ that could later be analyzed to seewhether the project has met its goals. Scacchi’sethnography further suggests that for FLOSS, goalswill likely come from within the project througha discursive process centered on the developers.Therefore, a key measure for FLOSS may be simplydeveloper satisfaction with the project, which corre-sponds to Hackman’s (1987) individual satisfactiondimension for group performance. Developer satis-faction could be measured by surveying developers:the developer community is much more clearly

Figure 2. Process model underlying the DeLone and McLean (1992) model of success


128


delineated than users, making such a survey feasi-ble. Indeed, there have already been several FLOSSdeveloper surveys (e.g. Ghosh 2002, Hertel et al.2004), though not on this topic specifically. Since inmany projects there is a great disparity in the contri-bution of developers – a few developers contributethe bulk of the code (Mockus et al. 2000) – it may bedesirable to weight developers’ opinions in formingan overall assessment of a project.

2.2.2. Measures of the Process of Systems DevelopmentIn DeLone and McLean’s (1992) process model,systems development is implicitly treated as aone-off event. However, for FLOSS projects (andindeed many other types of projects) development isinstead an ongoing activity, as the project continuesto release ‘often and early’ (Raymond 1998). Inother words, a FLOSS project is often characterizedby a continuing process of developers fixing bugs,adding features and releasing new versions ofthe software. This characteristic of the FLOSSdevelopment process suggests a number of possibleindicators of success.

Number of Developers. First, since many FLOSSprojects are dependent on volunteer developers, theability of a project to attract and retain developerson an ongoing basis is important for its success.Thus the number of developers involved in a projectcould be an indicator of success. The numberof developers can be measured in at least twoways. First, FLOSS development systems such asSourceForge list developers who are formally associatedwith each project, a measure that could also bediscovered through examination of ConcurrentVersions System (CVS) logs for projects wheredevelopers contribute code directly (i.e. not viaa mediated patch submission process). Second,examination of the mailing lists and other foraassociated with projects can reveal the number ofindividuals who actively participate in developmentactivities without being formally associated withthe project.

Level of Activity. More important than the sheernumber of developers is their contribution to aproject. Thus the level of activity of developers insubmitting code and bug reports may be usefulas an indicator of project success. For example,SourceForge computes and reports a measure

of project activity on the basis of the activitiesof developers. Researchers could also examinedevelopment logs for evidence of software beingwritten and released.

Cycle Time. Another measure related to the groupactivity is time between releases. In FLOSS develop-ment, there is a strong community norm to ‘releaseearly and release often’, which implies that an activerelease cycle is a sign of a healthy developmentprocess and project. For example, Freshmeat pro-vides a vitality score (Stewart and Ammeter 2002)that assesses how recently a project has made anannouncement of progress on the Freshmeat site.5In addition, detailed examination of bug-fixing andfeature-request fulfillment activities might yielduseful process data indicative of the project’s status.These processes involve interaction with the usercommunity and can involve applying patches ofcontributed code supplied by noncore developers.Bug reports and feature requests are often managedthrough a task-management system that records thedeveloper and community discussions and permitslabeling of priority items and sometimes includesinformal ‘voting mechanisms’ to allow the com-munity to express its level of interest in a bug ornew feature. The time to close bugs (or implementrequested features) and the proportion of bugs fixedtherefore might be helpful measures of the strengthof a project’s processes and thus indirectly of itssuccess.

2.2.3. Effects on Project TeamsFinally, because the projects are ongoing, it seemsimportant to consider the impact of a project on theabilities of the project team itself and its ability tocontinue or improve the development process. Thisdimension corresponds to the Hackman’s (1987)dimension of the team’s continued capability towork together as a measure of team performance.As Shenhar et al. put it, ‘how does the currentproject help prepare the organization for futurechallenges?’ (Shenhar et al. 2001).

Employment Opportunities. Some literature on themotivation of FLOSS developers suggests thatdevelopers participate to improve their employ-ment opportunities (e.g. Lerner and Tirole 2002).

5 http://freshmeat.net/faq/view/27/


129


Thus, one can consider salary or jobs acquired throughthe involvement in a particular project as possiblemeasures of success (Hann et al. 2004). For example,Hann et al. (2002) found that higher status withinthe Apache Project was associated with signifi-cantly higher wages. Again, one might measurethese indicators by surveying developers. Whilefor a single developer these measure are con-founded with innate talent, training, luck, etc.,aggregating across many developers and acrosstime may provide a useful project-level measureof success.

Individual Reputation. Similarly, literature alsosuggests that developers participating in FLOSSprojects are rewarded with reputation in the commu-nity, and that this reputation is a sufficient rewardfor interaction. Kelty (2001) suggests that reputa-tion might be measured through an analysis ofcredits located in source code (which he terms‘greputation’). Alternative measures of FLOSS rep-utation might include the FLOSS communities’implementation of a ‘Web of Trust’ at the com-munity site Advogato6 where developer status isconferred through peer review. Analyses of thiskind of measure face the difficulty of tying theearning of reputation to the success of a par-ticular project and systems that have incompleteparticipation.

Knowledge Creation. Projects can also lead to cre-ation of new knowledge for individuals as well as onthe group level (Arent and Nørbjerg 2000). Throughtheir participation in a project, individual develop-ers may acquire new procedural and programmingskills that would benefit them on future projects.This effect could be measured by surveying thedevelopers for their perceived learning. In addition,following Grant’s (1996) knowledge-based view ofthe firm, one can consider the project as a structureto integrate members’ knowledge into products. Inthis view, the project’s rules, procedures, normsand existing products are a reflection of knowledgebeing created by the project activities. This knowl-edge creation can be measured by observing andqualitatively analyzing changes in the written rulesand procedures over time and may be reflected andtransferred through the development of systems for

6 http://www.advogato.org/trust-metric.html

Table 2. Measures suggested by a re-examination of the FLOSSprocess

Measure ofsuccess

Indicators Audience

Project output Movement from alpha tobeta to stable

Developers

Achieved identified goalsDeveloper satisfaction

Process Number of developers Developers,Level of activity(developer and usercontributions, number ofreleases)

users

Time between releasesTime to close bugs orimplement features

Outcomes forproject members

Individual jobopportunities and salary

Developers

Individual reputationKnowledge creation

FLOSS project support, such as SourceForge andSavannah. Analysis of the development of interac-tions and support systems closely linked to a projectmight give some insight into this aspect of projectsuccess.

2.2.4. Summary of Measures from ProcessReconsiderationIn summary, consideration of the process of devel-oping FLOSS suggests a number of additionalmeasures indicative of success for these projects.These measures are summarized in Table 2. Wenote that as the measures move further back in theprocess model, they become increasingly removedfrom the user. As such, there may be a concernabout their validity as measures of success: is ita success if a project attracts developers but notusers? Or if it develops high quality processes butnot high quality code? We have two replies tothis concern. First, the apparent disconnect may bean accurate representation of the reality of FLOSSprojects, in which the developers frequently arethe users. Second, the measures developed in thissection should be viewed as complements to ratherthan replacements for the more conventional mea-sures of success. Using a variety of measures willprovide a richer picture of the status of a project.As well, because many of the individual measuresseem likely to have measurement problems, or mea-sure success from different perspectives, adoptinga portfolio of measures seems prudent.


130


2.3. Seeking Input from Floss Developers

Having reviewed the IS literature on IS success andextended this conceptual model by considering theunique features of the FLOSS development and useenvironments, we continued our theory building byturning to the FLOSS community for input on theirdefinitions of success. Our goal was to extend ourconceptual model by generating a range of ideas tocompare and extend our list of factors, rather thanthe relative importance of each one or the relationsamong them.

To solicit input, we posted a question solicit-ing feedback on SlashDot7 a Web-based discussiongroup that attracts considerable interest and par-ticipation from FLOSS developers and users. Thisdata elicitation technique was more like an on-linefocus group (or perhaps the initial stage of a Del-phi study) than a survey, as respondents were anonrandom sample, and could see and respondto earlier postings. The rationale for this approachto data elicitation was to match our goal of gen-erating ideas about success measures, rather thantesting a theory or making inferences from gen-eralizable data. To elicit comments, the followingquestion was posted on SlashDot on 22 April 2003(http://slashdot.org/article.pl?sid=03/04/21/239212):

‘There have been a number of discussionson SlashDot and elsewhere about how goodprojects work (e.g. Talk To a Successful FreeSoftware Project Leader), but less about howto tell if things are going well in the firstplace. While this may seem obvious, mosttraditional definitions of software project suc-cess seem inapplicable (e.g. profit) or nearlyimpossible to measure for most projects (e.g.market share, user satisfaction, organizationalimpact). In an organizational setting, develop-ers can get feedback from their customers, themarketplace, managers, etc.; if you’re Apache,you can look at Netcraft’s survey of serverusage; but what can the rest do? Is it enoughthat you’re happy with the code? I suspect thatthe release-early-and-often philosophy playsan important role here. I’m asking not to pickwinners and losers (i.e. NOT a ranking ofprojects), but to understand what developers

7 http://slashdot.org/

look at to know when things are going welland when they’re not.’

The question received 201 responses within a fewdays. A transcript of responses was downloadedon 26 April 2003. Many of the individuals postinganswers to our question identified themselves asdevelopers or contributors to FLOSS projects. As acheck on their qualifications, we searched Source-Forge for information about the posters. AlthoughSourceForge and SlashDot are separate sites, manydevelopers have strong attachments to their userIDs and use the same one whenever possible, pro-viding a possible link between the two systems.For example, it seems reasonable to expect that theuser ID Abcd1234 identifies the same individualon both systems. We identified the SlashDot IDs of72 posters who provided useful responses (someresponses were anonymous). Of these seventy-two,34 IDs matched a SourceForge ID exactly, and sixcould be matched with a bit of research (e.g. bymatching the real name of the individual; real namesare available for a few SlashDot posters and manySourceForge developers). Of the matched IDs, 16and 3 respectively were listed as members of Source-Forge projects (i.e. about half). A few other postershad pointers to non-SourceForge FLOSS projectson their SlashDot information page or informa-tion about their employment, generally as softwaredevelopers. These data are not conclusive, but dosuggest that a number of the contributors to thestudy had sufficient background as FLOSS devel-opers to be able to comment knowledgeably.

The transcript was content-analyzed by twocoders. The content analysis process was carried outusing Atlas-ti, a qualitative data analysis softwarepackage. Messages were coded using the thematicunit as the unit of analysis. Once a measure wasidentified within a message, the coder selected thetext containing the measure and coded that textusing categories from our coding scheme. A totalof 170 thematic units were identified and coded in91 responses (i.e. some postings contained multipleunits; the remaining responses did not contain textaddressing the question, e.g. a posting containingan advertisement). The content analysis processemployed a mixture of deductive and inductiveprocedures. The initial content analytic scheme wasbased on the literature review described above.During the process of content analysis, additionalthemes emerged from the data. Data analysis


131


Table 3. Results of the content analysis of SlashDot responses

Level 1 Level 2 Frequency Percentage (%)

User Satisfaction 14 8Involvement 25 15

Product Meetingrequirements

9 5

Code quality 11 6Portability 1 1Availability 2 1

Process Activity 5 3Adherence toprocess

10 6

Bug fixing 4 2Time 2 1Age 1 1

Developers Involvement 16 9Varieddevelopers

2 1

Satisfaction 29 17Enjoyment 8 5

Use Competition 4 2Number of users 2 1Downloads 3 2

Recognition Referral 3 2Attention andrecognition

9 5

Spin-offs 6 4Influence 4 2

Total 170

continued until saturation was reached. The tworaters agreed on the codes for 78% of the units. Wefelt that this level of agreement was sufficient for thepurposes of the analysis (identification of measuresto compare to the literature review), so we did notgo on to refine the definitions of codes or retrainthe coders to increase agreement. The results of thecontent analysis are summarized in Table 3.

The codes were organized into a two-levelhierarchy for presentation, with detailed codes(Level 2 in the table) clustered into meta-categories(Level 1). 32% of the units included elementsfrom the Developers meta-category, indicating thatthe respondents felt that a project is successful ifthe developers are involved, satisfied, enjoyed theprocess and that there is a variety of them. TheUsers meta-category also had a large number ofresponses. 23% of units indicated that the poster felta project was successful if it satisfies users (otherthan developers) and that users are involved indiscussions and bug reports. Involvement of bothusers and developers was frequently mentioned,accounting for 31% of the units. Project recognition

codes were found in 11% of the units, exceeding thenumber of responses indicating Use as a measureof success, which accounted for 5% of instances.Finally, the product’s Quality (13%) and Process(13%) were suggested to be measures of successby developers as well. Note that percentages arereported only to more completely describe the data.Given the nonrandom sample of contributors andthe open data elicitation technique, the frequency ofa response should not be interpreted as important.

Overall, the responses of the developers postingon SlashDot were in general agreement with thelist of success measures we developed from theliterature and our re-examination of the process.The analysis indicates that developers found theirpersonal involvement, satisfaction and enjoymentto be measures of the success of a project, consistentwith the view of FLOSS as ‘software that scratchesan itch’. However, some new themes did emergefrom the coding.

• First, a number of respondents suggested recog-nition (e.g. mention on other sites) as a measureof project success. This measure could be opera-tionalized by searching the Web for the projectname (for projects with unique names) or for theURL of the project’s home page. A related sug-gested measure was the influence of the productor project’s process on other FLOSS groups andother commercial settings. These responses areconsistent with the literature on FLOSS devel-opers’ motivations that suggest recognition as aprimary motivation for involvement.

• A second category that emerged was the levelof involvement of the users as indicated byinvolvement of the users in submitting bugreports and participating in the project mailinglists. We had considered contributions fromdevelopers, but these responses emphasize thefact that FLOSS projects are also dependent onhelp from users to identify problems and postsuggestions.

• A final category that emerged from the data wasthe issue of porting, that is, moving a systemfrom one operating system platform to another.Developers consider porting of a product todifferent systems (especially to Windows) andrequests for such ports as a measure of thesuccess of the product. This theme might beconsidered a special case of popularity.


132


What was also surprising was what respondentsdid not say, in that none of the respondents men-tioned some of the measures of success we hadidentified. For example, though several authorshave suggested that developers are motivated bythe chance to learn and perhaps get a better job,none of the respondents mentioned these factors. Apossible explanation is a strong community norm onSlashDot that endorses altruism over expressions ofself-interest, which may have restricted discussionin that non-anonymous and community-moderatedforum.

2.4. Success Measures in Recent Floss Research

In the previous sections, we developed a listof possible success measures for FLOSS projectson the basis of a review of the IS literature,with extensions based on a consideration of theFLOSS development process and feedback fromFLOSS developers. In this section, we examine thecurrent state of academic research on FLOSS. Tounderstand how success has been operationalizedin recent FLOSS research, we carried out a contentanalysis of the working articles posted on thehttp://opensource.mit.edu/pre-press Web site. Asof 1 March 2005, the site included 182 recentworking articles and abstracts, thus presentinga convenient cross section of the literature. Therationale for using this sample of articles was thatthe abstracts for the articles were all on one page,making it much easier to collect them (as againstdoing manual searches across multiple journalWeb sites), the articles were recent, having notbeen delayed by the publication cycle, and thearticles presented a broad cross section of FLOSSresearch, while journal publications likely havesome selection based on the interests of the journal.

Four people coded the articles on the site fortype of article (empirical vs conceptual) and foruse of project success as a dependent variable (ifany). The articles were divided into three groupsfor initial coding by three of the coders, and theempirical articles were then recoded by a fourthcoder to ensure comparability in the coding. Ofthe 182 articles, 84 were deemed to be empiricalstudies. Of these only 14 were identified as studyingsuccess, performance or effectiveness of projectteams in some manner. Table 4 shows the specificconcepts and measures we found with citations tothe associated articles.

There are two striking features of this table.Firstly, there is a wide range of alternative mea-sures of project success; the field has not set-tled on any one measure, or systematic group ofmeasures. Secondly, there are a great many mea-sures related to the process of system creation itself,especially if one includes ‘learning by developers’as an outcome measure internal to the team. Thisresult is pleasingly consistent with reconsiderationof IS success models in the FLOSS context pre-sented above, which pointed to the early stagesof systems development as particularly importantto an understanding of IS success in the FLOSScontext.

3. SUMMARY OF CONCEPTUAL SCHEMEFOR FLOSS SUCCESS

Table 5 presents a summary of the success mea-sures, and possible operationalizations that we havediscussed above. Using our reconsideration of theFLOSS process as its framework, the table drawstogether the insights from the review of the IS liter-ature, our extension of these models in light of theFLOSS process, input from FLOSS developers viaSlashDot and our analysis of the existing literaturestudying FLOSS.

4. EMPIRICAL STUDY OF SUCCESSMEASURES USING SOURCEFORGE DATA

The previous section of this article developed aconceptual model of success factors on the basis ofthe literature and theoretical considerations andcompared these to developer opinions and thecurrent state of the empirical FLOSS research. Inthis section, we continue our examination of suc-cess measures empirically, by using data fromSourceForge. This study demonstrates the opera-tionalization of three of the measures we suggestabove and allows us to assess their convergentvalidity. Following our report of this study, weconsider the implications of our findings for ourtheory building and to suggest future avenues ofresearch.

From the measures developed above, we chosethe number of developers (assessed from the recordsof the project and from bug-fixing logs), bug-fixingtime and popularity (assessed from the number


133


Table 4. Success measures in recent FLOSS research

Type Measure Operationalization Example citations

System creation Activity/effort SourceForge activity level Crowston et al. 2004Crowston and Scozzi 2002

Time invested per week byadministrators

Stewart and Gosain in press

Time spent in discussion Butler et al. 2002Number of message posts onSourceForge

Krishnamurthy 2002

Size of developmentteam

Number registered as developers onSourceForge

Krishnamurthy 2002,Stewart and Gosain in press

Number checking code into CVS Mockus et al. 2000,Reis and Fortes 2002

Number of posters on mailing lists Mockus et al. 2000Number of posters in bug tracker Crowston et al. 2004

Programmerproductivity

Lines of code per programmer per year Mockus et al. 2000

Project Active or not on SourceForge Giuri et al. 2004development status Development stage on SourceForge Crowston and Scozzi 2002

Krishnamurthy 2002,Stewart and Gosain in press

Task completion Speed in closing bugs or tracker items Mockus et al. 2000,Stewart and Gosain in pressCrowston et al. 2004

Development ofstable processes

Description of development processes Reis and Fortes 2002

System quality Modularity Modularity of source code Shaikh and Cornford 2003Correctness Defects per lines of code/deltas Mockus et al. 2000Manageability Lines of code under package

managementGonzález-Barahona and Robles2003

Maintainability Common coupling Schach et al. 2005Schach et al. 2002

System use Interest SourceForge page views Krishnamurthy 2002,Stewart and Gosain in pressCrowston et al. 2004

Copies incirculation

SourceForge downloads Krishnamurthy 2002,Stewart and Gosain in pressCrowston et al. 2004

Market share Netcraft survey Mockus et al. 2000Supporteffectiveness

Number of questions effectivelyanswered

Lakhani and Wolf 2003

Time spent seeking help/providing help Lakhani and Wolf 2003

Systemconsequences

Learning bydevelopers

Motivations of developers for readingand answering questions

Lakhani and Wolf 2003

Tool development Description of tools Reis and Fortes 2002

of downloads and viewings of project Web pages),and inclusion in distributions. These measures werechosen because they span the reconsidered FLOSSdevelopment process discussed above, includinginputs (number of developers), process (speed ofbug fixing) and output (popularity), and becausethe data were representative of the kinds of dataused in prior research.

Our analysis aims at assessing the utility of thesemeasures for future research. Each has good facevalidity, in the sense that a project that attractsdevelopers fixes bugs quickly, and which is pop-ular does seem like it deserves to be describedas a success. We are also interested in assessinghow these measures relate to one another: do theymeasure the same construct or are they measuring


134


Table 5. Summary of concepts for Information Systems success in FLOSS context

Process phase Measure Potential indicators

System creation andmaintenance

Activity/effort File releases, CVS check-ins, mailing list discussions, trackerdiscussions, surveys of time invested

Attraction and retention ofdevelopers (developersatisfaction)

Size, growth and tenure of development team throughexamination of registration, CVS logs. Posts to dev. mailinglists and trackers. Skill coverage of development team.Surveys of satisfaction and enjoyment

Advancement of project status Release numbers or alpha, beta, mature self-assessment,request for enhancements implemented

Task completion Time to fix bugs, implementing requests, meetingrequirements (e.g. J2EE specification). Time between releases

Programmer productivity Lines of code per programmer, surveys of programmer effortDevelopment of stable processesand their adoption

Documentation and discussion of processes, rendering ofprocesses into collaborative tools, naming of processes,adoption by other projects/endeavors

System quality Code quality Code analysis metrics from software engineering(modularity, correctness, coupling, complexity)

Manageability Time to productivity of new developers, amount of codeabandonment

Documentation quality Use of documentation, user studies and surveysSystem use User Satisfaction User ratings, opinions on mailing lists, user surveys

Number of users Surveys (e.g. Debian popularity contest), downloads,inclusion in distributions, package dependencies, reuse ofcode

Interest Site pageviews, porting of code to other platforms,development of competing products or spin-offs

Support effectiveness Number of questions effectively answered, time required toassist newbies

Systemconsequences

Economic implications Implementation studies, e.g. total cost of ownership, casestudies of enablement

Knowledge creation Documentation of processes, creation of toolsLearning by developers Surveys and learning episode studiesFuture income and opportunitiesfor participants

Longitudinal surveys

Removal of competitors Open sourcing (or substantial feature improvement) ofcompeting proprietary applications

different aspects of a multidimensional successconstruct? And most importantly, what insight dothey provide into the nature of the developmentprocesses in the different projects?

4.1. MethodIn order to assess the utility of these measures,we first developed operationalizations for eachmeasure on the basis of data available on theSourceForge Web site. We then collected datafrom the SourceForge system, using Web spidersto download the html pages and parsing out therelevant fields. With the data in hand, we extractedeach individual measure, achieving results, whichwe present below. We then examined the measures’relationships using a correlation matrix (with Bon-feronni corrections for multiple correlations) to see

if these measures measure the same or differentthings.

4.1.1. OperationalizationInputs: Developer Counts. We operationalized thenumber of developers involved in a project intwo ways. First, we extracted the developercount from the SourceForge project summarypages. Alternative operationalizations consideredincluded analyzing the CVS logs to count the devel-opers contributing, but this method assumes that allprojects allow all developers direct access to the CVS(rather than using a patch submission system) and,more problematically, that only those contribut-ing code to CVS should be counted as developers.Therefore, we used the project’s own definitions of


135


developers. The counts on the SourceForge sum-mary page are self-reported data, but since beinglisted as a developer there involves both an appli-cation and approval by a project administrator, itprovides a useful measure of developer involve-ment. In order to assess how well projects weredoing in attracting and retaining developers, weanalyzed the change in developer counts over time.A project that has a growing number of developersis more successful in attracting developers than onethat has a shrinking number. These changes wereanalyzed both as categorized time series and usinga weighted delta more appropriate for our intendedcorrelation study.

Second, since the FLOSS development processrelies on contributions from active users as wellas core developers, we developed a measure thatreflected the size of this extended team, ratherthan just the core developers. As a proxy for thesize of the extended development community, wecounted the number of individuals who posteda bug report or message to the SourceForgebug tracker. Alternative operationalizations of thisconstruct would include counting posters on thevarious mailing lists, including development listsand user-support lists. Analyzing instead the bugtracker was practically convenient (as we werealready collecting that data), but participationthere also demonstrates closer involvement in theproject than just posting user questions to themailing list, as well as being a venue where directinteraction between users and developers would befound.

Process: Speed of Bug Fixing. We operationalizedteam performance in the speed of bug fixing byturning to the bug tracker provided to SourceForgeprojects. We examined how long it took the programto fix bugs by calculating the lifespan of each bugfrom report to close using the opened and closedtimestamps recorded by the bug tracker. The moststraightforward analysis would be to calculate eachproject’s average bug-fixing time. However, thisapproach has several problems. First, the time takento fix bugs is highly skewed (most bugs are closedquickly, but a small number take much longer),making an average unrepresentative. Second andmore problematically, because not all bugs wereclosed at the time of our study, we do not alwaysknow the actual lifespan, but only a lower bound.This type of data is described as ‘censored’ data.

Simply leaving out these unclosed bugs would biasthe estimated time to fix bugs. Finally, analyzingonly the average does not take into account availablebug-level data. If there are differences betweenprojects in the types of bugs reported (e.g. in theirseverity), then these differences could affect theaverage lifespan for a project. In Section 4.3 wedescribe how we approached these difficulties usingthe statistical approach known as survival or eventhistory analysis.

Outputs: Popularity. Our final measure, popular-ity, was assessed in three ways. First, we extractedthe number of downloads and project page viewsreported on the SourceForge project pages.8 Becausesome projects were older than others, we opera-tionalized this aspect of popularity as downloadsand page views per day. In keeping with a port-folio approach to success measurement, we alsomeasured popularity by examining whether theproject produced programs that were included in theDebian Linux distribution, the largest distribution ofFLOSS. Debian is distributed as a base installationand a set of additional packages for different pro-grams. Not all the programs produced by FLOSSprojects are candidates for inclusion in a Linuxdistribution (for example, some projects are writ-ten only for the Windows or Mac OS X platform),so this measurement was taken only for projectsproducing programs eligible for inclusion in thedistribution.

4.2. Data Collection

To gather data about the projects, we developed aspider to download and parse SourceForge projectpages. Spidering was necessary because the Source-Forge databases were not publicly available.9 Data

8 The SourceForge Web site at the time of data collection notedthat ‘Download statistics shown on this page for recent datesmay be inaccurate’, but our examination of the data suggests asystematic under-reporting, rather than a bias in favor of oneproject or another. As a result, while the absolute numbers maybe wrong, the data are sufficient to indicate relative performanceof projects and the relationships of these data to other variables.9 In the period between our research and this publication alimited amount of SourceForge data became available directlyfrom database dumps provided to researchers via a group basedat Notre Dame (http://www.nd.edu/∼oss/Data/data.html). Itis our understanding that this data source does not includetracker or mailing list information, but certainly if the conditionsare acceptable to researchers and the data adequate, this appearsto be an excellent source of clean data.


136


were collected at multiple points to allow for alongitudinal analysis. We collected data in Febru-ary 2001 and April 2002. We also obtained datafrom Chawla et al. for October 2003 (Chawla et al.2003) and from Megan Conklin for October 2004 andFebruary 2005 (Conklin 2004). These data have beencollated and made available via the FLOSSmoleproject (http://ossmole.sourceforge.net/, Howisonet al. In press). The use of data from multi-ple points in time provides a dynamic view ofthe projects lacking in most analyses of FLOSSprojects.

At the time we started our study, SourceForgesupported more than 50,000 FLOSS projects on awide diversity of topics (the number was 78,003as of 21 March 2004 and 96,397 projects and morethan 1 million registered users by 28 February 2005).Clearly not all of these projects would be suitablefor our study: many are inactive, many are in factindividual projects rather than the distributed teamefforts we are studying, as previous studies havesuggested (Krishnamurthy 2002), and some do notmake bug reports available. While we were able toassess these difficulties at the time of our originalproject selection, the FLOSSmole data spans five

years and covers all the projects on the site over thatperiod and so we illustrate our next point with thatmore complete data. With respect to our first mea-sure, developers listed on SourceForge homepages,our data confirmed the impression gained fromprevious research that few SourceForge projectsrepresent team efforts. Of the 98,502 projects that weexamined in the SourceForge system up to February2005, 64,881 (67%) never had more than one devel-oper registered to the project at any time in the fiveyears, as shown by the distribution of developercounts shown in Figure 3.

It was also clear that not all projects used thebug tracking system sufficiently to allow the cal-culation of community size, nor the speed of bugfixing. In order to collect useful data for these mea-sures we restricted our study to projects that listedmore than seven developers and had more than100 bugs in the project bug tracker at the time ofselection in April 2002. This restriction was jus-tified theoretically as well as practically: havingmultiple developers suggests that the project isin fact a team effort. Having bug reports was anecessary prerequisite for the planned analysis, aswell as indicative of a certain level of development

Figure 3. Maximum number of listed developers per project on SourceForge projects over the 5-year period from 2001to 2005


137


Figure 4. Example bug report and followup10 messages

effort. Quite surprisingly, we identified only 140projects that met both criteria. The sample includesthe projects curl, fink, gaim, gimp-print,htdig, jedit, lesstif, netatalk, phpmyad-min, openrpg, squirrelmail and tcl. Thosefamiliar with FLOSS will recognize some of theseprojects, which span a wide range of topics andprogramming languages.

To study the performance of bug fixing, wecollected data from the SourceForge bug tracking

system, which enables users to report bugs and todiscuss them with developers. As shown in Figure 4,a bug report includes a description of a bug that canbe followed up with a trail of correspondence. Basicdata for each bug includes the date and time it wasreported, the reporter, priority and, for closed bugs,the date and time it was closed. To collect this data,

10 Adapted from http://SourceForge.net/tracker/index.php?func=detail&aid=206585&group id=332&atid=100332


138


we developed a spider program that downloadedand parsed all bug report pages for the selectedprojects. The spider was run in March 2003. Unfortu-nately, between selection of projects and data collec-tion, some projects restricted access to bug reports,so we were able to collect data for only 122 projects.

Once obtained and parsed we conducted a basicexploration of the data for the purposes of datacleaning, which revealed problems with the qualityof the data for some of the projects. For example, oneteam had apparently consolidated bug reports fromanother bug tracking system into the SourceForgetracker. These copied-over bugs all appeared inSourceForge to have been opened and closed withinminutes, so this project was eliminated from furtheranalysis. Another project was eliminated becauseall of the bug reports were in Russian, making thedata impossible for us to interpret (and apparentlyfor others as well: only three posters had partici-pated, despite nine developers being registered tothe project). As a result, the sample for the remain-der of the analysis was reduced to 120 projects. Wealso deleted a few bug reports where the computedlifespan was 0 seconds or less due to errors in thedata or in the parsing. We obtained data on a totalof 56,641 bug reports, an average of 472 per project.The median number of reports was 274, indicatinga skewed distribution of number of bug report perproject.

4.3. Analysis

In this section we present the analyses we performedfor each of our measures, deferring discussion of theresults to the next section.

Popularity. Of our two measures of popularity,number of developers and community size, com-munity size required little detailed analysis. Wesimply counted the unique number of posters tothe bug trackers on SourceForge, dropping only thesystem’s indicator for anonymous posting.

The analysis of developer numbers was moreinvolved because, as mentioned above, a uniquefeature of our data on developer numbers is thatwe had data over time, which allowed us to assesshow the number of developers in a project changedover time, and we wanted to take advantage ofthat. However, since we had only five data points,measured at inconsistent time intervals, it wasimpossible to apply standard time series analysis

techniques. We therefore simply grouped projectsinto six categories on the basis of how the numberof developers had changed over time – whetherit had risen, fallen or stayed the same, and howconsistently it had done so.11 The precise definitionsof the six categories is presented in Table 6. Figure 5shows the series, allowing a visual inspection toassess the reasonableness of the categorizations.

A visual inspection of the diagrams suggeststhat the grouping of the patterns is reasonable.Figure 5(a) shows the trend for the high numberof projects in our sample that have continuallyattracted developers. Figure 5(b) shows the trendfor projects that tend to grow but which have, attimes, fallen. Figure 5(c) shows unchanged projects,or projects that move up and down without atrend. Not all the projects in this category werestarted at the time of our first data collection, sofurther data collection may reveal a trend andfacilitate sorting into one of the other categories.Figure 5(d) shows projects that have fallen for atleast half of their life spans and risen in onlyone period. Interestingly the rise is almost alwaysearly in the life of the project, followed by asustained decline. These projects appear to haveattracted some initial interest but were unsuccessfulin retaining developers. This may indicate troublesin the teams despite an interesting task. Figure 5(e)

Table 6. Definitions of categories of pattern of change indeveloper counts

Category Description

1. Consistent risers No falls, rising: a pattern ofconsecutive rises at least half aslong as the whole series wasfounda

2. Risers Mostly rising but with at least onefall

3. Steady or not treading Unchanged or neither rising norfalling

4. Fallers At most one rise but mostlyfalling: a pattern of consecutivefalls at least half as long as thewhole series was founda

5. Consistent fallers Always falling, no rises at all6. Dead projects Project removed from

SourceForge

a A rise (or fall) followed by no change was counted as two consecutiverises (or falls).

11 We have made the code for this categorization available viathe FLOSSmole project


139


(a) (b)

(c) (d)

(e) (f)

Figure 5. Changes in developer counts per project over time, categorized by trend. (Log scale: note that the vertical axisis inconsistent with lower total numbers in (e) and (f))

shows projects that have never grown and havelost members for at least two consecutive periods.Finally, Figure 5(f) shows projects for which databecame unavailable. These are dissolved projectsthat have been removed from SourceForge. Theresults of comparing these categorizations with thewhole SourceForge population are presented belowin Section 4.5, which allow us to interpret our resultsin the full context.

While the developer series categorization wasuseful, it was not suitable for the correlation studyof the proposed success measures. For that purpose,we computed a weighted average of the changesfrom period to period, using as weights the inverseof the age (1/1 for the change from 2004 to 2005,1/2 for the change from 2003 to 2004, etc.). Thisprocedure was adopted on the basis of the theorythat the ability to attract and retain developers is an


140


Figure 6. Plot of bug survival versus time for high (9), default (5) and low (1) priority bugs

indicator of success and to focus attention on thegrowth (or decline) in developers and to give moreweight to recent experience. This weighted averagefigure is reported below:

Community Size. The poster of bug reports andrelated messages are identified by a SourceForgeID (though postings can be anonymous), makingit possible to count the number of distinct IDsappearing for each project. We counted a total of14,922 unique IDs, of whom 1280 were involved inmore than one project (one was involved in eightof the projects in our sample). The total counts perproject were log-transformed to correct skew.

Bug-fixing Time. As discussed above, the analysisof bug-fixing speed is complicated by the right-censored data available. Analysis of censoredlifespan data involves a statistical approach knownas survival or event history analysis. The basicidea is to calculate from the life spans a hazardfunction, which is the instantaneous probability ofa bug being fixed at any point during its life orequivalently the survival function, which is thepercentage of bugs remaining open. A plot of the

survival over time for all bugs in our sample isshown in Figure 6. The plot shows that bugs withhigher priorities are generally fixed more quickly,as expected, but some bugs remain unfixed evenafter years.

The hazard function (or more usually the log ofthe hazard function) can be used as a dependentvariable in a regression. For our initial purpose ofdeveloping a project-level measure of bug-fixingeffectiveness, we simply entered project as a fac-tor in the hazard regression along with the bugpriority, allowing us to compute a hazard ratiofor each project (ratio because of the use of thelog of the hazard). The hazard ratios are theregression weights for the dummy variable foreach project in the hazard function, using the firstproject as the baseline. The analysis was performedusing the R-Project statistical system (http://www.r-project.org/), specifically the psm function fromthe Survival and Design packages. We experi-mented with different functional forms for fittingthe bug hazard rates. Somewhat surprisingly, theform that fitted best was an exponential (the R2 forthe fit was 0.51), that is, one in which the hazardrate is not time varying. Therefore the hazard ratiofor each project is reported below:


141


Popularity. The counts of project downloads andpage views extracted from the downloaded htmlpages were divided by age to compute downloadsand page view per day and log-transformed tocorrect skew.

Our additional measure of popularity, inclusionin a distribution, required additional analysis. Wefirst examined the list of Debian packages manuallyto match packages to the projects in our SourceForgesample. To do this for each project we examined theoutput of the Debian apt-cache program, whichsearches package names and descriptions, andthe http://packages.debian.org/site, which allowssearching filenames within package contents. Wethen examined the SourceForge homepages of theprojects to be sure that we had an accurate match.For those that were included we observed thatone project was linked to one or more (sometimesmany more) packages, but we did not observe manyprojects that were packaged together. As discussedabove, some of the projects in our sample are notcandidates for inclusion in the Debian distributionbecause they do not run on Linux and were notincluded in this measure. For example, fink is apackage-management system for Mac OS X andGnucleous is a Windows client for the Gnutellanetwork. We found that there were 110 packages(out of 120, or 92%) that could be installed and runon a Debian Linux system and 63 (57%) of thesewere packaged and distributed by Debian, while 47(43%) were not. One package was undergoing theDebian quality assurance process but was not yetavailable through the standard installation systemand was therefore coded as not included. Projectswith eligible programs that were included in Debianwere given a score of 1 (or YES), and projectswithout included programs were given a score of 0(or NO); ineligible projects were tagged with NA.

4.4. Results

The results of measuring these success measuresacross our sample of 120 projects are presentedbelow. First we present descriptive statistics for theindividual measures and then present the results ofexamining the correlations between the measures.We conclude our results discussion by reporting onthe practicality of our operationalizations. Table 7shows the descriptive statistics for the individualmeasures.

Table 7. Descriptive statistics for sample success measures

Variable Mean Median SD

Lifespan (days)a 1673 1699 198Developers in 2001 8.63 7 7.20Developers in 2002 15.56 12 11.22Developers in 2003 18.19 12 16.29Developers in 2004 20.06 14 19.95Developers in 2005 20.22 14 20.57Weighted delta (see text)a 1.60 0.72 3.26Posters to bug trackera 140 86 207Log bugsa 5.78 5.61 0.84Closed bugsa 405 234 489Log median bug lifetimea 14.44 14.39 1.29Hazard ratioa 1.13 1.10 1.04Log downloads (all time)b 11.29 11.87 3.38Log downloads (per day)b 4.32 4.44 2.24Log page views (all time)b 13.85 14.15 2.14Log page views (per day)b 6.45 6.74 2.12Debian package? 63 Yes 47 No 10 NA

a N = 120.b N = 118.

To examine the relationships among the variableswe measured, we examined the correlations, givenin Tables 8 and 9. Forty-two of the 136 correlationsare statistically significant indicating that there is agenuine relationship. (With 120 cases and applyingthe Bonferonni correction for the 136 comparisonspossible among 17 variables, the critical value forsignificance at p = 0.05 is r = 0.32.) None of theproposed success measures are correlated withproject lifespan, suggesting that they do providesome indication of the performance of the projectrather than just accumulation of events.

Bold and underlined correlations are significantat p < 0.05, using a Bonferonni correction for thenumber of correlations.

The counts of number of developers at differentpoints in time are correlated (the upper box inTable 8), as would be expected given that theyconstitute a time series. The counts are alsocorrelated with the computed weighted average ofchanges in developers. Interestingly, the developercounts are also correlated to number of bugsreported (the lower box in Table 8). It may be thatdevelopers themselves post bug reports and so moredevelopers constitutes more activity. Alternately, itmay be that activity attracts developers. As well, thecount of participants in the bug tracker is correlatedwith number of bugs, but not with number of listeddevelopers (the upper left box in Table 9). Theserelationships suggest that individuals post only


142


Table 8. Correlations among sample success measures

Lifespan Developersin 2001

Developersin 2002

Developersin 2003

Developersin 2004

Developersin 2005

Weighteddelta

Lifespan 1.000Developers in 2001 0.265 1.000Developers in 2002 0.061 0.627 1.000Developers in 2003 −0.016 0.648 0.761 1.000Developers in 2004 −0.047 0.656 0.643 0.949 1.000Developers in 2005 −0.042 0.658 0.643 0.943 0.998 1.000Weighted delta −0.023 0.374 0.352 0.789 0.913 0.922 1.000Posters to bug tracker 0.052 0.144 0.266 0.200 0.226 0.230 0.186Log bugs 0.008 0.158 0.396 0.392 0.398 0.394 0.316Closed bugs 0.034 0.191 0.381 0.322 0.344 0.348 0.272Log median bug lifetime 0.087 0.166 0.103 0.208 0.190 0.184 0.172Hazard ratio 0.200 0.219 0.084 0.177 0.158 0.147 0.134Log downloads (all time) 0.136 0.056 0.198 0.212 0.241 0.238 0.240Log downloads (per day) 0.010 0.059 0.228 0.249 0.295 0.287 0.283Log page views (all time) 0.001 0.043 0.228 0.245 0.277 0.279 0.269Log page views (per day) −0.060 0.035 0.224 0.245 0.280 0.282 0.271Debian package? 0.285 0.153 0.116 0.101 0.064 0.054 0.054

Table 9. Correlations among sample success measures, continued

Postersto bugtracker

Logbugs

Closedbugs

Logmedian

buglifetime

Hazardratio

Logdownloads(all time)

Logdownloads(per day)

Logpage views(all time)

Logpage views(per day)

Posters to bug tracker 1.000Log bugs 0.592 1.000Closed bugs 0.801 0.718 1.000Log median bug lifetime 0.179 0.087 0.232 1.000Hazard ratio 0.158 0.114 0.239 0.868 1.000Log downloads (all time) 0.359 0.268 0.252 0.095 0.068 1.000Log downloads (per day) 0.450 0.343 0.312 0.088 0.017 0.909 1.000Log page views (all time) 0.435 0.462 0.355 0.031 0.146 0.660 0.724 1.000Log page views (per day) 0.433 0.463 0.354 0.038 0.160 0.647 0.722 0.998 1.000Debian package? 0.125 0.073 0.088 0.063 0.115 0.201 0.240 0.171 0.150

a few bug reports, so more bug reports impliesa greater number of participants. The correlationbetween posters and the number of closed bugs isparticularly strong (r = 0.718).

The counts of downloads and page views (all timeand daily) are all strongly correlated (the lower rightbox in Table 9), suggesting that they offer similarmeasures of popularity. They are also correlatedwith the number of bugs (the lower left box inTable 9) and the count of participants in the bugtracker. These correlations taken together suggestthat the count of participants and number of bugsfunction more like indications of the popularityof a FLOSS project, rather than the success of itsdevelopment processes. On the other hand, the

hazard ratio for bug lifetimes and the median buglifetime are not significantly correlated with any ofthe other variables, suggesting that they do providean independent view of a project’s performance.

4.5. Discussion

The study provided useful data for reflectingon the conceptual model for success in FLOSSdevelopment developed in the first half of thisarticle. Because many of these findings revealthemselves as limitations in our study, we firstdiscuss these and attempt to distill general advicefor FLOSS research using success as a variable.


143


The measures applied in this article have goodface validity as indicators. The analysis presentedabove allows us to extend our examination of thevalidity and utility of the measures. The high cor-relation among many of these measures indicatesa degree of convergent validity, since the differ-ent measures do correlate, particularly number ofdevelopers and popularity. Examining these cor-relations and correlations with other variables inmore detail suggests room for improvement inthe measures. Clearly additional data could becollected to increase the accuracy of the devel-oper counts. Furthermore, our counts are merelyaggregate measures that may mask many devel-opers leaving and joining a project. These morespecific counts would enable us to determine ifsome projects are subject to significant ‘churn’ ofindividual developers, or conversely of the ‘tenure’of individuals as developers on projects. Such ameasure might be a theoretically more valuable, asit would have implications for development andretention of knowledge. As with our analysis of buglifetime, such analyses would need to employ eventhistory statistics to account for the right-censoreddata.

Further possibilities for measuring developer par-ticipation exist, which may provide more accurateand valuable measures. Such opportunities include

measuring participation in mailing lists and devel-oper’s involvement in code writing directly byexamining logs from the CVS system. Each of thesemeasures, however, suffers from the difficulty ofinferring departure and ‘tenure’ because it is pos-sible to ‘lurk’ in each of these fora. For example, ifa developer is observed to post once in 2003 andagain in 2005, was the developer always a part ofthe project, or did the developer leave and rejoin?It might be worth developing a measure of consis-tent engagement, but it would need to account fordifferent patterns for different individuals.

Another instructive limitation is that in buildinga sample of projects to study, we seem to havefound projects that seem to be mostly successful,by and large. This can be seen in a comparison ofthe categorization of developer count series fromour sample and from the whole SourceForge pop-ulation. We divided the population of SourceForgeprojects into the same six categories. Figure 7 showsa comparison of the distribution of time series intoour categories between our sample and the totalSourceForge population. We required three peri-ods of data collection to calculate the category, sothe full population includes only 39,414 projects,excluding 59,154 projects that are either too newor already dead before our fourth measurementpoint. The comparison shows that our sample of120 projects has a higher proportion of constant

Figure 7. Changes in developer counts per project over time, categorized by trend


144


risers and a lower proportion of fallers and deadprojects. The difference in the distributions is highlysignificant (χ 2 = 805, df = 5, p < 0.001). This resultsuggests that our sample, projects with at least sevendevelopers and 100 bugs in April 2002, is compara-tively successful, at least in terms of attracting andretaining developers over time.

As a result of this unintentional bias in our sam-ple construction, the sample may not have sufficientvariance on success, affecting the observed corre-lations reported above. To address this concern,FLOSS research should be aware that selecting onthe basis of team size and process features, suchas use of the SourceForge trackers, risks selectingonly successful projects and should therefore makea special effort to collect data on a broader rangeof projects, including some that seem clearly to beunsuccessful. This advice is true for quantitativeresearch but is also relevant to case study research;there is a real need for detailed research on failedFLOSS projects.

A second instructive limitation is that we fol-lowed the standard but inadequate practice ofusing popularity measures unadjusted for poten-tial ‘market size’. A project’s downloads are cappedby their total potential downloads, their ‘potentialmarket’. A consumer-oriented application, such asan instant messaging client, is of potential useful-ness to almost all internet users, whereas a programto model subatomic particle collision (for exam-ple) has a much lower potential market. While theinstant messaging program might achieve only 40%of its potential market, its absolute number will befar higher than the number achieved by the particlecollision modeling tool, even if it is used and adoredby 95% of high energy physicists. Thus downloadfigures without market share data are useful onlyif one considers all programs to be in competitionwith each other, i.e. if one ignores the project’saims entirely. Some value can be salvaged by usingrelative growth figures instead of the absolute num-bers. Within limited domains of projects, it mightbe possible to create categories of truly compet-ing products for which the absolute and relativedownload numbers ought to be a usable proxy forsoftware use. Unfortunately the self-classification ofproducts on sites like SourceForge (where ‘End-usersoftware’ is an entire category) is of little use andidentification of competitors must be carried out byhand.

Furthermore, our analysis revealed that mea-sures such as community size (in numbers ofposters) are more similar to these popularity mea-sures than to the process measures. On reflec-tion, community size should also be expectedto be heavily influenced by the potential sizeof the user population, much more so than thesmaller developer numbers. A brief inspectionof our data on bug posters affirms this: theprojects with the largest number of posters areconsumer desktop applications, such as gaim.Community size measures, therefore, should beadjusted in the same way as downloads andpageviews: by either using within-project changesor by manually creating categories for competingprojects.

This discussion draws attention to an elementmissing from our theory development above andfrom FLOSS research in general. The phenomenonunder research, FLOSS and its development, isgenerally, often implicitly, defined as any projectsthat use an ‘open source license’ (usually definedas OSI approved licenses). Our selection of projectsfrom SourceForge also implicitly uses this definitionof the phenomenon, as we studied SourceForgeprojects, and SourceForge allows only projects usingthese licenses to register with the site. The risk hereis that there are a wide range of system types,developer goals, software development processes,management styles and group structures all ofwhich are compatible with using an OSI approvedlicense. It is not yet clear upon what characteristicsthe phenomenon should be divided, but it is clearthat the meaning of success for different types ofprojects, programs, motivations and processes willbe quite different.

This observation is similar to that made by Sed-don et al. (1999), who introduce the ‘IS EffectivenessMatrix’. They suggest that two key dimensions onwhich to base success measures are ‘the type of sys-tem studied’ and ‘the stakeholder in whose intereststhe system is being evaluated’. FLOSS developmenttypically occurs outside the corporate environmentsin which stakeholders are explicit and interests areoften driven by the financial implications of the ISdevelopment. However, this does not mean thatthere are not stakeholders, such as core developers,peripheral developers, corporate users, individualusers, and other whole projects who depend on theproject whose success one is trying to measure. Theconceptual model of success presented in this article


145


intentionally provides a much greater emphasis onthe inputs and the process of IS system develop-ment, and thus the developers, than traditional ISmodels of success. Studies of success in the FLOSScontext should closely consider both the phenomenathey are interested in (and thus relevant projects)and from which perspective they need to measuresuccess, given their research interests. The develop-ment of a taxonomy of research interests and theidentifications of the portions of the FLOSS universeappropriate for their study would be a useful taskfor future research.

5. CONCLUSIONS

This article makes a contribution to the devel-oping body of empirical research on FLOSS byidentifying and operationalizing success measuresthat might be applied to FLOSS. We developedand presented a theoretically informed range ofmeasures which we think are appropriate for mea-suring the success of FLOSS projects, and we hopethat these will be useful to researchers in thedeveloping body of empirical research on FLOSSdevelopment. We complemented this theory devel-opment through an empirical study that demon-strated methods and challenges in operationaliz-ing success measures using data obtained fromSourceForge and made available to the community.This study demonstrated the portfolio approachto success measurement by taking measures fromthroughout the FLOSS development process andby using longitudinal data. The study allowedus to identify and communicate the limitationsof our theory development and to elaborate are

Information Systems Success in Free and Open Source Software...

Documents

Transcript of Information Systems Success in Free and Open Source Software...