ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)
-
Upload
tom-mens -
Category
Technology
-
view
499 -
download
0
description
Transcript of ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)
ECOS: Ecological Studies ofOpen Source So6ware Ecosystems
• Tom Mens, Maelick Claes • So6ware Engineering Lab
!• Philippe Grosjean
Numerical Ecology Lab
informaEque.umons.ac.be/genlog/projects/ecos
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
About ECOS
• “AcEon de Recherche Concertée”of University of Mons – Interdisciplinary project
• Combines research in biology (ecology) and compuEng science (empirical so6ware engineering)
– COMPLEXYS Research InsEtute – Oct 2012 —> Sep 2017 – 500K EUR funding
• Related EU project:
�2
informaEque.umons.ac.be/genlog/projects/ecos
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
High-‐level project goal
• Improve understanding of, and support for, open source so#ware ecosystems
–Draw inspiraEon from biological evoluEon, ecology and natural ecosystems
• Determine main factors of success and failure of OSS projects within their ecosystem
–Provide beeer techniques and mechanisms to predict and improve survivability of OSS projects and resilience of their ecosystems
–Provide guidelines and evoluEon dashboards to support so6ware communiEes
�3
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �4
So6ware ecosystem DefiniEon
Business-‐oriented view• “a set of actors func5oning as a unit
and interac5ng with a shared market for so#ware and services, together with the rela5onships among them.” (Jansen et al. 2009)
Examples
• Eclipse • Android and iOS app store
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �5
So6ware ecosystem DefiniEon
Development-‐centric view• “a collec5on of so#ware products
that have some given degree of symbio5c rela5onships.” • Messerschmie & Szyperski: So#ware
ecosystem: Understanding an indispensable technology and industry. MIT Press, 2003.
• “a collec5on of so#ware projects that are developed and evolve together in the same environment.” • M. Lungu: Towards reverse engineering
so6ware ecosystems. Int’l Conf. So#ware Maintenance, 2008, pp. 428–431.
Examples
• GnomeKDE !
• Debian Ubuntu !
• R’s CRAN !
• Apache
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
Main Research QuesEons
• Which control mechanisms driving natural ecosystems can be used to explain dynamics of so6ware ecosystems? !
• Which mechanisms and measures can we borrow from ecology to explain and predict how so6ware projects evolve?
�6
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �7
Terminology Biological ecosystem
DefiniEons
• Ecology: the scien5fic study of the interac5ons that determine the distribu5on and abundance of organisms
• Ecosystem: the physical and biological components of an environment considered in rela5on to each other as a unit – combines all living
organisms (plants, animals, micro-‐organisms) and physical components (light, water, soil, rocks, minerals)
Example: coral reefs
• High biodiversity: polyps, sea anemones, fish, mollusks, sponges, algae
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �8
Comparison
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �9
Ecological theories of evoluEon of species
• Jean-‐BapEste Lamarck (1744-‐ 1829) • animal organs and behaviour can change according to
the way they are used • those characterisEcs can transmit from one generaEon to
the next to reach a greater level of perfecEon • Example: giraffe’s necks have become longer while trying to reach the upper
leaves of a tree
• Charles Darwin (1809–1882) • all species of life have descended over Eme
from common ancestors • this branching paeern resulted from natural selecEon • evoluEon history is represented by a phylogene5c tree • Example: 13 types of Galapagos finches, same habits
and characterisEcs, but different beaks
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �10
Ecological theories of evoluEon of species
Hologenome theory • The unit of natural selecEon is the holobiont: the organism
together with its associated microbial communiEes, that live together in symbiosis.
• The holobiont can adapt to changing environmental condiEons far more rapidly than by geneEc mutaEon and selecEon alone.
• Darwinism emphasises compe55on (survival of the fieest), hologenome theory also includes coopera5on (through symbiosis)
!In so6ware evoluEon: Hologenome theory may be closer to what one observes in open source projects where cooperaEon plays a more important role.
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �11
Ecological theories of evoluEon of species
ReEculate evoluEon • EvoluEon history is represented as a graph structure.
Two or more evoluEonary lineages can berecombined at some level • hybrid specia5on (2 lineages recombine to create
a new one) • horizontal gene transfer (genes are transferred
across species) !In so6ware evoluEon: Distributed VCS like Git promote reEculate evoluEon through fork and merge (but few projects actually merge) !
See Robles et al. A Comprehensive Study of So#ware Forks: Dates, Reasons and Outcomes. OSS Conference 2012, Best Paper Award.
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �12
EvoluEon History So6ware
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �13
Trophic web (food chain) in natural ecosystems
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �14
Trophic web in so6ware ecosystems
•Producer-‐consumer relaEon
Users
Peripheral developers
Core developers
Onion model
TOP-‐DOWN change requests & bug reports
BOTTOM-‐UP changes in core projects and architecture
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �15
Core Architecture -‐ or Why developers are polyps
Coral reef ecosystem
• Sclerac5nian coral polyps are responsible for creaEng the coral reef structure
• This coral reef is required for the other species of the ecosystem to thrive.
So6ware ecosystem
• Core developers are responsible for creaEng the core so6ware architecture
• Based on this core architecture, other developers and third parEes can create other projects, services, and so on.
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �16
So6ware EcosystemDynamics
Predator-‐prey relaEonship (Lotka-‐Volterra 1925/1926) • Predators (hunEng animals) feed upon their prey(aeacked animals)
• Can be described by a dynamic model with mutually dependentparametric differenEal equaEons
Analogies in so6ware maintenance • Debuggers are predators, so6ware defects are prey
Calzolari et al. Maintenance and tes5ng effort modeled by linear and nonlinear dynamic systems,” Informa5on and So#ware Technology, 2001
• Developers are predators, the informaEon they seek is prey Lawrance et al. Scents in programs: Does informa5on foraging theory apply to program maintenance? VL/HCC 2007
• Dual (socio-‐technical) view: • Developers are predators, the projects they work on are prey • Projects are predators that feed upon the cogniEve resources of their developers
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �17
Desirable ecosystem characterisEcs
Biodiversity measures the degree of variaEon of species within a given ecosystem
• Maximum diversity if all species have same number of individuals
• Low diversity if a parEcular species dominates the others
• Many different metrics: Shannon entropy, Simpson index, evenness, … !
• Posnee et al. used similar noEon to measure developer ac5vity focus and module ac5vity focus
Dual Ecological Measures of Focusin Software Development
Daryl Posnett†, Raissa D’Souza∗, Premkumar Devanbu,† and, Vladimir Filkov††∗University of California Davis, USA
†{dpposnett,ptdevanbu,vfilkov}@ucdavis.edu,∗[email protected]
Abstract—Work practices vary among software developers.Some are highly focused on a few artifacts; others make wide-ranging contributions. Similarly, some artifacts are mostly au-thored, or “owned”, by one or few developers; others have verywide ownership. Focus and ownership are related but differentphenomena, both with strong effect on software quality. Priorstudies have mostly targeted ownership; the measures of own-ership used have generally been based on either simple counts,information-theoretic views of ownership, or social-network viewsof contribution patterns. We argue for a more general concep-tual view that unifies developer focus and artifact ownership.We analogize the developer-artifact contribution network to apredator-prey food web, and draw upon ideas from ecology toproduce a novel, and conceptually unified view of measuringfocus and ownership. These measures relate to both cross-entropyand Kullback-Liebler divergence, and simultaneously providetwo normalized measures of focus from both the developer andartifact perspectives. We argue that these measures are theoret-ically well-founded, and yield novel predictive, conceptual, andactionable value in software projects. We find that more focuseddevelopers introduce fewer defects than defocused developers. Incontrast, files that receive narrowly focused activity are morelikely to contain defects than other files.
I. INTRODUCTION
Developers are the lifeblood of open source software, OSS,and their contributions are vital for OSS to thrive. Ratherthan being assigned tasks by management, OSS developers aregenerally free to choose the style, focus, and breadth of theircontributions. Some might be quite focused, working on onespecific subsystem; others may contribute to many differentsubsystems. An device driver expert, for example, may con-tribute very specialized knowledge to an open source project,focusing on only a few files or packages. His contributions to asmall subset of modules1 may be his only contribution duringhis tenure with the project. In contrast, a project leader maywork on a variety of different tasks touching many moduleswithin a project. While OSS developers are free to choosetheir contribution styles, such choices are not inconsequential,especially to the central issue of software quality.
A dominant theme emerging from previous work in thisarea is module ownership [1], [2], [3]. Low ownership of amodule, i.e., too many contributors, can adversely impact codequality. There is, however, an entirely different perspective,developer’s attention focus, which is relatively unexplored.Human attention and cognition are finite resoucres [4]. Whendifferent tasks are simultaneously engaged, they can compete
1We use modules to mean either packages or files, depending on the context.
for mental resources and task performance can suffer [5]. Adeveloper engaged in many different tasks carries a greatercognitive burden than a more focused developer. Interestingly,the developer and module perspectives are, conceptually sym-metric, dualistic views of focus. From a module’s perspective,strong ownership indicates a strong focused contribution. Werefer to this as module activity focus, or MAF , a measure ofhow focused the activities are on a module. Symmetrically, werefer to the developer’s attention focus, or DAF , a measureof how focused the activities are of a particular developer.
A surprising, but natural analogy for MAF and DAF , arepredator-prey food webs from ecology. In a sense, modulesare predators that “feed upon” the cognitive resources ofdevelopers. As the number of developers contributing to amodule increases, the diversity of cognitive resources uponwhich the module “feeds” also increases; likewise, a developeris a “prey” whose limited cognitive resources are spread overthe modules that “prey” upon her.
Ecosystem diversity is of great interest to ecologists.Williams and Martinez call the roles complexity and diversityplay “[o]ne of the most important and least settled questionsin ecology.” [6] This diversity has two symmetric perspectives,both from a prey’s perspective, and a predator’s perspective.Ecologists have developed sophisticated symmetric measuresof predator-prey relationships, drawing upon ideas such asentropy and Kulback-Leibler divergence, that simultaneouslycapture both perspectives. We adapt these measures for soft-ware engineering projects into the metrics MAF and DAF .
In this work, we employ the methodology presented by ElEmam to validate our measures [7]. In particular, we showthat the DAF and MAF measures succeed in distinguishingimportant cases that extant measures don’t capture. We makethe following contributions:
• We adapt terminology and motivation from ecology,based on bipartite graphs;
• We incorporate and generalize previous results on devel-oper and artifact diversity;
• We provide easy to compute measures of focus, MAFand DAF , normalized to facilitate comparison within andacross projects;
• We show these measures more precisely capture out-comes relevant to software researchers and practitioners.
This novel analysis simultaneously considers focus bothfrom the artifact perspective and the author perspective.Researchers can use our MAF and DAF metrics to more
978-1-4673-3074-9/13/$31.00
c� 2013 IEEE
ICSE 2013, San Francisco, CA, USA
452
ICSE 2013
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
• Stability • the capacity to maintain an equilibrium over longer periods of Eme
• Resistance • the ability to withstand environmental changes without too much
disturbances of its biological communiEes • Resilience
• the ability to return to an equilibrium a6er a disturbance !Goal: Use these and related measures to study maintainability and survivability of so6ware projects within their ecosystem
Desirable ecosystem characterisEcs
�18
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �19
Ongoing Research 2 case studies
• CRAN (Comprehensive R Archive Network) – CharacterisEcs
15 years > 5000 packages > 2500 contributors different OS flavours (Linux, Windows, MacOS, Solaris) superlinear package growth
– Goal • Study package dependencies and maintainability (number of errors and Eme to fix) and their effect on package survivability
• See our CSMR-‐WCRE 2014 ERA paper “On the maintainability of CRAN packages”
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium �20
Ongoing Research 2 case studies
• GNOME – CharacterisEcs
16 years > 1400 projects > 5800 contributors > 1.3M commits > 12M file touches
– Goals 1. Combine different ecosystem measures into a predicEve
model of project survivability 2. Study migra5on paberns of contributors and their effect
on project survivability
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
– Replicate and generalise the empirical study by Uzma Raja
Ongoing Research GNOME case study 1
Combine different ecosystem measures into a predicEve model of project survivability
�21
Defining and Evaluating a Measureof Open Source Project Survivability
Uzma Raja, Member, IEEE Computer Society, and Marietta J. Tretter
Abstract—In this paper, we define and validate a new multidimensional measure of Open Source Software (OSS) project survivability,
called Project Viability. Project viability has three dimensions: vigor, resilience, and organization. We define each of these dimensions
and formulate an index called the Viability Index (V I) to combine all three dimensions. Archival data of projects hosted atSourceForge.net are used for the empirical validation of the measure. An Analysis Sample (n ¼ 136) is used to assign weights to each
dimension of project viability and to determine a suitable cut-off point for V I. Cross-validation of the measure is performed on a hold-out Validation Sample (n ¼ 96). We demonstrate that project viability is a robust and valid measure of OSS project survivability that can
be used to predict the failure or survival of an OSS project accurately. It is a tangible measure that can be used by organizations tocompare various OSS projects and to make informed decisions regarding investment in the OSS domain.
Index Terms—Evaluation framework, external validity, open source software, project evaluation, software measurement, software
survivability.
Ç
1 INTRODUCTION
OPEN Source Software (OSS) projects are developed anddistributed for free, with full access to the project
source code. Recently there has been a significant increasein the use of these projects. Some OSS projects have earnedthemselves a high reputation and corporate sponsorships.Large corporations (e.g., IBM, SUN microsystems) arebecoming involved with the OSS movement in variouscapacities. Projections indicate that the corporate interest inOSS projects will grow stronger in the future [1] and theseprojects will see integration in enterprise architecture [2].This increased use of OSS projects creates the need forbetter project evaluation measures.
Traditionally, software projects are evaluated by con-formance to budget, schedule, and user requirements [3], [4],[5], [6], [7], [8]. These measures, however, are difficult tomap to OSS projects, which are developed through anetwork of volunteer participants, with no defined budget,schedule, or customer. Although there is a surge in theinvestment in OSS projects [1], research indicates that a largenumber of OSS projects fail [9], [10]. Some have questionedthe operational reliability and quality of OSS projects [11].Since there are no contractual or legal bindings for providingOSS updates or maintenance services, businesses investinghuman or financial capital on adoption of OSS projects needthe ability to evaluate whether the project will continue toexist or not [12]. Development teams need to measure
project survivability to control and improve performance.Individual and corporate users need a measure of projectsurvivability to compare the available OSS projects beforemaking decisions regarding project adoption.
In this paper, we define and validate a new multi-dimensional measure of OSS project survivability, calledProject Viability. OSS projects provide access to theirdevelopment archives, thereby providing a unique oppor-tunity to conduct empirical research [13] and developreliable measures [14], [15]. In the following sections, wedefine, formulate, and validate project viability. Section 2provides a brief overview of the existing empirical researchin OSS and the background of project survivability. Section 3defines the dimensions of project viability and formulatesan index to measure it. Section 4 discusses the empiricalevaluation framework and validates the new measure usingOSS project data. Discussion of the results is presented inSection 5 and conclusions are given Section 6.
2 BACKGROUND
A large number of OSS projects are available for use.However, the failure rate of these projects is high [9]. Theevaluation of OSS projects is different than CommercialSoftware Systems (CSS) [16]. The adopters of OSS projectsneed a mechanism to compare the chances of failure orsurvival of the available projects. This would allow betterdecisions regarding corporate resource investment.
A range of measures has been used in prior research toevaluate OSS projects. Godfrey and Tu [17] examined theevolution of the Linux kernel and its growth pattern in oneof the first empirical studies in the OSS domain. They usedthe Source Lines of Code (SLOC) to compare the growthpattern of Linux to CSS projects and found evidence thatOSS growth rates are significantly high compared to CSSprojects. Paulson et al. [18] compared OSS and CSS projectsusing a diverse sample of OSS projects and found no
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO. 1, JANUARY/FEBRUARY 2012 163
. U. Raja is with the Department of Information Systems, Statistics andManagement Science, The University of Alabama, Box #870226,300 Campus Drive, Tuscaloosa, AL 35487. E-mail: [email protected].
. M.J. Tretter is with the Department of Information and OperationsManagement, Texas A&M University, Mail Stop #310D, WehnerBuilding, College Station, TX 77840. E-mail: [email protected].
Manuscript received 30 Oct. 2009; revised 14 June 2010; accepted 21 Aug.2010; published online 1 Apr. 2011.Recommended for acceptance by R. Jeffery.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TSE-2009-10-0294.Digital Object Identifier no. 10.1109/TSE.2011.39.
0098-5589/12/$31.00 ! 2012 IEEE Published by the IEEE Computer Society
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
Ongoing Research GNOME case study 2
�22
Study migra5on paberns of contributors and their effect on project survivability
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
3035
�23
Ongoing Research GNOME case study 2
EvoluEon Gimp GTK+
28 Tom Mens, Maelick Claes, Philippe Grosjean and Alexander Serebrenik
project that were not active in this project during the preceding 6-month period,but that were involved in some activity in other GNOME projects instead. Globaljoiners are incoming coders in the considered project that were not active in anyof the GNOME projects during the preceding period. A similar definition holds forthe local and global leavers. Formally, the metrics are defined as follows. Let p bea GNOME project, t a 6-month activity period (and t � 1 the previous period), c acoder, Gnome the set of GNOME’s code projects, and isDev(c, t, p) is a predicatewhich is true if and only if c made a code commit in p during t:
localLeavers(p, t) ={c|isDev(c, t �1, p)^¬isDev(c, t, p)^9p2 (p2 2 Gnome^ isDev(c, t, p2))}
globalLeavers(p, t) ={c|isDev(c, t �1, p)^8p2 (p2 2 Gnome ) ¬isDev(c, t, p2))}
localJoiners(p, t) ={c|isDev(c, t, p)^¬isDev(c, t �1, p)^9p2 (p2 2 Gnome^ isDev(c, t �1, p2))}
globalJoiners(p, t) ={c|isDev(c, t, p)^8p2 (p2 2 Gnome ) ¬isDev(c, t �1, p2))}
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
30
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
30
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
30
evolution gtk+ gimp
Fig. 1.11 Historical evolution (timeline) of the number of local (black solid) and global (reddashed) joiners (y-axis) for three GNOME projects.
We did not find any general trend, the patterns of intake and loss of coders arehighly project-specific. Figure 1.11 illustrates the evolution of the number of localand global joiners for some of the more important GNOME projects (the figures forleavers are very similar). For some projects (e.g., evolution) we do not observea big difference between the number of local and global joiners, respectively. Theseprojects seem to attract new developers both from within and outside of GNOME.Other projects, like gimp, attract most of its incoming developers from outsideGNOME. A third category of projects attracts most of its incoming developers fromother GNOME projects. This is the case for gtk+, glib and libgnome, whichcan be considered as belonging to the core of GNOME. This observation seems tosuggests that libraries, toolkits and auxiliary projects attract more inside developers,while projects that are well-known to the outside world (such as GIMP, a popular
Timeline (6-‐month intervals) of joiners to Gnome projects
-‐ Black = local joiners from other Gnome projects -‐ Red = global joiners from outside of Gnome -‐ Blue = stayers
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
3035
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
3035
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
Time
Leavers
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
3035
Time
Leavers
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
3035
Time
Leavers
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
3035
�24
MigraEon in so6ware ecosystems Gnome case study
EvoluEon Gimp GTK+
-‐ Black = local joiners from other Gnome projects -‐ Red = global joiners from outside of Gnome -‐ Blue = stayers
28 Tom Mens, Maelick Claes, Philippe Grosjean and Alexander Serebrenik
project that were not active in this project during the preceding 6-month period,but that were involved in some activity in other GNOME projects instead. Globaljoiners are incoming coders in the considered project that were not active in anyof the GNOME projects during the preceding period. A similar definition holds forthe local and global leavers. Formally, the metrics are defined as follows. Let p bea GNOME project, t a 6-month activity period (and t � 1 the previous period), c acoder, Gnome the set of GNOME’s code projects, and isDev(c, t, p) is a predicatewhich is true if and only if c made a code commit in p during t:
localLeavers(p, t) ={c|isDev(c, t �1, p)^¬isDev(c, t, p)^9p2 (p2 2 Gnome^ isDev(c, t, p2))}
globalLeavers(p, t) ={c|isDev(c, t �1, p)^8p2 (p2 2 Gnome ) ¬isDev(c, t, p2))}
localJoiners(p, t) ={c|isDev(c, t, p)^¬isDev(c, t �1, p)^9p2 (p2 2 Gnome^ isDev(c, t �1, p2))}
globalJoiners(p, t) ={c|isDev(c, t, p)^8p2 (p2 2 Gnome ) ¬isDev(c, t �1, p2))}
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
30
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
30
Time
Joiners
1997 1999 2001 2003 2005 2007 2009 2011 2013
05
1015
2025
30
evolution gtk+ gimp
Fig. 1.11 Historical evolution (timeline) of the number of local (black solid) and global (reddashed) joiners (y-axis) for three GNOME projects.
We did not find any general trend, the patterns of intake and loss of coders arehighly project-specific. Figure 1.11 illustrates the evolution of the number of localand global joiners for some of the more important GNOME projects (the figures forleavers are very similar). For some projects (e.g., evolution) we do not observea big difference between the number of local and global joiners, respectively. Theseprojects seem to attract new developers both from within and outside of GNOME.Other projects, like gimp, attract most of its incoming developers from outsideGNOME. A third category of projects attracts most of its incoming developers fromother GNOME projects. This is the case for gtk+, glib and libgnome, whichcan be considered as belonging to the core of GNOME. This observation seems tosuggests that libraries, toolkits and auxiliary projects attract more inside developers,while projects that are well-known to the outside world (such as GIMP, a popular
Timeline (6-‐month intervals) of leavers from Gnome projects
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
Some references
�25
To appear in 2013 in Springer’s Empirical Software Engineering journal – manuscript No.(will be inserted by the editor)
On the variation and specialisation of workload – Acase study of the Gnome ecosystem community
Bogdan Vasilescu · Alexander Serebrenik ·Mathieu Goeminne · Tom Mens
DOI: 10.1007/s10664-013-9244-1
Abstract Most empirical studies of open source software repositories focus on theanalysis of isolated projects, or restrict themselves to the study of the relation-ships between technical artifacts. In contrast, we have carried out a case study thatfocuses on the actual contributors to software ecosystems, being collections of soft-ware projects that are maintained by the same community. To this aim, we defineda new series of workload and involvement metrics, as well as a novel approach—eT-graphs—for reporting the results of comparing multiple distributions. We usedthese techniques to statistically study how workload and involvement of ecosys-tem contributors varies across projects and across activity types, and we exploredto which extent projects and contributors specialise in particular activity types.Using Gnome as a case study we observed that, next to coding, the activities of lo-calization, development documentation and building are prevalent throughout theecosystem. We also observed notable di↵erences between frequent and occasionalcontributors in terms of the activity types they are involved in and the numberof projects they contribute to. Occasional contributors and contributors that areinvolved in many di↵erent projects tend to be more involved in the localization ac-tivity, while frequent contributors tend to be more involved in the coding activityin a limited number of projects.
Keywords open source · software ecosystem · metrics · developer community ·case study
B. Vasilescu and A. SerebrenikMDSE, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, The Nether-landsTel.: +31-40-2473595 Fax: +31-40-2475404E-mail: {b.n.vasilescu | a.serebrenik}@tue.nl
M. Goeminne and T. MensCOMPLEXYS Research Institute, Universite de Mons, Place du Parc 20, 7000 Mons, BelgiumTel.: +32-65-373453 Fax: +32-65-373459E-mail: {mathieu.goeminne | tom.mens}@umons.ac.be
UMONSFaculté des Sciences
Département d’Informatique
Understanding the Evolution ofSocio-technical Aspects in Open SourceEcosystems: An Empirical Analysis of
GNOME
Mathieu Goeminne
A dissertation submitted in fulfillment of the requirements ofthe degree of Docteur en Sciences
Advisor Jury
Dr. TOM MENS Dr. XAVIER BLANCUniversité de Mons, Belgium Université de Bordeaux 1, France
Dr. VÉRONIQUE BRUYÈREUniversité de Mons, Belgium
Dr. JESUS M. GONZALEZ-BARAHONAUniversidad Rey Juan Carlos, Spain
Dr. TOM MENSUniversité de Mons, Belgium
Dr. ALEXANDER SEREBRENIKTechnische Universiteit Eindhoven, The Netherlands
Dr. JEF WIJSENUniversité de Mons, Belgium
June 2013
A historical dataset for GNOME contributorsMathieu Goeminne, Maelick Claes and Tom Mens
Software Engineering Lab, COMPLEXYS research institute, UMONS, Belgium
Abstract—We present a dataset of the open source
software ecosystem GNOME from a social point of view.
We have collected historical data about the contributors
to all GNOME projects stored on git.gnome.org, taking
into account the problem of identity matching, and as-
sociating different activity types to the contributors. This
type of information is very useful to complement the
traditional, source-code related information one can ob-
tain by mining and analyzing the actual source code.
The dataset can be obtained at https://bitbucket.org/
mgoeminne/sgl-flossmetric-dbmerge.
I. INTRODUCTION
The historical and empirical study of open sourcesoftware (OSS) ecosystems is a relatively recent but fast-growing research domain. An important characteristic ofsuch ecosystems, at least according to our definition [15],is the fact that they are made up of a set of softwareprojects sharing a community of users and contributors.A well-known example is GNOME. Its constituent soft-ware projects are designed to work together in order toconstitute a complete software desktop environment. TheGNOME projects are developed by a developer commu-nity that is spread across the world. We have observedthat it is not uncommon for a contributor to be activelyinvolved in many projects at a time [16]. In additionto this, the type of activity a contributor is involved inmay change from one person to another. For example,a very important activity involves internationalization(localization and translation), which is globally managedvia the web application Damned Lies1 for all GNOMEtranslation teams.
Many tools and datasets have been proposed to anal-yse a software project’s history, but few are availableat the level of the ecosystem because of the additionallevel of difficulty involved. It does not suffice to simplyconsider the union of all project histories belonging tothe same ecosystem. Because some projects may havecontributors in common, and some contributors may beinvolved in different projects over time, this informationneeds to be explicitly represented at the ecosystemlevel. The same is true for the types of activity of anecosystem’s contributor, and how this varies over time,and over the different projects he is involved in.
1http://l10n.gnome.org
In this paper, we present the process we have usedto create a dataset containing the historical informationrelated to contributors to the GNOME ecosystem. Ourdatabase and the tools and scripts used to created it canbe found on a dedicated Bitbucket repository2.
In contrast to many other datasets, we do not focus onsource code, since a significant amount of files commit-ted to GNOME’s project repositories do not even containcode (e.g., image files, web pages, documentation, lo-calization and many more). Such type of information isoften ignored in MSR research while it is very relevantto understand which types of activities contributors areinvolved in. For GNOME we observed, for example, thata significant fraction of the community is working oninternationalization instead of code [16].
II. MOTIVATION
An important motivation for creating a historicaldataset for analysing contributors to the GNOME ecosys-tem was inspired by the many OSS repository miningstudies that have used GNOME as a case study [2], [13].In 2009 and 2010, GNOME was part of the MSR MiningChallenge, which lead to many contributions [1], [5], [8],[9], [11], [12], [14].
Of specific interest, in the context of software ecosys-tem research, are the social interactions in the commu-nity of contributors. Following a holistic approach, [7]estimated effort and studied developer co-operation andco-ordination in GNOME, based on the version controlrepositories and mailing lists. Similarly, [4] developed anadvanced measure of individual developer contributionbased on the source code repository, mailing lists andbug tracking systems, and applied the measure to anumber of GNOME projects. [6] studied six GNOMEprojects in order to understand how contributors join,socialize and develop within GNOME. [10] studied re-lations between the GNOME contributors by means ofsocial network analysis.
In our own previous work [15], [16] we used thedataset presented in this article to statistically analyse thespecialization of workload and involvement of GNOMEcontributors across projects and activity types, and we
2https://bitbucket.org/mgoeminne/sgl-flossmetric-dbmerge
@ MSR 2013
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, BelgiumFebruary 2014 -‐ CSMR-‐WCRE So6ware EvoluEon Week, Antwerp, Belgium
References
�26
Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. !Springer, ISBN 978-3-642-45398-4
Chapter 10Studying Evolving Software Ecosystemsbased on Ecological Models
Tom Mens, Maelick Claes, Philippe Grosjean and Alexander Serebrenik
Research on software evolution is very active, but evolutionary principles, modelsand theories that properly explain why and how software systems evolve over timeare still lacking. Similarly, more empirical research is needed to understand howdifferent software projects co-exist and co-evolve, and how contributors collaboratewithin their encompassing software ecosystem.
In this chapter, we explore the differences and analogies between natural ecosys-tems and biological evolution on the one hand, and software ecosystems and soft-ware evolution on the other hand. The aim is to learn from research in ecology toadvance the understanding of evolving software ecosystems. Ultimately, we wishto use such knowledge to derive diagnostic tools aiming to analyse and optimisethe fitness of software projects in their environment, and to help software projectcommunities in managing their projects better.
Tom Mens and Maelick Claes and Philippe GrosjeanCOMPLEXYS Research Institute, University of Mons, Belgiume-mail: tom.mens,maelick.claes,[email protected]
Alexander SerebrenikEindhoven University of Technology, The Netherlandse-mail: [email protected] work has been partially supported by F.R.S-F.N.R.S. research grant BSS-2012/V 6/5/015author’s stay at the Universite de Mons, supported by the F.R.S-F.N.R.S. under the grant BSS-2012/V 6/5/015. and ARC research project AUWB-12/17-UMONS-3,“Ecological Studies of OpenSource Software Ecosystems” financed by the Ministere de la Communaute francaise - Directiongenerale de l’Enseignement non obligatoire et de la Recherche scientifique, Belgium.
245
5 February 2014 — CSMR-‐WCRE So6ware EvoluEon Week,Antwerp, Belgium
Interested in joining?
• Open PhD posiEon available • 6 to 12 month postdoc visits welcomed
�27