Geographic origin of libre software developers

8
Geographic origin of libre software developers q Jesus M. Gonzalez-Barahona a, * , Gregorio Robles a , Roberto Andradas-Izquierdo a , Rishab Aiyer Ghosh b a GSyC/LibreSoft, Departamento de Sistemas Telemáticos y Computación, Universidad Rey Juan Carlos, C/Tulipan s/n, 28903 Mostoles, Spain b Collaborative Creativity Group, United Nations University (UNU-MERIT), Keizer Karelplein 19, 6211TC Maastricht, The Netherlands article info Article history: Available online 29 July 2008 JEL classification: C81 L17 Keywords: Geographical location Data mining Libre software Free software Open source software abstract This paper examines the claim that libre (free, open source) software involves global devel- opment. The anecdotal evidence is that developers usually work in teams including indi- viduals residing in many different geographical areas, time zones and even continents and that, as a whole, the libre software community is also diverse in terms of national ori- gin. However, its exact composition is difficult to capture, since there are few records of the geographical location of developers. Past studies have been based on surveying a limited (and sometimes biased) sample and extrapolating that sample to the global distribution of developers. In this paper we present an alternate approach in which databases are ana- lyzed to create traces of information from which the geographical origin of developers can be inferred. Applying this technique to the SourceForge users database and the mailing lists archives from several large projects, we have estimated the geographical origin of more than one million individuals who are closely related to the libre software development pro- cess. The paper concludes that the result is a good proxy for the actual distribution of libre software developers working on global projects. Ó 2008 Elsevier B.V. All rights reserved. 1. Introduction One of the most well known claims about libre (free, open source) software 1 is that it is based on an internationally dis- tributed pool of developers. Most of these projects are open to participation by any interested individual with Internet access who has sufficient knowledge and skills to make a contribution. There are apparently few barriers to participation that are due to the geographical location of a developer. However, the distribution of those developers in the different regions of the globe is extremely uneven, showing that barriers do exist, even when they are not built by the projects themselves. Obtain- ing a clear picture of the actual geographical distribution of libre software developers is a precondition to analyzing the nat- ure and source of barriers and their effect on the participation of specific populations in this global phenomenon. The 0167-6245/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.infoecopol.2008.07.001 q This work has been funded in part by the European Commission, under the FLOSSMETRICS and FLOSSWorld projects (IST program, Contract Numbers 015722 and 033982). This work is based on in part on the SourceForge database provided by University of Notre Dame, see details at http://www.nd.edu/ oss/Data/data.html. This work is based in part in the contribution ‘‘Geographic Location of Developers at SourceForge”, by Gregorio Robles and Jesus M. Gonzalez-Barahona, presented in the Mining Software Repositories Workshop, Shanghai, May 2006. * Corresponding author. E-mail addresses: [email protected] (J.M. Gonzalez-Barahona), [email protected] (G. Robles), [email protected] (R. Andradas-Izquierdo), [email protected] (R.A. Ghosh). 1 Through this paper we will use the term libre software to refer both to free software (according to the Free Software Foundation) and to open source software (according to the Open Source Initiative). Information Economics and Policy 20 (2008) 356–363 Contents lists available at ScienceDirect Information Economics and Policy journal homepage: www.elsevier.com/locate/iep

Transcript of Geographic origin of libre software developers

Page 1: Geographic origin of libre software developers

Information Economics and Policy 20 (2008) 356–363

Contents lists available at ScienceDirect

Information Economics and Policy

journal homepage: www.elsevier .com/locate / iep

Geographic origin of libre software developers q

Jesus M. Gonzalez-Barahona a,*, Gregorio Robles a, Roberto Andradas-Izquierdo a,Rishab Aiyer Ghosh b

a GSyC/LibreSoft, Departamento de Sistemas Telemáticos y Computación, Universidad Rey Juan Carlos, C/Tulipan s/n, 28903 Mostoles, Spainb Collaborative Creativity Group, United Nations University (UNU-MERIT), Keizer Karelplein 19, 6211TC Maastricht, The Netherlands

a r t i c l e i n f o

Article history:Available online 29 July 2008

JEL classification:C81L17

Keywords:Geographical locationData miningLibre softwareFree softwareOpen source software

0167-6245/$ - see front matter � 2008 Elsevier B.Vdoi:10.1016/j.infoecopol.2008.07.001

q This work has been funded in part by the Europ015722 and 033982). This work is based on in partoss/Data/data.html. This work is based in part in thGonzalez-Barahona, presented in the Mining Softwa

* Corresponding author.E-mail addresses: [email protected] (J.M. Gonzalez-Ba

(R.A. Ghosh).1 Through this paper we will use the term libre s

software (according to the Open Source Initiative).

a b s t r a c t

This paper examines the claim that libre (free, open source) software involves global devel-opment. The anecdotal evidence is that developers usually work in teams including indi-viduals residing in many different geographical areas, time zones and even continentsand that, as a whole, the libre software community is also diverse in terms of national ori-gin. However, its exact composition is difficult to capture, since there are few records of thegeographical location of developers. Past studies have been based on surveying a limited(and sometimes biased) sample and extrapolating that sample to the global distributionof developers. In this paper we present an alternate approach in which databases are ana-lyzed to create traces of information from which the geographical origin of developers canbe inferred. Applying this technique to the SourceForge users database and the mailing listsarchives from several large projects, we have estimated the geographical origin of morethan one million individuals who are closely related to the libre software development pro-cess. The paper concludes that the result is a good proxy for the actual distribution of libresoftware developers working on global projects.

� 2008 Elsevier B.V. All rights reserved.

1. Introduction

One of the most well known claims about libre (free, open source) software1 is that it is based on an internationally dis-tributed pool of developers. Most of these projects are open to participation by any interested individual with Internet accesswho has sufficient knowledge and skills to make a contribution. There are apparently few barriers to participation that aredue to the geographical location of a developer. However, the distribution of those developers in the different regions of theglobe is extremely uneven, showing that barriers do exist, even when they are not built by the projects themselves. Obtain-ing a clear picture of the actual geographical distribution of libre software developers is a precondition to analyzing the nat-ure and source of barriers and their effect on the participation of specific populations in this global phenomenon. The

. All rights reserved.

ean Commission, under the FLOSSMETRICS and FLOSSWorld projects (IST program, Contract Numberson the SourceForge database provided by University of Notre Dame, see details at http://www.nd.edu/e contribution ‘‘Geographic Location of Developers at SourceForge”, by Gregorio Robles and Jesus M.re Repositories Workshop, Shanghai, May 2006.

rahona), [email protected] (G. Robles), [email protected] (R. Andradas-Izquierdo), [email protected]

oftware to refer both to free software (according to the Free Software Foundation) and to open source

Page 2: Geographic origin of libre software developers

J.M. Gonzalez-Barahona et al. / Information Economics and Policy 20 (2008) 356–363 357

implications of this knowledge are not only academic; they are also important from both strategic and economic points ofview.

Getting an accurate picture of the geographical distribution of developers is, however, not a simple task. Considerableinformation about developers exists in public repositories maintained by projects (contributions to source code, messagesin mailing lists, bug reports submitted, etc.), but these repositories do not usually include specific information about geo-graphic origin. Even non-public user databases of large sites hosting libre software projects (such as SourceForge) do not in-clude data fields containing direct declarations of national or geographical origin. Other sources of data, such as surveys (aclassical approach to collecting this type of data) have shortcomings. Some surveys have included questions about the geo-graphical origin of respondents, and have resulted in interesting (and partially consistent) results, but they have also high-lighted the difficulties of constructing a geographically representative sample.

In this paper, we use a novel approach, based not on the direct collection of geographical information, but on mining andanalyzing traces of this information, which when aggregated, can lead to reasonably accurate data about the geographic dis-tribution of individuals involved in the global libre software development. To do this, we use as source data the private usersdatabase of SourceForge, the largest site hosting libre software projects (containing information about more than one millionregistered users). We also use the mailing lists stored in the public repositories of several global projects (such as GNOME orFreeBSD), which include information about the messages sent by several hundreds of thousands of different mail addresses.Analysis of the combination of these sources can produce a large and unbiased sample that is representative of the globallibre software developer community. An accurate picture of the distribution of developers can then be constructed.

2. Methodology

Neither the SourceForge database nor the mailing list archives contain specific information about the geographical loca-tion of developers. That has to be inferred from other indicators. For these databases, we will use top-level domains in ad-dresses and time zone information (when available). In the case of SourceForge, users have to provide a valid e-mail address,which is used for verification during the registering process, and can select the time zone they prefer. In the case of mailinglists, each message includes an e-mail address in the ‘‘To” field, and usually a header with information about the time zonefrom which the mail was sent. Therefore, the analysis in both cases is similar. However, there are several differences worthmentioning.

Users in SourceForge can be assumed to be different (and the presence of duplicated e-mail addresses can be checked).However, in many cases an individual uses different e-mail addresses for different postings in mailing lists. Therefore, beforeperforming any study of the distribution of origins, addresses corresponding to the same individual have to be linked. This isdone using the Seal system,2 which uses several heuristics. Once that step is completed, the geographical origin of an indi-vidual can be established using the corresponding collection of linked addresses.

The incidence of different addresses for the same person is expected to be higher in the case of mailing lists (as opposed toSourceForge accounts) mainly because of two factors: people using several e-mail addresses at the same time (e.g., mail ac-counts at home and at work), and people using several e-mail addresses over time (e.g., due to moving to new jobs, or to newpersonal e-mail service providers). Neither factor is relevant in SourceForge, because the same account can easily be usedfrom any location, and maintained over time. To detect related addresses, the Seal system uses several heuristics; the mostpowerful is finding the same ‘real’ name accompanying different e-mail addresses in different messages to the same mailinglist. Although this heuristic could raise false positives for common ‘real’ names (e.g., John Smith), we have determined bymanual verification that this is uncommon.

Time zone information also comes in different formats in SourceForge and in mailing lists. In the case of SourceForge, thisinformation usually comes in a format such as ‘‘Europe/Madrid”, which includes rich geographical information (although insome cases, it can also be ‘‘GMT+1”, which is less specific). In the case of e-mail messages, it is usually in a format like‘‘GMT+1”, although in some cases, the time zone identifier (such as ‘‘CET” for Central European Time) also appears.

Some other aspects are common to both cases. Top-level domains in e-mail addresses can be country-specific (such as‘‘es” for Spain), but can also be generic (such as ‘‘com”). To make things worse, there are some countries (notably the UnitedStates) for which the use of the country-specific domain is very rare. For instance, in the case of SourceForge, more than750,000 from a total population of over 1,180,000 registered developers; do not use a country-code top-level domain. Allof these effects have to be considered when inferring the distribution of developers across countries.

The approach for identifying an individual’s origin in both cases is, in summary, as follows:

� The individuals using addresses that include country-code domains are assigned to that country.� The remaining individuals will have non country-code top-level domains. For them, the distribution of time zones for each

second level domain (such as ‘‘debian.org”) is calculated, and used to infer country distributions.� In cases of discrepancy (e.g., an individual with time zone ‘‘Europe/Madrid” but ‘‘.de” address), further heuristics are

used.The detailed algorithm for the SourceForge case is shown in detail in Robles and Gonzalez-Barahona (2006). For

2 Seal is one of the LibreSoft tools, see more info in http://tools.libresoft.es.

Page 3: Geographic origin of libre software developers

358 J.M. Gonzalez-Barahona et al. / Information Economics and Policy 20 (2008) 356–363

the mailing lists case, the exact algorithm is slightly different due to the different formats for time zone information, butthe general idea is the same.The exact data sources analyzed in this article are:

� The SourceForge database, as provided to research teams by the University of Notre Dame under a special agreement.3 Weused the monthly dump of November 2005, which included more than 1,180,000 registered users.

� Mailing lists archives of the Debian Project (97 lists, archives of May 2006, with first messages from 1995) and of theGNOME and FreeBSD projects (archives of August 2006, with first messages of 2001 and 2003, respectively), having a com-bined a total of more than one million different e-mail addresses.

3. Related research

There are several contributions in the literature that deal with the estimation of geographical origins of libre softwaredevelopers. They can be classified in two main categories: those based on specific data provided by certain libre softwareprojects, and those which obtain data from surveys.

To our knowledge, the first study in this area was Dempsey et al. (2002), which analyzed the top-level domain of the e-mail addresses of developers found in the Linux Software Map entries.4 Unfortunately, this study could not compensate forthe bias resulting from the prevalence in the use of the country-code domain for the United States, and it was also limited bythe number of individuals considered.

Later, the Debian project was studied (Robles et al., 2001) using the country information that the Debian developers havethe option of entering in the Debian Developer Database. Since it also contains information about the admission date, an evo-lutionary analysis was performed, showing a shift in time from a dominance of US-based developers to a larger Europeanparticipation, and a very small representation of developers from developing countries.

The CREDITS file of the Linux kernel, and the contact information of the GNOME Project were studied by Lancashire(2001), who also showed a shift towards greater European-based developer participation in both cases. He explained thatshift with an economic model in which the geographic distribution of developers depended on their opportunity costs. Someyears later, a new study of the Linux 4 CREDITS file (Tuomi, 2004) provided a more in-depth study of the geographical dis-tribution of the kernel developers.

The first study based on surveys was probably WIDI (Robles et al., 2001), which featured over 5500 respondents, andshowed a majority of developers to be EU-based, although a bias was acknowledged due to the self-selected nature of theparticipants. A later survey, FLOSS (Ghosh et al., 2002), was answered by about 2500 self-selected developers over the Inter-net, also showing a surprisingly large quantity of European developers (in comparison with their American and Asian coun-terparts). Similar surveys have since been performed, such as FLOSS-US (David et al., 2003) (in which Europeans were alsopredominant) and some others have been done in Asia. Another contemporary study (Hertel et al., 2003) focused on thestudy of motivation. Those authors obtained a sample with a larger number of North American (48%) than European(37%) developers. However, the total size of that sample was relatively small (only 141).

Specifically for SourceForge, the only reference to our knowledge that addresses the origin of its population of users is apost in SourceForge News.5 It shows the results obtained by the SourceForge team after running an IP geolocation tool onsome logs of the site (apparently, the web server logs). The results, which seem to refer to visitors by country in someunspecified period of time during 2002, are partially consistent with those presented in this paper, although they are difficultto interpret given the lack of details about how they were obtained.

From this review of related research, it is clear that the problem of estimating the geographical origin of libre softwaredevelopers has been addressed by several authors, but is still an open field. In particular, no study has considered very largesamples of developers (in the range of tens of thousands or more). This is exactly the focus of our analysis, trying to provide,from a different point of view, a new answer to this problem.

4. Results and observations

Prior to presenting our findings, we should note that data from SourceForge and other sources, such as the mailing lists ofglobal projects, might under represent some regions. In East Asia, where China and Japan have a sizeable developer commu-nity, participants apparently do not directly interact with the global community very extensively due to cultural and linguis-tic differences. Indeed, the participation of developers from China and Japan in global projects may often be channeledthrough a few key ‘‘connectors”, individuals who for reasons of language and culture are more comfortable contributingto global projects, and who contribute on behalf of others.

Nevertheless, from an economic perspective it is useful to examine the distribution of participation in global projects andportals such as SourceForge, as they are good indicators of the population of globally active developers. While they underrepresent China and Japan in absolute numbers of developers, they probably provide an accurate representation of the global

3 More information about this agreement can be obtained from http://www.nd.edu/�oss/Data/data.html.4 The Linux Software Map (LSM) is a database of software written or ported to Linux, http://lsm.execpc.com/lsm/.5 http://sourceforge.net/forum/forum.php?forum_id=238394, posted on December 26th 2002.

Page 4: Geographic origin of libre software developers

J.M. Gonzalez-Barahona et al. / Information Economics and Policy 20 (2008) 356–363 359

influence on libre software development from China and Japan, an influence which is arguably low (relative to their devel-oper numbers), given their lower degree of global participation. This lack of influence may reflect the software developmentthat is taking place in these countries. For instance, it may be that communities which are less globally active focus theirdevelopment more on local needs, such as language localization.

Considering SourceForge as a collection of globally active developers, Table 1 lists 6 the top 30 countries by number ofdevelopers registered at SourceForge, according to our estimation of country origin. The top 50 countries amount for96.5% of the total identified registered users, while the top 20 countries include up to 83.9% of the total SourceForgepopulation.

Table 2 groups countries by large geographical regions. These figures are generally consistent with previous studies,although they show somewhat higher numbers for North America. In any case, it is clear that most of the developers comefrom Europe and North America (with almost a one to one ratio), followed by Asia with less than 10% of the developerpopulation. On the other hand, since the population is larger in Europe than in North America, the penetration of the libresoftware development measured in SourceForge registered developers per capita is higher in North America than inEurope, as shown in the demographic analysis below. Fig. 1 also shows growth in populations of developers over the pastfew years.

Even among registered SourceForge users, only a small proportion actually make commits to the source code of thehosted libre software projects. These are people who make changes to the code in the publicly available central repositories,and thus also play a greater leadership role. Out of some 1.1 million registered participants on SourceForge, just under 50,000

Table 1Top 30 countries by number of users in SourceForge (estimation)

Rank Country Developers

1 United States 425,6202 Germany 95,8003 United Kingdom 60,7684 Canada 49,1095 France 44,5876 China 36,5177 Australia 31,8128 Italy 30,7639 Netherlands 29,33510 Sweden 23,86711 India 22,11312 Brazil 21,29113 Russian Federation 19,01214 Spain 18,90515 Japan 15,08116 Poland 14,69717 Belgium 13,98318 Switzerland 12,13319 Austria 10,02420 Denmark 995221 Singapore 915522 Finland 902723 Norway 849824 Mexico 818525 South Korea 772726 Israel 694827 Argentina 669528 Hungary 657329 Romania 634530 Taiwan 6336

Table 2Number of users in SourceForge by region

Region Developers

Africa 12,560Asia 127,275EU 401,845Europe 466,792North America 485,679Oceania 46,422South America 36,330

Page 5: Geographic origin of libre software developers

Fig. 1. Evolution over time of the number of developers in SourceForge, by region (estimation).

Fig. 2. Evolution over time of the number of core developers in SourceForge, by region (estimation).

360 J.M. Gonzalez-Barahona et al. / Information Economics and Policy 20 (2008) 356–363

commit code to the development repositories. As Fig. 2 shows, despite the possible US bias of the SourceForge portal, Euro-pean developers have in recent years overtaken North American developers in terms of such active participation.

Fig. 3 shows the location of active participants in the analyzed mailing lists (see Section 2 for details). It provides a pictureof coordination and development roles in projects, and may be considerably less biased towards US registrations than theSourceForge user registration data. Indeed, if one looks at the mailing lists as representing only the specific projects exam-ined, it is not a biased sample at all but a census of active contributors to discussions related to project development. It isnotable that this picture also indicates that most of the contributors come from Europe. The high share of contributors from‘‘other” regions includes Latin America, Eastern Europe and other parts of Asia as well as other parts of the world, indicatingat least in part the extensive global diversity in development discussion (if not actual coding) accompanying a strong Euro-pean localization. The inclusion of GNOME – a popular graphical user interface – in our selection of projects probably in-creases the apparent global diversity of participation.

Page 6: Geographic origin of libre software developers

Fig. 4. World map of committers by population.

Fig. 3. Partipants in development mailing lists, by region (estimation).

J.M. Gonzalez-Barahona et al. / Information Economics and Policy 20 (2008) 356–363 361

Looking beyond Western Europe and North America, it is clear that libre software development is closely correlated togeneral levels of ICT participation, population and wealth.6 To highlight this finding, we plot globally active core developers(committers) on SourceForge, by country.

As Fig. 4 shows, the number of libre software developers is far from evenly distributed in terms of population. Indeed,when weighted by population, there are more developers in the US and Canada than in most European countries or regions,with the exception of Scandinavia. Australia also has a particularly high share of developers. This contrast is easily explainedgiven Europe’s relatively low indicators for ICT use (compared to North America); a good proxy is Internet penetration, whichis lower in Europe. As seen in Fig. 5, the US has fewer libre software developers per million Internet users than most Euro-pean countries.

The low developer numbers from Asia and Latin America are perhaps most directly influenced by wealth. As Fig. 6 shows,China, India, Russia, Brazil and even South Africa are among the higher contributors when numbers are adjusted for wealth.

6 Data from World Development Indicators database from the World Bank, 2001.

Page 7: Geographic origin of libre software developers

Fig. 5. World map of committers by Internet penetration.

Fig. 6. World map of committers by average wealth.

362 J.M. Gonzalez-Barahona et al. / Information Economics and Policy 20 (2008) 356–363

In fact, India with 606 committers per USD 1000 GDP/capita has by far the highest wealth-adjusted contribution to globalFLOSS development, almost half as much again as China, the next largest contributor from this group.

The low numbers of developers in most African countries has to be considered in the light of a methodological problem.For many of those countries, the use of foreign e-mail addresses is common, and many of them fall into the same time zonesas European countries, which could lead to an underestimation of developers in these countries.

Finally, we note a possible bias in the case of SourceForge. As SourceForge is clearly a company-owned facility, joining itcould have different connotations for developers in different cultures. For example, it is well known that some developers seethe ‘control’ by SourceForge of a large fraction of all the facilities available for libre software development as a problem. (Thisis for instance the position of the Free Software Foundation, which maintains its own facility, Savannah, to avoid usingSourceForge.) If developers from a certain culture were more sensitive to this kind of arguments, the corresponding countrieswould be under-represented in the sample of SourceForge users.

Page 8: Geographic origin of libre software developers

J.M. Gonzalez-Barahona et al. / Information Economics and Policy 20 (2008) 356–363 363

5. Conclusions and further research

In this paper we have shown a more complete and detailed landscape of the geographical distribution of libre softwaredevelopers around the world than previous studies. The combined analysis of SourceForge users data and mailing list ar-chives also offers a more complete perspective on participation.

It is important to emphasize that this research is focused on global development (that is, projects not targeted in partic-ular to a specific geographical or cultural community), and for that reason, developers participating only in regional or localprojects are not considered. These developer groups can form sizable communities in some areas of the world.7 In addition,and despite the large size and representativeness of our samples, some bias might remain. For instance, projects that are notpresent in SourceForge may have a different geographical distribution. However, since SourceForge hosted projects and theother projects we studied account for a sizable fraction of all libre software available, we believe this bias is inconsequential.

All of the results presented are the result of heuristics and (educated) assumptions, and are therefore inexact. We haveworked with sources having rather different error margins, and we have used heuristics that are sound, but they are subjectto a certain error rate in identifying locations. To assess the validity of the methodology for estimating the national origin, itwould be desirable to check (probably by contacting developers themselves) a large fraction of SourceForge users. The re-sults could then be compared with those of our study. However, the validations we have performed seem to indicate thatthe results are statistically sound, and that the figures shown are good estimators of the reality.

Our methodology is not focused on identifying the geographical location of single developers (although in many casesthat is done), but on finding the aggregate numbers of developers of a certain national origin. In many cases, therefore,we use algorithmically driven estimations to infer the proportion of nationals of a certain country in a population of usershaving certain characteristics. This is certainly a limitation of the proposed approach, especially if we were interested in(individual) developer identification methods as proposed in other works (e.g., Robles and Gonzalez-Barahona, 2005).

Acknowledgements

We thank the SourceForge team, and Greg Madey from the University of Notre Dame, for providing access to the Source-Forge data. Thanks to Martin Michlmayr for his help with the Debian mail archives. Also, a big thank you goes to our col-leagues from GSyC/LibreSoft for their help in verifying the validity of the data. Last, but not least, thanks to theanonymous reviewers; their comments have helped to improve this paper.

References

David, P.A., Waterman, A., Arora, S., 2003. FLOSS-US. The free/libre/open source software survey for 2003. Technical Report, Stanford Institute for Economicand Policy Research, Stanford, CA, USA.

Dempsey, B.J., Weiss, D., Jones, P., Greenberg, J., 2002. Who is an open source software developer? Communications of the ACM 45 (2), 67–72.Ghosh, R.A., Glott, R., Krieger, B., Robles, G., 2002. Survey of developers (free/libre and open source software: survey and study). Technical Report,

International Institute of Infonomics, University of Maastricht, The Netherlands.Hertel, G., Niedner, S., Herrmann, S., 2003. Motivation of software developers in open source projects: an internet-based survey of contributors to the linux

kernel. Research Policy (32), 1159–1177.Lancashire, D., 2001. Code, culture and cash: the fading altruism of open source development. First Monday 6 (12).Robles, G., Gonzalez-Barahona, J.M., 2005. Developer identification methods for integrated data from various sources. In: Proceedings of the International

Workshop on Mining Software Repositories, St. Louis, Missouri, USA, pp. 106–110.Robles, G., Gonzalez-Barahona, J.M., 2006. Geographic location of developers at SourceForge. In: Proceedings of the Mining Software Repositories Workshop,

Shanghai, China.Robles, G., Scheider, H., Tretkowski, I., Weber, N., 2001. Who is doing it? A research on libre software developers. Technical Report, Technische Universitt

Berlin, Berlin, Germany.Tuomi, I., 2004. Evolution of the Linux credits file: methodological challenges and reference data for open source research. First Monday 9 (6).

7 A separate research in which the authors are also working shows that important regional communities, with little relationship with the global community,exist in regions such as East Asia and Brazil.