GoOpen 2010: Roger Bivand

27
Experience R project R spatial Managing research in collaborative networks Open Source software, research and higher education: a practitioner’s view GoOpen 2010 (Fou thread), Aker Brygge, Oslo, 19–20 April. Roger Bivand Department of Economics Norwegian School of Economics and Business Administration Bergen, Norway 20 April 2010 Roger Bivand A practitioner’s view

description

 

Transcript of GoOpen 2010: Roger Bivand

Page 1: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Open Source software, research and highereducation: a practitioner’s view

GoOpen 2010 (Fou thread), Aker Brygge, Oslo, 19–20 April.

Roger Bivand

Department of EconomicsNorwegian School of Economics and Business Administration

Bergen, Norway

20 April 2010

Roger Bivand A practitioner’s view

Page 2: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Outline

This talk will examine how open source software developmentand use may interact with their institutional contexts inresearch and higher education

The talk will be based on experience of open sourcedevelopment in applied statistics and geospatial applications

Reasons for mismatch between an institutional contextpreferring secrecy when applying for funding, restricteddeliverables, and races to publication, and the ways in whichopen source development occur will be discussed

In particular, the roles of mutual trust and community-buildingin open source development will be stressed; these factorsappear to express externalities between developers and usersof software that are neglected in the exclusive managementmodels prevalent in research and higher education

Roger Bivand A practitioner’s view

Page 3: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Contextual background

In order to provide some justifications for presenting a“practitioners view”, some background information beyond myaffiliation may be useful

Although employed in the Department of Economics atNorges Handelshøyskole, I am an academic geographer,educated in Cambridge, and the London School of Economics

My specialities within geography are in quantitative methodsand geographical information systems, and have used anddeveloped software since 1973, for research and teaching

During the EU 5th Framework, I was involved in theevaluation of three open source Information SocietyTechnologies (IST) calls; I also founded the MBA programmesat Warsaw University of Technology in 1991/92

Roger Bivand A practitioner’s view

Page 4: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Little languages

My first“open source”publication was an extra module for theproprietary program Systat, with both source code and DOS binariesavailable for FTP download, and an accompanying paper inComputers & Geosciences in 1992

While much early software (Fortran, later C) was compiled (I onlyhad limited exposure to BASIC), by the 1980s little languages,generally interpreted, began to appear as glue for compiled programs

The languages covered in two of my papers published in 1996 and1997 were the Unix shell scripting language and AWK, used as gluefor the GRASS GIS, and for GMT for map production; I have beenusing Unix/Linux since 1985

In these papers and other work in the mid 1990s, I pointed up thebenefits of scripting in permitting work to be reproduced andaudited, contrasted with non-journalling GUIs that were becomingprevalent in academic practice

Roger Bivand A practitioner’s view

Page 5: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Glimpse from 1997

Here is a slide from a talk given in Italy about software forhandling geographical information (GI) in early 1997:

MAPPING GI USERS:

PRODUCTION: high training costs,

application specific macro languages,

CASUAL: generic likeness to

familiar GUI, looks & behaves

like Excel or Netscape (cf.

plug-ins)

PROFESSIONALS: as consultants customising

GI handling technologies for clients in long/

medium term relationships; as researchers

in GI handling technologies

few linking requirements (cf. COTS)

CURIOUS: as researchers analysing

geographic information; as citizens

challenging the use of GI by private

companies and public administration

STANDARDISED TASKSMORE LESS

MORELESS NEED OPEN SOFTWARE

Roger Bivand A practitioner’s view

Page 6: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Using the R project

My first message to the R project was in mid January 1997, asI had begun using early alpha releases to re-implement anumber of spatial analysis functions

The initial motivation to systematise code for functions forspatial data analysis was for a course given in the University ofBergen Department of Geography; we were a joint departmentuntil administrative changes split us

By 1998, Albrecht Gebhardt (Klagenfurt, Austria) and I hadprovided code for most simple spatial data analysis for R,either porting existing code, or writing fresh contributions(presentation at a congress in Vienna)

But what is the R project?

Roger Bivand A practitioner’s view

Page 7: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

www.r-project.org

While its website is non-candy, R is becoming a central resource forstatistical and computational data analysis across the sciences andin business:

Roger Bivand A practitioner’s view

Page 8: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

The R project

R is a language and environment for statistical computing andgraphics — it is a GNU project which is similar to the S languageand environment which was developed at Bell Laboratories (formerlyAT&T, now Alcatel–Lucent) by John Chambers and colleagues

R can be considered as a different implementation of S. There aresome important differences, but much code written for S runsunaltered under R

The term“environment” is intended to characterize it as a fullyplanned and coherent system, rather than an incremental accretionof very specific and inflexible tools, as is frequently the case withother data analysis software

Many users think of R as a statistics system. We prefer to think ofit of an environment within which statistical techniques areimplemented — R can be extended (easily) via packages

Roger Bivand A practitioner’s view

Page 9: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

The R foundation

The R project began as an academic

initiative with no funding in Auckland,

New Zealand, and was licensed under

GPL as more collaborators joined. This

group was strengthed by academic

contributors to S, who began to work

with R in the late 1990s. By 2002, a

more formal structure was needed, and a

foundation was formed. I was invited to

join as an ordinary member in March

2003, so have seen things“from the

kitchen” since then.

Roger Bivand A practitioner’s view

Page 10: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

The R community

While the software system was intended

to be“fully planned and coherent”, the

community that has grown up around R

is neither planned nor coherent. Since

1997, there have been two main mailing

lists, one for users, the other for

developers. John Fox (another non-core

ordinary foundation member) has

described the social structure of the

project in a recent paper in the R

Journal, from which this graph is taken:

Roger Bivand A practitioner’s view

Page 11: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

CRAN and contributed packages

The community has also grown thanks

to the ease with which packages may be

contributed. Both writing packages, and

their formal checking against R are not

hard — the check process executes all

the examples on the help pages and

other documentation. The

comprehensive R archive network

(CRAN) thus distributes R itself (source

and binaries for multiple platforms) and

packages (source and binaries), and

packages may also be installed and

updated from within R.

Roger Bivand A practitioner’s view

Page 12: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

CRAN

Roger Bivand A practitioner’s view

Page 13: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

CRAN task views

Since so many packages have been

contributed to R, and distributed

through CRAN, it became necessary to

provide a mechanism for guiding users

towards solutions to their problems. It is

helpful to see the complexity of CRAN

as an advantage, with“ecologically”more

fit packages establishing themselves in

“niches”possibly even in competition

with other packages providing similar

facilities. Task views have been added as

a light-weight non-authoritative way of

offering suggestions:

Roger Bivand A practitioner’s view

Page 14: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

R Forge

In addition to CRAN running the

released, patched, and development

versions of R on the CRAN packages’

examples nightly, packages may also be

hosted on the R Forge repository. This

provides the usual *forge services, such

as SVN, but also builds Windows and

OSX binary packages, and checks

package source on multiple platforms

nightly. So even alpha or beta packages

may be made available, and may begin

to harvest user input, before being

released to CRAN:

Roger Bivand A practitioner’s view

Page 15: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

R spatial

In 1999 I had interfaced R and the open source GIS GRASS, andpresented a paper on this at a Scandinavian GIS meeting — thepaper was rejected by Norsk Geografisk Tidsskrift, but published inextended form in Computers & Geosciences in 2000

This, and the publication of a paper based on my 1998 presentationwith Albrecht Gebhardt in Journal of Geographical Systems, and apresentation with Markus Neteler, the lead GRASS developer at the2000 GeoComputation conference, led to closer personal contactswith R core

Kurt Hornik, who runs CRAN, encouraged me to talk about R andGIS at the March 2001 Distributed Statistical Computing meeting inVienna, at which I got to know active developers personally

By the next DSC meeting in March 2003, I was organising athematic session on spatial statistics, and a crucial fringe developers’workshop to discuss how to advance spatial data analysis in R

Roger Bivand A practitioner’s view

Page 16: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

CRAN Spatial task view

Since 2003, a number of

community-building steps have been

made over and above developing

contributed packages. From the CRAN

side, the Spatial task view is the hub, to

which traffic is channelled to package

pages and to ancilliary websites, as well

as the special interest group mailing list.

Some package authors contact me to

ask to be included, others are asked

whether they want to be added to the

web of information

Roger Bivand A practitioner’s view

Page 17: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

R-sig-geo mailing list

Following the 2003 workshop, we

started a project on Sourceforge to

permit joint development, and a mailing

list served within the family of R lists

from Zurich. Traffic on the list has

grown steadily, with a subscribed

membership in April 2010 of over 1600.

Naturally, many of these“lurk”without

posting, while others post without

helping, and many fewer help by

answering posted questions. This final

group is however growing, and since the

list archives are also kept on Nabble,

they are easy to search for information.

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

050

100

150

200

250

300

monthly number of emails on r−sig−geo

# of

em

ails

2004 2005 2006 2007 2008 2009 2010

Roger Bivand A practitioner’s view

Page 18: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

The sp package

In 2003, we agreed that a shared system of

new-style classes to contain spatial data would

permit many-to-one and on-to-many conversion

of representations, avoiding the then prevalent

many-to-many conversion problem. The idea

was to make it easier for GIS people and stats

people to work together by creating objects that

“looked” familiar to both groups, although the

groups differ a lot in how they“see”data

objects. Package dependencies have grown, here

the upper diagram shows packages depending

on sp in April 2008, the lower diagram in April

2010:

Roger Bivand A practitioner’s view

Page 19: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

R Wiki

In addition to the“coordinated”

information sources, a community Wiki

does exist. While it seems to suit some

users, the general impression (among

older people?) is that there is little

feeling of responsibility for following up

tips given there. On the mailing list and

its archive, usually experienced

developers or users will clarify

misunderstandings, while on the Wiki,

posters do not feel obliged to update

their contributions, as when examples

stop working (they are not run ever,

unlike CRAN package examples):

Roger Bivand A practitioner’s view

Page 20: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Spatial on R Forge

R Forge is used actively by individuals

and groups in developing packages for

spatial data analysis, with 52 projects

registered in April 2010. Some projects

are registered in more than one topical

area, some may never mature, but some

are already in active use; the raster

package is already frequently discussed

on R-sig-geo — it was released to CRAN

in late March 2010 after a gestation of

16 months.

Roger Bivand A practitioner’s view

Page 21: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Book website

Finally, I’ll mention a book that I wrote

with Edzer Pebesma and Virgilio

Gomez-Rubio, and published in the

Springer useR series in 2008. Not only

does the book seem to be doing OK, but

the website with dataset and code

download is visited frequently (450–600

unique visitors per month). The code is

run nightly against current R and the

various required contributed packages. It

may be of interest to note that the text

was written using the literate

programming tool Sweave in R, which is

designed to support reproducible

research (as indeed is this talk).

Roger Bivand A practitioner’s view

Page 22: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Managing research in higher education

While the links between the knowledge economy and OpenSource software are evident, there are very real challenges tothe management of research and higher education in policyterms that need to be addressedMost research and higher education organisations have beenrationalised and subjected to the styles of managementpractices introduced in commercial corporations years andeven decades agoIn particular, budget discipline is a favoured tool in attemptingto point organisational units in directions seen as beingappropriateGiven that these organisations clearly face a“missing market”,in that neither potential students nor grant-giving bodies areanalogues of customers in a fast-food restaurant, thoseresponsible for management have a measurement problem

Roger Bivand A practitioner’s view

Page 23: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Grant processes

Universities and research institutions appear to“compete” ingrant processes, and thereby seem to have an interest inlocking potential competitors out, by securing privilegedaccess to knowledge

While such advantage may be quite real in the case oflaboratory skills and quality — the institution does deliverservices of higher quality, or when the institution has securedthe services of high-flying academics — this model is notdirectly transferable to software

Given the steadily increasing importance of software inteaching and research, it seems clear that care is needed inconstructing management tools for activities which mayproduce or modify software (see the UEA“climategate”scandal)

Roger Bivand A practitioner’s view

Page 24: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Software deliverables

It does make sense for institutions to develop expertise incustomising software, in training, and in publishing materialsof benefit to software users on a for-profit basis

It does not in general, however, make sense to mandate sourceclosure in research programs or projects, in the same way thatmandating openness might be mistaken

The question as to whether software deliverables, or softwaredeveloped in the process of creating deliverables should beopened is one that is relevant in all grant processes

It is also highly relevant in evaluation routines associated withprogram and project execution

Roger Bivand A practitioner’s view

Page 25: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Handling software in research projects

In grant awarding and evaluation processes, the grant-makingbody should consider at least two factors: the importance ofOpen Source for enhanced efficiency in providing the softwareneeded in a project, and the importance of reproducibility andpeer-review in the scientific process generally

It can thus be argued that the management of the boundarybetween what the institution“owns”, what can sensibly becommercialised on a for-profit basis, and research productivityand efficiency deserves attention

Otherwise, naive and rather outdated management practisescan endanger research quality and productivity with regard tosoftware innovation and incremental improvement by seeingproducts where one should see services

Roger Bivand A practitioner’s view

Page 26: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Software, research and higher education

There are clearly cases in which source code should not beopened, although when the public purse has funded theresearch involves, the number of real cases will in practice bevery few, even for projects with very small communities ofinterest

It is of importance for the enabling, for the empowering ofactors in the knowledge economy, that unnecessary barriers tothe diffusion of knowledge be removed, and that new ones notbe permitted to emerge

As a corollary, researchers should perhaps be given incentivesin career terms to contribute to the pool of knowledge byopening source code, and by contributing to the improvementof software in their domain of science, in the same way thatpublications are rewarded

Roger Bivand A practitioner’s view

Page 27: GoOpen 2010: Roger Bivand

ExperienceR projectR spatial

Managing research in collaborative networks

Round-up

As far as I am aware, no research council has played anyrelevant role in the progress of the R project directly

Indirectly, research council funded projects have includedsoftware deliverables defined as contributed packages,including spatial packages (but none that I have handled)

Even more indirectly, people in research council fundeddoctoral and post-doctoral positions have not only used R andR spatial, but have contributed to software development, eventhough this was not required or mentioned in their projects

Finally, the diffuseness and unpredictability of collaborativenetworks of“amator”developers makes it very hard to reply tocalls; if a research council wanted to be pro-active, it mightfund travel for active developers to enable them to meet, orsimilar enabling measures

Roger Bivand A practitioner’s view