Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC –...

32
Raising the Bar with Open Source - Raising the Bar with Open Source - R as an Exemplar R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March 2008
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    4

Transcript of Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC –...

Page 1: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Raising the Bar with Open Source -Raising the Bar with Open Source -R as an ExemplarR as an Exemplar

Kathy GerberUniversity of Virginia

ITC – Research Computing Support GroupACCS Conference - March 2008

Page 2: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

What is R?

• A statistical software language licensed under the GPL.

• A software environment for manipulating, analyzing, and graphing data.

• An integrated programming environment, allowing users to write their own functions to do customized tasks.

• Structured around the base installation, allowing individual selection from hundreds of downloadable packages for addressing specialized tasks.

• R is supported by an active community of developers.

http://www.r-project.org

Page 3: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Topics for Today

Motivation Open Source Software Communities Statistical and Mathematical Computing

Communities The R Community R Intro How is Value Defined and Evaluated? Extend to Evaluating Other Projects

Page 4: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Motivation

How does quality become part of an open source project?

What do we mean by quality? What factors are behind the success of R? Or how did the developmental trajectory of R differ

from roughly comparable projects? Where do we place R in the larger open source

world? Can other projects make use of lessons learned?

Page 5: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

A Bit of R History

• Personal and technical talents of the original developers. Leadership.

• John Chambers - creator of the S language• 1991: Ross Ihaka and Robert Gentleman• 1993: R first announced• 1995: R made available under the GPL• John Fox is interviewing members of the R Core team

and others for an upcoming piece.• In 1998, the UCLA Department of Statistics, which had

been one of the major users of Lisp-Stat, decided to switch to S/R.

Page 6: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

John Chambers’ Timeline From His Use-R Conference Talk

Page 7: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

ACM Software System Award(a partial listing)

1983 - UNIX 1986 - TeX 1991 - TCP/IP 1995 - World-Wide Web 1997 - Tcl/Tk 1998 - The S System 1999 - The Apache Group 2002 - Java 2005 - The Boyer-Moore Theorem Prover

Page 8: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Timeline with GPL

1983 - GNU Project 1989 - First GNU GPL 1993 - R binary made available on statlib 1991 - Linux kernel first announced/ released 1995 - R source code available by ftp

Page 9: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Email to R-help list: Feb 14, 2008

Next month I'll be giving a talk on R as an example of high quality open source software. I think there is much to learn from R as a high quality extensible product that (at least as far as I can tell) has never been "spun" or "hyped" like so many open source fads.

The question that intrigues me the most is why is R as an open source project is so incredibly successful and other projects, say for example, Octave don't enjoy that level of success?

I have some ideas of course, but I would really like to know your thoughts when you look at R from such a vantage point.

Page 10: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Community: Authors & Developers

Necessity of top experts involved in implementation throughout the entire process starting with the planning stage

Community attributes and priorities• Mature but evolving discipline and participants –

impact on matters such as determining appropriate content

• Disciplinary stewardship and pedagogical responsibility

DLMF Talk: 2005

Page 11: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Credits

Many thanks to members of the R Core team and the many helpful participants on the R-help list for sharing thoughtful and informative insights.

Robert BaerJonathan BaronDouglas BatesPatrick BurnsAndy BushPatrick ConnollyJohn W. Eaton * Ben FairbankJohn FoxJohn C. Frain

Roger KoenkerJim LemonMichael A. MillerDuncan MurdochBarry RowlingsonJohn SorkinPhil SpectorKevin WrightAchim Zeileis

Paul GilbertEarl F. GlynnSpencer GravesFrank E. Harrell, Jr.Søren HøgsgaardRoger KoenkerJim LemonMichael A. MillerDuncan MurdochFrank E. Harrell, Jr.

Page 12: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Quality Characteristics

Functionality Reliability Usability Efficiency Maintainability Portability

Page 13: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Buchberger’s Mathematical Creativity SpiralAnd Where, When and How do Mathematicians Compute?

Phases-- Experimentation -- Exactification -- Application

http://b.kutzler.com/bk/a-pt/ped-tool.html

Page 14: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Quality: Developer Talent

Near one extreme: text editors 434 text editors on http://sourceforge.net (Mar 2, 2008). 147 have been downloaded 0 times. Total downloads for top 10: 14.7 million

Top 10 Source Forge Text Editors

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

4,500,000

5,000,000

1 2 3 4 5 6 7 8 9 10

Downloads

Page 15: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Market Share for Top Servers Across All Domains August 1995 - February 2008Netcraft Web Server Survey

Apache: 80 million

Page 16: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

The ‘Lottery Syndrome’ and Recent Open Source StatisticsMichael K. Bergmanhttp://www.mkbergman.com/?p=148

Page 17: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Silos of Developer Talent

Page 18: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Testing (Robert Gentleman)

How distributed development is possible. The developer is responsible for writing

examples for his/her code. Others are responsible for making sure they

don’t break the code by running those examples.

Page 19: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Packaging System

Suppose you want a pie chart. >install.packages(“plotrix”) >install.packages(“plotrix”, dep=true)

Page 20: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.
Page 21: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: Ease of Use, Maintainability

OS neutral (Windows, Linux, Mac)

All risk averse users should like the idea that programs and acquired skills are not tied to the operating system and hardware flavor of the month. (R has excelled in this respect.) -- Paul Gilbert

Functional language Extensible – what does it mean?

Page 22: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Views

Page 23: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: Inherent Interest

R was not just a clone of S. Its implementation had certain advantages over how S was implemented. This gave it some inherent interest, as opposed to just being an anti-commercial exercise.

This interestingness is probably one of the features that attracted the set of people to work on it that it did. While most potential users wouldn't have known the difference, those who knew something about statistical computing knew that the R people were not just any ol' collection of country bumpkins. --- Patrick Burns

Page 24: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: Portability

The windows port of R has been very good for a long time. I know some people who even think that the current windows port is better than the Linux version. Thanks to those who have made the windows port available and who continue to maintain it.

-- John C. Frain

Page 25: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Values and Value in Open Source

Monetary worth Social worth Changes over time

Page 26: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: Price

Purchase cost is typically not so important for corporate and institutional users, since it is usually dominated by support costs. However, young users may often feel they would prefer to have their personal investment in something they can easily take with them if they move. Some of us at the other end like the idea that we don't need a corporate account to continue research we might be interested in doing when we retire.

-- Paul Gilbert on R

Page 27: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: Coolness and Awesomeness

Sometimes cool software is next to useless in getting work done. Not always.

Is R cool? Advocacy - a help or a hindrance?

Page 28: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: Speed - Efficiency

Raw computational speed Perform tasks > sample(30,15) [1] 10 27 6 23 2 4 17 28 21 22 3 24 1 11

26

Page 29: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: Flexibility

Flexibility in the present includes having the capability to make use of diverse data resources, program customizations or add-ons, and play well with other programs and tools.

Flexibility for the future means adaptability to new domain developments and new ways of doing things.

Page 30: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: Software Collaboration

Access to databases Talk with similar programs and data sets Embed in other programs

Example 1: web services Example 2: Sage ( http://www.sagemath.org )

Page 31: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

Value: People Collaboration

Able to work together in development Able to work together to resolve

management problems R-help list, e.g., response to query

Page 32: Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC – Research Computing Support Group ACCS Conference - March.

References

Cribari-Neto, Francisco and Zarkos, Spyros G.: R: Yet Another Econometric Programming Environment, Journal of Applied Econometrics, 14: 319-329 (1999)

Ihaka, Ross: R: Past and Future History, A Draft of a Paper for Interface ‘98

R Project: http://www.r-project.org de Leeuw, Jan: On Abandoning XLISP-STAT,

Journal of Statistical Software, Feb 2005, Vol. 13, Issue 7, http://www.jstatsoft.org/

Gentleman, Robert: R and Modern Statistical Computing, www.ssc.ca/sso/msc.ppt