Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC –...
-
date post
19-Dec-2015 -
Category
Documents
-
view
222 -
download
4
Transcript of Raising the Bar with Open Source - R as an Exemplar Kathy Gerber University of Virginia ITC –...
Raising the Bar with Open Source -Raising the Bar with Open Source -R as an ExemplarR as an Exemplar
Kathy GerberUniversity of Virginia
ITC – Research Computing Support GroupACCS Conference - March 2008
What is R?
• A statistical software language licensed under the GPL.
• A software environment for manipulating, analyzing, and graphing data.
• An integrated programming environment, allowing users to write their own functions to do customized tasks.
• Structured around the base installation, allowing individual selection from hundreds of downloadable packages for addressing specialized tasks.
• R is supported by an active community of developers.
http://www.r-project.org
Topics for Today
Motivation Open Source Software Communities Statistical and Mathematical Computing
Communities The R Community R Intro How is Value Defined and Evaluated? Extend to Evaluating Other Projects
Motivation
How does quality become part of an open source project?
What do we mean by quality? What factors are behind the success of R? Or how did the developmental trajectory of R differ
from roughly comparable projects? Where do we place R in the larger open source
world? Can other projects make use of lessons learned?
A Bit of R History
• Personal and technical talents of the original developers. Leadership.
• John Chambers - creator of the S language• 1991: Ross Ihaka and Robert Gentleman• 1993: R first announced• 1995: R made available under the GPL• John Fox is interviewing members of the R Core team
and others for an upcoming piece.• In 1998, the UCLA Department of Statistics, which had
been one of the major users of Lisp-Stat, decided to switch to S/R.
John Chambers’ Timeline From His Use-R Conference Talk
ACM Software System Award(a partial listing)
1983 - UNIX 1986 - TeX 1991 - TCP/IP 1995 - World-Wide Web 1997 - Tcl/Tk 1998 - The S System 1999 - The Apache Group 2002 - Java 2005 - The Boyer-Moore Theorem Prover
Timeline with GPL
1983 - GNU Project 1989 - First GNU GPL 1993 - R binary made available on statlib 1991 - Linux kernel first announced/ released 1995 - R source code available by ftp
Email to R-help list: Feb 14, 2008
Next month I'll be giving a talk on R as an example of high quality open source software. I think there is much to learn from R as a high quality extensible product that (at least as far as I can tell) has never been "spun" or "hyped" like so many open source fads.
The question that intrigues me the most is why is R as an open source project is so incredibly successful and other projects, say for example, Octave don't enjoy that level of success?
I have some ideas of course, but I would really like to know your thoughts when you look at R from such a vantage point.
Community: Authors & Developers
Necessity of top experts involved in implementation throughout the entire process starting with the planning stage
Community attributes and priorities• Mature but evolving discipline and participants –
impact on matters such as determining appropriate content
• Disciplinary stewardship and pedagogical responsibility
DLMF Talk: 2005
Credits
Many thanks to members of the R Core team and the many helpful participants on the R-help list for sharing thoughtful and informative insights.
Robert BaerJonathan BaronDouglas BatesPatrick BurnsAndy BushPatrick ConnollyJohn W. Eaton * Ben FairbankJohn FoxJohn C. Frain
Roger KoenkerJim LemonMichael A. MillerDuncan MurdochBarry RowlingsonJohn SorkinPhil SpectorKevin WrightAchim Zeileis
Paul GilbertEarl F. GlynnSpencer GravesFrank E. Harrell, Jr.Søren HøgsgaardRoger KoenkerJim LemonMichael A. MillerDuncan MurdochFrank E. Harrell, Jr.
Quality Characteristics
Functionality Reliability Usability Efficiency Maintainability Portability
Buchberger’s Mathematical Creativity SpiralAnd Where, When and How do Mathematicians Compute?
Phases-- Experimentation -- Exactification -- Application
http://b.kutzler.com/bk/a-pt/ped-tool.html
Quality: Developer Talent
Near one extreme: text editors 434 text editors on http://sourceforge.net (Mar 2, 2008). 147 have been downloaded 0 times. Total downloads for top 10: 14.7 million
Top 10 Source Forge Text Editors
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
5,000,000
1 2 3 4 5 6 7 8 9 10
Downloads
Market Share for Top Servers Across All Domains August 1995 - February 2008Netcraft Web Server Survey
Apache: 80 million
The ‘Lottery Syndrome’ and Recent Open Source StatisticsMichael K. Bergmanhttp://www.mkbergman.com/?p=148
Silos of Developer Talent
Testing (Robert Gentleman)
How distributed development is possible. The developer is responsible for writing
examples for his/her code. Others are responsible for making sure they
don’t break the code by running those examples.
Packaging System
Suppose you want a pie chart. >install.packages(“plotrix”) >install.packages(“plotrix”, dep=true)
Value: Ease of Use, Maintainability
OS neutral (Windows, Linux, Mac)
All risk averse users should like the idea that programs and acquired skills are not tied to the operating system and hardware flavor of the month. (R has excelled in this respect.) -- Paul Gilbert
Functional language Extensible – what does it mean?
Views
Value: Inherent Interest
R was not just a clone of S. Its implementation had certain advantages over how S was implemented. This gave it some inherent interest, as opposed to just being an anti-commercial exercise.
This interestingness is probably one of the features that attracted the set of people to work on it that it did. While most potential users wouldn't have known the difference, those who knew something about statistical computing knew that the R people were not just any ol' collection of country bumpkins. --- Patrick Burns
Value: Portability
The windows port of R has been very good for a long time. I know some people who even think that the current windows port is better than the Linux version. Thanks to those who have made the windows port available and who continue to maintain it.
-- John C. Frain
Values and Value in Open Source
Monetary worth Social worth Changes over time
Value: Price
Purchase cost is typically not so important for corporate and institutional users, since it is usually dominated by support costs. However, young users may often feel they would prefer to have their personal investment in something they can easily take with them if they move. Some of us at the other end like the idea that we don't need a corporate account to continue research we might be interested in doing when we retire.
-- Paul Gilbert on R
Value: Coolness and Awesomeness
Sometimes cool software is next to useless in getting work done. Not always.
Is R cool? Advocacy - a help or a hindrance?
Value: Speed - Efficiency
Raw computational speed Perform tasks > sample(30,15) [1] 10 27 6 23 2 4 17 28 21 22 3 24 1 11
26
Value: Flexibility
Flexibility in the present includes having the capability to make use of diverse data resources, program customizations or add-ons, and play well with other programs and tools.
Flexibility for the future means adaptability to new domain developments and new ways of doing things.
Value: Software Collaboration
Access to databases Talk with similar programs and data sets Embed in other programs
Example 1: web services Example 2: Sage ( http://www.sagemath.org )
Value: People Collaboration
Able to work together in development Able to work together to resolve
management problems R-help list, e.g., response to query
References
Cribari-Neto, Francisco and Zarkos, Spyros G.: R: Yet Another Econometric Programming Environment, Journal of Applied Econometrics, 14: 319-329 (1999)
Ihaka, Ross: R: Past and Future History, A Draft of a Paper for Interface ‘98
R Project: http://www.r-project.org de Leeuw, Jan: On Abandoning XLISP-STAT,
Journal of Statistical Software, Feb 2005, Vol. 13, Issue 7, http://www.jstatsoft.org/
Gentleman, Robert: R and Modern Statistical Computing, www.ssc.ca/sso/msc.ppt