MSR Conf.odp
Transcript of MSR Conf.odp
Mining Software Repositories
What to do? And where to get data?
Israel Herraiz Universidad Alfonso X el Sabio
June 18th 2010
Outline
What is Mining Software Repositories? What are repositories?
Conferences and journals of interest
And some words about trending topics
Tools for Mining Software Repositories
Datasets for Mining Software Repositories
For replicable and verifiable empirical studies
1. What is Mining Software Repositories?
What is Mining Software Repositories?
MSR analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects.
Popular topic since 2004MSR workshop, colocated with ICSE
Working Conference since 2008
What are repositories?
Anything that leaves a trail about any software development or maintenance activities
Also includes any software artifact
TipicallyVersion control systems
Bug tracking systems
Public communication tools (mailing lists)
Differences between artifact and repository
#include
int main() {printf(Hello world);return 0;}Artifact Source code file
hello.c
- printf(Hello world);+ printf(Hello world\n);Author: rmsDate: 20100618 04:34 UTCChange: +1 -1Log: Forgot to add new line
hello.c.diff
RepositoryChange to an artifactMeta-information
2. Conferences and journals of interest
Working conferences of interest
IEEE Int. Working Conf. Source Code Analysis & Manipulation(SCAM)
http://www.ieee-scam.org
IEEE Int. Working Conf. Mining Software Repositories(MSR)
http://msr.uwaterloo.ca
Deadlines Accept rate Journal possib.
January
(Februray for the challenge)
April
26% (2007)38% (2008)45% (2009)
19% (2008)31% (2010)
JSSSCP
EMSEIEEE TSE
Conferences of interest
IEEE Int. Conf. Software Engineering (ICSE)
http://www.sbs.co.za/ICSE2010/
IEEE Int. Conf. Software Maintenance (ICSM)
http://icsm2010.upt.ro/
Deadlines Accept rate Journal possib.
April
AugustSeptember
15% (2008)12% (2009)14% (2010)
21% (2007)26% (2008)22% (2009)
Nospecial issues
Nospecialissues
Empirical Software Eng. & Measurement (EMSE)
http://www.esem-conferences.org/
March
?
EMSE
Other interesting conferences
Working Conference on Reverse Engineering (WCRE)http://web.soccerlab.polymtl.ca/wcre2010/
International Conference on Predictive Models and Software Engineering (PROMISE)http://promisedata.org/
European Conference on Software Mainteance and Re-engineering (CSMR)http://www.sait.escet.urjc.es/csmr2010/
Journals of interest
IEEE Transactions on Software Engineering (TSE) http://www.computer.org/tse/
ACM Transactions on Software Engineering and Methodology (TOSEM)http://tosem.acm.org/
Empirical Software Engineering (EMSE)http://www.springerlink.com/content/1382-3256
Journal of Systems and Software (JSS)http://www.elsevier.com/locate/jss
Journal of Software Maintenance and Evolution (JSME)http://eu.wiley.com/WileyCDA/WileyTitle/productCd-SMR.html
Handy links
Software Engineering ConferencesVerification, Formal Methods, Programming Lang. and Compilers, Web, Security
http://people.engr.ncsu.edu/txie/seconferences.htm
Upcoming Software Engineering Conferences Maphttp://research.csc.ncsu.edu/ase/semap/
Trending topics
Replication of empirical studiesThe replication package
Recommendation systemsAutomated Software Engineering
3. Tools for Mining Software Repositories
Tools for Mining Software Repositories
Mining toolsLibresoft Tools http://tools.libresoft.es/
CVSAnaly CVS/SVN/Git repositories log parser
MLStats Mailman and Mboxes parser
Bicho Bugzilla and SF.net tracker parser
Software Architecture Group (SWAG) University of Waterloohttp://www.swag.uwaterloo.ca/tools.html
4. Datasets for Mining Software Repositories
MSR Mining Challenge
Mirrors of the version archives and bug databases for Mozilla Firefox and Eclipsehttp://msr.uwaterloo.ca/msr2008/challenge/
Repository logs of over 500+ Gnome projects, XML dump of the bug databases, and the complete SVN repositories of 69 Gnome projectshttp://msr.uwaterloo.ca/msr2009/challenge/
Ultimate Debian Database
Database with information about packages and bug reports of Debian and Ubuntuhttp://udd.debian.org/
Eclipse bug database
Saarland University
Datasheets, databases, scripts, with information about Eclipse bug reports for several releases
http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/
FLOSSMetrics
Databases about ~5000 open source projects
Control version repositories, mailing list archives, bug tracking databases
MySQL dumpsNot very user friendly
Obtained using the Libresoft Tools
http://www.flossmetrics.org/
FLOSSMole
Database with information about all the SourceForge.net projects
~150,000 projects
Mainly metainformation, obtained through parsing the web pages of the projects
No low level or fine grained information
http://flossmole.org
PROMISE repository
All PROMISE papers must also submit a package with the data used in the paper
http://promisedata.org/
101 datasetsDefect prediction (58)
Effort prediction (18)
General (9)
Model-based SE (7)
Text mining (9)
http://www.uax.es