Post on 31-Jul-2015
Better Software, Better Research
Dr. Boris Adryan@BorisAdryan
http://www.software.ac.uk @SoftwareSaved
Software Sustainability Institute
brief bio & experience
since 2015 Fellow of the SSI
since 2013 IoT entrepreneur
2008-2016 Royal Society research group leader at University of Cambridge
2011-2015 Scientific advisor to FlyBase
2012-2015 MPhil Director for Computational Biology
‣ a UK government-funded “virtual institute” for building better, sustainable software
‣ primarily focussed on academic software but very inclusive to industry partners
‣ distributed team with a few members at universities in Southampton, Oxford, Manchester and Edinburgh plus a vast network of independent fellows “in the field”
Software Sustainability Institute
http://www.software.ac.uk @SoftwareSaved
software‣ good, reusable code ‣ well documented
people‣ recognition and reward ‣ career paths
values‣ reproducibility ‣ openness
policy‣ raise awareness ‣ establish facts
Software Sustainability Institute
http://www.software.ac.uk @SoftwareSaved
Survey results: http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software Sustainability Institute
yes 92%
no 8%
yes 56%
no 44%
yes 79%
no 21%
no difference 10%
not be practical 21%
more effort 69%
do you use research software?
do you develop research software?
have you received training in software
development?
impact of not having research software
‣ Software reaches boundaries that prevent improvement, growth and adoption
‣ Providing the expertise and services needed to negotiate to the next stage: ✓ software reviews and refactoring ✓ collaborations between stakeholders (Hi, Eclipse!) ✓ guidance and best practice on software development ✓ training (e.g. Software Carpentry) ✓ project management ✓ community building ✓ publicity etc…
Software Sustainability Institute
http://www.software.ac.uk @SoftwareSaved
Software Sustainability Institute
Work better. Together.
Issues with research software
Exemplified by the honest account and anecdotes of
‣bad coding, ‣bad design decisions, and ‣bad practice
of a humble biologist.
coding skills
school Turbo Pascal Turbo Prolog
independent developer Borland Delphi
undergraduate and PhD student
postdoc Perl, R, SQL
hobbyist + entrepreneur Python, C, node.js, Clojure, noSQL
CTRL+F1
1992
1995
2005-present
2010-present
‣unsupervised undergraduate project
‣ inspired by the need of a PhD student
‣no software manual or help
‣requests for code: 0
‣URL is long dead, no idea about the whereabouts of code
very generous for the time!
‣addressed my own needs as biologist (“got the job done”)
‣horribly mix of object oriented and spaghetti code
‣required complex manipulations in the source to update quickly outdated information
‣requests for code: many; but too embarrassed to put on sourceforge
“If you would like to adapt GO-Cluster to your personal needs and want the source code (only fairly commented), please contact my group leader Dr. Reinhard Schuh.”
‣ there’s virtually no Objective C adoption in the scientific community
BAD SCIENCE“All other data analyses were performed using custom-written Perl scripts or
publicly available websites.“
“All downstream analyses were performed with custom-made Perl scripts.”
“All data analysis was performed using custom-written Perl scripts and statistical tests were performed with R.”
Embarrassingly unscientific quotes from a few of my data analytical papers between 2005-2008
i.e.: “f$@k you, I can’t be asked telling you what I did!” in combination with
mostly uncommented write-only and execute-once type scripts
OPEN DATA, OPEN SOURCE, OPEN ACCESS, OPEN SCIENCE
since early 2010s: increased pressure in the community not only to release data, but also tools
‣sometimes requested by journals ‣often required to appease reviewers ‣ frequent naming and shaming on Twitter
simple Perl CGI script with MySQL backend
‣easy to update content :-) ‣no analytical capability :-(
using InterMine framework, based on Java, ASP, Ajax and PostgreSQL
‣ fancy features and looks :-) ‣requires a specialists to do any
update :-(
FlyTF is a gold standard, but has never been funded!
Technical upgrade (feature-rrhea) was motivated because content-only updates are hard to publish.
‣Java
‣hardware- and OS-independent
‣GUI and config files
‣extensive documentation for end-users and programmers
‣code refactored regularly to ease readability for novices
‣all source on Github
Issues with (academic) software development
‣ typically little or no dedicated budget for software development on scientific grants
‣ even if funded, resources are often too little to adhere to best practices (e.g. lack of a planning phase)
‣ often very ad-hoc with a focus on getting ‘one job done’, not with reuse and sustainability in hindsight
‣ there’s no credit for writing good software ‣ code generated by ‘amateurs’ with a high turnover
of people with skills ‣ academic salaries are poor compared to industry
salaries - it’s hard to get professional software developers
Software Sustainability Institute
Work better. Together.
This presentation is on Slideshare: http://www.slideshare.net/BorisAdryan
For the community. Driven by individuals. Us.