OntoSoft: A Distributed Semantic Registry for Scientific Software - 2016...
Transcript of OntoSoft: A Distributed Semantic Registry for Scientific Software - 2016...
1Yolanda GilUSC Information Sciences Institute [email protected]
OntoSoft: A Distributed Semantic Registry for
Scientific Software
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar
Information Sciences Instituteand Department of Computer Science
University of Southern California@yolandagil, @dgarijov
{gil,dgarijo,saurabhm,varunr}@isi.edu
http://www.ontosoft.orgBuilding Block
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
2Yolanda GilUSC Information Sciences Institute [email protected]
We have all been here…
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
3Yolanda GilUSC Information Sciences Institute [email protected]
The Value of Software: Reproducibility
Financial
Human lives
Reliability
Scientific integrity
Financial
Trust
5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com
Page 1 of 8http:/ / www.nytimes.com/ interactive/ 2015/ 05/ 28/ science/ retractions- scientific- studies.html?smid= tw- nytimesscience&_r= 1
Sections Home Search Skip to content
Advertisement
EmailShareTweetMore
Search
Subscribe
Log In 0 Settings
Close search
search sponsored by
Search NYTimes.com
Clear this text input Go
http://nyti.ms/1HPVX1t
1. 1. Study on Attitudes Toward Same-Sex Marriage Is Retracted by a Scientific Journal
2. A Proposal to Modify Plants Gives G.M.O. Debate New Life
3. Chimpanzees in Liberia, Used in New York Blood Center Research, Face Uncertain Future
4. Matter
The Human Family Tree Bristles With New Branches
5. Observatory
Race and Gender Biases Can be Reduced With Sleep Therapy, Study Finds
6. Observatory
Ancient Skull Suggests an Early Murder
7. National Briefing | Washington
Live Anthrax Spores Shipped to Laboratories
8. A Robot That Can Perform Brain Surgery on a Fruit Fly
9. Jinghong Journal
China’s High Hopes for Growing Those Rubber Tree Plants
10. Scientists Warn to Expect More Weather Extremes
11. Arguing in Court Whether 2 Chimps Have the Right to ‘Bodily Liberty’
12. Sister Megan Rice, Freed From Prison, Looks Ahead to More Anti-Nuclear Activism
13. Obama Announces New Rule Limiting Water Pollution
14. Lassa Fever Carries Little Risk to Public, Experts Say
SUBSCRIBE NOW
5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com
Page 5 of 8http:/ / www.nytimes.com/ interactive/ 2015/ 05/ 28/ science/ retractions- scientifi c- studies.html?smid= tw- nytimesscience&_r= 1
The retraction by Science of a study of changing attitudes about gay marriage is
the latest prominent withdrawal of research results from scientific literature.
And it very likely won't be the last. A 2011 study in Nature found a 10-fold
increase in retraction notices during the preceding decade.
Many retractions barely register outside of the scientific field. But in some
instances, the studies that were clawed back made major waves in societal
discussions of the issues they dealt with. This list recounts some prominent
retractions that have occurred since 1980.
Photo
In 1998, The Lancet, a British medical journal,
published a study by Dr. Andrew Wakefield
that suggested that autism in children was
caused by the combined vaccine for measles,
mumps and rubella. In 2010, The Lancet
retracted the study following a review of Dr.
Wakefield's scientific methods and financial
conflicts.
Despite challenges to the study, Dr.
Wakefield's research had a strong effect on
many parents. Vaccination rates tumbled in
Britain, and measles cases grew. American
antivaccine groups also seized on the research. The United States had more
cases of measles in the first month of 2015
than the number that is typically diagnosed in a full year.
Vaccines andAutism
Papers published by Japanese researchers in Nature in 2014 claimed to provide
an easy method to create multipurpose stem cells, with eventual implications
for the treatment of diseases and injuries. Months later, the authors, including
Haruko Obokata, issued a retraction. An investigation by one of Japan's most
prestigious scientific institutes, where much of the research occurred, found
that the author had manipulated some of the images published in the study.
Approximately one month after the retraction, one of Ms. Obokata's co-authors,
Yoshiki Sasai, was found hanging in a stairwell of his office. He had taken his
own life.
Stem Cell Production
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
4Yolanda GilUSC Information Sciences Institute [email protected]
Quantifying the Value of Software through
“Reproducibility Maps” [Bourne & Gil et al 12]
2 months of effort in reproducing published method (in PLoS’10)
Authors expertise was required
Comparison of ligand binding sites
Comparison of dissimilar protein structures
Graph network generation
Molecular Docking
Work with P. Bourne of UCSD
5Yolanda GilUSC Information Sciences Institute [email protected]
Software Today
There are repositories of domain specific software (e.g., geosciences)
There are general software repositories with no standard metadata
Most scientists are not aware of the value of their software
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
6Yolanda GilUSC Information Sciences Institute [email protected]
“Dark Software”
Models that are not published
• Eg from a PhD thesis
Data preparation software
• Data pre-processing and QC can take up to 80% of a project’s effort
Visualization software
“Dark Software” is the counterpart of “Dark Data” [Heidorn 2008]
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
7Yolanda GilUSC Information Sciences Institute [email protected]
Why Is Software Not Shared?
“Noone would use my code if I shared it”
“My code is really bad”
“My code is not ready to be shared”
“Sharing my software will take a lot of time”
“I won’t get anything out of sharing my software”
“I’ve shared software before, bad things happened”
“I work for the government”
“I want to commercialize my software”
“I don’t want anyone to sell my software”
“I don’t know where to start!”Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
8Yolanda GilUSC Information Sciences Institute [email protected]
Contributions: OntoSoft
Registry for software• Complements code repositories
• Scientist-centered software metadata
• Community curated software metadata
• Training scientists on best practices
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
9Yolanda GilUSC Information Sciences Institute [email protected]
OntoSoft Architecture
OntoSo So wareMetadataRepository
Ontologies
Geosciences
OntoSo so waremetadataimportpublish
query
OntoSo UserInterface
PublishBrowse/Search
query
ExternalRepository
Push
GitHub
ApacheSVN
CSDMS
…
Adapters(eg,BMI)
CSDMS CF ESMF …
Domain-SpecificUI
StandardNames
OntoSo Training
Lessons
VMEnvironmentGenerator
Docker
Vagrant
…
…
SolrSearch
Index
Videos
DomainOntologies
…
ExternalRepository
Pull
5/31/2016
Recommend
NOAA
…
OntoSo components
Externalcomponents
Legend
OtherOntoSo Installa ons
PROV
WebAccessControl
MetadataAccessControl
10Yolanda GilUSC Information Sciences Institute [email protected]
The OntoSoft Ontology for Describing
Scientific Software Metadata [Gil et al 2015]
An ontology for scientific software metadata
• Intended to describe scientific software
• Designed with scientists in mind to guide them to deposit and describe their software in a software registry
Major categories of metadata: what does a scientist need?
1. identify software
2. understand what it does and its utility for research,
3. execute the software,
4. get support if questions arise,
5. do research with it, and
6. contribute to its development
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
11Yolanda GilUSC Information Sciences Institute [email protected]
OntoSoft Metadata Categories
http://www.ontosoft.org/software
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
12Yolanda GilUSC Information Sciences Institute [email protected]
Describing Scientific Software in OntoSoft
http://www.ontosoft.org/portal
Metadata can be exported in
several formats (HTML, RDF,
JSON)
Metadata for 3DDY Software
Metadata properties
collected through
simple questions
Set permissions for 3DDY
Metadata properties
organized into categories that
make sense to scientists
Automatic import of metadata
from other repositoriesIndicators of metadata
completeness
13Yolanda GilUSC Information Sciences Institute [email protected]
Access control
http://www.ontosoft.org/portal
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
Users and permissions for
the 3DDY software
component
Setting permissions for editing 3DDY metadata
W3CWeb access control Ontology
14Yolanda GilUSC Information Sciences Institute [email protected]
Software entries
from distributed
repositories are
readily accessible
Semantic
search
Comparison matrix
of software entries
PIHM PIHMgis DrEICH TauDEM WBMsed nto$o%$
Metadata
completion
highlighted
Software is
contrasted
by property
15Yolanda GilUSC Information Sciences Institute [email protected]
Community
Learning
UK Software Institute
Software Carpentry
CIGESMF
Critical Zone Observatory
Early Career Advisory Board
FES/ESIP
CSDMSEarthCubeBuilding Blocks
Recommender system � Interoperability
Publication
Community
Learning
Structured metadata � Interactive advice
� Best practices � Multimedia lessons
Collaborating with SEN C4P EC3
EarthCubeRCNs
Publication
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
Omics
Code meta initiative
16Yolanda GilUSC Information Sciences Institute [email protected]
Conclusions
Software is a valuable research product
• Must embed best practices of software sharing into research activities
Improve productivity, quality, reproducibility
OntoSoft contributions• Ontology of scientific
software metadata
• Portal for software registry Do you want to use Ontosoft? Let us know!
http://www.ontosoft.org
http://www.ontosoft.org/software
http://www.ontosoft.org/portal
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
17Yolanda GilUSC Information Sciences Institute [email protected]
More Information
http://www.ontosoft.org
http://www.ontosoft.org/software
http://www.ontosoft.org/portal
http://www.ontosoft.org/gpf
OntoSoft: Capturing Scientific Software Metadata. Yolanda Gil, Varun Ratnakar, and Daniel Garijo. Proceedings of the Eighth ACM International Conference on Knowledge Capture (K-CAP), 2015.
OntoSoft: A Distributed Semantic Registry for Scientific Software. Yolanda Gil, Daniel Garijo, Saurabh Mishra, and Varun Ratnakar. Under review, 2016.
DRAT: An Unobtrusive, Scalable Approach to Large Scale Software License Analysis. Chris A. Mattmann, Ji-Hyun Oh, Tyler Palsulich, Lewis John McGibbney, Yolanda Gil, and Varun Ratnakar. Proceedings of the Fourth International Workshop on Software Mining, held in conjunction with the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015.
Cyber-Innovated Watershed Research at the Shale Hills Critical Zone Observatory. Xuan Yu, Chris Duffy, Yolanda Gil, Lorne Leonard, Gopal Bhatt, and Evan Thomas. IEEE Systems Journal, to appear.
Collaborative Software Development Needs in Geosciences. Yolanda Gil, Eunyoung Moon and James Howison. Proceedings of the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2), held in conjunction with the IEEE ACM International Conference on High Performance Computing (SC), New Orleans, LA, November 2014.
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users. Daniel Garijo, Oscar Corcho, Yolanda Gil, Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad and, Paul Thompson and Arthur W. Toga. Proceedings of the IEEE Conference on e-Science, 2014.
FragFlow: Automated Fragment Detection in Scientific Workflows. Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A. Gutman, Ivo D. Dinov, Paul Thompson and Arthur W. Toga. Proceedings of the IEEE Conference on e-Science, Guarujua, Brazil, October 2014.
An Overview of Mobile Applications for Field Science. Anna Zeng, Kevin Zeng, Yolanda Gil, and Matty Mookerjee. GeoSoft Project Report, September 2014.
The CSDMS Standard Names: Cross-Domain Naming Conventions for Describing Process Models, Data Sets and Their Associated Variables. Scott D. Peckham. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.
Web Applications that Share Level-12 HUC Data and Models of the CONUS. Lorne Leonard and Chris Duffy. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.
Intelligent Workflow Systems and Provenance-Aware Software. Yolanda Gil. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016
18Yolanda GilUSC Information Sciences Institute [email protected]
Acknowledgements
The OntoSoft project team includes Chris Duffy (PSU), Chris Mattmann (JPL),
Scott Pechkam (CU), Ji-Hyun Oh (USC), Varun Ratnakar (USC), and Erin
Robinson (ESIP)
Thank you to James Howison (UT), Lisa Kempler (Matworks), and Greg Wilson
(Software Carpentry) for their feedback on best practices for software sharing
Thank you to the scientists and other colleagues that have contributed ideas
and asked hard questions about software stewardship
Thank you to the National Science Foundation and the EarthCube program for
supporting this work
EarthCube!ICER-1440323ICER-1343800
http://www.ontosoft.org
http://www.ontosoft.org/software
http://www.ontosoft.org/portal
http://www.ontosoft.org/gpf
Yolanda Gil, Daniel Garijo, Saurabh Mishra, Varun Ratnakar eScience 2016