The PDB An Exemplar for Data Science To Date, But What About the Future?
-
Upload
philip-bourne -
Category
Education
-
view
737 -
download
1
description
Transcript of The PDB An Exemplar for Data Science To Date, But What About the Future?
![Page 1: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/1.jpg)
The PDB An Exemplar for Data Science To Date, But What About the Future?
Philip E. Bourne Ph.D.Associate Director for Data Science
National Institutes of Health
![Page 2: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/2.jpg)
Background
6/12 2/14 3/14
• Findings:• Sharing data & software through catalogs• Support methods and applications development• Need more training• Hire CSIO• Continued support throughout the lifecycle
http://acd.od.nih.gov/diwg.htm
![Page 3: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/3.jpg)
Motivation for This Talk
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
![Page 4: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/4.jpg)
More Motivation
![Page 5: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/5.jpg)
The way we fund and operate biomedical databases will not scale.
How do we keep the best features of todays resources but also respond to shrinking budgets and changes in the
way we do science?
Lets address this question using the PDB as an example
![Page 6: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/6.jpg)
Disclaimer: This is NOT a talk about the PDB per se, but a talk about data resources in general, but using the PDB as an example since we are all
familiar with it and it is considered an exemplar by most stakeholders
![Page 7: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/7.jpg)
Good News: We Trust the PDB
PDB
Trust in the datais perhaps the PDB’sbiggest achievement
![Page 8: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/8.jpg)
Good News: Trust
Trust is like compound interest
Comes from listening
Comes from engaging the community in every aspect of the process
Comes from data consistency and level of annotation
Comes from responsiveness
Comes from the quality of the delivery service
![Page 9: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/9.jpg)
Good News/Bad News Re Data Quality
Good News:– If done right in the
beginning 25% of the PDB’s budget could have been saved
– Ontologies can work
– Automation has reduced cost even as the amount of data has increased
– Reproducibility is improved
Bad News:– Complex ontologies slow
adoption
– All data are created equal
– Annotation is limited
![Page 10: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/10.jpg)
Good News/Bad News Re Community
Good News:– The community is
engaged
– The community has driven data sharing
Bad News:– The community does not
reduce costs through active participation
– There is insufficient reward for being part of the community e.g. as an annotator
![Page 11: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/11.jpg)
How we do science is changing. Do data resources including the PDB best
serve the needs of the user at this point?
![Page 12: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/12.jpg)
How is Science Changing?
More interdisciplinary
More translational
More access to diverse data types
More computational
More collaborative
![Page 13: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/13.jpg)
Good News/Bad News for the PDB in this Changing Landscape
Bad News:
– Interface complex and uni-data oriented
– Data accessible; methods accessible (sort of); but not together
– Significant redundancy in services offered
Good News:
– Annotation!
– Demand is increasing
– Integrated with other data types
– Restful services
![Page 14: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/14.jpg)
General Problem Statement:
How to insure a high quality annotated data source that provides
the optimal environment for accessibility and analysis by a broad
community of diverse users?
![Page 15: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/15.jpg)
Okay so what can the funders do to address a situation where really the
PDB is currently a best case scenario?
![Page 16: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/16.jpg)
1. Encourage more understanding for how existing data are used
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010
1RUZ: 1918 H1 Hemagglutinin
Structure Summary page activity forH1N1 Influenza related structures
3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir
[Andreas Prlic]
![Page 17: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/17.jpg)
We Need to Learn from Industries Whose Livelihood Addresses the Question of Use
![Page 18: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/18.jpg)
2. Address the Issue that Scholarship is Broken
I have a paper with 17,500 citations that no one has ever read
I have papers in PLOS ONE that have more citations than ones in PNAS
I have data sets I am proud of few places to put them
I edited a journal but it did not count for much
![Page 19: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/19.jpg)
3. Address the Reward System
![Page 20: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/20.jpg)
4. Enable Reproducibility
Much of the research life cycle is now digital - encourage the reliability, accessibility, findability, usability of data, methods, narrative, publications etc.
How? Data sharing plans
Standards frameworks
Data and software catalogs
PubMedCentral
? The Commons – PMC for the complete lifecycle
? Machine readable data sharing plans
? Small funding to communities
? Support for training and best practices in eScholarship
![Page 21: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/21.jpg)
5. Establish The Commons
Public/private partnership
Work with IC’s, NCBI and CIT to identify and run pilots – cloud, HPC centers
Port DbGAP to the cloud
? Experiment with new funding strategies
Evaluate
![Page 22: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/22.jpg)
Sustainability and Sharing: The Commons
Data
The Long Tail
Core Facilities/HS Centers
Clinical /Patient
The Why:Data Sharing Plans
TheCommons
Government
The How:
DataDiscoveryIndex
SustainableStorage
Quality
Scientific Discovery
Usability
Security/Privacy
Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment
The End Game:
KnowledgeNIHAwardees
PrivateSector
Metrics/Standards
Rest ofAcademia
Software StandardsIndex
BD2KCenters
Cloud, Research Objects,Business Models
![Page 23: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/23.jpg)
What Does the Commons Enable?
Dropbox like storage
The opportunity to apply quality metrics
Bring compute to the data
A place to collaborate
A place to discover
http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png
![Page 24: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/24.jpg)
The PDB in the Commons
Components:– Annotated collection of data files
– API’s to access these data files
– Example methods using these APIs
Potential outcomes– Nothing happens?
– A new breed of developer starts to use PDB data in new ways ?
– The casual user has a broader set of services that previously?
– Quality declines?
![Page 25: The PDB An Exemplar for Data Science To Date, But What About the Future?](https://reader035.fdocuments.us/reader035/viewer/2022070313/5549975db4c90507608b494f/html5/thumbnails/25.jpg)
Some Acknowledgements
Eric Green & Mark Guyer (NHGRI)
Jennie Larkin (NHLBI)
Leigh Finnegan (NHGRI)
Vivien Bonazzi (NHGRI)
Michelle Dunn (NCI)
Mike Huerta (NLM)
David Lipman (NLM)
Jim Ostell (NLM)
Andrea Norris (CIT)
Peter Lyster (NIGMS)
All the over 100 folks on the BD2K team