The PDB An Exemplar for Data Science To Date, But What About the Future?

26
The PDB An Exemplar for Data Science To Date, But What About the Future? Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

description

Keynote Presented at 3DSIG Boston MA, USA July 12, 2014.

Transcript of The PDB An Exemplar for Data Science To Date, But What About the Future?

Page 1: The PDB An Exemplar for Data Science To Date, But What About the Future?

The PDB An Exemplar for Data Science To Date, But What About the Future?

Philip E. Bourne Ph.D.Associate Director for Data Science

National Institutes of Health

Page 2: The PDB An Exemplar for Data Science To Date, But What About the Future?

Background

6/12 2/14 3/14

• Findings:• Sharing data & software through catalogs• Support methods and applications development• Need more training• Hire CSIO• Continued support throughout the lifecycle

http://acd.od.nih.gov/diwg.htm

Page 3: The PDB An Exemplar for Data Science To Date, But What About the Future?

Motivation for This Talk

Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

Page 4: The PDB An Exemplar for Data Science To Date, But What About the Future?

More Motivation

Page 5: The PDB An Exemplar for Data Science To Date, But What About the Future?

The way we fund and operate biomedical databases will not scale.

How do we keep the best features of todays resources but also respond to shrinking budgets and changes in the

way we do science?

Lets address this question using the PDB as an example

Page 6: The PDB An Exemplar for Data Science To Date, But What About the Future?

Disclaimer: This is NOT a talk about the PDB per se, but a talk about data resources in general, but using the PDB as an example since we are all

familiar with it and it is considered an exemplar by most stakeholders

Page 7: The PDB An Exemplar for Data Science To Date, But What About the Future?

Good News: We Trust the PDB

PDB

Trust in the datais perhaps the PDB’sbiggest achievement

Page 8: The PDB An Exemplar for Data Science To Date, But What About the Future?

Good News: Trust

Trust is like compound interest

Comes from listening

Comes from engaging the community in every aspect of the process

Comes from data consistency and level of annotation

Comes from responsiveness

Comes from the quality of the delivery service

Page 9: The PDB An Exemplar for Data Science To Date, But What About the Future?

Good News/Bad News Re Data Quality

Good News:– If done right in the

beginning 25% of the PDB’s budget could have been saved

– Ontologies can work

– Automation has reduced cost even as the amount of data has increased

– Reproducibility is improved

Bad News:– Complex ontologies slow

adoption

– All data are created equal

– Annotation is limited

Page 10: The PDB An Exemplar for Data Science To Date, But What About the Future?

Good News/Bad News Re Community

Good News:– The community is

engaged

– The community has driven data sharing

Bad News:– The community does not

reduce costs through active participation

– There is insufficient reward for being part of the community e.g. as an annotator

Page 11: The PDB An Exemplar for Data Science To Date, But What About the Future?

How we do science is changing. Do data resources including the PDB best

serve the needs of the user at this point?

Page 12: The PDB An Exemplar for Data Science To Date, But What About the Future?

How is Science Changing?

More interdisciplinary

More translational

More access to diverse data types

More computational

More collaborative

Page 13: The PDB An Exemplar for Data Science To Date, But What About the Future?

Good News/Bad News for the PDB in this Changing Landscape

Bad News:

– Interface complex and uni-data oriented

– Data accessible; methods accessible (sort of); but not together

– Significant redundancy in services offered

Good News:

– Annotation!

– Demand is increasing

– Integrated with other data types

– Restful services

Page 14: The PDB An Exemplar for Data Science To Date, But What About the Future?

General Problem Statement:

How to insure a high quality annotated data source that provides

the optimal environment for accessibility and analysis by a broad

community of diverse users?

Page 15: The PDB An Exemplar for Data Science To Date, But What About the Future?

Okay so what can the funders do to address a situation where really the

PDB is currently a best case scenario?

Page 16: The PDB An Exemplar for Data Science To Date, But What About the Future?

1. Encourage more understanding for how existing data are used

* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm

Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010

1RUZ: 1918 H1 Hemagglutinin

Structure Summary page activity forH1N1 Influenza related structures

3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir

[Andreas Prlic]

Page 17: The PDB An Exemplar for Data Science To Date, But What About the Future?

We Need to Learn from Industries Whose Livelihood Addresses the Question of Use

Page 18: The PDB An Exemplar for Data Science To Date, But What About the Future?

2. Address the Issue that Scholarship is Broken

I have a paper with 17,500 citations that no one has ever read

I have papers in PLOS ONE that have more citations than ones in PNAS

I have data sets I am proud of few places to put them

I edited a journal but it did not count for much

Page 19: The PDB An Exemplar for Data Science To Date, But What About the Future?

3. Address the Reward System

Page 20: The PDB An Exemplar for Data Science To Date, But What About the Future?

4. Enable Reproducibility

Much of the research life cycle is now digital - encourage the reliability, accessibility, findability, usability of data, methods, narrative, publications etc.

How? Data sharing plans

Standards frameworks

Data and software catalogs

PubMedCentral

? The Commons – PMC for the complete lifecycle

? Machine readable data sharing plans

? Small funding to communities

? Support for training and best practices in eScholarship

Page 21: The PDB An Exemplar for Data Science To Date, But What About the Future?

5. Establish The Commons

Public/private partnership

Work with IC’s, NCBI and CIT to identify and run pilots – cloud, HPC centers

Port DbGAP to the cloud

? Experiment with new funding strategies

Evaluate

Page 22: The PDB An Exemplar for Data Science To Date, But What About the Future?

Sustainability and Sharing: The Commons

Data

The Long Tail

Core Facilities/HS Centers

Clinical /Patient

The Why:Data Sharing Plans

TheCommons

Government

The How:

DataDiscoveryIndex

SustainableStorage

Quality

Scientific Discovery

Usability

Security/Privacy

Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment

The End Game:

KnowledgeNIHAwardees

PrivateSector

Metrics/Standards

Rest ofAcademia

Software StandardsIndex

BD2KCenters

Cloud, Research Objects,Business Models

Page 23: The PDB An Exemplar for Data Science To Date, But What About the Future?

What Does the Commons Enable?

Dropbox like storage

The opportunity to apply quality metrics

Bring compute to the data

A place to collaborate

A place to discover

http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png

Page 24: The PDB An Exemplar for Data Science To Date, But What About the Future?

The PDB in the Commons

Components:– Annotated collection of data files

– API’s to access these data files

– Example methods using these APIs

Potential outcomes– Nothing happens?

– A new breed of developer starts to use PDB data in new ways ?

– The casual user has a broader set of services that previously?

– Quality declines?

Page 25: The PDB An Exemplar for Data Science To Date, But What About the Future?

Some Acknowledgements

Eric Green & Mark Guyer (NHGRI)

Jennie Larkin (NHLBI)

Leigh Finnegan (NHGRI)

Vivien Bonazzi (NHGRI)

Michelle Dunn (NCI)

Mike Huerta (NLM)

David Lipman (NLM)

Jim Ostell (NLM)

Andrea Norris (CIT)

Peter Lyster (NIGMS)

All the over 100 folks on the BD2K team

Page 26: The PDB An Exemplar for Data Science To Date, But What About the Future?

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

[email protected]