Yale Day of Data

25
NIH’s Day, Months, Four Years of Data Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

description

Presented at Yale Day of Data, Yale University, Sept 26 2014.

Transcript of Yale Day of Data

Page 1: Yale Day of Data

NIH’s Day, Months, Four Years of Data

Philip E. Bourne Ph.D.Associate Director for Data Science

National Institutes of Health

Page 2: Yale Day of Data

What is NIH’s Overall Approach to Data and What Does It Mean to You?

Page 3: Yale Day of Data

Some Context: NIH Data Science History

6/12 2/14 3/14

• Findings:• Sharing data & software through catalogs• Support methods and applications development• Need more training• Need campus-wide IT strategy• Hire CSIO• Continued support throughout the lifecycle

Page 4: Yale Day of Data

My Bias

Still a scientist

A funder who still thinks like a PI

Not yet attuned to the federal system

Big supporter of open science through prior work with the Public Library of Science, FORCE11 etc.

Page 5: Yale Day of Data

Data – A Few Observations …

We talk about the promise of big data, but we don’t even know the value of little data (aka could “Big Data” be the new “AI”)

Good data is expensive in terms of time and money

Looking at data retroactively is really expensive

Good data begats trust; trust begats community; community is God

The way we support scientific data currently is not sustainable

That is, is no business model currently for scientific data

Page 6: Yale Day of Data

Data – A Few NIH Observations …

1. We have little idea how much we spend on data – estimated over $1bn per year

2. We have even less idea how much we should be spending

Point 2 is part of a culture clash between the more observational history of biomedicine and the new analytical approach to discovery

Page 7: Yale Day of Data

Data – An Academic Medical Center Observation

A digital enterprise exists when data connections are made across different areas of the organization such that productivity and competitiveness are improved

For example, between education and research

I am not aware that any such academic institutions exist?

Many are starting to wake up to the idea of getting there

JAMIA 2014, 21(2), 194

Page 8: Yale Day of Data

ADDS Mission Statement

To foster an ecosystem that enables biomedical research to be conducted as a digital enterprise that enhances

health, lengthens life and reduces illness and disability

Page 9: Yale Day of Data

What Problems Are We Trying to Solve?

One Possible Solution

Sustainability – 50% business model

Efficiency – sharing best practices in longitudinal clinical studies; “trusted investigator”

Collaboration - identification of collaborators at the point of data collection not publication

Reproducibility – data accessible with publication

Integration – phenotype homogenization

Accessibility – clinical trials registration

Quality – sharing CDEs across institutes

Training – keeping trainees in the ecosystem

Page 10: Yale Day of Data

The Data Ecosystem

Community Policy

Infrastructure

• Sustainable business model

• Collaboration• Training

Page 11: Yale Day of Data

The Data Ecosystem

Community Policy

Infrastructure

• Sustainable business model

• Collaboration• Training

VirtuousResearch

Cycle

Page 12: Yale Day of Data

The Virtuous Cycle

http://goo.gl/fkWjhS

Page 13: Yale Day of Data

Raw Materials to Seed the Ecosystem

NIH mandate & support

ADDS team of 8 people

Intramural participation of over 100 team members across ICs

Funding through BD2K:– ~$30M in FY14

– ~$80M in FY15

– ....

Page 14: Yale Day of Data

Organization to Seed the Ecosystem…

Page 15: Yale Day of Data

Associate Director for Data Science

Commons BD2K Efficiency

Sustainability Education Innovation Process

• Cloud – Data & Compute

• Search• Security • Reproducibility

Standards• App Store

• Coordinate• Hands-on• Syllabus• MOOCs

• Community• Centers• Training Grants• Catalogs• Standards• Analysis

• Data Resource Support

• Metrics• Best

Practices• Evaluation• Portfolio

Analysis

The Biomedical Research Digital Enterprise

Partnerships

Collaboration

Programmatic Theme

Deliverable

Example Features • IC’s• Researchers• Federal

Agencies• International

Partners• Computer

Scientists

Scientific Data Council External Advisory Board

Training

Page 16: Yale Day of Data

Associate Director for Data Science

Commons BD2K Efficiency

Sustainability Education Innovation Process

• Cloud – Data & Compute

• Search• Security • Reproducibility

Standards• App Store

• Coordinate• Hands-on• Syllabus• MOOCs

• Community• Centers• Training Grants• Catalogs• Standards• Analysis

• Data Resource Support

• Metrics• Best

Practices• Evaluation• Portfolio

Analysis

The Biomedical Research Digital Enterprise

Partnerships

Collaboration

Programmatic Theme

Deliverable

Example Features • IC’s• Researchers• Federal

Agencies• International

Partners• Computer

Scientists

Scientific Data Council External Advisory Board

Training

Page 17: Yale Day of Data

Example Communities

– NIH

• 27 ICs

– Agencies

• NSF

• DOE

• DARPA

• NIST

– Government

• OSTP

• HHS HDI

• ONC

• CDC

• FDA

– Private sector

• Phrma

• Google

• Amazon

– Organizations

• PCORI, GA4GH

• RDA, ELIXIR

• CCC

• CATS

• FASEB, ISCB

• Biophysical Society

• Sloan Foundation

• Moore Foundation

Page 18: Yale Day of Data

Example Policies

– Clinical data harmonization

– DbGaP in the cloud

– Data citation

– Machine readable data sharing plans on all grants

– New review models, audiences etc.

• Open review

• Micro funding

• Standing data committees to explore best practices

• Crowd sourcing

Page 19: Yale Day of Data

Example Infrastructure: The Commons

Data

The Long Tail

Core Facilities/HS Centers

Clinical /Patient

The Why:Data Sharing Plans

TheCommons

Government

The How:

DataDiscoveryIndex

SustainableStorage

Quality

Scientific Discovery

Usability

Security/Privacy

The End Game:

KnowledgeNIHAwardees

PrivateSector

Metrics/Standards

Rest ofAcademia

Software StandardsIndex

BD2KCenters

Cloud, Research Objects,Business Models

Page 20: Yale Day of Data

What Does the Commons Enable?

Dropbox like storage

The opportunity to apply quality metrics

Bring compute to the data

A place to collaborate

A place to discover

http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png

Page 21: Yale Day of Data

[Adapted from George Komatsoulis]

One Possible Commons Business Model

HPC, Institution …

Page 22: Yale Day of Data

Pilots Around A Virtuous CycleExpect a FY15 Funding Call to Work in

the Commons

Page 23: Yale Day of Data

Training & Diversity Training & Diversity

Training & Diversity Goals:

– Develop a sufficient cadre of diverse researchers skilled in the science of Big Data

– Elevate general competencies in data usage and analysis across the biomedical research workforce

– Combat the Google bus

How:

– Traditional training grants

– Work with IC’s on a needs assessment

– Standards for course descriptions with EU

– Work with institutions on raising awareness

– Partner with minority institutions

– Virtual/physical training center(s)?

Page 24: Yale Day of Data

Closing Question

Calls for increased NIH funding has so far gone unheeded, what can the

ADDS do (that you have not heard about) to improve data science

activities?

Page 25: Yale Day of Data

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

[email protected]