Yale Day of Data

Post on 02-Jun-2015

513 views 0 download

Tags:

description

Presented at Yale Day of Data, Yale University, Sept 26 2014.

Transcript of Yale Day of Data

NIH’s Day, Months, Four Years of Data

Philip E. Bourne Ph.D.Associate Director for Data Science

National Institutes of Health

What is NIH’s Overall Approach to Data and What Does It Mean to You?

Some Context: NIH Data Science History

6/12 2/14 3/14

• Findings:• Sharing data & software through catalogs• Support methods and applications development• Need more training• Need campus-wide IT strategy• Hire CSIO• Continued support throughout the lifecycle

My Bias

Still a scientist

A funder who still thinks like a PI

Not yet attuned to the federal system

Big supporter of open science through prior work with the Public Library of Science, FORCE11 etc.

Data – A Few Observations …

We talk about the promise of big data, but we don’t even know the value of little data (aka could “Big Data” be the new “AI”)

Good data is expensive in terms of time and money

Looking at data retroactively is really expensive

Good data begats trust; trust begats community; community is God

The way we support scientific data currently is not sustainable

That is, is no business model currently for scientific data

Data – A Few NIH Observations …

1. We have little idea how much we spend on data – estimated over $1bn per year

2. We have even less idea how much we should be spending

Point 2 is part of a culture clash between the more observational history of biomedicine and the new analytical approach to discovery

Data – An Academic Medical Center Observation

A digital enterprise exists when data connections are made across different areas of the organization such that productivity and competitiveness are improved

For example, between education and research

I am not aware that any such academic institutions exist?

Many are starting to wake up to the idea of getting there

JAMIA 2014, 21(2), 194

ADDS Mission Statement

To foster an ecosystem that enables biomedical research to be conducted as a digital enterprise that enhances

health, lengthens life and reduces illness and disability

What Problems Are We Trying to Solve?

One Possible Solution

Sustainability – 50% business model

Efficiency – sharing best practices in longitudinal clinical studies; “trusted investigator”

Collaboration - identification of collaborators at the point of data collection not publication

Reproducibility – data accessible with publication

Integration – phenotype homogenization

Accessibility – clinical trials registration

Quality – sharing CDEs across institutes

Training – keeping trainees in the ecosystem

The Data Ecosystem

Community Policy

Infrastructure

• Sustainable business model

• Collaboration• Training

The Data Ecosystem

Community Policy

Infrastructure

• Sustainable business model

• Collaboration• Training

VirtuousResearch

Cycle

The Virtuous Cycle

http://goo.gl/fkWjhS

Raw Materials to Seed the Ecosystem

NIH mandate & support

ADDS team of 8 people

Intramural participation of over 100 team members across ICs

Funding through BD2K:– ~$30M in FY14

– ~$80M in FY15

– ....

Organization to Seed the Ecosystem…

Associate Director for Data Science

Commons BD2K Efficiency

Sustainability Education Innovation Process

• Cloud – Data & Compute

• Search• Security • Reproducibility

Standards• App Store

• Coordinate• Hands-on• Syllabus• MOOCs

• Community• Centers• Training Grants• Catalogs• Standards• Analysis

• Data Resource Support

• Metrics• Best

Practices• Evaluation• Portfolio

Analysis

The Biomedical Research Digital Enterprise

Partnerships

Collaboration

Programmatic Theme

Deliverable

Example Features • IC’s• Researchers• Federal

Agencies• International

Partners• Computer

Scientists

Scientific Data Council External Advisory Board

Training

Associate Director for Data Science

Commons BD2K Efficiency

Sustainability Education Innovation Process

• Cloud – Data & Compute

• Search• Security • Reproducibility

Standards• App Store

• Coordinate• Hands-on• Syllabus• MOOCs

• Community• Centers• Training Grants• Catalogs• Standards• Analysis

• Data Resource Support

• Metrics• Best

Practices• Evaluation• Portfolio

Analysis

The Biomedical Research Digital Enterprise

Partnerships

Collaboration

Programmatic Theme

Deliverable

Example Features • IC’s• Researchers• Federal

Agencies• International

Partners• Computer

Scientists

Scientific Data Council External Advisory Board

Training

Example Communities

– NIH

• 27 ICs

– Agencies

• NSF

• DOE

• DARPA

• NIST

– Government

• OSTP

• HHS HDI

• ONC

• CDC

• FDA

– Private sector

• Phrma

• Google

• Amazon

– Organizations

• PCORI, GA4GH

• RDA, ELIXIR

• CCC

• CATS

• FASEB, ISCB

• Biophysical Society

• Sloan Foundation

• Moore Foundation

Example Policies

– Clinical data harmonization

– DbGaP in the cloud

– Data citation

– Machine readable data sharing plans on all grants

– New review models, audiences etc.

• Open review

• Micro funding

• Standing data committees to explore best practices

• Crowd sourcing

Example Infrastructure: The Commons

Data

The Long Tail

Core Facilities/HS Centers

Clinical /Patient

The Why:Data Sharing Plans

TheCommons

Government

The How:

DataDiscoveryIndex

SustainableStorage

Quality

Scientific Discovery

Usability

Security/Privacy

The End Game:

KnowledgeNIHAwardees

PrivateSector

Metrics/Standards

Rest ofAcademia

Software StandardsIndex

BD2KCenters

Cloud, Research Objects,Business Models

What Does the Commons Enable?

Dropbox like storage

The opportunity to apply quality metrics

Bring compute to the data

A place to collaborate

A place to discover

http://100plus.com/wp-content/uploads/Data-Commons-3-1024x825.png

[Adapted from George Komatsoulis]

One Possible Commons Business Model

HPC, Institution …

Pilots Around A Virtuous CycleExpect a FY15 Funding Call to Work in

the Commons

Training & Diversity Training & Diversity

Training & Diversity Goals:

– Develop a sufficient cadre of diverse researchers skilled in the science of Big Data

– Elevate general competencies in data usage and analysis across the biomedical research workforce

– Combat the Google bus

How:

– Traditional training grants

– Work with IC’s on a needs assessment

– Standards for course descriptions with EU

– Work with institutions on raising awareness

– Partner with minority institutions

– Virtual/physical training center(s)?

Closing Question

Calls for increased NIH funding has so far gone unheeded, what can the

ADDS do (that you have not heard about) to improve data science

activities?

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

philip.bourne@nih.gov