Open Data in a Global EcosystemPhilip E. Bourne Ph.D., FACMIAssociate Director for Data Science
National Institutes of [email protected]
BioMedBridges, EBI, November 17, 2015
http://www.slideshare.net/pebourne
Not a talking head….An on-going conversation
Some context to start that conversation …
Perspective
Structural bioinformatics researcher
Former custodian of the RCSB PDB
Obsessive about open science e.g., PLOS
NIH-wide responsibility for developments in data science
Consider this change from my own career experience ….
The History of Computational Biomedicine According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
It Follows …
We are entering a period of disruption in biomedical research and we should all be thinking about what this means
to bioinformatics & biomedicine
http://i1.wp.com/chisconsult.com/wp-content/uploads/2013/05/disruption-is-a-process.jpg http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
Big Data in Biomedicine…
This speaks to something more fundamental that more data …
It speaks to new methodologies, new skills, new emphasis, new cultures,
new modes of discovery …
We are at a Point of Deception …
Evidence:– Google car– 3D printers– Waze– Robotics– Sensors
From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
Disruption: Example - Photography
DigitizationDeception
Disruption
Demonetization
Dematerialization
Democratization
Time
Vol
ume,
Vel
ocity
, Var
iety
Digital camera invented byKodak but shelved
Megapixels & quality improve slowly; Kodak slow to react
Film market collapses;Kodak goes bankrupt
Phones replacecameras
Instagram,Flickr become thevalue proposition
Digital media becomes bona fide form of communication
Disruption: Biomedical Research
Digitization of Basic & Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
Disruptive Features: Sustainability
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
Disruptive Features:Reproducibility
Changing Value of Scholarship (?)
“And that’s why we’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.”
President Barack ObamaJanuary 30, 2015
Disruptive Features – New Science
Precision Medicine Initiative
National Research Cohort – >1 million U.S. volunteers– Numerous existing cohorts (many funded by NIH)– New volunteers
Participants will be centrally involved in design and implementation of the cohort
They will be able to share genomic data, lifestyle information, biological samples – all linked to their electronic health records
What Are Some General Implications of Such a Future?
Open collaborative science becomes of increasing importance nationally and internationally
The value of data and associated analytics becomes of increasing value to scholarship
Opportunities exist to improve the efficiency of the research enterprise and hence fund more research
Global cooperation between funders will be needed to sustain the emergent digital enterprise
Current training content and modalities will not match supply to demand
Balancing accessibility vs security becomes more important yet more complex
What Are Some General Implications of Such a Future?
Open collaborative science becomes of increasing importance nationally and internationally
The value of data and associated analytics becomes of increasing value to scholarship
Opportunities exist to improve the efficiency of the research enterprise and hence fund more research
Global cooperation between funders will be needed to sustain the emergent digital enterprise
Current training content and modalities will not match supply to demand
Balancing accessibility vs security becomes more important yet more complex
How Should We Respond as Funders?
Community: – Encourage wherever possible a global cultural shift towards
open science– Encourage global exchanges – Encourage global projects
Policies:– Understand and map data sharing policies, standards etc.– Understand ethical, legal and societal differences
Infrastructure:– Share the burden and the reward
How Should We Respond as Funders?
Community: – Encourage wherever possible a global cultural shift towards
open science– Encourage global exchanges
Policies:– Understand and map data sharing policies, standards etc.– Understand ethical, legal and societal differences
Infrastructure:– Share the burden and the reward
https://www.openscienceprize.org/
A Culture of Sharing
1999 20042003 2007 20142008
Research Tools Policy
NIH Data Sharing Policy
Model Organism Policy
Genome-wide Association (GWAS) Policy
2012
NIH Public Access Policy (Publications)
Big Data to Knowledge (BD2K) Initiative
Genomic Data Sharing (GDS) Policy
Modernization of NIH Clinical Trials
White House Initiative
(2013 “Holdren Memo”)
The BD2K Program
BD2K Budget
BD2K FY14 Awardssupported by all NIH Institutes
MD2K Applications – CHF and Smoking
How Should We Respond as Funders?
Community: – Encourage wherever possible a global cultural shift towards
open science– Encourage global exchanges – Encourage global projects
Policies:– Understand and map data sharing policies, standards etc.– Understand ethical, legal and societal differences
Infrastructure:– Share the burden and the reward
The Commons is a shared virtual space which is FAIR:
– Find
– Access (use effectively)
– Interoperate
– Reuse
An environment to find and catalyze the use of shared digital research objects
The CommonsConcept
The Developer or User Defines the Environment from the Appropriate
Building Blocks
The CommonsComponents
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
DDICC
Software
Standards
Infrastructure - The Commons
Labs
Labs
Labs
Labs
Public Beacons
Host Content
AMPLab 1000 Genomes Project
Broad Institute ExAC
Curoverse PGP, GA4GH Example Data
EBI 1000 Genomes Project, UK10K, GoNL, EVS, GEUVADIS, UMCG Cardio GenePanel
Google 1000 Genomes Project, Phase III, Illumina Platinum Genomes
ISB Known VARiants
NCBI NHLBI Exome Sequence Project
OICR 55 cancer datasets
SolveBio 56 public datasets
UCSC ClinVar, LOVD, UniProt
University of Leicester Cafe CardioKit, Cafe Variome Central
WTSI IBD, Native American, Egyptian, UK10K
Over 120 public datasets beaconized across 21 institutions
10s thousands of individuals
Commons - Pilots
The Cloud Credits - business model
BD2K Centers
MODs (Model Organism Databases)
HMP Data and tools available in the cloud
NCI Cloud Pilots & Genomic Data Commons
I not only use all the brains I have, but all I can borrow.
– Woodrow Wilson
What Can We Do Now?
Extend the research pilots concept
Have TCC & TeSS work together
Global hackathons, competitions
Closer ties between NLM and EBI / Elixir
Student exchanges Engage foundations, charities
in more global initiatives
http://wwwdev.ebi.ac.uk/Tools/ddi/
ADDS Team
BD2K Representatives
NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health
[email protected]://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/
Top Related