There is No Intelligent Life Down Here
-
Upload
philip-bourne -
Category
Education
-
view
978 -
download
0
Transcript of There is No Intelligent Life Down Here
The Levinthal Lecture
Philip E. Bourne Ph.D., FACMIAssociate Director for Data Science
National Institutes of [email protected]
http://www.slideshare.net/pebourne
Open Eye Meeting, Santa Fe, March 8, 2016
What follows are my personal views and not necessarily those of my
employer, the US federal government.
There is No Intelligent Life Down Here
With Apologies to Cy
Phil Bourne
Open Eye Meeting, Santa Fe, March 8, 2016
My Interactions with Cy
……And pray that there's intelligent life somewhere up in space'Cause there's bugger all down here on Earth
Evidence #1
http://www.iucr.org/resources/commissions/crystallographic-computing/schools/school96/banquet-humour
Evidence #2
We throttle some but not all scholarly communication
Consider Cy’s Own words from around 1970 concerning data sharing
“At that time, it was difficult to obtain crystallographic coordinates although the results of the structural analysis had been published”
Local: Cooperative Community Action
Individual letters to editors of journals
Committees IUCr commission on
Biological Macromolecules ACA/USNCCr Richards committee
Funding agencies Articles in journals
Marvin Cassman Fred Richards Richard Dickerson
Courtesy of Helen Berman
PDB Growth
http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100
A Broad Culture of Sharing
1999 20042003 2007 20142008
Research Tools Policy
NIH Data Sharing Policy
Model Organism Policy
Genome-wide Association (GWAS) Policy
2012
NIH Public Access Policy (Publications)
Big Data to Knowledge (BD2K) Initiative
Genomic Data Sharing (GDS) Policy
Modernization of NIH Clinical Trials
White House Initiative
(2013 “Holdren Memo”)
Data Sharing: An Essential ComponentData Sharing: An Essential Component
Modernizing NIH Clinical Trials Activities
NIH-Funded trials published within 100 months of completion
Less than 50% published within 30 months of completion
BMJ 2012;344:d7292
Modernizing NIH Clinical Trials Activities:
Call to Action
Increasing Clinical Trial Transparency Proposed November 2014; Final Spring 2016 (est.)
Notice of Proposed Rulemaking: Clinical Trials Registration and Results Submission (FDAAA, Section 801)– Further implements statutory requirements on private and public
sponsors to register; report results on phase 2, 3, and 4 trials
– Includes drugs, biologics, and devices (except small feasibility)
Draft NIH Policy on Clinical Trial Information Dissemination – Extends Section 801 requirements to all NIH-funded clinical trials
– Includes phase 1 trials and trials of non-FDA regulated interventions such as behavioral trials
Evidence #3
Research does not follow a free market economy – you can get rewarded regardless of what you produce
True Free Market - Photography
DigitizationDeception
Disruption
Demonetization
Dematerialization
Democratization
Time
Vol
um
e, V
eloc
ity,
Var
iety
Digital camera invented byKodak but shelved
Megapixels & quality improve slowly; Kodak slow to react
Film market collapses;Kodak goes bankrupt
Phones replacecameras
Instagram,Flickr become thevalue proposition
Digital media becomes bona fide form of communication
False Market - Biomedical Research?
Digitization of Basic & Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
Sustaining the System is a Problem
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
ReproducibilityChanging Value of Scholarship
“And that’s why we’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.”
President Barack ObamaJanuary 30, 2015
New Science
Lets get a bit closer to home for this audience ….
Evidence #4
Molecular graphics has not advanced as it should
http://upload.wikimedia.org/wikipedia/commons/2/2e/Molecular-Graphics-GRIP-75-Console.jpg
What Did Cy Say?
1990 – “..although we may not have "chemical insight" there are more and more 3-D structures determined experimentally to aid in understanding which conformational results are reasonable and which are not; as long as we can look at them.”
Good News/Bad News of Molecular Graphics Today
Good News:– It is harder to think of a
more powerful way to comprehend complex data
– It has excited generations to the promise of science
– It has adapted to changing technologies
Bad News:– It is not an
adaptive/extensible environment
– It is not a collaborative environment
– It is not an integrative environment
– State not transferable
BMC Bioinformatics 2005, 6:21
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
Is a database really different than a biological journal?
PloS Comp Biol 2005 1(3) e34
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
Evidence #5
By Pbroks13 (talk) - File:Views on Evolution.jpgNew Scientist Magazine, 19 April 2008, Vol. 198, No.2652, page 31: "Evolution myths: It doesn't matter if people don't grasp evolution"New Scientist Magazine, 19 August 2006, Vol. 191, No.2565, page 11: "Why doesn't America believe in evolution?"., Public Domain, https://commons.wikimedia.org/w/index.php?curid=4403503
Nature’s Reductionism
There are ~ 20300 possible proteins>>>> all the atoms in the Universe
~58M protein sequences from 58K organisms (source RefSeq)
116,539 protein structures yield 1393 domain folds (SCOP)
Is structure a useful discriminator of species?
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
Method – Distance Determination
(FSF)SCOP
SUPERFAMILY
organisms
C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
C. intestinalis C. briggsae F. rubripes
C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0
Presence/Absence Data Matrix
Distance Matrix
The Answer Would Appear to be Yes
It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome
Environmental Influence
Chris Dupont Scripps Institute of Oceanography
UCSD
DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827
Evolution of the Earth
4.5 billion years of change 300+50K 1-5 atmospheres Constant photoenergy Chemical and geological
changes Life has evolved in this time
The ocean was the “cradle” for 90% of evolution
Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines).
The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom.
Billions of years before present
Concentration
(O2
in arbitrary units, Zn and Fe in m
oles L-1
BacteriaArchaea
Eukarya
Oxygen
Zinc
Iron
CobaltManganese
Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History
Replotted from Saito et al, 2003Inorganica Chimica Acta 356: 308-318
Evidence #6
Data resources including the PDB don’t fully serve the needs of the
user at this point?
Good News/Bad News for the PDB in this Changing Landscape
Bad News:
– Interface complex and uni-data oriented
– Data accessible; methods accessible (sort of); but not together
– Significant redundancy in services offered
– Sustainability
Good News:
– Annotation!
– Demand is increasing
– Integrated with other data types
– Restful services
General Problem Statement:
How to insure a high quality annotated data source that provides
the optimal environment for accessibility, integration and analysis
by a broad community of diverse users?
Enter the Commons
The CommonsComponents
Computing environment
– cloud or HPC (High Performance Computing)
– supports access, utilization, sharing and storage of digital objects.
Methods for Interoperability
– enables connectivity, shareability and interoperability between digital objects.
Digital object compliance model
– describes the properties of digital objects that enables them to be discoverable and shareable.
The CommonsComponents
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
DDICC
Software
Standards
Infrastructure - The Commons
Labs
Labs
Labs
Labs
Commons - Pilots
The Cloud Credits - business model
BD2K Centers
MODs (Model Organism Databases)
HMP Data and tools available in the cloud
NCI Cloud Pilots & Genomic Data
Commons
The PDB in the Commons
Components:– Annotated collection of data files
– API’s to access these data files
– Example methods using these APIs
Potential outcomes– Nothing happens?
– A new breed of developer starts to use PDB data in new ways ?
– The casual user has a broader set of services that previously?
– Quality declines/increases?
Delineation of polypharmacology across the human structural kinome
using a functional site interaction fingerprint approach
Zhao et al. J. Med. Chem., 2016, DOI: 10.1021/acs.jmedchem.5b02041
Evidence #7The difficulty to translate academic
ideas into products
Functional Site interaction Fingerprint (Fs-IFP) Approach
Step 1. Extract the Structural Kinome 208 kinase, 2383 ligand-bound structures
Step 2. All-against-all binding-site comparison
Step 3. Encoding Fs-IFP
Step 4. Statistics analysis and machine learning
Binding Mode Characterization of Kinase Inhibitors
Clustering of Fs-IFP across the structural kinome
Spatial locations for the binding regions for the eight clusters
Kinase Binding Profile Prediction Using Fs-IFP
ROC curves of the trained support
vector machine model
The performance of predicted binding profile of 51 type-I inhibitors to 344
kinases
SummaryThere is more intelligence than we
think.
While we study complex systems they are also why we do not make faster
progress
Acknowledgements
The 133 Folks who have passed through my lab over the years
Cy Levinthal for giving me this opportunity
https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0