Computational Grand Challenges for 21st Century Biomedical Science

24
Computational Grand Challenges for 21st Century Biomedical Science Daniel Masys, MD Affiliate Professor Biomedical and Health Informatics University of Washington Seattle, WA November 8, 2012 NCBC showcase meeting

Transcript of Computational Grand Challenges for 21st Century Biomedical Science

Computational Grand Challenges for 21st Century

Biomedical Science

Daniel Masys, MDAffiliate Professor

Biomedical and Health InformaticsUniversity of Washington

Seattle, WA

November 8, 2012NCBC showcase meeting

Computational GrandChallenges

Topics

�Big data in perspective�Turning the promise of advanced

computation into reality�The road ahead

Computational GrandChallenges

Characteristics of “Big Data”

�Exceeds the capacity of unaided human cognition for its comprehension

�Strains current technology capacity in one or more ways� CPU-bound: computational and algorithmic

complexity� Bandwidth-limited: network communication capacity� Storage-limited: voluminous bits & bytes

Computational GrandChallenges

NIH and successive eras of “Big Data”1960’s: Electronic Medical Records� Sparse matrix data

needing compact storage and rapid retrieval

� NIH support of MGH Laboratory of Computer Science (Octo Barnett) leads to MUMPS programming language

The Hospital Computer Project time-shared DEC PDP-1d . It featured a 50-Mbyte specially built Fastrand drumfor storing medical data files and 64 simultaneously usable telecommunication ports, many of which were connected to Teletype terminals operating at the Massachusetts General Hospital in 1966.(Photo courtesy of BBN Technologies.)

Computational GrandChallenges NIH and successive

eras of “Big Data”

1970’s: Artificial Intelligence

� Pattern detection in high volume complex datasets

� Rule-based expert systems emerge

� NIH support of SUMEX-AIM resource at Stanford leads to Dendral, Mycin, Oncocin, Protege

Joshua Lederberg

Ed Feigenbaum

Ted Shortliiffe

Computational GrandChallenges NIH and successive eras

of “Big Data”

1980’s: Data Mining and molecular complexity� “Tower of Babel” proliferation

of molecular resources leads to creation of NCBI at NLM

� Bill Raub champions the PROPHET workstation

Computational GrandChallenges NIH and successive

eras of “Big Data”

1990’s: Scalar-vector and massively parallel computing technologies come of age, ubiquitous bandwidth arrives� NIH joins Federal High Performance

Computing, Communications at Information Technology (HPCCIT) program

� Human Genome project becomes poster child for molecular volume and complexity, high throughput “omics” technologies arrive

Computational GrandChallenges

Source: HPCC Blue Book 1993

Computational GrandChallenges

NIH and successive eras of “Big Data”

Biomedical research in transition: the Biomedical Information Science and Technology (BISTI) report of 1999� Bottleneck no longer data production; now data analysis� Biology as an information science� A new breed of 21st century scientists� Emerging science careers in intersection of biology,

health, informatics, computer science, quantitative methods

� A vision for interdisciplinary Centers: the NCBCs

Computational GrandChallenges Realizing the vision:

NCBC highlights

Source: Russ Altman

Computational GrandChallenges

The road ahead:Computational Grand Challenges in

Biomedical Science

�Computationally tractable�Hard but not impossible�Evidence from current technologies or

applications that similar problems have been at least partially solved

Computational GrandChallenges

The road ahead:Grand challenge examples

�Molecular structure-function prediction�Biomedical Imaging�Simulation�A systems infrastructure for evidence-

driven ‘individualized healthcare’

Computational GrandChallenges

Molecular structure-function prediction

�Perennial holy grail (and an informatics Nobel Prize): solving the protein folding problem

�Deciphering the noncodingbut biologically active genome

�Epigenomics: how do 25K genes make 400K proteins?

Computational GrandChallenges

Spectrum of “NIH-relevant” imaging

Source: Jim Clark, UW Dept. Biol. Structure

Computational GrandChallenges

Image Analysis & Interpretation

� Image segmentation: automated detection of boundaries objects of interest within single images and related sets of images

� Imaging sematics: Automated linkage of volumes of interest to knowledge about those biological objects and processes

� Image quantitation: volumetric change of objects of interest over time

Computational GrandChallenges

Simulation

� In silico cells, tissues, organisms: a true computable systems biology

�Assembling a complete structural and functional model of the human body, at many levels between molecular and whole organism (e.g., organelle & cell assembly, tissues, organs) linking structure to functions and processes when known

Computational GrandChallenges

Simulation, cont’d

�Change over time: Modeling the structural and biochemical processes of aging

�Simulating, from “molecular first principles”, common diseases in a continuum from molecular changes to visible clinical manifestations

Meningoccocal rash

Computational GrandChallenges

Computational infrastructure for 21st

century healthcare and research

�Development of a robust, ubiquitous, interoperable electronic infrastructure for the appropriate linking of person-specific health data � that protects confidentiality� that is responsive to the needs of patients, providers,

payers and other healthcare-related organizations� that supports discovery science based on common and

rare variant molecular patterns and corresponding health states

Computational GrandChallenges

Computational infrastructure for 21st

Century healthcare

�Universal standards for the content of Electronic Health Records, that support both human interpretation, computer-based reasoning and course guidance, and translational science

1000

Fac

ts p

er D

ecis

ion

10

100

1990 2000 2010 2020

Human Cognitive Capacity

The need for systems-level approaches to clinical decision support for “personalized medicine”

Structural Genetics:

e.g. SNPs, haplotypes

Functional Genetics:

Gene expression profiles

Proteomics and other

effector molecules

Decisions by clinical

phenotype

i.e., traditional health care

Be diligent. Read two journal articles every night. At the end of a year a you will be 1275 years behind the literature.

Assume only 1% of the new literature is relevant to what any individual care provider does. At the end of a year you are 12 years behind.

Computational GrandChallenges

Getting There

The functional and structural appeal of Centers as an engine of innovation and problem solving. Communities of scholars, users, builders, evaluators:

�Who advance the state of the art in computational methods

�Who build and/or ‘harden’, and distribute tools that bridge the gap between advanced computational techniques and the needs and capabilities of ‘rank and file scientists’

Computational GrandChallenges

Getting There, cont’d

�Centers as communities of scholars, users, builders, evaluators: �Have interdisciplinary critical mass�Can take advantage of economies of scale�Develop and maintain software frameworks that

make tools interoperable�Know how to avoid ‘reinventing wheels’�Are focal points for training and education for this

and future generations

The elevated creative bandwidth of face-to-face brainstorming over a cup of coffee…

Computational GrandChallenges

Getting There

�Given the scientific trends of the current era of “Big Data” and complexity in biomedical science, if Centers for advanced biomedical computation did not exist, would they need to be created now?