Download - Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15

Use of Bionetworks to Build Maps of Diseases

Stephen Friend MD PhD

Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ San Francisco

MIPS Seminar Series August 15th, 2011

why consider the fourth paradigm- data intensive science

thinking beyond the narrative, beyond pathways

advantages of an open innovation compute space

it is more about how than what

Alzheimer’s Diabetes

Cancer Obesity Treating Symptoms v.s. Modifying Diseases

Will it work for me?

Familiar but Incomplete

Reality: Overlapping Pathways

WHY NOT USE “DATA INTENSIVE” SCIENCE

TO BUILD BETTER DISEASE MAPS?

Equipment capable of generating massive amounts of data

“Data Intensive Science”- “Fourth Scientific Paradigm” For building: “Better Maps of Human Disease”

Open Information System

IT Interoperability

Evolving Models hosted in a Compute Space- Knowledge Expert

It is now possible to carry out comprehensive monitoring of many traits at the population level

Monitor disease and molecular traits in populations

Putative causal gene

Disease trait

what will it take to understand disease?

DNA RNA PROTEIN (dark matter)

MOVING BEYOND ALTERED COMPONENT LISTS

2002 Can one build a “causal” model?

trait trait trait trait trait trait trait trait trait trait trait trait trait

How is genomic data used to understand biology?

!Standard" GWAS Approaches Profiling Approaches

!Integrated" Genetics Approaches

Genome scale profiling provide correlates of disease   Many examples BUT what is cause and effect?

Identifies Causative DNA Variation but provides NO mechanism

  Provide unbiased view of molecular physiology as it

relates to disease phenotypes

  Insights on mechanism

  Provide causal relationships and allows predictions

RNA amplification Microarray hybirdization

Gene Index

Tum

ors

Tum

ors

Integration of Genotypic, Gene Expression & Trait Data

Causal Inference

Schadt et al. Nature Genetics 37: 710 (2005) Millstein et al. BMC Genetics 10: 23 (2009)

Chen et al. Nature 452:429 (2008) Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005)

Zhu et al. Cytogenet Genome Res. 105:363 (2004) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)

“Global Coherent Datasets” •  population based

•  100s-1000s individuals

Constructing Co-expression Networks

Start with expression measures for genes most variant genes across 100s ++ samples

Note: NOT a gene expression heatmap

1 -0.1 -0.6 -0.8

-0.1 1 0.1 0.2

-0.6 0.1 1 0.8

-0.8 0.2 0.8 1 1

2

3

4

1 2 3 4

Correlation Matrix Brain sample

expr

essi

on

1 0 1 1 0 1 0 0 1 0 1 1 1 0 1 1 1

2

3

4

1 2 3 4

Connection Matrix

1 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1

2

4

3

1 2 4 3

4 1

3 2

4 1

2

Establish a 2D correlation matrix for all gene pairs

Define Threshold eg >0.6 for edge

Clustered Connection Matrix

Hierarchically cluster

sets of genes for which many pairs interact (relative to the total number of pairs in that

set)

Network Module

Identify modules

Preliminary Probabalistic Models- Rosetta /Schadt

Gene symbol Gene name Variance of OFPM explained by gene expression*

Mouse model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11] C3ar1 Complement component

3a receptor 1 46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct identification of genes that are causal for disease

Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

  50 network papers   http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

(Eric Schadt)

Recognition that the benefits of bionetwork based molecular models of diseases are powerful but that they require significant resources

Appreciation that it will require decades of evolving representations as real complexity emerges and needs to be integrated with therapeutic interventions

Sage Mission

Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by

contributor scientists with a shared vision to accelerate the elimination of human disease

Sagebase.org

Data Repository

Discovery Platform

Building Disease Maps

Commons Pilots

Sage Bionetworks Collaborators

  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca,

Amgen, Johnson &Johnson

22

  Foundations   Kauffman CHDI, Gates Foundation

  Government   NIH, LSDF

  Academic   Levy (Framingham)

  Rosengren (Lund)

  Krauss (CHORI)

  Federation   Ideker, Califarno, Butte, Schadt

RULES GOVERN

Engaging Communities of Interest

PLAT

FORM

NEW

MAP

S NEW MAPS

Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...)

PLATFORM Sage Platform and Infrastructure Builders-

( Academic Biotech and Industry IT Partners...)

RULES AND GOVERNANCE Data Sharing Barrier Breakers-

(Patients Advocates, Governance and Policy Makers, Funders...)

NEW TOOLS Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape,

Clinical Trialists, Industrial Trialists, CROs…)

PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots-

(Federation, CCSB, Inspire2Live....)

Research Platform Research Platform Commons

Data Repository

Discovery Platform

Building Disease

Maps

Tools & Methods

Repository

Discovery

Maps

Tools &

Repository

Discovery Platform

Repository Repository

Discovery

Repository

Discovery

Commons Pilots

Outposts Federation

CCSB

LSDF-WPP Inspire2Live

POC

Cancer Neurological Disease

Metabolic Disease

Pfizer Merck Takeda

Astra Zeneca CHDI Gates NIH

Curation/Annotation

CTCAP Public Data Merck Data TCGA/ICGC

Hosting Data Hosting Tools

Hosting Models

LSDF

Bayesian Models Co-expression Models

KDA/GSVA

25

Example 1: Breast Cancer

Zhang B et al., manuscript

Bayesian Network

Survival Analysis

Coexpression Networks Module combination

Partition BN

4 Public Breast Cancer Datasets

NKI: van de Vijver et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002 Dec 19;347(25):1999-2009.

Wang Y et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005 Feb 19-25;365(9460):671-9.

Miller: Pawitan Y et al. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res. 2005;7(6):R953-64.

Christos: Sotiriou C et al.. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006 Feb 15;98(4):262-72.

295 samples

286 samples

159 samples

189 samples

Generation of Co-expression & Bayesian Networks from published Breast Cancer Studies

Recovery of EGFR and Her2 oncoproteins downstream pathways by super modules downstream pathways by super modules

Comparison of Super-modules with EGFR and Her2 signaling and resistance pathways

29

Key Driver Analysis •  Identify key regulators for a list of genes h and a network N •  Check the enrichment of h in the downstream of each node in N •  The nodes significantly enriched for h are the candidate drivers The nodes significantly enriched for

30

A) Cell Cycle (blue)

C) Pre-mRNA Processing (brown)

B) Chromatin modification (black)

D) mRNA Processing (red)

Global driver

Global driver & RNAi validation

Signaling between Super Modules

(View Poster presented by Bin Zhang)

Example 2. The Sage Non-Responder Project in Cancer

Sage Bionetworks • Non-Responder Project

•  To identify Non-Responders to approved drug regimens so we can improve outcomes, spare patients unnecessary toxicities from treatments that have no benefit to them, and reduce healthcare costs

•  Co-Chairs Stephen Friend, Todd Golub, Charles Sawyers & Rich Schilsky

•  AML (at first relapse) •  Non-Small Cell Lung Cancer

•  Ovarian Cancer (at first relapse)

•  Breast Cancer •  Renal Cell

•  Multiple Myeloma

Purpose:

Leadership:

Initial Studies:

Model of Alzheimer’s Disease Bin Zhang Jun Zhu

AD

normal

AD

normal

AD

normal

Cell cycle

http://sage.fhcrc.org/downloads/downloads.php

Blue module: 3000 genes Associated with Type 2 diabetes Elevated HbA1c Reduced insulin secretion

Global expression data from 64 human islet donors

340 genes in islet-specific open chromatin regions

168 overlapping genes, which have

•  Higher connectivity •  Markedly stronger association with

•  Type 2 diabetes •  Elevated HbA1c •  Reduced insulin secretion

•  Enrichment for beta-cell transcription factors and exocytotic proteins

New Type II Diabetes Disease Models Anders Rosengren

•  Search across 1300 datasets in MetaGEO at Sage for similar expression profiles Top hit: Islet dedifferentiation study where the 168 genes were upregulated in mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)

•  Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test

•  Identification of candidate key genes affecting beta-cell differentiation and chromatin

Working hypothesis:

Normal beta-cell: open chromatin in islet-specific regions, high expression of beta-cell transcription factors, differentiated beta-cells and normal insulin secretion

Diabetic beta-cell: lower expression of beta-cell transcription factors affecting the identified module, dedifferentiation, reduced insulin secretion and hyperglycemia

Next steps: Validation of hypothesis and suggested key genes in human islets

Anders Rosengren

New Type II Diabetes Disease Models

Clinical Trial Comparator Arm Partnership (CTCAP)

  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.

  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.

  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].

  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.

Examples: The Sage Federation

•  Founding Lab Groups

–  Seattle- Sage Bionetworks –  New York- Columbia: Andrea Califano –  Palo Alto- Stanford: Atul Butte –  San Diego- UCSD: Trey Ideker –  San Francisco: UCSF/Sage: Eric Schadt

•  Initial Projects –  Aging –  Diabetes –  Warburg

•  Goals: Share all datasets, tools, models Develop interoperability for human data

THE FEDERATION Butte Califano Friend Ideker Schadt

vs

Federation s Genome-wide Network and Modeling Approach

Califano group at Columbia Sage Bionetworks Butte group at Stanford

Human Aging Project

Brain A (n=363)

Brain B (n=145)

Blood A (n=~1000)

Blood B (n=~1000)

Brain C (n=400)

Adipose (n=~700)

Data Transformations

TF Activity Profile

Gene Set / Pathway Variation Analysis

Interactome

Machine Learning

Elastic Net

Network Prior Models

Tree Classifiers

Age Model

Deriving Master Regulators from Transcription Factors Regulatory Networks Glycolysis & Glycogenesis Metabolism Pathway

Inferring Prostate Cancer Regulatory Modules for Glycolysis &Glycogenesis Metabolism Pathway

Duarte N. et al (2006) PNAS 107(6):1777-1782

Glycolysis and Glycogenesis Metablism

Gene Set (GGMSE)

Prostate cancer global coherent data set (GSE21032) Taylor BS. et al (2010) Cancer Cell 18(1):11-22

Inferred Transcriptional Regulatory Network in Prostate

Cancer

Zhu J. et al (2008) Nature Genetics 40(7):854-61

Integrated Bayesian Approach

Prostate Cancer Regulatory Modules for GGMSE and Other

Metabolism Pathways

Cox Proportional-Hazards Regression model based on

individual gene for recurrence free survival

Metabolism pathways with regulatory modules enriched by poor prognosis genes

for prostate cancer

Sage bionetworks’ approach

Genes Associated with Poor Prognosis are disproportionally found among the networks regulating the !glycolysis" Genes

Size of the node proportional to -log10 P value for recurrence free survival.

>5 fold enrichment of recurrence free prognostic genes with the Glycolysis BN module than random selection (p<1e-100)

P-Value<0.005

Inferred regulatory module for GGMSE Inferred regulatory module for Oxidative Phosphorylation and Sphingolipid

Metabolism genes

Federated Aging Project : Combining analysis + narrative

=Sweave Vignette Sage Lab

Califano Lab Ideker Lab Califano Lab Ideker Lab

Shared Data Repository

JIRA: Source code repository & wiki

R code + narrative

PDF(plots + text + code snippets) R code + PDF(plots + text + code snippets)

Data objects

PDF(plots + text + code snippets) PDF(plots + text + code snippets) PDF(plots + text + code snippets) PDF(plots + text + code snippets)

HTML

Submitted Paper

Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning

Synapse as a Github for building models of disease

Evolution of a Software Project

Biology Tools Support Collaboration

Potential Supporting Technologies

Taverna

Addama

tranSMART

Platform for Modeling

SYNAPSE

� � � � � � � � � � � � � �

INTEROPERABILITY (tranSMART)

TENURE FEUDAL STATES

IMPACT ON PATIENTS IMPACT ON PATIENTS

Eight Projects Initiated in last year

!

Group D LEGAL STACK-ENABLING PAIENTS: John Wilbanks

Arch2POCM

Restructuring Drug Discovery

“Absurdity” of Current R&D Ecosystem

•  $200B per year in biomedical and drug discovery R&D •  Handful of new medicines approved each year •  Productivity in steady decline since 1950 •  90% of novel drugs entering clinical trials fail •  NIH and EU just started spending billions to duplicate process

•  Significant pharma revenues going off patent in next 5 years •  >30,000 pharma employees fired in each of last four years •  Number of R&D sites in Europe down from 29 to 16 since 2009

What is the problem? •  Regulatory hurdles too high? •  Low hanging fruit picked? •  Payers unwilling to pay? •  Genome has not delivered? •  Valley of death? •  Companies not large enough to execute on strategy? •  Internal research costs too high? •  Clinical trials in developed countries too expensive?

In fact, all are true but none is the real problem

What is the problem? •  The current system is designed as if every new program is destined to

deliver an approved drug

•  Past 20 years prove this assumption wrong (again and again)

•  Why do promising early results rarely translate into approved drugs?

•  Bottom line: we have poor understanding of biology

•  Lack of early-data sharing within closed information systems dooms drug discovery for frequent avoidable failure

What is the problem?

We need to rebuild the drug discovery process so that we better understand disease biology before testing proprietary compounds on sick patients

The solution – Arch2POCM 1.  Create an Archipelago of clinicians and scientists from public

and private sectors to take projects from ideas to Proof of Clinical Mechanism (POCM)

2.  Arch2POCM is a collaborative, data-sharing network of scientists, whose drug discovery objective is to use robust compounds against new targets to disentangle the complexity of human biology, not to create a medicine

3.  Success? •  A compound that provides proof of concept for a novel target-

allowing companies to use this common information to compete, with dramatic increased chances of success

•  Culling targets with doomed mechanisms before multiple companies waste money exploring them - at $50M a pop

Why data sharing through to Phase IIb?

•  Most rapidly reveals limitations and opportunities associated with the target

•  Increases probability of success for internal proprietary programs

•  Scientific decisions are not influenced by market considerations or biased internal thinking

•  Target mechanism is only properly tested at Phase IIb

Why no IP on “Common Stream” compounds?

•  Allows multiple groups to test diverse indications without funds from Arch2POCM- crowdsourcing drug discovery

•  Broader and faster data dissemination

•  Far fewer legal agreements to negotiate

•  Generates “freedom to operate” on target because there are no patent thickets to wade through

•  Efficient way to access world’s top scientists and doctors without hassle

Existing Team Ready to Execute

First major milestones

2013- First Compound in clinical trials

2014- Go and No-Go Decisions from common stream of targets driving Proprietary Programs

2014- Full complement of target programs activated

2014- Core Clinical Programs joined by crowdsourced clinical trials

why consider the fourth paradigm- data intensive science

thinking beyond the narrative, beyond pathways

advantages of an open innovation compute space

it is more about how than what

OPPORTUNITIES FOR MIPS COMMUNITY

Data sets, Tools and Models

Joining Synapse Communities

Joining Federation Projects

Joinig Arch2POCM

Change reward structures for sharing data (patients and academics)