Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven...

19
Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University of Wisconsin-Madison U.S.A. joint work with: Deborah Chasman, Paul Ahlquist, Brandi Gancarz, Linhui Hao

Transcript of Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven...

Page 1: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Identifying Candidate Pathways to Explain Phenotypes in

Genome-Wide Mutant Screens

Mark CravenDepartment of Biostatistics & Medical Informatics

University of Wisconsin-Madison

U.S.A.

joint work with: Deborah Chasman, Paul Ahlquist, Brandi Gancarz, Linhui Hao

Page 2: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Viruses take advantage of host cell genes

Figure from: C. E. Samuel, Journal of Biological Chemistry 285, 2010.

Page 3: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Genome-wide mutant screens

HOS1 SPE3

MED1 YPR071W NOT5

LTP1

HOS1 SPE3

MED1 YPR071W NOT5

LTP1

HOS1 SPE3

MED1 YPR071W NOT5

LTP1

HOS1 SPE3

MED1 YPR071W NOT5

LTP1

mutant phenotype

Example: determining which host genes affect viral replication [Kushner et al., PNAS 2003; Gancarz et al., PLoS One 2011]

Page 4: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Genome-wide mutant screensThe output of such screens are sets of genes that either inhibit or stimulate viral processes

Page 5: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Characterizing virus-host interactions

given such interaction data, we want to• identify pathways that provide consistent explanations for the

genome-wide measurements• predict the interfaces to the virus

before inference after inference

Some interactions are deemed not consistent with the measurements

Directions and signs of interactions are specified

Interfaces to the virus are hypothesized

Page 6: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Integer programming approach

1. Collect candidate pathways for each “hit”

2. Use IP to identify a globally consistent subnetwork

Page 7: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Collecting candidate pathways for a hit

generate candidate pathways, up to a specified length, that link a hit to the virus

Page 8: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Variables in integer programming approach

σ p

x2, s2, d2€

x1

x3, s3

Variable Description

xe is edge e active?

se sign of edge e (up- or down-regulating)

de direction of edge e

tg phenotype (effect) of knocking out gene g

σp is pathway p active?€

tg

Page 9: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Constraints in integer programming approach

∀n ∈ hits σ pp:n∈ nodes ( p )

∑ ⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟> 0

σ p

x2, s2, d2€

x1

x3, s3

tg

all significant measurements (hits) are explained by at least one pathway

Page 10: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Constraints in integer programming approach

σ p

x2, s2, d2€

x1

x3, s3

∀p σ p = 0 ∨ consistent (p)( )

consistent(p) =

∧e∈ edges ( p ){ i, j}= nodes (e )

xe =1 ∧ de = dir(p,e) ∧ se = tit j( )

tg

all active pathways are consistent, with edges directed toward the interface

Page 11: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Current objective function ininteger programming approach

minimize the number of interfaces

min I σ p > 0p:n∈ nodes ( p )

∑ ⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟

n∈ interfaces

σ p

x2, s2, d2€

x1

x3, s3

tg

Page 12: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

before inference

after inference

Page 13: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

How to evaluate the IP approach?

hold a measurement asidesee if we can correctly predict it using inferred networks

?

Page 14: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Baseline predictors

?

neighbor voting further neighbor voting

?

predict

1 neighbor votes

2 neighbors vote

predict

3 neighbors vote

2 neighbors vote

Page 15: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Baseline predictors

consistency neighbor voting

predict

1 neighbor votes

2 neighbors vote

?

This gene votes because it has a repressing interaction with query gene

Page 16: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Markov network approach

• variables are the same as in the IP

• potential functions represent• the constraints• uncertainty associated with specific interactions• the preference for a small number of interfaces

• inference done using Gibbs sampling

Page 17: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Predictive Accuracy (BMV)

Page 18: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Predictive Accuracy (FHV)

Page 19: Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.

Future work

• taking into account additional sources of information• quantitative values from assays• genetic interactions• interactions automatically extracted from the scientific literature

• adapting approach to RNAi screens in mammalian cells• more genes• lower density of known interactions• more uncertainty in measurements

• devising methods that use these models to determine which follow-up experiments would be most informative