Probabilistic Models that uncover the hidden Information Flow in Signalling Networks

25
Probabilistic Models that uncover the hidden Information Flow in Signalling Networks Achim Tresch

description

Achim Tresch. Probabilistic Models that uncover the hidden Information Flow in Signalling Networks. Which model?. A model that explains the data merely finds associations E.g.: Epidemiology (predict colon cancer risk from SNPs). A model that explains the mechanism. - PowerPoint PPT Presentation

Transcript of Probabilistic Models that uncover the hidden Information Flow in Signalling Networks

Page 1: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

Probabilistic Models that uncover the hidden Information Flow in Signalling Networks

Achim Tresch

Page 2: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-2-

A model that explains the data

merely finds associations E.g.: Epidemiology (predict colon cancer risk from SNPs)

Which model?

A model that explains the mechanism

finds explanations E.g.: Physics, Systems Biology (predict the signal flow through a cascade of transcription factors)

Page 3: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-3-

Which model?

?

Our choice: Graphical Modelsnodes correspond to physical entities, arrows correspond to interactions

Need for inter-ventional data

Two different types of nodes:Observable componentsPerturbed components (signals)

1st Idea

Page 4: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-4-

How do marionettes walk?

Page 5: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-5-

How do marionettes walk?This is what we observe This is the true model

?

Both models explain the observations perfectly.

What makes the right model (biologically) more plausible?

Page 6: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-6-

How do marionettes walk?This is what we observe This is the true model

?

Both models explain the observations perfectly.

What makes a model (biologically) more plausible?

Signal transmission is expensive!

Find a consistent model with a most

parsimonious effects graph

Signals,Signal graph Γ

Observables,Effects graph Θ

2nd Idea

Page 7: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-7-

Signal graph, Adjacency matrix Γ= (with 1´s in the diagonal)

Effects graph,Adjacency matrix Θ =

Signals

Predicted effects Ft

Obs

erva

bles

1 1 0

0 1 1

Parsimony Assumption: Each observable is linked to exactly one action

Definition [Markowetz, Bioinformatics 2005]: A Nested Effects Model (NEM) is a model F for which F = Γ Θ

Nested Effects Models

1 0 0

1 1 1

0 0 1

1 0

0 0

0 1

Page 8: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-8-

Signals

Predicted effects Ft

Obs

erva

bles

1 1 0

0 1 1

Nested Effects Models

Why „nested“ ?

If the signal graph is transitively closed, then the observed effects are nested in the sense that

a → b

implies

effects(a) effects(b)

The present formulation of a NEM drops the transitivity requirement.

█ █ █

Predicted effects

Page 9: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-9-

s

a

Signals

Obs

erva

bles

Effect of signal s on observable a

Ra,s1 1 0

0 1 1Predicted effects = Ft Measured effects = Rt

sa, perturbed is if

respond NOT does |

perturbed is if

responds | log R

s

aDataP

s

aDataP

Nested Effects Models

The final ingredient: A quantification of the measured effect strength

Ra,s > 0 if the data favours an effect of s on a

Page 10: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-10-

)0|(log)|(log FDPFDP

0 if 0

1 if

)0|(

)|(log

),( ,

,,

),( ,

,,

sa as

assa

sa sa

sasa

F

FR

sDP

FsDP

Assuming independent data,

it follows that

Note: Missing data is handeled easily: set Rs,a= 0

a s

assa FsDPFDP sObservable Signals

,, )|( )|(

)(tr )(tr )( ,s)(a,

,, RFRFRFRa

aaassa

Nested Effects Models

Page 11: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-11-

NEM Estimation

There are two ways of finding a high scoring NEM:

)(tr maxarg)ˆ,ˆ(,

R

dR )P( ))(exp(tr maxargˆ

Maximum Likelihood:

Bayesian, posterior mode:

For n≤5 signals, an exhaustive parameter space search is possible.

For larger n, apply standard optimization strategies:Gradient ascent, Simulated annealingor heuristics tailored to NEMs:Module networks [Fröhlich et al., BMC Bioinformatics 2007], Triplet search [Markowetz at al., Bioinformatics 2007]

Theorem (Tresch, SAGeMB 2008): For ideal data, is unique up to reversals (Corollary: if Γ is a DAG).

)ˆ,ˆ( ̂

Page 12: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-12-

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

4

2

3

1

True Graph

1 23 4 56 78 9101112131415161718192021222324252627282930

1

2

3

4

True graphs Γ,Θ

simulatedmeasure-ments (R)

idealmeasure-ments (ΓΘ)

R/Bioconductor package: Nessy

Simulation

Page 13: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-13-

Tru

e G

raph

12

34

56

78

9101

1121

3141

5161

7181

9202

1222

3242

5262

7282

9301

2

3

4

True graph Estimated graph

Histogram of res$posteriors

res$posteriorsFr

eque

ncy

6 8 10 12 14

020

040

060

0

Distribution of the likelihoods

12 edges, 212=4096 signal graphs, ~ 4seconds

Simulation

Page 14: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-14-

a

b

1

2

3

Hypotheses:

• SL between two genes occurs if the genes are located in different pathways

• Genes sharing the same synthetic lethality partners have an increased chance of being located in the same pathway [Ye, Bader et al., Mol.Systems Biology 2005]

Pathway I Pathway II

Pathway IPathway IIsynthetic lethality

Consequence:

• A gene b whose SL partners are nested into the SL partners of another gene a is likely to be located beneath a in the same pathway.

Application: Synthetic Lethality

Page 15: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-15-

YAL021C

YBL008W

YBR136W YBR195CYBR274W

YBR278W

YCL016C YCL061C

YCR066W

YCR086W

YDL013W

YDL040C

YDL074C

YDL101C

YDR004WYDR076W

YDR121W

YDR191W

YDR386W

YDR439W

YEL061C

YER016W

YER095W

YER173W

YGL020C

YGL058W

YGL163C

YHR013C

YHR191C

YIL132C YJL047C

YJL092W

YJL115W

YJR043C

YKL113C

YLL002W

YLR032W

YLR288C

YLR320WYML032C

YML085C

YML102W

YMR038C

YMR048W

YMR078C

YMR190C

YMR224C

YNL250W

YNL273W

YNR052C

YOL068C

YOR025W

YOR038C

YOR080W

YOR144C

YOR209C

YOR368W

YPL055C

YPL153C

YPL194W

YPR023C

YPR135W

YPR164W

YAL021CYBL008WYBR136WYBR195C

YBR274WYBR278W

YCL016CYCL061CYCR066WYCR086WYDL013WYDL040CYDL074CYDL101CYDR004WYDR076WYDR121WYDR191WYDR386WYDR439WYEL061CYER016WYER095WYER173WYGL020CYGL058W

YGL163CYHR013C

YHR191CYIL132C

YJL047CYJL092WYJL115WYJR043CYKL113CYLL002WYLR032W

YLR288CYLR320WYML032CYML085CYML102WYMR038CYMR048WYMR078CYMR190CYMR224CYNL250WYNL273WYNR052CYOL068CYOR025WYOR038CYOR080WYOR144CYOR209CYOR368W

YPL055CYPL153CYPL194WYPR023CYPR135WYPR164W

YAL021CYBL008WYBR136WYBR195CYBR274WYBR278WYCL016CYCL061CYCR066WYCR086WYDL013WYDL040CYDL074CYDL101CYDR004WYDR076WYDR121WYDR191WYDR386WYDR439WYEL061CYER016WYER095WYER173WYGL020CYGL058WYGL163CYHR013CYHR191CYIL132CYJL047CYJL092WYJL115WYJR043CYKL113CYLL002WYLR032WYLR288CYLR320WYML032CYML085CYML102WYMR038CYMR048WYMR078CYMR190CYMR224CYNL250WYNL273WYNR052CYOL068CYOR025WYOR038CYOR080WYOR144CYOR209CYOR368WYPL055CYPL153CYPL194WYPR023CYPR135WYPR164W

c(-1

, 1)

-1-1

-1-1

-11

11

1

Application: Synthetic Lethality

Pan et al., Cell 2006

Page 16: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-16-

Application: Synthetic Lethality

7 of 10 Genes directly linked to DNA repair

Tresch, unpublished

Page 17: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-17-

References: • Structure Learning in Nested Effects Models. A. Tresch, F. Markowetz,

to appear in SAGeMB 2008, avaliable on the ArXive

• Nested Effects Models as a Means to learn Signaling Networks from Intervention Effects. H. Fröhlich, A. Tresch, F. Markowetz, M. Fellmann, R. Spang, T. Beissbarth, in preparation

• Computational identification of cellular networks and pathways F. Markowetz, Olga G. Troyanskaya, Dennis Kostka, Rainer Spang. Molecular BioSystems, Bioinformatics 2007

• Non-transcriptional Pathway Features Reconstructed from Secondary Effects of RNA Interference. F. Markowetz, J. Bloch, R. Spang, Bioinformatics 2005

R/Bioconductor packages:

• NEM (Markowetz, Fröhlich, Beissbarth)

• Nessy (Tresch)

Software, References

Page 18: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-18-

Research & Teaching Activities

Research related to the theory of NEMsIntegration of multiple data sourcesTime-dependent NEMsAllow for arbitrary signalling model

TeachingLectures & Exercises in Bioinformatics, Machine Learning, Statistics for Physicians, Group Theory, Microarray AnalysisE-learning Core Group of the FacultyBachelor-/Master- and PhD theses

Other Research TopicsSoftware for data acquisition, -processing & -visualization for high-density technologiesDesign and analysis of biological/clinical experiments, consulting

Page 19: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-19-

Florian Markowetz Lewis-Sigler Institute, Princeton

Tim Beissbarth, Holger FröhlichGerman Cancer Research Center, Heidelberg

Rainer SpangComputational Diagnostics Group, Regensburg

Acknowledgements

Page 20: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-20-

Thank You!

Conclusion

Exercise:

Why is this administration model inefficient?

Construct a model that scores better!

Page 21: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-21-

Page 22: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-22-

What I did not show …

Estimated Graph (68 'known' Observations)

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768

rel-

key-

tak-

mkk4hep-

Automatic Feature Selection, without Control experiment:Estimated graph (120 genes selected)

Page 23: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-23-

The „observed“ graph of the Fellmann estrogen receptor dataset

What I did not show …

Page 24: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-24-

15 Genes

17 Knockdown Experiments

6 of them double Knockdowns

What I did not show …

Page 25: Probabilistic  Models  that uncover  the hidden Information Flow  in  Signalling Networks

-25-

AKT1

AKT2

BCL2

CCNG2ERK1ERK2

ESR1

FOXA1

GPR30

HSPB8

LOC120224STAT3 STAT5B

STC2

XBP1

AKT1AKT2BCL2

CCNG2ESR1

FOXA1HSPB8

LOC120224STAT5B

STC2XBP1

ERK1+STAT5BERK2+STAT5B

ESR1+ERK1ESR1+ERK2

ESR1+GPR30STAT3+STAT5B

c(-

1, 1)

-6.7

-5.8

-5.1

-3.6

-0.3

30.4

31.1

2.4

12

Same Data,

With prior knowledge.

What I did not show …