Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

22
Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL

Transcript of Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Page 1: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Dynamic modelling of microarray data.

Martino Barenco

Institute of Child Health / UCL

Page 2: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Goal: predict targets of a known transcription factor in a complex response using dynamic models and time course microarray data. HVDM: Hidden Variable Dynamic Modelling

Outline

1) Principle + Results (Genome Biology 2006)

2) Techniques (R/Bioconductor implementation: rHVDM)

Page 3: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Gene expression modelTranscript concentration Xj(t):

dX j (t)

dt= B j + S j f (t) − D j X j (t)

transcription rates

degradation rate

transcription factor activity

f(t)Bj=0Sj=3Dj=1

Bj 010Dj 10.1

Sj 36Bj 010Dj 10.1

Sj 36

Xj(t)

Page 4: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Algorithm Principle:I) Training step:

Inputs:- Previous biological knowledge: known targets of the transcription factor- Expression values of those targets

Output:- Transcription factor activity (the hidden variable)- Kinetic parameters for the training genes

II) Screening step (for each single gene):

Input:- Transcription factor activity - Expression profile of the gene

Output:- Dependency status of the gene: target or not?

Page 5: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

B j S j f (t) D j X j (t)dX j (t)

dt

Training step (j: training genes)

Screening step (j: individual gene being screened)

B j S j f (t) D j X j (t)dX j (t)

dt

Page 6: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

The p53 network

Active p53

Rb/E2F1

E2F1

Rb

CDK4Cell CycleG1/S Arrest p73

14-3-3

Jun-Bp21

Baxp53AIFPuma

FasPiddDR5

bcl2

mybJun

MDM2

p19Arf

p53

CHK2

Active ATM

ATM

DNADamage

G2/MArrest

Survival

DeathReceptor

MitochondrialApoptosis

Page 7: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Experimental setup

Human T cells (MOLT4/p53 wild-type) submitted to 5Gy irradiation.

mRNA harvested 2,4,6,8,10,12 hours after irradiation, and just before (0 hrs time point).

Affymetrix microarrays (HG-U133) were then run.

Experiment was run in triplicates.

Page 8: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Results of training step: activity profile of p53

Page 9: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Screening

Q: what are the other genes that are p53 activated?

Putative p53 targets must both:a) Fit the model wellb) Have a sensitivity coefficient Sj>0

dX j (t)

dt= B j + S j f (t) − D j X j (t)

Page 10: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Model Sensitivityscore M (Z-score)

damage-specific DNA binding protein 2, 48kDa DDB2 203409_at 18.74 18.24

CD38 antigen (p45) CD38 205692_s_at 36.69 14.77

ferredoxin reductase FDXR 207813_s_at 79.82 13.19

hypothetical protein FLJ22457 FLJ22457 221081_s_at 60.45 11.01

tripartite motif-containing 22 TRIM22 213293_s_at 41.36 10.99

carnitine O-octanoyltransferase CROT 204573_at 84.4 10.98

glutaminase 2 (liver, mitochondrial) GLS2 205531_s_at 42.83 10.28

leucine-rich repeats and death domain containing LRDD 219019_at 78.8 9.9

hect domain and RLD 5 HERC5 219863_at 37.65 9.55

cyclin G1 CCNG1 208796_s_at 17.04 9.37

BCL2-interacting killer BIK 205780_at 19.43 9.35

activating signal cointegrator 1 complex subunit 3 ASCC3 212815_at 60.34 9.26

sestrin 1 SESN1 218346_s_at 8.37 9.25

p53 target zinc finger protein WIG1 219628_at 41.33 9.19

tumor necrosis factor receptor superfamily, member 10bTNFRSF10B 209295_at 27.34 9.05

chromosome 6 open reading frame 4 C6orf4 215411_s_at 86.45 8.81

cyclin-dependent kinase inhibitor 1A(p21) CDKN1A 202284_s_at 24.98 8.4

etoposide induced 2.4 mRNA EI24/PIG8 216396_s_at 88.04 8.2

mitogen-activated protein kinase kinase kinase kinase 4 MAP4K4 206571_s_at 62.88 7.54

lymphoid-restricted membrane protein LRMP 204674_at 26.92 7.36

xeroderma pigmentosum, group C XPC 209375_at 43.09 7.36

TNF (ligand) superfamily, member 4 (Ox40L) TNFSF4 207426_s_at 34.73 7.15

Human cleavage /polyadenylation specificity factor CPSF1 33132_at 77.75 7.09

AMP-activated protein kinase, beta 1 subunit PRKAB1 201834_at 25.72 7.01

transducer of ERBB2, 1 TOB1 202704_at 92.69 6.79

p53-inducible cell-survival factor P53CSV 218403_at 48.33 6.5

sortilin-related receptor, L(DLR class) SORL1 203509_at 15.66 6.34

Fas (TNF receptor superfamily, member 6) FAS 216252_x_at 44.31 6.23

ribonucleotide reductase M1 polypeptide RRM1 201477_s_at 46.58 6.19

archaemetzincins-2 AMZ2 218167_at 37.48 6.16

galactose-3-O-sulfotransferase 4 GAL3ST4 219815_at 38.62 5.97

growth arrest and DNA-damage-inducible, alpha GADD45A 203725_at 84.23 5.89

hypothetical protein FLJ11259 FLJ11259 218627_at 7.23 5.87

major histocompatibility complex, class I, B HLA-B 209140_x_at 89.77 5.79

testis specific, 10 TSGA10 220623_s_at 20.85 5.67

hypothetical protein MDS025 MDS025 218288_s_at 31.35 5.66

TP53 activated protein 1 TP53AP1 209917_s_at 22.22 5.65

leukemia inhibitory factor LIF 205266_at 14.86 5.62

interferon stimulated exonuclease gene 20kDa-like 1 ISG20L1 219361_s_at 48.55 5.56

Gene Title Gene Symbol Affymetrix Identifier

Page 11: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

P21: part oftraining set

CD38:Uncovered by screening

Page 12: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Verification experimentsiRNA knock down of p53:

HVDM predictions:

Page 13: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Ingredients needed1) ODE integrator:

dX j (t)

dt= B j + S j f (t) − D j X j (t) + parameter values

X j,MODEL(t)

2) Model fitting:

Find set of parameter values s.t.

X j,MODEL(t) ≅ X j,DATA (t)

3) Want to take measurement noise into the data into account

4) Specifically for the Bioconductor implementation: be reasonably quick

Page 14: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

1) ODE integration

0

10

20

30

40

50

60

70

80

0 2 4 6 8 10 12

- Want to estimate slope of at t=6

X j,MODEL (t)

€ - Slope=weighted sum of time points around t=6

dX j,MODEL(t)

dt≅ A.X j,MODEL (t)

A.X j (t) = B j + S j f (t) − D j X j (t)- i.e. the ODE is turned into a system of linear equations

X j,MODEL (t) = (A + D jI )-1(B j 1+ S j f(t))Formal solution:

Page 15: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

2) Model fitting1) Start with a “random” set of parameters:

2) Compute a solution:

3) Compare with data using a merit function:

4) Vary p systematically until a minimum value for M(p) is reached.

p = {B1,..,Bm,S1,..,Sm,D1,..,Dm, f }

j = 1,...,m

M(p) =ˆ X j(ti) − X j(ti)

σ X j(ti)( )

⎝ ⎜ ⎜

⎠ ⎟ ⎟

n time points (i)m genes (j)

∑2

X j,MODEL (t) = (A + D jI )-1(B j 1+ S j f(t))

Page 16: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Fitting algorithms:

Originally used simplex-based method (Nelder-Mead) (GB paper)

Followed by a MCMC step to determine confidence intervals (GB paper)

rHVDM (Bioconductor) uses Levenberg-Marquardt (gradient-based).

By-product is the Hessian, which allows to compute confidence intervals.

Page 17: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Difference between MCMC and LM confidence intervals.

Basal rates

0

10

20

30

40

50

60

70

80

203409_at 218346_s_at 209295_at 202284_s_at 205780_at

Sensitivity

0

0.5

1

1.5

2

2.5

203409_at 218346_s_at 209295_at 202284_s_at 205780_at

Degradation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

203409_at 218346_s_at 209295_at 202284_s_at 205780_at

Transcription factor activity (sample1)

0

50

100

150

200

250

300

350

400

450

1 2 3 4 5 6 7

Page 18: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Importance of confidence intervals

Biological data is inherently noisy. Don’t want to assume that measurement are exact.

example: Genes with a flat profile would be a good fit to the

equation (Sj=0)

Essential to identify these situations to detect targets of the transcription factor

dX j (t)

dt= B j + S j f (t) − D j X j (t)

Page 19: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Parameter count reduction / identifiability

€ dXj(t)dt= Bj + Sj (α  g(t) + β) – Dj Xj(t)

= (Bj + Sj β) + α  Sj g(t) – Dj Xj(t)

=~Bj +

~Sj g(t) – Dj Xj(t)

dX j (t)

dt= B j + S j f (t) − D j X j (t)

Replace f(t) with

f (t) = αg(t) + β

Solution:Let Sp21=1 (removes “”’’ ambiguity)and f(0)=0 (removes “’’ ambiguity) parameter count is reduced by 2

Page 20: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Confidence intervals importance II

Solution measure one of the kinetic parameters independently, integrate that in the fitting:

Initial fitting:

Page 21: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Measurementerror

AlgorithmicspeedParameter

identifiability

Parameter countreduction

Confidenceintervals

Page 22: Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Acknowledgements

Sonia Shah (Bloomsbury Centre for Bioinformatics)

Dan Brewer (Institute of Cancer Research)Crispin Miller (Patterson Institute for Cancer

Research)Daniela Tomescu (ICH)Mike Hubank (ICH)Robin Callard (ICH)Jaroslav Stark (CISBIC, Imperial College)