Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Post on 13-Jan-2016

215 views 0 download

Tags:

Transcript of Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL.

Dynamic modelling of microarray data.

Martino Barenco

Institute of Child Health / UCL

Goal: predict targets of a known transcription factor in a complex response using dynamic models and time course microarray data. HVDM: Hidden Variable Dynamic Modelling

Outline

1) Principle + Results (Genome Biology 2006)

2) Techniques (R/Bioconductor implementation: rHVDM)

Gene expression modelTranscript concentration Xj(t):

dX j (t)

dt= B j + S j f (t) − D j X j (t)

transcription rates

degradation rate

transcription factor activity

f(t)Bj=0Sj=3Dj=1

Bj 010Dj 10.1

Sj 36Bj 010Dj 10.1

Sj 36

Xj(t)

Algorithm Principle:I) Training step:

Inputs:- Previous biological knowledge: known targets of the transcription factor- Expression values of those targets

Output:- Transcription factor activity (the hidden variable)- Kinetic parameters for the training genes

II) Screening step (for each single gene):

Input:- Transcription factor activity - Expression profile of the gene

Output:- Dependency status of the gene: target or not?

B j S j f (t) D j X j (t)dX j (t)

dt

Training step (j: training genes)

Screening step (j: individual gene being screened)

B j S j f (t) D j X j (t)dX j (t)

dt

The p53 network

Active p53

Rb/E2F1

E2F1

Rb

CDK4Cell CycleG1/S Arrest p73

14-3-3

Jun-Bp21

Baxp53AIFPuma

FasPiddDR5

bcl2

mybJun

MDM2

p19Arf

p53

CHK2

Active ATM

ATM

DNADamage

G2/MArrest

Survival

DeathReceptor

MitochondrialApoptosis

Experimental setup

Human T cells (MOLT4/p53 wild-type) submitted to 5Gy irradiation.

mRNA harvested 2,4,6,8,10,12 hours after irradiation, and just before (0 hrs time point).

Affymetrix microarrays (HG-U133) were then run.

Experiment was run in triplicates.

Results of training step: activity profile of p53

Screening

Q: what are the other genes that are p53 activated?

Putative p53 targets must both:a) Fit the model wellb) Have a sensitivity coefficient Sj>0

dX j (t)

dt= B j + S j f (t) − D j X j (t)

Model Sensitivityscore M (Z-score)

damage-specific DNA binding protein 2, 48kDa DDB2 203409_at 18.74 18.24

CD38 antigen (p45) CD38 205692_s_at 36.69 14.77

ferredoxin reductase FDXR 207813_s_at 79.82 13.19

hypothetical protein FLJ22457 FLJ22457 221081_s_at 60.45 11.01

tripartite motif-containing 22 TRIM22 213293_s_at 41.36 10.99

carnitine O-octanoyltransferase CROT 204573_at 84.4 10.98

glutaminase 2 (liver, mitochondrial) GLS2 205531_s_at 42.83 10.28

leucine-rich repeats and death domain containing LRDD 219019_at 78.8 9.9

hect domain and RLD 5 HERC5 219863_at 37.65 9.55

cyclin G1 CCNG1 208796_s_at 17.04 9.37

BCL2-interacting killer BIK 205780_at 19.43 9.35

activating signal cointegrator 1 complex subunit 3 ASCC3 212815_at 60.34 9.26

sestrin 1 SESN1 218346_s_at 8.37 9.25

p53 target zinc finger protein WIG1 219628_at 41.33 9.19

tumor necrosis factor receptor superfamily, member 10bTNFRSF10B 209295_at 27.34 9.05

chromosome 6 open reading frame 4 C6orf4 215411_s_at 86.45 8.81

cyclin-dependent kinase inhibitor 1A(p21) CDKN1A 202284_s_at 24.98 8.4

etoposide induced 2.4 mRNA EI24/PIG8 216396_s_at 88.04 8.2

mitogen-activated protein kinase kinase kinase kinase 4 MAP4K4 206571_s_at 62.88 7.54

lymphoid-restricted membrane protein LRMP 204674_at 26.92 7.36

xeroderma pigmentosum, group C XPC 209375_at 43.09 7.36

TNF (ligand) superfamily, member 4 (Ox40L) TNFSF4 207426_s_at 34.73 7.15

Human cleavage /polyadenylation specificity factor CPSF1 33132_at 77.75 7.09

AMP-activated protein kinase, beta 1 subunit PRKAB1 201834_at 25.72 7.01

transducer of ERBB2, 1 TOB1 202704_at 92.69 6.79

p53-inducible cell-survival factor P53CSV 218403_at 48.33 6.5

sortilin-related receptor, L(DLR class) SORL1 203509_at 15.66 6.34

Fas (TNF receptor superfamily, member 6) FAS 216252_x_at 44.31 6.23

ribonucleotide reductase M1 polypeptide RRM1 201477_s_at 46.58 6.19

archaemetzincins-2 AMZ2 218167_at 37.48 6.16

galactose-3-O-sulfotransferase 4 GAL3ST4 219815_at 38.62 5.97

growth arrest and DNA-damage-inducible, alpha GADD45A 203725_at 84.23 5.89

hypothetical protein FLJ11259 FLJ11259 218627_at 7.23 5.87

major histocompatibility complex, class I, B HLA-B 209140_x_at 89.77 5.79

testis specific, 10 TSGA10 220623_s_at 20.85 5.67

hypothetical protein MDS025 MDS025 218288_s_at 31.35 5.66

TP53 activated protein 1 TP53AP1 209917_s_at 22.22 5.65

leukemia inhibitory factor LIF 205266_at 14.86 5.62

interferon stimulated exonuclease gene 20kDa-like 1 ISG20L1 219361_s_at 48.55 5.56

Gene Title Gene Symbol Affymetrix Identifier

P21: part oftraining set

CD38:Uncovered by screening

Verification experimentsiRNA knock down of p53:

HVDM predictions:

Ingredients needed1) ODE integrator:

dX j (t)

dt= B j + S j f (t) − D j X j (t) + parameter values

X j,MODEL(t)

2) Model fitting:

Find set of parameter values s.t.

X j,MODEL(t) ≅ X j,DATA (t)

3) Want to take measurement noise into the data into account

4) Specifically for the Bioconductor implementation: be reasonably quick

1) ODE integration

0

10

20

30

40

50

60

70

80

0 2 4 6 8 10 12

- Want to estimate slope of at t=6

X j,MODEL (t)

€ - Slope=weighted sum of time points around t=6

dX j,MODEL(t)

dt≅ A.X j,MODEL (t)

A.X j (t) = B j + S j f (t) − D j X j (t)- i.e. the ODE is turned into a system of linear equations

X j,MODEL (t) = (A + D jI )-1(B j 1+ S j f(t))Formal solution:

2) Model fitting1) Start with a “random” set of parameters:

2) Compute a solution:

3) Compare with data using a merit function:

4) Vary p systematically until a minimum value for M(p) is reached.

p = {B1,..,Bm,S1,..,Sm,D1,..,Dm, f }

j = 1,...,m

M(p) =ˆ X j(ti) − X j(ti)

σ X j(ti)( )

⎝ ⎜ ⎜

⎠ ⎟ ⎟

n time points (i)m genes (j)

∑2

X j,MODEL (t) = (A + D jI )-1(B j 1+ S j f(t))

Fitting algorithms:

Originally used simplex-based method (Nelder-Mead) (GB paper)

Followed by a MCMC step to determine confidence intervals (GB paper)

rHVDM (Bioconductor) uses Levenberg-Marquardt (gradient-based).

By-product is the Hessian, which allows to compute confidence intervals.

Difference between MCMC and LM confidence intervals.

Basal rates

0

10

20

30

40

50

60

70

80

203409_at 218346_s_at 209295_at 202284_s_at 205780_at

Sensitivity

0

0.5

1

1.5

2

2.5

203409_at 218346_s_at 209295_at 202284_s_at 205780_at

Degradation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

203409_at 218346_s_at 209295_at 202284_s_at 205780_at

Transcription factor activity (sample1)

0

50

100

150

200

250

300

350

400

450

1 2 3 4 5 6 7

Importance of confidence intervals

Biological data is inherently noisy. Don’t want to assume that measurement are exact.

example: Genes with a flat profile would be a good fit to the

equation (Sj=0)

Essential to identify these situations to detect targets of the transcription factor

dX j (t)

dt= B j + S j f (t) − D j X j (t)

Parameter count reduction / identifiability

€ dXj(t)dt= Bj + Sj (α  g(t) + β) – Dj Xj(t)

= (Bj + Sj β) + α  Sj g(t) – Dj Xj(t)

=~Bj +

~Sj g(t) – Dj Xj(t)

dX j (t)

dt= B j + S j f (t) − D j X j (t)

Replace f(t) with

f (t) = αg(t) + β

Solution:Let Sp21=1 (removes “”’’ ambiguity)and f(0)=0 (removes “’’ ambiguity) parameter count is reduced by 2

Confidence intervals importance II

Solution measure one of the kinetic parameters independently, integrate that in the fitting:

Initial fitting:

Measurementerror

AlgorithmicspeedParameter

identifiability

Parameter countreduction

Confidenceintervals

Acknowledgements

Sonia Shah (Bloomsbury Centre for Bioinformatics)

Dan Brewer (Institute of Cancer Research)Crispin Miller (Patterson Institute for Cancer

Research)Daniela Tomescu (ICH)Mike Hubank (ICH)Robin Callard (ICH)Jaroslav Stark (CISBIC, Imperial College)