Regulatory element discovery for developmental time series

18
Regulatory element discovery for developmental time series Joint work with Xuejing Li, Chris Wiggins, Valerie Reinke Christina Leslie Computational Biology Program Sloan-Kettering Institute Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org

description

Regulatory element discovery for developmental time series. Computational Biology Program Sloan-Kettering Institute Memorial Sloan-Kettering Cancer Center. Joint work with Xuejing Li, Chris Wiggins, Valerie Reinke Christina Leslie. http://cbio.mskcc.org. Regulatory networks in development. - PowerPoint PPT Presentation

Transcript of Regulatory element discovery for developmental time series

Page 1: Regulatory element discovery for developmental time series

Regulatory element discovery for developmental time series

Joint work with Xuejing Li, Chris Wiggins, Valerie Reinke

Christina Leslie

Computational Biology ProgramSloan-Kettering Institute

Memorial Sloan-Kettering Cancer Center

http://cbio.mskcc.org

Page 2: Regulatory element discovery for developmental time series

Regulatory networks in development

• Reinke lab: genome-wide expression for C. elegans developmental time series

+ germ cell/gametogenesis mutants• Problem: decipher regulatory networks

governing germline- and sex-regulated genes

Page 3: Regulatory element discovery for developmental time series

Previous work: MEDUSA in yeast

• Predict up/down expression of target genes from promoter + regulator expression

• Learns from a set of mRNA expression experiments without

clustering• Problem: high correlation

of nearby time points, many regulator profiles

Page 4: Regulatory element discovery for developmental time series

Sequence to expression profile

• Can we learn mapping from promoter sequence to full expression trajectory (with some level of statistical significance)?

• Retain some properties of MEDUSA:– No clustering of expression profiles

– Learn motifs de novo from promoters by building from k-mers

…AGCTATGCCATCGACTGCTCCA…

Page 5: Regulatory element discovery for developmental time series

Regression problem

• Idea: learn latent factors T = X W that “explain” Y

• Then regress X ≈ TPt, Y ≈ TQt

or Y ≈ BX where B WQt

X YG G

M E

motif vector (k-mer counts) for gene g

expression profile for gene g

columns wi

= weight vectors

columns of P, Q = loadings

Page 6: Regulatory element discovery for developmental time series

First step: PLS regression

• Sequentially build latent factors ti = Xwi:– Maximize covariance between factors and Y– Constrain t1, …, tK to be uncorrelated

• SIMPLS: – for i = 1, …, K

in 1D case

subject to

wi argmaxw wtX tYY tXw

argmaxwCov(Y,Xw)2

witwi 1, ti

t t j witX tXw j 0, j 1i 1

Page 7: Regulatory element discovery for developmental time series

Equivalent formulation

• Learn latent factors ti = Xwi and ui = Xci for both predictor and response variables– wi and ci chosen to maximize Cov(ti, ui)

– for i = 1, …, K

subject to

wi cimotif weight vector

expression weight vector

wi,c i argmax w,c wtX tYc

witwi c i

tc i 1,

tit t j wi

tX tXw j 0, j 1i 1

Page 8: Regulatory element discovery for developmental time series

Next steps: sparsity, graph Laplacian

• For regulatization and interpretability of weight vectors, want– sparsity in w: want most components to be 0

– smoothness in w: define graph on set of k-mers, with edge k ~ l if corresponding k-mers are close in Hamming distance

w kk

b1

w k w l 2k~ l

b2

Page 9: Regulatory element discovery for developmental time series

Preliminary results: worm time series

• Reinke data: ~9000 genes, 12 time points (3 replicates), wild type germline development

• Genes sets, from mutant expression data:– Sperm genes: high expression

in spermatogenesis– Oocyte genes: high expression

in oogenesis

• Motif matrix: filter k-mers based on expected counts

Page 10: Regulatory element discovery for developmental time series

Standard PLS

• 10-fold c.v. on held-out genes

Page 11: Regulatory element discovery for developmental time series

Regularized PLS

• 10-fold c.v. on held-out genes

Page 12: Regulatory element discovery for developmental time series

Regularized PLS

• Sperm/oocyte gene sets: largest chi-square reduction for 3rd/1st latent factor

Page 13: Regulatory element discovery for developmental time series

Interpretation of factor weights

• To infer motifs relevant for an expression pattern:– Latent factors ti = Xwi and ui = Yci for both

predictors and reponse variables

– wi and ci chosen to maximize Cov(ti,ui)

• ci gives weights over time points: interpret as expression pattern

• wi gives weights over motifs: highly weighted motifs relevant for this expression pattern

Page 14: Regulatory element discovery for developmental time series

Sperm genes

• c3 correlated with sperm gene expression, consistent with drop in chi-square

Page 15: Regulatory element discovery for developmental time series

Motif graph for sperm genes

• Top 50 k-mer graph for w3, clusters around GATAA (ELT-1) and ACGTG (bHLH)

Page 16: Regulatory element discovery for developmental time series

Oocyte genes

• Oocyte genes correlate with c1 pattern

Page 17: Regulatory element discovery for developmental time series

Oocyte motif map

• Top 50 k-mer graph for w1, log(p) vs weight

Page 18: Regulatory element discovery for developmental time series

Some related work

• Zhang et al, 2008: PCA in Y for motif discovery

• Naughton et al, 2006: algorithmic motif search using graph representation

• Beer and Tavazoie, 2004; Segal et al, 2002: sequence to expression via clustering