High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford...

15
High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford and Young labs Dana-Farber Cancer Institute Jan 2007

Transcript of High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford...

High-resolution computational models of genome binding events

Yuan (Alan) Qi• Joint work with Gifford and

Young labs

Dana-Farber Cancer Institute Jan 2007

ChIP-chip Experiments

ChIP-chip data:

Encode valuable information about protein-DNA binding events.

Goal:

Decode accurate binding information from the noisy data.

Challenges:

• Noise

•Joint influence of multiple binding events

Joint Binding Deconvolution

JBD: generative probabilistic graphical model.

Data Likelihood

Prior Distributions:

Hyper Prior Distributions:

Shear Distribution

(a) The distribution of DNA fragmentsizes produced in the ChIP protocol were experimentally measured and statistically modeled.

(b) An influence function is derived from the measured fragment size distribution.

Approximate Bayesian InferenceExact Bayesian posterior of binding events:

Non-conjugate models, thousands of variables -> Intractable calculations of the exact posterior distribution!

Message passing algorithm (Expectation propagation):

Where and

EP iteratively refines the factor approximations (i.e., messages) to improve the posterior approximation.

EP in a Nutshell

• Approximate a probability distribution by simpler parametric terms:

• Each approximation term lives in an exponential family (e.g., Gaussian or Gamma distributions).

( ) ( )

( ) ( )

ii

ii

p f

q f

w | y w

w w

)(~wif

EP in a NutshellThree key steps:• Deletion: Approximate the “leave-one-out”

posterior distribution for the ith factor.

• Minimization: Minimize the following KL divergence by moment matching.

• Inclusion:

ij

jii ffqq )(

~)(

~/)()(\ wwww

))()(~||)()((minarg \\

)(~

wwwww

ii

ii

f

qfqfKL

i

)()(~

)( \ www ii qfq

Results

Spatial resolution comparison between JBD and other methods

• The average distance of JBD’s Gcn4 binding predictions to motif sites is smaller than for other methods, and JDB identifies more known Gcn4 targets.

JBD better resolves proximal binding events than do other methods. Shown here is performance of the JBD, MPeak and Ratio methods on 200 simulated DNA regions each containing two binding events.

Using binding posterior to guide motif discovery

Approach: • Using binding posterior probabilities derived

from the ChIP-chip data to weight sequence regions differently for motif discovery.

Results: • Finding Mig2 motif while a standard motif

discovery algorithm (e.g., MEME) failed. • Note that the correct motif for Mig2 was not

recovered when using the Ratio method to analyze the ChIP-chip data.

Positional priors for motif discovery improve robustness to false input DNA sequence regions.

Questions?