High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford...
-
Upload
madison-greer -
Category
Documents
-
view
212 -
download
0
Transcript of High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford...
High-resolution computational models of genome binding events
Yuan (Alan) Qi• Joint work with Gifford and
Young labs
Dana-Farber Cancer Institute Jan 2007
ChIP-chip Experiments
ChIP-chip data:
Encode valuable information about protein-DNA binding events.
Goal:
Decode accurate binding information from the noisy data.
Challenges:
• Noise
•Joint influence of multiple binding events
Joint Binding Deconvolution
JBD: generative probabilistic graphical model.
Data Likelihood
Prior Distributions:
Hyper Prior Distributions:
Shear Distribution
(a) The distribution of DNA fragmentsizes produced in the ChIP protocol were experimentally measured and statistically modeled.
(b) An influence function is derived from the measured fragment size distribution.
Approximate Bayesian InferenceExact Bayesian posterior of binding events:
Non-conjugate models, thousands of variables -> Intractable calculations of the exact posterior distribution!
Message passing algorithm (Expectation propagation):
Where and
EP iteratively refines the factor approximations (i.e., messages) to improve the posterior approximation.
EP in a Nutshell
• Approximate a probability distribution by simpler parametric terms:
• Each approximation term lives in an exponential family (e.g., Gaussian or Gamma distributions).
( ) ( )
( ) ( )
ii
ii
p f
q f
w | y w
w w
)(~wif
EP in a NutshellThree key steps:• Deletion: Approximate the “leave-one-out”
posterior distribution for the ith factor.
• Minimization: Minimize the following KL divergence by moment matching.
• Inclusion:
ij
jii ffqq )(
~)(
~/)()(\ wwww
))()(~||)()((minarg \\
)(~
wwwww
ii
ii
f
qfqfKL
i
)()(~
)( \ www ii qfq
Spatial resolution comparison between JBD and other methods
• The average distance of JBD’s Gcn4 binding predictions to motif sites is smaller than for other methods, and JDB identifies more known Gcn4 targets.
JBD better resolves proximal binding events than do other methods. Shown here is performance of the JBD, MPeak and Ratio methods on 200 simulated DNA regions each containing two binding events.
Using binding posterior to guide motif discovery
Approach: • Using binding posterior probabilities derived
from the ChIP-chip data to weight sequence regions differently for motif discovery.
Results: • Finding Mig2 motif while a standard motif
discovery algorithm (e.g., MEME) failed. • Note that the correct motif for Mig2 was not
recovered when using the Ratio method to analyze the ChIP-chip data.