Chromatin Immuno-precipitation (CHIP)-chip Analysis 11/07/07.
-
Upload
jessie-pierce -
Category
Documents
-
view
220 -
download
1
Transcript of Chromatin Immuno-precipitation (CHIP)-chip Analysis 11/07/07.
Chromatin Immuno-precipitation (CHIP)-chip Analysis
11/07/07
Experimental Protocol
• Step 1: crosslink protein with DNA
• Step 2: sonication (break) DNA
Kim and Ren 2007
Experimental Protocol
• Step 1: crosslink– fix protein with DNA
• Step 2: sonication– break DNA
• Step 3: immuno-precipitation– Pull down target
protein by specific antibody
Kim and Ren 2007
Experimental Protocol
• Step 1: crosslink– fix protein with DNA
• Step 2: sonication– break DNA
• Step 3: immuno-precipitation– Pull down target protein by
specific antibody
• Step 4: hybridization– Hybridize input and pulled-
down DNA on microarray
Kim and Ren 2007
Intergenic microarray
• Array probes are PCR products of intergenic regions.
• Binding signal is represented by a single probe.
ChIP-array
• Consistently enriched in repeated ChIP-arrays are selected to be the TF binding targets
• Usually hundreds of targets, each ~1000 long
• We want to know the precise binding
(e.g. 10 bases)
TF Target
• Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region.
chromosome
Tiling arrays
Tiling Array Data
Each TF binding signal is represented by multiple probes.
Need more sophisticated statistical tools.Kim and Ren 2007
Methods
• Moving average t-test (Keles et al. 2004)
• HMM (Li et al. 2005; Yuan et al. 2005)
• Tilemap (Ji and Wong 2005)
• MAT (Johnson et al. 2006)
Keles’ method• Calculate a two-sample t-
statistic Y2
Y1
i
CHIP-signal
Input-signal
22,21
2,1
,1,2,
/ˆ/ˆ nn
YYT
ii
iini
Keles et al. 2004
Keles’ method• Calculate a two-sample t-
statistic Y2
Y1
i
CHIP-signal
Input-signal
22,21
2,1
,1,2,
/ˆ/ˆ nn
YYT
ii
iini
w
1
,*,
1 wi
ihnhni T
wT
• Moving average scan-statistic
Multiple hypothesis testing
• Multiple hypothesis testing needs to be considered to control false positive error rates.
• What is the null distribution of this statistic?
1
,*,
1 wi
ihnhni T
wT
Multiple hypothesis testing
• Assume has t-distribution• Approximate
by normal distribution.
• Alternatively can use resampling method to estimate the null distribution.
nhT ,
1
,*,
1 wi
ihnhni T
wT
Tilemap
Improvement over Keles’ method in following ways
• Use a more robust test statistic
• Estimate the null distribution without prior assumptions.
Ji and Wong 2005
Step 1: calculating a t-like test statistic
• Model:
log-intensity
Probe index Condition index Replicate index
Step 1: calculating a t-like test statistic
• Model:
log-intensity
pooling data
• Two samples:
• Multiple samples:
Step 1: calculating a t-like test statistic
• Want to have a robust estimate of variance.
Notation
Step 1: calculating a t-like test statistic
Estimation of by variance shrinkage
Shrinkage factor
Step 2: Merging data
• Moving average
• Alternatively use Hidden Markov Model
Step 3: control FDR
Goal: To find null and signal distributions
Idea: assume a mixture modelThis is unidentifiable!
Step 3: control FDR
Goal: To find null and signal distributions
Idea: assume a mixture modelThis is unidentifiable!
A clever trick: Look for
with
How to find g0 and g1
• To get g1, can we select probes with highest t-score?
• Why or why not?
How to find g0 and g1
• Idea: signals at neighboring probes are correlated, whereas noises are not (hopefully!)
• First select probes that have the highest t-score ti.
• Use their downstream value ti+1 to estimate g1.
• Use same trick to estimate g0.
Step 3: control FDR
Goal: To find null and signal distributions
Idea: assume a mixture modelThis is unidentifiable!
A clever trick: Find
with
Additional assumption:
Step 3: control FDR
Goal: To find null and signal distributions
Idea: assume a mixture modelThis is unidentifiable!
A clever trick: Find
with
Additional assumption:
Step 3: Unbalanced mixture score
with
)()( 00 tgtf
is estimated by fitting
dttftg
dttftgtfth2
10
101
0)()(
)()())()(̂
False discovery rate (FDR)
Determine TF bindings sites are FDR cutoff
How to find g0 and g1
• Idea: signals at neighboring probes are correlated, whereas noises are not (hopefully!)
• First select probes that have the highest t-score ti.
• Use their downstream value ti+1 to estimate g1.
• Use same trick to estimate g0.
Memory problem!
Example: Analysis of a cMyc binding data
Comparison of models
Simulation results
MAT
Basic Idea:
• Baseline level correction
• Standardize probe intensity with respect to the expected baseline value
(Johnson et al. 2006)
MAT
• How to estimate the baseline values?
Estimated nucleotide effect
A C
MAT
• Standardization
binaffinity
ˆ)log(
i
iii s
mPMt
region)in values()( tTMnregionMATscore p
(X.S. Liu)
Reading List
• Keles el 2004– Developed a multiple hypothesis method for
tiling array analysis
• Ji and Wong 2005– Tilemap; improved over Keles et al.’s method
• Johnson et al. 2006– MAT: showed baseline adjustment improved
signal detection.