Chromatin Immuno-precipitation (CHIP)-chip Analysis 11/07/07.

Chromatin Immuno-precipitation (CHIP)-chip Analysis

11/07/07

Experimental Protocol

• Step 1: crosslink protein with DNA

• Step 2: sonication (break) DNA

Kim and Ren 2007


• Step 1: crosslink– fix protein with DNA

• Step 2: sonication– break DNA

• Step 3: immuno-precipitation– Pull down target

protein by specific antibody

Kim and Ren 2007


• Step 1: crosslink– fix protein with DNA

• Step 2: sonication– break DNA

• Step 3: immuno-precipitation– Pull down target protein by

specific antibody

• Step 4: hybridization– Hybridize input and pulled-

down DNA on microarray

Kim and Ren 2007

Intergenic microarray

• Array probes are PCR products of intergenic regions.

• Binding signal is represented by a single probe.

ChIP-array

• Consistently enriched in repeated ChIP-arrays are selected to be the TF binding targets

• Usually hundreds of targets, each ~1000 long

• We want to know the precise binding

(e.g. 10 bases)

TF Target

• Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region.

chromosome

Tiling arrays

Tiling Array Data

Each TF binding signal is represented by multiple probes.

Need more sophisticated statistical tools.Kim and Ren 2007

Methods

• Moving average t-test (Keles et al. 2004)

• HMM (Li et al. 2005; Yuan et al. 2005)

• Tilemap (Ji and Wong 2005)

• MAT (Johnson et al. 2006)

Keles’ method• Calculate a two-sample t-

statistic Y2

Y1

i

CHIP-signal

Input-signal

22,21

2,1

,1,2,

/ˆ/ˆ nn

YYT

ii

iini

Keles et al. 2004

Keles’ method• Calculate a two-sample t-

statistic Y2

Y1

i

CHIP-signal

Input-signal

22,21

2,1

,1,2,

/ˆ/ˆ nn

YYT

ii

iini

w

1

,*,

1 wi

ihnhni T

wT

• Moving average scan-statistic

Multiple hypothesis testing

• Multiple hypothesis testing needs to be considered to control false positive error rates.

• What is the null distribution of this statistic?

1

,*,

1 wi

ihnhni T

wT

Multiple hypothesis testing

• Assume has t-distribution• Approximate

by normal distribution.

• Alternatively can use resampling method to estimate the null distribution.

nhT ,

1

,*,

1 wi

ihnhni T

wT

Tilemap

Improvement over Keles’ method in following ways

• Use a more robust test statistic

• Estimate the null distribution without prior assumptions.

Ji and Wong 2005

Step 1: calculating a t-like test statistic

• Model:

log-intensity

Probe index Condition index Replicate index


• Model:

log-intensity

pooling data

• Two samples:

• Multiple samples:


• Want to have a robust estimate of variance.

Notation


Estimation of by variance shrinkage

Shrinkage factor

Step 2: Merging data

• Moving average

• Alternatively use Hidden Markov Model

Step 3: control FDR

Goal: To find null and signal distributions

Idea: assume a mixture modelThis is unidentifiable!

Step 3: control FDR



A clever trick: Look for

with

How to find g0 and g1

• To get g1, can we select probes with highest t-score?

• Why or why not?


• Idea: signals at neighboring probes are correlated, whereas noises are not (hopefully!)

• First select probes that have the highest t-score ti.

• Use their downstream value ti+1 to estimate g1.

• Use same trick to estimate g0.

Step 3: control FDR



A clever trick: Find

with

Additional assumption:

Step 3: Unbalanced mixture score

with

)()( 00 tgtf

is estimated by fitting

dttftg

dttftgtfth2

10

101

0)()(

)()())()(̂

False discovery rate (FDR)

Determine TF bindings sites are FDR cutoff


• Idea: signals at neighboring probes are correlated, whereas noises are not (hopefully!)

• First select probes that have the highest t-score ti.

• Use their downstream value ti+1 to estimate g1.

• Use same trick to estimate g0.

Memory problem!

Example: Analysis of a cMyc binding data

Comparison of models

Simulation results

MAT

Basic Idea:

• Baseline level correction

• Standardize probe intensity with respect to the expected baseline value

(Johnson et al. 2006)

MAT

• How to estimate the baseline values?

Estimated nucleotide effect

A C

MAT

• Standardization

binaffinity

ˆ)log(

i

iii s

mPMt

region)in values()( tTMnregionMATscore p

(X.S. Liu)

Reading List

• Keles el 2004– Developed a multiple hypothesis method for

tiling array analysis

• Ji and Wong 2005– Tilemap; improved over Keles et al.’s method

• Johnson et al. 2006– MAT: showed baseline adjustment improved

signal detection.

Chromatin Immuno-precipitation (CHIP)-chip Analysis 11/07/07.

Documents

Transcript of Chromatin Immuno-precipitation (CHIP)-chip Analysis 11/07/07.