ChIP seq - Departments

35
ChIPseq

Transcript of ChIP seq - Departments

Page 1: ChIP seq - Departments

ChIP‐seq

Page 2: ChIP seq - Departments

ChIP SeqChIP‐Seq

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 3: ChIP seq - Departments

ChIP Seq AnalysisChIP‐Seq Analysis

Alignment

Peak Detection

Annotation Visualization

Sequence Analysis

Motif Analysis

Page 4: ChIP seq - Departments

AlignmentAlignment

• ELAND

• BowtieBowtie

• SOAP

• SeqMap

• …

Page 5: ChIP seq - Departments

Peak detectionPeak detection

i d k• FindPeaks• CHiPSeqq• BS‐Seq• SISSRs• SISSRs• QuEST• MACS• CisGenomeCisGenome• …

Page 6: ChIP seq - Departments

Two common designsTwo common designs

• One sample experiment

contains only a ChIP’d samplecontains only a ChIP d sample

• Two sample experiment

contains a ChIP’d sample and a negativecontains a ChIP d sample and a negative control sample

Page 7: ChIP seq - Departments

One sample analysisOne sample analysisA simple way is the sliding window method

Poisson background model is commonly used to estimate error rateki ~ Poisson(λ0)

ki

Or people use Monte Carlo simulations

Both are based on the assumption that read sampling rate is a constant  p p gacross the genome.

Ji et al. Nat Biotechnol, 26: 1293-1300. 2008

Page 8: ChIP seq - Departments

The constant rate assumption does not hold!The constant rate assumption does not hold!

Negative binomial model fits the data better!ki | λi ~ Poisson(λi)ki | λi   Poisson(λi)λi ~ Gamma(α, β)

Marginally,ki ~ NegBinom(α, β)

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 9: ChIP seq - Departments

FDR estimation based on Poisson and negative binomial model

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 10: ChIP seq - Departments

Read direction provides extra informationRead direction provides extra information

Page 11: ChIP seq - Departments

CisGenome procedureCisGenome procedure

Alignment

Exploration

FDR computation

Negative binomial model

Peak DetectionPeak Detection

Post Use read direction to refine Post Processing peak boundary and filter 

low quality peaks

Page 12: ChIP seq - Departments

Two sample analysisTwo sample analysisReason: read sample rates at the same genomic locus are correlated across different 

lsamples. 

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 13: ChIP seq - Departments

CisGenome two sample analysisCisGenome two sample analysis

Ali tAlignment

k1i

k2i

Exploration

ni =k1i + k2ik1i | ni ~ Binom(ni , p0)

FDR computation

Peak Detection

Post Processingg

Page 14: ChIP seq - Departments

A comparative study of ChIP chip and ChIP seqA comparative study of ChIP‐chip and ChIP‐seq

• NRSF ChIP‐chip

2 ChIP + 2 Mock IP in Jurkat cells, profiled using Affymetrix Human Tiling 2.0R arrays. 

• NRSF ChIP‐seq

ChIP + Negative Control in Jurkat cells sequenced with theChIP + Negative Control in Jurkat cells, sequenced with the next generation sequencer made by Illumina/Solexa.

Page 15: ChIP seq - Departments

IntersectionIntersection

Before post‐processing After post‐processing

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 16: ChIP seq - Departments

Signal correlationSignal correlation

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 17: ChIP seq - Departments

Visual comparisonVisual comparison

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 18: ChIP seq - Departments

Comparison of peak detection resultsComparison of peak detection results

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 19: ChIP seq - Departments

Are array specific peaks noise or signal?Are array specific peaks noise or signal?

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 20: ChIP seq - Departments

Effects of read number in ChIP seqEffects of read number in ChIP‐seq

Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008

Page 21: ChIP seq - Departments

Motif Analysis

Page 22: ChIP seq - Departments

Sequence motif – a pattern of nucleotide or amino dacid sequences

TF

DNA motif:

GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA

TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA

CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA

TF

TF

123456789

TGGGTGGTC

TGGGTGGTA

TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG

AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC

TF

TF

TGGGAGGTC

TGGGTGGTG

TGAGTGGTC

TGGGTGGTCATCAGAATGGGTGGTCCATATATCCCAAAGAAGAGGGTAG

TF TGGGTGGTC

Transcription Factor Binding Sites (TFBS)

Protein motif:

Page 23: ChIP seq - Departments

Motif representationMotif representation

Page 24: ChIP seq - Departments

Consensus sequenceConsensus sequence

Example: CACSTGExample:  CACSTG

Page 25: ChIP seq - Departments

Sequence Logoq gSchneider & Stephens, Nucleic Acids Res. 18:6097‐6100 (1990)

Entropy (Shannon) – a measurement of uncertainty

The amount of uncertainty reduced by observing sequences is the amount of information (or information content) we obtained:

This is the height of each position in the logo plot.

Height of each nucleotide is proportional to its frequency 

Page 26: ChIP seq - Departments

Two questions in motif analysisTwo questions in motif analysis

• Known motif mapping

Finding occurrences of a motif in nucleotide or amino acid sequences

• De novomotif discovery

Finding motifs that are previously unknown

Page 27: ChIP seq - Departments

Known motif mappingKnown motif mapping

• Consensus mapping

STEP 1: provide a motif (e.g. CACSTG = CAC[C,G]TG)

STEP 2: specify number of mismatches allowed (e.g. <=1)

STEP 3 thSTEP 3: scan the sequence

CGCCGGGACCAGATCAACGCCGAGATCCGGCACATGAAGGAGCTCGCCGGG CC G C CGCCG G CCGGC C G GG GC

m=3, no                                     m=1, yes

A useful tool: CisGenome (http://www.biostat.jhsph.edu/~hji/cisgenome)

Page 28: ChIP seq - Departments

Known motif mappingKnown motif mapping

• Motif matrix mapping (CisGenome)Motif matrix mapping (CisGenome)STEP 1: provide a motif and background model

STEP 2: specify a likelihood ratio cutoff (e.g. LR>=500)p y ( g )

STEP 3: scan the sequence

θ0ΘMotif:Background: 0

A C G TA .3 .2 .2 .3C .2 .3 .3 .2G .2 .3 .3 .2T .3 .2 .2 .3

1 2 3 4 5 6 7 8 9A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17T 1 00 0 00 0 00 0 00 0 83 0 00 0 00 1 00 0 00

GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA

LR>500 yes LR<500 no

T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00

LR>500, yes LR<500, no

• Another tool for matrix mappingMAST (http://meme.sdsc.edu/meme/mast‐intro.html)

Page 29: ChIP seq - Departments

De novomotif discoveryDe novomotif discovery

• Two major class of methods:

1. Word enumeration

2. Matrix updating

Page 30: ChIP seq - Departments

Word enumerationWord enumeration

STEP 1: enumerate possible words;

STEP 2: count word occurrences;

STEP 3: compare observed word count with random expectation.

Example: Sinha & Tompa, Nucleic Acids Res. 30: 5549‐5560 (2002)

Page 31: ChIP seq - Departments

Matrix updatingMatrix updating

• CONSENSUS (Stormo & Hartzell, PNAS, 86: 1183‐1187, 1990)

STEP 1: use all k‐mers in the first sequence as seeds;

STEP 2: find matches (often use best matches) of each seed in the second sequence;

STEP 3: update seed matrices, exclude matrices with low informationSTEP 3: update seed matrices, exclude matrices with low information content;

STEP 4: repeat step 2 and 3 for all sequences. 

Page 32: ChIP seq - Departments

Motif discovery – a mixture model methodMotif discovery – a mixture model method

A C G TA C G T

A .3 .2 .2 .3C .2 .3 .3 .2G .2 .3 .3 .2T .3 .2 .2 .3

1 2 3 4 5 6 7 8 9A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00

θ0 Θ, W

S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA

Motif:Background:

q = [q0,q1]q0 q1

S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA

A: 000000000000001000000000000000000000000001000000000000000000000000000000

)()|()|( qWΘθqWΘASθSqWΘA 00 πff ∝

EM:

),,(),,,|,(),|,,,( qWΘθqWΘASθSqWΘA 00 πff ∝

Inference by iterative estimation/sampling

Lawrence and Reilly (1990)

Bailey and Elkan (1994), etc.

Gibbs Sampler:Θ,W,q A

Lawrence et al. (1993)

Liu (1994), Liu et al. (1995), etc.

Page 33: ChIP seq - Departments

Ci l t d l diCis-regulatory module discovery(Zhou and Wong, PNAS 2004)

• Module structure: consider co-localization of motif sites.

0θ 1Θ KΘL

⎥⎥⎤

⎢⎢⎡

25.025.0

L⎥⎥⎥

⎦⎢⎢⎢

⎣ 25.025.0

Motif 1 Motif 2 Motif 3

Hi hi l Mi d li B M

0q1q

Kq

Hierarchical Mixture modeling

K: # of motifs

B M

r−1 r

SS

Page 34: ChIP seq - Departments

Phylogenetic FootprintingPhylogenetic Footprinting

For example, exons are conserved due to the selection pressure. Introns and intergenic regions are less likely to be conservedintergenic regions are less likely to be conserved.

Page 35: ChIP seq - Departments

Phylogenetic footprinting & motif discoveryPhylogenetic footprinting & motif discovery

• Evolutionary model based approach

EMnEM (Moses et al. 2004)EMnEM (Moses et al. 2004)

PhyME (Sinha et al. 2004)

PhyloGibbs (Siddharthan et al. 2005)

Tree Sampler (Li and Wong 2005)Tree Sampler (Li and Wong, 2005)