Download - Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)

Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)

2nd and 3rd Generation DNA Sequencers and Applications

• Roche 454 (2nd)• Illumina Solexa(2nd)• ABI SoLid (2nd)• Helicos (3rd)

Applications• De novo sequencing• Targeted resequencing• Digital Gene Expression

(DGE)• RNA-seq• ChIP-seq

Sequencing Platforms

Why ChIP-seq?

• Protein-DNA interactions• Chromatin States• Transciptional regulation

ChIP experimentIn Nutshell

•Protein cross-linked to DNA in vivo by treating cells with formaldehyde

•Shear chromatin (sonication)

•IP with specific antibody

•Reverse cross-links, purify DNA

•PCR amplification*

•Identify sequences

•Genome-wide association map

*-unless using a single molecule sequencer

History: From ChIP-chip to ChIP-seq

ChIP-chip (c.2000)

• Resolution (30-100bp)

• Coverage limited by sequences on the array

• Cross-hybridization between probes and non-specific targets creates background noise

ChIP-seq experiment (2007-present)

Sample Prep: Solexa vs. Helicos

ChIP-seq Materialsample preps with in-house protocols

Helicos sample prep

Normal QC and ChIP stepsInput material 3ng-9ngRNAseA/ProteinaseK treatment (2-3h)Purification (phenol/precipitation) (1.5h) Tailing (1.5h)Termination (1.5h)

Amount of library sequenced approx. 1/3

Unique Tags after analysis approx >12M(based on our limited ERaChIP-seq libraries)

**Slide borrowed from Thomas Westerling

Solexa sample prep

Normal QC and ChIP stepsInput material typically >30ngEnd-Repair (1h)Purification (phenol/precipitation) (1.5h) A-overhang (1h)Purification (phenol/precipitation) (1.5h)Adapter oligo ligation (30min)Purification (phenol/precipitation) (1.5h)Size-selection (30min by E-gel)Precipitation (1h)Amplification PCR (2h) (12-18 cycles)Size-selection (30min by E-gel)Precipitation (1h)Diagnostic gel (30min) QC by direct qPCR (4hours)

Amount of library sequenced approx. 1/10

Unique Tags after analysis > 3M (based on our limited ERaChIP-seq libraries)

ChIP-chip ChIP-seq (Solexa) ChIP-seq ( Helicos)

Max resolution

Array-specific, 30-100bp 1 nt 1 nt

Coverage Limited by sequences on array and non-repetitive

Limited by alignability of reads to genome; increases

with read length; many repetitive regions can be

covered

Limited by alignability of reads to genome; increases with read

length; many repetitive regions can be covered

Cost$400-800 per array

(multiple arrays needed for large genomes

$1,000-2,000 per lane $500 at MBCF

Source of platform

noise

Cross-hybridization between probes and non-

specific targetsSome GC bias due to PCR single molecule (no PCR)

Required amount of ChIP DNA

Micrograms 10-50ng9-12ng standard, have done

1.5ng at MBCF

Dyanmic Range

lower detection limit;saturation at high

signalNo limit No limit

Amplification (PCR)

more required less (12-17cycles None

Multi-plexing not an option Yes Sure

Helicos vs Solexa vs ChIP2

2. Helicos

1. Solexa

3. ChIP2

4700

37445293

433 2541

2900

1661

Solexa data (red):Unique tags 4MPeaks called 10 500A

Negative peaks 20 000B

ChIP2C data (green):Array technology, no tagsPeaks called 12 500FDR 20D

Helicos data (blue):Unique tags 13MPeaks called 12 500Negative peaks 1000E

A) More inclusive (10%) ELAND mapping used (compare to Bowtie in library table)B) MACS performs a sample swap between ChIP and Input (chromatin) samples and calculates a local λ-value to determine level of background peaks called in control data. This gives a FDR for each positive peak. Due to the nature of deep sequencing combined with PCR this parameter is in some sample extremely high and not entirely trustworthy. C) ChIP2 data published in Carroll et al. Nat Genet. 2006 Nov;38(11):1289-97. D) FDR values of ChIP2 are calculated differently from FDRs by MACS and are not directly comparable. E) Negative peaks and thus local FDR values are at first glance more reliable in Helicos sequencing, in part at least due to the lack of amplification the removes scientist introduced artifacts and reduced complexity of sequenced library.

ChIP-seq Analysis

ChIP-seq peaks

• Only 5’ end of fragments are sequenced• Tags from both + and - strand aligned to reference genome

+/- tag mapping

Types of Analysis

1. Binding site identification and discovery of binding sequence motifs (Non-histone ChIP)

2. Epigenomic gene regulation and chromatin structure (Histone ChIP)

Binding Site DetectionBut where does the meat go?

Control: Input DNAMeasuring enrichment

Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIPSeq experiments relative to controls. Nature Biotech. 27, 66-75 (2009)

Input DNA: portion of DNA sample removed before IP

Why we need to sequence Input DNA

• Input DNA does not demostrate “flat” or random (Poisson) distribution

• Open chromatin regions tend to be fragmented more easily during shearing

• Amplification bias

• Mapping artifacts-increased coverage of more “mappable” regions (which also tend to be promotor regions) and repetitive regions due inaccuracies in number of copies in assembled genome

Depth of SequencingAre we there yet?

ERa E2 Helicos MACS peaks 12500(tag30 mfold30) – sequence depth determination by subsampling

FoldChange Bins 0-20 20-40 40-60 60-80 80-100 100-120 120-140 140-160 160-180 180-200 200-220Number of total 7687 2841 935 429 217 140 85 49 23 7 4Peaks in each bin

% peaks detectedof total peaks/bin

% of tags sampled

Statistical Significance

Helicos Input

HelicosChIP

SolexaChIP

Solexa Input

MACS shifted tag-count graph –i.e. Peak shapes