Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
2nd and 3rd Generation DNA Sequencers and Applications
• Roche 454 (2nd)• Illumina Solexa(2nd)• ABI SoLid (2nd)• Helicos (3rd)
Applications• De novo sequencing• Targeted resequencing• Digital Gene Expression
(DGE)• RNA-seq• ChIP-seq
Sequencing Platforms
Why ChIP-seq?
• Protein-DNA interactions• Chromatin States• Transciptional regulation
ChIP experimentIn Nutshell
•Protein cross-linked to DNA in vivo by treating cells with formaldehyde
•Shear chromatin (sonication)
•IP with specific antibody
•Reverse cross-links, purify DNA
•PCR amplification*
•Identify sequences
•Genome-wide association map
*-unless using a single molecule sequencer
History: From ChIP-chip to ChIP-seq
ChIP-chip (c.2000)
• Resolution (30-100bp)
• Coverage limited by sequences on the array
• Cross-hybridization between probes and non-specific targets creates background noise
ChIP-seq experiment (2007-present)
Sample Prep: Solexa vs. Helicos
ChIP-seq Materialsample preps with in-house protocols
Helicos sample prep
Normal QC and ChIP stepsInput material 3ng-9ngRNAseA/ProteinaseK treatment (2-3h)Purification (phenol/precipitation) (1.5h) Tailing (1.5h)Termination (1.5h)
Amount of library sequenced approx. 1/3
Unique Tags after analysis approx >12M(based on our limited ERaChIP-seq libraries)
**Slide borrowed from Thomas Westerling
Solexa sample prep
Normal QC and ChIP stepsInput material typically >30ngEnd-Repair (1h)Purification (phenol/precipitation) (1.5h) A-overhang (1h)Purification (phenol/precipitation) (1.5h)Adapter oligo ligation (30min)Purification (phenol/precipitation) (1.5h)Size-selection (30min by E-gel)Precipitation (1h)Amplification PCR (2h) (12-18 cycles)Size-selection (30min by E-gel)Precipitation (1h)Diagnostic gel (30min) QC by direct qPCR (4hours)
Amount of library sequenced approx. 1/10
Unique Tags after analysis > 3M (based on our limited ERaChIP-seq libraries)
ChIP-chip ChIP-seq (Solexa) ChIP-seq ( Helicos)
Max resolution
Array-specific, 30-100bp 1 nt 1 nt
Coverage Limited by sequences on array and non-repetitive
Limited by alignability of reads to genome; increases
with read length; many repetitive regions can be
covered
Limited by alignability of reads to genome; increases with read
length; many repetitive regions can be covered
Cost$400-800 per array
(multiple arrays needed for large genomes
$1,000-2,000 per lane $500 at MBCF
Source of platform
noise
Cross-hybridization between probes and non-
specific targetsSome GC bias due to PCR single molecule (no PCR)
Required amount of ChIP DNA
Micrograms 10-50ng9-12ng standard, have done
1.5ng at MBCF
Dyanmic Range
lower detection limit;saturation at high
signalNo limit No limit
Amplification (PCR)
more required less (12-17cycles None
Multi-plexing not an option Yes Sure
Helicos vs Solexa vs ChIP2
2. Helicos
1. Solexa
3. ChIP2
4700
37445293
433 2541
2900
1661
Solexa data (red):Unique tags 4MPeaks called 10 500A
Negative peaks 20 000B
ChIP2C data (green):Array technology, no tagsPeaks called 12 500FDR 20D
Helicos data (blue):Unique tags 13MPeaks called 12 500Negative peaks 1000E
A) More inclusive (10%) ELAND mapping used (compare to Bowtie in library table)B) MACS performs a sample swap between ChIP and Input (chromatin) samples and calculates a local λ-value to determine level of background peaks called in control data. This gives a FDR for each positive peak. Due to the nature of deep sequencing combined with PCR this parameter is in some sample extremely high and not entirely trustworthy. C) ChIP2 data published in Carroll et al. Nat Genet. 2006 Nov;38(11):1289-97. D) FDR values of ChIP2 are calculated differently from FDRs by MACS and are not directly comparable. E) Negative peaks and thus local FDR values are at first glance more reliable in Helicos sequencing, in part at least due to the lack of amplification the removes scientist introduced artifacts and reduced complexity of sequenced library.
ChIP-seq Analysis
ChIP-seq peaks
• Only 5’ end of fragments are sequenced• Tags from both + and - strand aligned to reference genome
+/- tag mapping
Types of Analysis
1. Binding site identification and discovery of binding sequence motifs (Non-histone ChIP)
2. Epigenomic gene regulation and chromatin structure (Histone ChIP)
Binding Site DetectionBut where does the meat go?
Control: Input DNAMeasuring enrichment
Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIPSeq experiments relative to controls. Nature Biotech. 27, 66-75 (2009)
Input DNA: portion of DNA sample removed before IP
Why we need to sequence Input DNA
• Input DNA does not demostrate “flat” or random (Poisson) distribution
• Open chromatin regions tend to be fragmented more easily during shearing
• Amplification bias
• Mapping artifacts-increased coverage of more “mappable” regions (which also tend to be promotor regions) and repetitive regions due inaccuracies in number of copies in assembled genome
Depth of SequencingAre we there yet?
ERa E2 Helicos MACS peaks 12500(tag30 mfold30) – sequence depth determination by subsampling
FoldChange Bins 0-20 20-40 40-60 60-80 80-100 100-120 120-140 140-160 160-180 180-200 200-220Number of total 7687 2841 935 429 217 140 85 49 23 7 4Peaks in each bin
% peaks detectedof total peaks/bin
% of tags sampled
Statistical Significance
Helicos Input
HelicosChIP
SolexaChIP
Solexa Input
MACS shifted tag-count graph –i.e. Peak shapes
Helicos Input
HelicosChIP
SolexaChIP
Solexa Input
MACS shifted tag-count graph –i.e. Peak shapes
Helicos Input
HelicosChIP
SolexaChIP
Solexa Input
MACS shifted tag-count graph –i.e. Peak shapes
Top Related