Regulomics I: Methods to read out regulatory functions
description
Transcript of Regulomics I: Methods to read out regulatory functions
Regulomics I:Methods to read out regulatory functions
Identifying regulatory functions in genomes
Chr5: 133,876,119 – 134,876,119
Genes
Transcription
• Regulatory elements are not easily detected by sequence analysis
• Examine biochemical correlates of RE activity in cells/tissues:• Chromatin Immunoprecipitation (ChIP-seq)• DNase-seq and FAIRE• Methylated DNA immunoprecipitation (MeDIP)
Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010)
Identifying regulatory functions in genomes
1. TF binding
Biochemical indicators of regulatory function
2. Histonemodification • H3K27ac • H3K4me3
3. Chromatinmodifiers &coactivators
p300 MLL
4. DNA loopingfactors cohesin
Regulatory functions are tissue/cell type/time point-specific
From Visel et al. (2009) Nature 461:199
Identifying regulatory functions in genomes
Chr5: 133,876,119 – 134,876,119
Genes
Transcription
TF bindingHistone mods
Methods
ChIP-seq Chromatin accessibility
TFs Histone mods DNase FAIRE
From Furey (2012) Nat Rev Genet 13:840
ChIP-seq
ChIP
Input
Peak call Signal
Align reads to referenceUse peaks of mapped reads to
identify binding events
PCR
ChIP-seq is an enrichment methodRequires a statistical framework for determining the significance of enrichment
ChIP-seq ‘peaks’ are regions of enriched read density relative to an input controlInput = sonicated chromatin collected prior to immunoprecipitation
ChIP
Input
Peak call Enrichment relative to control
Calling peaks in ChIP-seq data
Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)
There are many ChIP-seq peak callers available
From Park (2009) Nat Rev Genet 10:669
Generating ChIP-seq peak profiles
Artifacts:
• Repeats• PCR duplicates
Assessing statistical significance
# of reads at a site (S)
Empirical FDR: Call peaks in input (using ChIP as control)FDR = ratio of # of peaks of given enrichment value called in input vs ChIP
Assume read distribution follows a Poisson distribution
Many sites in input data will have some reads by chance
Some sites will have many reads
From Pepke et al (2009) Nat Meth 6:S22
Assessing statistical significance
# of reads at a site (S)
From Park (2009) Nat Rev Genet 10:669
Sequencing depth matters:
ChIP-seq signal profiles vary depending on factor
Transcriptionfactors
Pol II
Histonemods
From Park (2009) Nat Rev Genet 10:669
Quantitative analysis of ChIP-seq signal profiles
ChIP-seq signal
Sign
al a
t 20,
000
boun
d sit
es
HeLaHeLa K562
Sites strongly marked in HeLa
Sites strongly marked in K562
Clustering
Sites strongly marked
in both
ChIP-seq analysis workflow
From Park (2009) Nat Rev Genet 10:669
Interpreting ChIP-seq datasets
Requires some prior knowledge• TF function• Histone modification• Potential target genes
Exploit existing annotation• Promoter locations• Known binding sites• Known histone modification maps
Example from PS1: CTCF and RAD21 (cohesin)
CTCF and cohesin co-occupy many sites
Promoters
Insulators
Enhancers
From Kagey et al (2010) Nature 467:430
CTCF: marks insulators and promotersRAD21 (cohesin): marks insulators, promoters and enhancers
Promoter Enhancers?
Limb Brain
Discovering regulatory functions specific to a biological state
Function?
Assign enhancers to genes based on proximity (not ideal)
GREAT: bejerano.stanford.edu/great/Gene ontology annotation assigned to regulatory sequences
TF motif elicitation from ChIP-seq data
CTCF
~20,000 binding sites identified by ChIP:
From Furey (2012) Nat Rev Genet 13:840
MEME suite:http://meme.nbcr.net/meme/
Enhancer-associatedhistone modification
Single TF binding events may not indicate regulatory function
• Many TFs are present at high concentrationsin the nucleus
• TF motifs are abundant in the genome
• Single TF binding events may be incidental
DNase I FAIRE
Mapping chromatin accessibility
From Furey (2012) Nat Rev Genet 13:840
DNase I hypersensitivity identifies TF binding events
From Furey (2012) Nat Rev Genet 13:840
Song et al., Genome Res 21:1757 (2011)
DNase I hypersensitivity identifies regulatory elements
DNase I hypersensitive sites
De novo TF motif discovery by DNase I hypersensitivity mapping
In human ES cells:
From Neph (2012) Nature 489:83
De novo TF motif discovery by DNase I hypersensitivity mappingAcross tissue types:
From Neph (2012) Nature 489:83
Capturing long-range regulatory interactions
From Visel et al. (2009) Nature 461:199
Sequence: Hi-C
ChIP for specific factors:ChIA-PET
Sequence
Chromosome Conformation Capture Methods
Sequence
From Kieffer-Kwon et al. (2013) Cell 155:1507
Long-range regulatory interactions mediated by specific factors:RNA PolII
Int – Intergenic or intronicPr – PromoterEx – Exonic
Long-range regulatory interactions mediated by specific factors:Cohesin
From DeMare et al. (2013) Genome Res. 23:1224
Summary
• Relevant overview papers on ChIP-seq and DNase-seq posted on class wiki
• Wednesday: Epigenetics and the histone code