Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape...
-
Upload
douglas-quinn -
Category
Documents
-
view
221 -
download
0
Transcript of Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape...
Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape (part 2)
Héctor Corrada BravoCMSC858P Spring 2012
(many slides courtesy of Rafael Irizarry)
How do we measure DNA methylation?
Microarray Data
One question…
• Where do we measure?
• At least 7 arrays are needed to measure entire genome
• CpG are depleated
• Remaining CpGs cluster
CpG Islands
But variation seen outside
McRBC
No Methylation
Cuts at AmCG or GmCG Input
McRBC
Methylation
McRBC after GEL
Methylation
McRBC after GEL
Methylation
Now unmethylated
No Methylation
McRBC after Gel
No Methylation
Gene Expression Normalization does not work well here
We use control probes
There are also waves
Smoothing
McRBC on tiling two channel array
We smooth
Proportion of neighboring CpG also methylated/not methylated
True signal (simulated)
Observed data
Observed data and true signal
What is methylated (above 50%)?
Naïve approach
Many false positives (FP)
Smooth
No FP, but one false negative
Smooth less? No FN, lots of FP
We prefer this!
CHARMDMR for three tissues (five replicates)
Irizarry et al, Nature Genetics 2009
Some findings
[Irizarry et al., 2009, Nat. Genetics]
Tissue easily distinguished
Cancer DMR
Many Regions like thisNote: hypo and hyper methylation
Both hyper and hypo methylated
Cancer and Tissue DMRs coincide
DMR enriched in Shores
Still affects expression
T-DMRs
Still affects expression
C-DMRs
USING SEQUENCING (BS-SEQ)
TTCGATTACGA
AAGCTAATGCT
CH3
CH3
TTCGATTACGA
AAGCTAATGCT
CH3
CH3
Liver Brain
TTCGATTACGA
AAGCTAATGCT
CH3
CH3
TTCGATTACGA
AAGCTAATGCT
CH3
CH3
TTCGATTACGA
AAGCTAATGCT
CH3
CH3
TTCGATTACGA
AAGCTAATGCT
CH3
TTCGATTACGA
AAGCTAATGCT
CH3
CH3TTCGATTACGA
AAGCTAATGCT
CH3
CH3
TTCGATTACGA
AAGCTAATGCT
CH3
CH3
85% Methylationchr3:44,031,616-44,031,626
Bisulfite Treatment
Bisulfite Treatment
GGGGAGCAGCATGGAGGAGCCTTCGGCTGACT
GGGGAGCAGTATGGAGGAGTTTTCGGTTGATT
BS-seq
GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG TATATCGTAGTATTTT TATATCGTAGTATTTG NATATCGTAGTATNTG TTTTATATCGCAGTAT ATATTTTATGTCGTA ATATTTTATCTCGTA ATATTTTATGTCGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC
GTTCAATATT
Coverage: 13Methylation Evidence: 13Methylation Percentage: 100%
BS-seq
GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG TATATTGTAGTATTTT TATATCGTAGTATTTG NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTCGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC
GTTCAATATT
Coverage: 13Methylation Evidence: 9Methylation Percentage: 69%
BS-seq
GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTTGTAGTATCTGTC TATGTTGTAGTATTTG TATATTGTAGTATTTT TATATTGTAGTATTTG NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTTGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC
GTTCAATATT
Coverage: 13Methylation Evidence: 4Methylation Percentage: 31%
BS-seq
• Alignment is much trickier:– Naïve strategy: do nothing, hope not many CpG in a
single read– Smarter strategy: “bisulfite convert” reference: turn all
Cs to Ts• Also needs to be done on reverse complement reference and
reads
– Smartest strategy: be unbiased and try all combinations of methylated/un-methylated CpGs in each read
• Computationally expensive (see Hansen et al, 2011, for a strategy)
BS-seq
• There are similarities to SNP calling (we’ll see this in a couple of weeks)
• EXCEPT: we want to measure percentages– Use a binomial model to estimate p, percentage of
methylation– Allow for sequencing errors, coverage differences,
etc.
Measuring DNA Methylation
• Estimating percentages• Use “local-likelihood”
method– Based on loess
(Plot courtesy of Kasper Hansen)
BS-seq
Lister et al. 2009, Nature
Gene Expression Regulation: DNA methylation in promoter regions
Lister et al. 2009, Nature
DNA methylation patterns within genomic regions
Lister et al. 2009
Putting it together
What were we after?
• The epigenetic progenitor origin of human cancer• [Feinberg, et al., Nature Reviews Genetics, 2006]• Stochastic epigenetic variation as driving force of
disease• [Feinberg & Irizarry, PNAS, 2009]• Phenotypic variation, perhaps epigenetically mediated,
increases disease susceptibility• Increased epigenetic and gene expression variability of
specific genes/regions is a defining characteristic of cancer
What did we do?
• Custom Illumina methylation microarray• Confirmed increased epigenetic variability in
specific regions across five cancer types
What did we do?
• Custom Illumina methylation microarray• Confirmed increased epigenetic variability in
specific regions across five cancer types
What did we do?• Custom Illumina methylation microarray
• Confirmed increased epigenetic variability in specific regions across five cancer
types
• Confirmed same sites are involved in tissue differentiation
What did we do?• Custom Illumina methylation microarray
• Whole genome sequencing of bisulfite treated DNA– Found large blocks of hypo-methylation (sometimes Mbps long) in
colon cancer
What did we do?• Custom Illumina methylation microarray
• Whole genome sequencing of bisulfite treated DNA– Found large blocks of hypo-methylation (sometimes Mbps long) in
colon cancer– These regions coincide with hyper-variable regions across cancer types
What did we do?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis
Gene Expression Data
Gene Expression Data
When using multiple microarray experiments, proper normalization is key[McCall, et al., Biostatistics 2010]
Normalization is key
• fRMA: a single-chip normalization procedure• GNUSE: a single-chip quality metric• Barcode: a single-chip common-scale
measurement
What did we do?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis
– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks
[Corrada Bravo, et al., under review]
What are we doing next?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis
– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks
Bigger gene expression study
• 7,741 HGU133plus2 samples• 598 normal tissue samples, 4,886 tumor
samples• 176 different tissue types• 175 different GEO studies
Bigger gene expression study
[Corrada Bravo, et al., under review]
What are we doing next?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis
– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks
– Tissue-specific genes have hyper-variable gene expression across cancer types
[Corrada Bravo, et al., under review]
[Corrada Bravo, et al., under review]
[Corrada Bravo, et al., under review]