Gavery Fish546
-
Upload
sr320 -
Category
Technology
-
view
573 -
download
0
description
Transcript of Gavery Fish546
Whole Genome Bisulfite Sequencing
(feasibility trial)
FISH 546 Mackenzie Gavery
Introduction QUESTION:
is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?
HYPOTHESIS:
With limited reference sequence available, it will be very difficult to annotate methylated regions of DNA
WHO CARES:
DNA methylation is an epigenetic mechanism with important regulatory functions. Evidence for regulatory role in oysters, would like to explore in diff populations / generations but need to know where to look.
Introduction QUESTION:
is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?
HYPOTHESIS:
With limited reference sequence available, it will be very difficult to annotate methylated regions of DNA
WHO CARES:
DNA methylation is an epigenetic mechanism with important regulatory functions. Evidence for regulatory role in oysters, would like to explore in diff populations / generations but need to know where to look.
Background: bisulfite sequencing
m
C ATG T TA C G AT C G G C T C G
bisulfite
U ATG T TA U G ATC G G U T C G
PCR
T ATG T TA TG ATC G G T T C G ATAC A AT AC TAG C C AT G C
m
previous work – use design primers to amplify specific regions of interest
challenging to design primers with specificity, limited to known sequences
Bisulfite-PCR
Kismeth
WGBS Challenges:
sequencing issues – sequencers can have problems w/ low complexity sequence
non-model species genomic resources limited
C.gigas
Most resources are ESTs (coding sequences only)
bioinformatics
assemblies/alignments need to recognize C/T conversion
bisulfite treatment results in 4 unique strands after PCR
Approach:
generate mock bisulfite-seq reads using Atlantic salmon GSS sequences as surrogate to C.gigas
use CLC to assemble mock bisulfite treated reads back to non-treated mock sequences
Approach:
Atlantic salmon GSS: 203,387
sequences
after de novo assembly: 128,337
contigs
generate 1 million random, ~40bp
fragments
convert all C to T, with exception of ‘ACG’ sequences (259,750 ‘C’s’
remain)
create similar fragment library that is not converted to use as reference
sequence
use the non-treated library to assemble
bisulfite treated reads
Assembly 1st try:
non treated fragments de novo assembly non
treated
assemble bisulfite reads to de novo non
treated
BLAST non treated contigs with matches
for ID
1 million short reads 40 mil bp
459 contigs (~300bp)
42 contigs (~ 46bp)
1940 bp
Found hits, but many
not annotated
Analysis summary:
non-treated reference A*
non-treated reference B
non-treated reference converted
assembly settings:
(‘global alignment’, ‘allow mismatch)
limit=8
mismatch cost =2
score limit = 8
limit=8
mismatch cost =3
score limit = 15
limit=8
mismatch cost =3
score limit = 15
contigs generated (total bp)
42
(1940 bp)
71
(21,487)
11,213
(508,799)
total SNPs 42 42 473
Other tools:
Nature Reviews Genetics 11, 191-203 | doi:10.1038/nrg2732
Conclusions: QUESTION:
is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?
HYPOTHESIS:
With limited reference sequence available, it will be very difficult to map methylated regions of DNA
ANSWER:
Yup
Conclusions: QUESTION:
is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?
HYPOTHESIS:
With limited reference sequence available, it will be very difficult to map methylated regions of DNA
ANSWER:
Yup
Next Steps Find tool to do ‘customizable’ assembly
e.g. only allow C/T (or G/A mismatches)
new protocol using SOLiD that will only sequence 1 strand (this will make analysis easier)
reduced representation
digest w/ restriction enzymes and size select DNA prior to making library
DNA methylation enrichment kit – fractionate DNA by binding to methyl binding domain proteins (only sequence heavily methylated regions)
Thank you