Gavery Fish546

Whole Genome Bisulfite Sequencing

(feasibility trial)

FISH 546 Mackenzie Gavery

Introduction   QUESTION:

is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?

  HYPOTHESIS:

With limited reference sequence available, it will be very difficult to annotate methylated regions of DNA

  WHO CARES:

DNA methylation is an epigenetic mechanism with important regulatory functions. Evidence for regulatory role in oysters, would like to explore in diff populations / generations but need to know where to look.

Background: bisulfite sequencing

m

C ATG T TA C G AT C G G C T C G

bisulfite

U ATG T TA U G ATC G G U T C G

PCR

T ATG T TA TG ATC G G T T C G ATAC A AT AC TAG C C AT G C

m

  previous work – use design primers to amplify specific regions of interest

  challenging to design primers with specificity, limited to known sequences

Bisulfite-PCR

Kismeth

WGBS Challenges:

  sequencing issues – sequencers can have problems w/ low complexity sequence

  non-model species genomic resources limited

  C.gigas

  Most resources are ESTs (coding sequences only)

  bioinformatics

  assemblies/alignments need to recognize C/T conversion

  bisulfite treatment results in 4 unique strands after PCR

Approach:

  generate mock bisulfite-seq reads using Atlantic salmon GSS sequences as surrogate to C.gigas

  use CLC to assemble mock bisulfite treated reads back to non-treated mock sequences

Approach:

Atlantic salmon GSS: 203,387

sequences

after de novo assembly: 128,337

contigs

generate 1 million random, ~40bp

fragments

convert all C to T, with exception of ‘ACG’ sequences (259,750 ‘C’s’

remain)

create similar fragment library that is not converted to use as reference

sequence

use the non-treated library to assemble

bisulfite treated reads

Assembly 1st try:

non treated fragments de novo assembly non

treated

assemble bisulfite reads to de novo non

treated

BLAST non treated contigs with matches

for ID

1 million short reads 40 mil bp

459 contigs (~300bp)

42 contigs (~ 46bp)

1940 bp

Found hits, but many

not annotated

Analysis summary:

non-treated reference A*

non-treated reference B

non-treated reference converted

assembly settings:

(‘global alignment’, ‘allow mismatch)

limit=8

mismatch cost =2

score limit = 8

limit=8

mismatch cost =3

score limit = 15

limit=8

mismatch cost =3

score limit = 15

contigs generated (total bp)

42

(1940 bp)

71

(21,487)

11,213

(508,799)

total SNPs 42 42 473

Other tools:

Nature Reviews Genetics 11, 191-203 | doi:10.1038/nrg2732

Conclusions:   QUESTION:

is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?

  HYPOTHESIS:

With limited reference sequence available, it will be very difficult to map methylated regions of DNA

  ANSWER:

Yup

Next Steps   Find tool to do ‘customizable’ assembly

  e.g. only allow C/T (or G/A mismatches)

  new protocol using SOLiD that will only sequence 1 strand (this will make analysis easier)

  reduced representation

  digest w/ restriction enzymes and size select DNA prior to making library

  DNA methylation enrichment kit – fractionate DNA by binding to methyl binding domain proteins (only sequence heavily methylated regions)

Thank you

Gavery Fish546

Technology

Transcript of Gavery Fish546