Gavery Fish546

15
Whole Genome Bisulfite Sequencing (feasibility trial) FISH 546 Mackenzie Gavery

description

Gavery FISH546 Presentation

Transcript of Gavery Fish546

Page 1: Gavery Fish546

Whole Genome Bisulfite Sequencing

(feasibility trial)

FISH 546 Mackenzie Gavery

Page 2: Gavery Fish546

Introduction   QUESTION:

is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?

  HYPOTHESIS:

With limited reference sequence available, it will be very difficult to annotate methylated regions of DNA

  WHO CARES:

DNA methylation is an epigenetic mechanism with important regulatory functions. Evidence for regulatory role in oysters, would like to explore in diff populations / generations but need to know where to look.

Page 3: Gavery Fish546

Introduction   QUESTION:

is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?

  HYPOTHESIS:

With limited reference sequence available, it will be very difficult to annotate methylated regions of DNA

  WHO CARES:

DNA methylation is an epigenetic mechanism with important regulatory functions. Evidence for regulatory role in oysters, would like to explore in diff populations / generations but need to know where to look.

Page 4: Gavery Fish546

Background: bisulfite sequencing

m

C ATG T TA C G AT C G G C T C G

bisulfite

U ATG T TA U G ATC G G U T C G

PCR

T ATG T TA TG ATC G G T T C G ATAC A AT AC TAG C C AT G C

m

Page 5: Gavery Fish546

  previous work – use design primers to amplify specific regions of interest

  challenging to design primers with specificity, limited to known sequences

Bisulfite-PCR

Kismeth

Page 6: Gavery Fish546

WGBS Challenges:

  sequencing issues – sequencers can have problems w/ low complexity sequence

  non-model species genomic resources limited

  C.gigas

  Most resources are ESTs (coding sequences only)

  bioinformatics

  assemblies/alignments need to recognize C/T conversion

  bisulfite treatment results in 4 unique strands after PCR

Page 7: Gavery Fish546

Approach:

  generate mock bisulfite-seq reads using Atlantic salmon GSS sequences as surrogate to C.gigas

  use CLC to assemble mock bisulfite treated reads back to non-treated mock sequences

Page 8: Gavery Fish546

Approach:

Atlantic salmon GSS: 203,387

sequences

after de novo assembly: 128,337

contigs

generate 1 million random, ~40bp

fragments

convert all C to T, with exception of ‘ACG’ sequences (259,750 ‘C’s’

remain)

create similar fragment library that is not converted to use as reference

sequence

use the non-treated library to assemble

bisulfite treated reads

Page 9: Gavery Fish546

Assembly 1st try:

non treated fragments de novo assembly non

treated

assemble bisulfite reads to de novo non

treated

BLAST non treated contigs with matches

for ID

1 million short reads 40 mil bp

459 contigs (~300bp)

42 contigs (~ 46bp)

1940 bp

Found hits, but many

not annotated

Page 10: Gavery Fish546

Analysis summary:

non-treated reference A*

non-treated reference B

non-treated reference converted

assembly settings:

(‘global alignment’, ‘allow mismatch)

limit=8

mismatch cost =2

score limit = 8

limit=8

mismatch cost =3

score limit = 15

limit=8

mismatch cost =3

score limit = 15

contigs generated (total bp)

42

(1940 bp)

71

(21,487)

11,213

(508,799)

total SNPs 42 42 473

Page 11: Gavery Fish546

Other tools:

Nature Reviews Genetics 11, 191-203 | doi:10.1038/nrg2732

Page 12: Gavery Fish546

Conclusions:   QUESTION:

is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?

  HYPOTHESIS:

With limited reference sequence available, it will be very difficult to map methylated regions of DNA

  ANSWER:

Yup

Page 13: Gavery Fish546

Conclusions:   QUESTION:

is whole genome bisulfite sequencing (WGBS) a viable option for discovering methylated cytosines in non-model species with limited genomic resources?

  HYPOTHESIS:

With limited reference sequence available, it will be very difficult to map methylated regions of DNA

  ANSWER:

Yup

Page 14: Gavery Fish546

Next Steps   Find tool to do ‘customizable’ assembly

  e.g. only allow C/T (or G/A mismatches)

  new protocol using SOLiD that will only sequence 1 strand (this will make analysis easier)

  reduced representation

  digest w/ restriction enzymes and size select DNA prior to making library

  DNA methylation enrichment kit – fractionate DNA by binding to methyl binding domain proteins (only sequence heavily methylated regions)

Page 15: Gavery Fish546

Thank you