Hannah McPherson - Plants Plenary

25
Do Next Generation Sequencing approaches provide the answer for DNA barcoding of plants? Hannah McPherson Marlien van der Merwe Paul Rymer Mark Edwards Maurizio Rossetto

Transcript of Hannah McPherson - Plants Plenary

Page 1: Hannah McPherson - Plants Plenary

Do Next Generation Sequencing

approaches provide the answer

for DNA barcoding of plants?

Hannah McPherson Marlien van der Merwe

Paul Rymer Mark Edwards Maurizio Rossetto

Page 2: Hannah McPherson - Plants Plenary

Landscape-level studies of the

Australian flora Species and population

dynamics

Historical and current

processes shaping

distributions and

assemblages of native

trees

Using a range of molecular

tools, life history traits and

modelling

Reproduced from Crisp et al. 2004

Page 3: Hannah McPherson - Plants Plenary

Next generation sequencing

Exploring new molecular tools and

approaches

NGS to assemble whole chloroplast

genomes

Use of whole chloroplast as a barcode?

Reproduced from Crisp et al. 2004

Page 4: Hannah McPherson - Plants Plenary

Technical approach

Full genome shotgun sequencing

Solexa Illumina platform (7Gb/lane)

• 8 labelled paired-end libraries

multiplexed in one lane

• Sub-sampled data from single lanes

No reference sequence

Reproduced from Crisp et al. 2004

Page 5: Hannah McPherson - Plants Plenary

2 locations

20 rainforest tree species

4 individuals pooled from each species for each site

Sampling

*

* Sydney S

Nightcap N

Reproduced from Crisp et al. 2004

Page 6: Hannah McPherson - Plants Plenary

reality check: sampling from

rainforests

Collecting and identifying samples

Preserving leaf material

DNA extraction

9/20 plants successfully sequenced from

both North and South

Reproduced from Crisp et al. 2004

Page 7: Hannah McPherson - Plants Plenary

questions

Can we bioinformatically assemble chloroplast genomes from whole genomic shotgun sequencing without a reference?

What levels of variation do we find across a broad range of species/families?

Can we mine the data for non-chloroplast regions too?

Is whole/partial chloroplast genome sequencing a viable option for barcoding?

Reproduced from Crisp et al. 2004

Page 8: Hannah McPherson - Plants Plenary

From Angiosperm Phylogeny Website

http://www.mobot.org/MOBOT/Research/APweb/welcome.html

Atherospermataceae

Monimiaceae

Lauraceae

Malvaceae

Pittosporaceae

Sapindaceae,

Meliaceae

Euphorbiaceae

Proteaceae

Urticaeae

Angiosperm Phylogeny

Model organism tree

Page 9: Hannah McPherson - Plants Plenary

Malvales

Gossypium,

Theobroma

Brachychiton

Malvaceae

Page 10: Hannah McPherson - Plants Plenary

Atherospermataceae Doryphora

Monimiaceae Wilkiea

Lauraceae Cinnamomum

Calycanthaceae Calycanthus

Laurales

Page 11: Hannah McPherson - Plants Plenary

Map trimmed reads to whole cp genome of closest relative available on Genbank (CLC)

• Consensus of N & S

De Novo assembly (CLC and Velvet) • N & S separately

• Local BLAST / cpDNA genome database

Assemble contigs to N & S reference

(Geneious Pro)

assembling chloroplast genomes

Page 12: Hannah McPherson - Plants Plenary

Align with annotated

Page 13: Hannah McPherson - Plants Plenary

90000

110000

130000

150000

170000

Bra

chychiton

Cin

nam

om

um

Cla

oxylo

n

Dip

loglo

ttis

Dory

phora

Pitto

sporu

m

Synoum

Toona

Wilkie

a

Length/closest cpDNA ref Length mapped cpDNA Length assembled contigs

assembling chloroplast genomes

Page 14: Hannah McPherson - Plants Plenary

Diploglottis cunninghamii

Pittosporum multiflorum

Toona ciliata

Synoum glandulosum

Doryphora sassafras

Claoxylon australe

Cinnamomum oliveri

Brachychiton acerifolius

Wilkiea huegelii

NC_008641 Gossypium barbadense

NC_008325 Daucus carota

NC_008334 Citrus sinensis

NC_010433 Manihot esculenta

NC_004993 Calycanthus floridus var. glaucus

Aligned with MAFFT

RAXML tree from

Cipres Sci Gateway

~40Kbp excluding gaps

Page 15: Hannah McPherson - Plants Plenary

Map trimmed reads to newly constructed

references (assembled contigs)

SNP detection (CLC)

SNP verification

• exploring data

• Sanger sequencing

quantifying variation

Reproduced from Crisp et al. 2004

Page 16: Hannah McPherson - Plants Plenary

SNP detection

Synoum glandulosum (~140Kbp)

• SNPs between N and S

• ~1 in 550bp

• SNPs within N and S

• N ~1 in 2800bp

• S ~1 in 4500bp

reference

Synoum N

Synoum S

reference

Synoum N

Synoum S

reference

Synoum N

Synoum S

reference

Synoum N

Synoum S

Page 17: Hannah McPherson - Plants Plenary

SNP detection

Page 18: Hannah McPherson - Plants Plenary

data mining

Chloroplast barcoding genes

Universal cpSSR markers

Other data BLAST

The question of coverage

Reproduced from Crisp et al. 2004

Page 19: Hannah McPherson - Plants Plenary

choroplast barcoding loci rb

cL

a-f

F

rbcL

a-r

R

rbcL

1F

rbcL

724R

accD

1 F

accD

2 F

accD

3 R

accD

4 R

matK

2.1

F

matK

2.1

a F

matK

X F

matK

3.2

R

matK

5 R

390 F

1326 R

matK

_1F

matK

_1R

matK

_2F

matK

_2R

rpo

B 1

F

rpo

B 2

F

rpo

B 3

R

rpo

B 4

R

rpo

C1 1

F

rpo

C1 2

F

rpo

C1 3

R

rpo

C1 4

R

ycf5

1 F

ycf5

2 F

ycf5

3 R

ycf5

4 R

nd

hJ 1

F

nd

hJ 2

F

nd

hJ 3

R

nd

hJ 4

R

trn

H2 F

psb

AF

R

trn

H (

GU

G)

F

psb

A R

atp

F F

atp

H R

psb

K R

psb

I R

trn

L-c

F

trn

L-d

R

trn

L-e

F

trn

L-f

R

trn

L-g

F

trn

L-h

R

Brachychiton

Cinnamomum

Claoxylon

Diploglottis

Doryphora

Pittosporum

Synoum

Toona

Wilkiea

Daucus

Gossypium

Calycanthus

Citrus

Vijayan and Tsou 2010

Page 20: Hannah McPherson - Plants Plenary

universal cpSSR primers

ccm

p1F

ccm

p1R

ccm

p2F

ccm

p2R

ccm

p3F

ccm

p3R

ccm

p4F

ccm

p4R

ccm

p5F

ccm

p5R

ccm

p6F

ccm

p6R

ccm

p7F

ccm

p7R

ccm

p8F

ccm

p8R

ccm

p9F

ccm

p9R

ccm

p10F

ccm

p10R

Brachychiton

Cinnamomum

Claoxylon

Diploglottis

Doryphora

Pittosporum

Synoum

Toona

Wilkiea

Daucus

Gossypium

Calycanthus

Citrus `

Weising and Gardner 1999

Page 21: Hannah McPherson - Plants Plenary

data mining

26S coverage ~35-300

Rpb2 only returned when sequence

available in same family or sister family

coverage ~3-5

Resistance genes – good return but

coverage ~2-10

Leafy – no returns

Reproduced from Crisp et al. 2004

Page 22: Hannah McPherson - Plants Plenary

data mining

Matches were good

Seem to be in more conserved bits

Single copy nuclear genes present but

low coverage

Some difficulty retrieving regions

depending on available data for BLAST

Reproduced from Crisp et al. 2004

Page 23: Hannah McPherson - Plants Plenary

viability for barcoding

Large portion of the chloroplast genome

retrieved and easily assembled even

without a reference

Potential for retrieving other regions with

increased coverage/ carefully designed

multiplexing

Reproduced from Crisp et al. 2004

Page 24: Hannah McPherson - Plants Plenary

to sum up the story so far

We can assemble large portions of chloroplast genomes from whole genomic shotgun sequencing even without a reference

Variation is low and varies from family to family

Single copy nuclear genes present but low coverage?

Is whole/partial chloroplast genome sequencing a viable option for barcoding?

Reproduced from Crisp et al. 2004

Page 25: Hannah McPherson - Plants Plenary

acknowledgements

Friends of the Botanic Gardens Trust

Southern Cross University – Robert

Henry Nicole Rice Stirling Bowen

Evolutionary Ecology team at the Royal

Botanic Gardens Sydney

Emma McIntosh Alexander Dohms

Juelian Siow Ashlee Wakefield

Reproduced from Crisp et al. 2004