454/Illumina Marker Gene Studies (rRNA)

26
High-throughput environmental marker gene studies Holly Bik @Dr_Bik Photo Credit: J. Baldwin, M. Mundo- Ocampo

description

Presentation about high-throughput marker gene studies at the UC Davis Bits & Bites lunchtime discussion group (4/19/2012)

Transcript of 454/Illumina Marker Gene Studies (rRNA)

Page 1: 454/Illumina Marker Gene Studies (rRNA)

High-throughput environmental marker gene studies Holly Bik @Dr_Bik

Photo Credit: J. Baldwin, M. Mundo-Ocampo

Page 2: 454/Illumina Marker Gene Studies (rRNA)

High-throughput biodiversity research • Oceanic sediments (covering >70% of the earth’s surface)

harbor the vast majority of the world’s biodiversity

• Microscopic eukaryotes (e.g. nematode worms, protists, fungi) are diverse and abundant in these environments

• The taxonomy and functional role of these species (likely to be significant in marine ecosystems) is not understood

• Informed mitigation and remediation REQUIRE prior knowledge of biodiversity!

Page 3: 454/Illumina Marker Gene Studies (rRNA)

-Omic Dictionary

• Marker gene studies – amplification of a conserved homologous gene (18S, 16S rRNA) from environmental samples

• Metagenomics – shotgun sequencing of random genomic fragments from environmental DNA

• Metatranscriptomics– expressed mRNA transcripts from environmental samples

Page 4: 454/Illumina Marker Gene Studies (rRNA)

Extract Environmental DNA

Amplify rRNA

High-throughput sequencing

Community analysis

Diverse marine community

EASYEASY

EASY

VERY Difficult!!

Page 5: 454/Illumina Marker Gene Studies (rRNA)

99 98 98 98 98 98 98 98 99 99 99 100 100 99 99 99 99 99 99 99 99 98

Base Conservation across Metazoa

5’- G C T T G T C T C A A A G A T T A A G C C C -3’SSU_F04

5’- G C C T G C T G C C T T C C T T G G A -3’ SSU_R22

5’- G G T G G T G C A T G G C C G T T C T T A G T T -3’ 99 100 100 100 99 100 100 100 100 100 99 97 100 99 99 100 100 98 100 100 98 98 100 100

100 100 100 100 100 100 100 100 100 100 98 100 100 100 90 100 90 100 100

NF1

100 88 88 88 88 88 88 88 100 98 98 100 99 99 100 100 100 99 100 99 1005’- T A C A A A G G G C A G G G A C G T A A T -3’18Sr2b

100

100

100

100

% identity

% identity

% identity

% identity

Primer SequenceNematodes

Amplification of 18S rRNA

F04/R22(Region 1)

456 bp

NF1/18Sr2b(Region 2)~400 bp

Page 6: 454/Illumina Marker Gene Studies (rRNA)

Key Questions1) How diverse are marine communities of

microscopic eukaryotes?

2) How structured are these communities in marine sediments?

3) What has been the effect of anthropogenic disturbance on these communities?

Page 7: 454/Illumina Marker Gene Studies (rRNA)

Environmental Taxonomy(18S rRNA)

Deep sea and shallow water marine sediment1.2 million reads, 454 GS FLX Titanium Bik et al. (2012), Molecular Ecology

Page 8: 454/Illumina Marker Gene Studies (rRNA)

Diverse Communities

Page 9: 454/Illumina Marker Gene Studies (rRNA)

ShallowGulf

ShallowCalif

Atlantic22#1

Atlantic25#2Atlantic29

Atlantic45Atlantic43

Pacific128

Pacific528 Pacific422

Pacific321

Pacific237

ShallowGulf

ShallowCalif

Atlantic22#1

Atlantic25#2Atlantic29Atlantic43 Pacific128

Pacific528Pacific422

Pacific321

Pacific237Atlantic45

PC2 (12.21%)

PC3 (10.54%) PC1 (13.03%)

PC2 (13.32%)

PC1 (14.46%)PC3 (12.38%)

95% Clustering(2000 OCTUs)

99% Clustering(20,000 OCTUs)

**Same grouping patterns were observed using Region 2 of the 18S gene

Deep sea vs. Shallow communities

Bik et al. 2012, Molecular Ecology

Page 10: 454/Illumina Marker Gene Studies (rRNA)

OTUs clustered at 95% identityBik et al. 2012, Molecular Ecology

Page 11: 454/Illumina Marker Gene Studies (rRNA)

Introduction of Bias

• Sampling design (replicates, temporal, gear)

• Preservation and Extraction methods

• Primer bias (marker gene studies)

• PCR bias (template composition, inhibitors)• Sequencing bias (depth of sequencing, platform

specific considerations)

Page 12: 454/Illumina Marker Gene Studies (rRNA)

Variation in Read Number

0

50

100

150

200

250

300

350

400

Halic

epha

lobu

s n. s

p. 6

96H.

gale

atus

B. a

nato

lius 1

70A.

hel

icti 94

A. b

esse

yi 9

8B.

long

icaud

atus

Dity

lenc

hus s

p. 19

9B.

kevi

ni 3

61Z.

pun

ctat

a

B. h

ylob

ianum

160

B. tu

scia

e 18

3B.

hof

man

ni 1

55B.

egg

ersi

146

T. li

rellu

sTr

ichod

orus

sp.

P. ac

umin

atus

B. se

ani 1

75B.

hel

leni

cus 1

54

P. fl

orid

ensis

617

B. fu

ngiv

orus

153

B. p

arac

orne

olus

172

B. se

xden

tati

179

B. a

brup

tus 1

36B.

gerb

eri 1

69

Myo

laim

us n

. sp.

233

B. b

orea

lis 1

38

Prism

atol

aim

us sp

.Lo

ngid

orus

sp.

Rhab

ditid

oide

s n. s

p. 24

3C.

eleg

ans

B. p

latz

eri 1

71

P. ae

rivor

us. 7

58

A. rh

ynch

ofor

i 193

Para

ctino

laim

us sp

.

No.

of R

eads

1r_081r_A_091r_B_091r_C_093r_A_093r_B_093r_C_09

Artificial control community – 1 individual per nematode speciesPorazinska et al. 2009 Molecular Ecology Resources

Page 13: 454/Illumina Marker Gene Studies (rRNA)

99% cutoff

OTUs as ‘Clouds’

97% cutoff

How to correlate OTUs with biological species?

Page 14: 454/Illumina Marker Gene Studies (rRNA)

OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match

27 63 266 525 e-146 265 265 100 -1 B. seani 175

12 9 265 500 e-138 261 264 98.86 -1 B. seani 175170 8 264 496 e-137 261 264 98.86 0 B. seani 175513 1 264 494 e-136 259 262 98.85 -2 B. seani 175579 2 263 492 e-136 258 261 98.85 -2 B. seani 175570 1 262 492 e-136 258 261 98.85 -1 B. seani 175394 1 263 490 e-135 260 264 98.48 1 B. seani 17519 2 269 488 e-135 264 269 98.14 0 B. seani 175

658 1 266 486 e-134 260 265 98.11 -1 B. seani 175412 2 264 480 e-132 260 265 98.11 1 B. seani 175465 9 254 478 e-132 251 254 98.82 0 B. seani 175

1164 1 268 478 e-132 261 267 97.75 -1 B. seani 175304 1 261 474 e-130 255 260 98.08 -1 B. seani 175868 1 244 460 e-126 242 245 98.78 1 B. seani 175514 2 274 458 e-126 263 272 96.69 -2 B. seani 175683 1 250 426 e-116 241 249 96.79 -1 B. seani 175627 1 230 422 e-115 223 226 98.67 -4 B. seani 175171 3 212 400 e-108 209 211 99.05 -1 B. seani 175

1223 1 202 355 5.00E-95 198 204 97.06 2 B. seani 175

Tail

Head

Porazinska et al. 2010 Zootaxa

Head-Tail Pattern in Nematode OTUs

Tail

Head

Artificial control community containing known nematode species, all with corresponding full length reference 18S sequences

Page 15: 454/Illumina Marker Gene Studies (rRNA)

Assigning Taxonomy to OTUs

• BLAST approaches: accuracy is critically dependent on reference databases

• Eukaryote sequence databases are patchy and sparsely sampled

SILVA 108 Ref rRNA Database (16S/18S)

Bacteria 530,197

Archaea 25,658

Eukaryotes 62,587

Page 16: 454/Illumina Marker Gene Studies (rRNA)

Errors vs. Rare Taxa

• Chimeras – hybrid sequences formed during PCR that do not exist in nature

• ‘Jumping off points’ in conserved amplicon regions

• Mostly low-read OTUs restricted to single samples

How do we separate the ‘rare biosphere’ from erroneous sequences?

Page 17: 454/Illumina Marker Gene Studies (rRNA)

Important Challenges

Phylogenetic– rRNA data needs to be interpreted in a phylogenetic

context, but eukaryotic guide trees are not comprehensive

– Phylogenetic placement of short sequences can help you identify taxon sampling problems in the reference dataset that would not be obvious by BLAST searches

Page 18: 454/Illumina Marker Gene Studies (rRNA)

BLAST vs. Phylogeny

Page 19: 454/Illumina Marker Gene Studies (rRNA)

Explicitly Phylogenetic Approaches

Aligned OTU sequences

Guide Tree

Evolutionary Placement of short reads

Edge PCoA

Taxonomy assignment, Exploiting head-tail patterns

Community ‘fingerprints’

Page 20: 454/Illumina Marker Gene Studies (rRNA)

http://phylosift.wordpress.com@PhyloSift

Page 21: 454/Illumina Marker Gene Studies (rRNA)

Development of new tools

How does OTU picking affect biological interpretations of sequence data

Shift towards Illumina… processing 10x as much data?!

Page 22: 454/Illumina Marker Gene Studies (rRNA)

Visualization

Visuals tools for enabling novel scientific discovery

OTU

s / Species

Sample Sites

Abundance (vertical)

Page 23: 454/Illumina Marker Gene Studies (rRNA)

Important ChallengesMetadata– Genbank’s Short Read Archive is not accessible– MOTUs (Molecular Operational Taxonomic Units)

are arbitrary constructions

Pressing need for open access database resources for metadata analysis and comparative studies

Page 24: 454/Illumina Marker Gene Studies (rRNA)
Page 25: 454/Illumina Marker Gene Studies (rRNA)

Tools for Computational Analysis

QIIME is popular and easy to use – available on Amazon Cloud if researchers don’t have local bioinformatic facilities

Page 26: 454/Illumina Marker Gene Studies (rRNA)

AcknowledgementsUC Davis• Jonathan Eisen• Aaron Darling• Guillaume Jospin

Former Lab Members• W. Kelley Thomas (Univ. of New Hampshire)• Way Sung (Univ. of New Hampshire)• Feseha Abebe-Akele (Univ. of New Hampshire)

Collaborators• Simon Creer (Univ. of Wales, Bangor)• Vera Fonseca (Univ. of Wales, Bangor)• Dorota Porazinska (Univ. of Florida)• Robin Giblin-Davis (Univ. of Florida)• Jyotsna Sharma (University of Texas, San Antonio)• Ken Halanych (Auburn University)

Holly Bik
Add Logos for Sloan, DHS