454/Illumina Marker Gene Studies (rRNA)
description
Transcript of 454/Illumina Marker Gene Studies (rRNA)
High-throughput environmental marker gene studies Holly Bik @Dr_Bik
Photo Credit: J. Baldwin, M. Mundo-Ocampo
High-throughput biodiversity research • Oceanic sediments (covering >70% of the earth’s surface)
harbor the vast majority of the world’s biodiversity
• Microscopic eukaryotes (e.g. nematode worms, protists, fungi) are diverse and abundant in these environments
• The taxonomy and functional role of these species (likely to be significant in marine ecosystems) is not understood
• Informed mitigation and remediation REQUIRE prior knowledge of biodiversity!
-Omic Dictionary
• Marker gene studies – amplification of a conserved homologous gene (18S, 16S rRNA) from environmental samples
• Metagenomics – shotgun sequencing of random genomic fragments from environmental DNA
• Metatranscriptomics– expressed mRNA transcripts from environmental samples
Extract Environmental DNA
Amplify rRNA
High-throughput sequencing
Community analysis
Diverse marine community
EASYEASY
EASY
VERY Difficult!!
99 98 98 98 98 98 98 98 99 99 99 100 100 99 99 99 99 99 99 99 99 98
Base Conservation across Metazoa
5’- G C T T G T C T C A A A G A T T A A G C C C -3’SSU_F04
5’- G C C T G C T G C C T T C C T T G G A -3’ SSU_R22
5’- G G T G G T G C A T G G C C G T T C T T A G T T -3’ 99 100 100 100 99 100 100 100 100 100 99 97 100 99 99 100 100 98 100 100 98 98 100 100
100 100 100 100 100 100 100 100 100 100 98 100 100 100 90 100 90 100 100
NF1
100 88 88 88 88 88 88 88 100 98 98 100 99 99 100 100 100 99 100 99 1005’- T A C A A A G G G C A G G G A C G T A A T -3’18Sr2b
100
100
100
100
% identity
% identity
% identity
% identity
Primer SequenceNematodes
Amplification of 18S rRNA
F04/R22(Region 1)
456 bp
NF1/18Sr2b(Region 2)~400 bp
Key Questions1) How diverse are marine communities of
microscopic eukaryotes?
2) How structured are these communities in marine sediments?
3) What has been the effect of anthropogenic disturbance on these communities?
Environmental Taxonomy(18S rRNA)
Deep sea and shallow water marine sediment1.2 million reads, 454 GS FLX Titanium Bik et al. (2012), Molecular Ecology
Diverse Communities
ShallowGulf
ShallowCalif
Atlantic22#1
Atlantic25#2Atlantic29
Atlantic45Atlantic43
Pacific128
Pacific528 Pacific422
Pacific321
Pacific237
ShallowGulf
ShallowCalif
Atlantic22#1
Atlantic25#2Atlantic29Atlantic43 Pacific128
Pacific528Pacific422
Pacific321
Pacific237Atlantic45
PC2 (12.21%)
PC3 (10.54%) PC1 (13.03%)
PC2 (13.32%)
PC1 (14.46%)PC3 (12.38%)
95% Clustering(2000 OCTUs)
99% Clustering(20,000 OCTUs)
**Same grouping patterns were observed using Region 2 of the 18S gene
Deep sea vs. Shallow communities
Bik et al. 2012, Molecular Ecology
OTUs clustered at 95% identityBik et al. 2012, Molecular Ecology
Introduction of Bias
• Sampling design (replicates, temporal, gear)
• Preservation and Extraction methods
• Primer bias (marker gene studies)
• PCR bias (template composition, inhibitors)• Sequencing bias (depth of sequencing, platform
specific considerations)
Variation in Read Number
0
50
100
150
200
250
300
350
400
Halic
epha
lobu
s n. s
p. 6
96H.
gale
atus
B. a
nato
lius 1
70A.
hel
icti 94
A. b
esse
yi 9
8B.
long
icaud
atus
Dity
lenc
hus s
p. 19
9B.
kevi
ni 3
61Z.
pun
ctat
a
B. h
ylob
ianum
160
B. tu
scia
e 18
3B.
hof
man
ni 1
55B.
egg
ersi
146
T. li
rellu
sTr
ichod
orus
sp.
P. ac
umin
atus
B. se
ani 1
75B.
hel
leni
cus 1
54
P. fl
orid
ensis
617
B. fu
ngiv
orus
153
B. p
arac
orne
olus
172
B. se
xden
tati
179
B. a
brup
tus 1
36B.
gerb
eri 1
69
Myo
laim
us n
. sp.
233
B. b
orea
lis 1
38
Prism
atol
aim
us sp
.Lo
ngid
orus
sp.
Rhab
ditid
oide
s n. s
p. 24
3C.
eleg
ans
B. p
latz
eri 1
71
P. ae
rivor
us. 7
58
A. rh
ynch
ofor
i 193
Para
ctino
laim
us sp
.
No.
of R
eads
1r_081r_A_091r_B_091r_C_093r_A_093r_B_093r_C_09
Artificial control community – 1 individual per nematode speciesPorazinska et al. 2009 Molecular Ecology Resources
99% cutoff
OTUs as ‘Clouds’
97% cutoff
How to correlate OTUs with biological species?
OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match
27 63 266 525 e-146 265 265 100 -1 B. seani 175
12 9 265 500 e-138 261 264 98.86 -1 B. seani 175170 8 264 496 e-137 261 264 98.86 0 B. seani 175513 1 264 494 e-136 259 262 98.85 -2 B. seani 175579 2 263 492 e-136 258 261 98.85 -2 B. seani 175570 1 262 492 e-136 258 261 98.85 -1 B. seani 175394 1 263 490 e-135 260 264 98.48 1 B. seani 17519 2 269 488 e-135 264 269 98.14 0 B. seani 175
658 1 266 486 e-134 260 265 98.11 -1 B. seani 175412 2 264 480 e-132 260 265 98.11 1 B. seani 175465 9 254 478 e-132 251 254 98.82 0 B. seani 175
1164 1 268 478 e-132 261 267 97.75 -1 B. seani 175304 1 261 474 e-130 255 260 98.08 -1 B. seani 175868 1 244 460 e-126 242 245 98.78 1 B. seani 175514 2 274 458 e-126 263 272 96.69 -2 B. seani 175683 1 250 426 e-116 241 249 96.79 -1 B. seani 175627 1 230 422 e-115 223 226 98.67 -4 B. seani 175171 3 212 400 e-108 209 211 99.05 -1 B. seani 175
1223 1 202 355 5.00E-95 198 204 97.06 2 B. seani 175
Tail
Head
Porazinska et al. 2010 Zootaxa
Head-Tail Pattern in Nematode OTUs
Tail
Head
Artificial control community containing known nematode species, all with corresponding full length reference 18S sequences
Assigning Taxonomy to OTUs
• BLAST approaches: accuracy is critically dependent on reference databases
• Eukaryote sequence databases are patchy and sparsely sampled
SILVA 108 Ref rRNA Database (16S/18S)
Bacteria 530,197
Archaea 25,658
Eukaryotes 62,587
Errors vs. Rare Taxa
• Chimeras – hybrid sequences formed during PCR that do not exist in nature
• ‘Jumping off points’ in conserved amplicon regions
• Mostly low-read OTUs restricted to single samples
How do we separate the ‘rare biosphere’ from erroneous sequences?
Important Challenges
Phylogenetic– rRNA data needs to be interpreted in a phylogenetic
context, but eukaryotic guide trees are not comprehensive
– Phylogenetic placement of short sequences can help you identify taxon sampling problems in the reference dataset that would not be obvious by BLAST searches
BLAST vs. Phylogeny
Explicitly Phylogenetic Approaches
Aligned OTU sequences
Guide Tree
Evolutionary Placement of short reads
Edge PCoA
Taxonomy assignment, Exploiting head-tail patterns
Community ‘fingerprints’
http://phylosift.wordpress.com@PhyloSift
Development of new tools
How does OTU picking affect biological interpretations of sequence data
Shift towards Illumina… processing 10x as much data?!
Visualization
Visuals tools for enabling novel scientific discovery
OTU
s / Species
Sample Sites
Abundance (vertical)
Important ChallengesMetadata– Genbank’s Short Read Archive is not accessible– MOTUs (Molecular Operational Taxonomic Units)
are arbitrary constructions
Pressing need for open access database resources for metadata analysis and comparative studies
Tools for Computational Analysis
QIIME is popular and easy to use – available on Amazon Cloud if researchers don’t have local bioinformatic facilities
AcknowledgementsUC Davis• Jonathan Eisen• Aaron Darling• Guillaume Jospin
Former Lab Members• W. Kelley Thomas (Univ. of New Hampshire)• Way Sung (Univ. of New Hampshire)• Feseha Abebe-Akele (Univ. of New Hampshire)
Collaborators• Simon Creer (Univ. of Wales, Bangor)• Vera Fonseca (Univ. of Wales, Bangor)• Dorota Porazinska (Univ. of Florida)• Robin Giblin-Davis (Univ. of Florida)• Jyotsna Sharma (University of Texas, San Antonio)• Ken Halanych (Auburn University)