Informatics Infrastructure at the start of the Second Decade of DNA Barcoding
-
Upload
sratnasi -
Category
Data & Analytics
-
view
374 -
download
1
Transcript of Informatics Infrastructure at the start of the Second Decade of DNA Barcoding
Informatics Infrastructure at the start of the Second Decade of DNA BarcodingSUJEEVAN RATNASINGHAM
BIODIVERSITY INSTITUTE OF ONTARIOUNIVERSITY OF GUELPH
Building the Library
10,000
100,000
1,000,000
10,000,000
2004 2006 2008 2010 2012 2014
barcodesLinnean SpeciesBINs
Spiders
Birds
LepidopteraOf North America
Birds of Argentina
IUCNRedList
CITES
Fish
BeesAmphibians
Mammals
Taxonomic
Thematic
Geographic
Community Benchmarks
Mosquitoes
Collaborative Networks
0
2
4
6
8
10
12
14
16
2004 2006 2008 2010 2012 2014
Regi
ster
ed U
sers
(Tho
usan
ds)
Data Sharing
2005 – 102 users from 30 institutions
1000+ Institutions from 94 countries sharing data on BOLD
Acr
oss N
atio
ns
Within Nations
100K+10K – 100K1K – 10K
BOLD User Network - 2015
CanadaFranceUSA GermanyCosta Rica
United Kingdom Switzerland
Acr
oss N
atio
ns
Within Nations
100K+10K – 100K1K – 10K
Finland
BOLD User Network - 2015
KenyaBelgiumMadagascar
NorwayAustria
SwedenJordan
ArgentinaChina Brazil
Spain Mexico PanamaPortugal
Pakistan Egypt South AfricaIndia
New Zealand
Netherlands
Acr
oss N
atio
ns
Within Nations
100K+10K – 100K1K – 10K
BOLD User Network - 2015
Acr
oss N
atio
ns
Within Nations
100K+10K – 100K1K – 10K
BOLD User Network - 2015
66 Other Countries from Every Continent
Testing the Library Depth
BBC Tree of Life, 2014
400K+163K+
80%500+
Specieswith Full Taxonomyof the BOLD LibraryOrders
Animals
Test Data:4000 species from 200 orders,20 per order
0
25
50
75
100
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Top
Ma
tch
Sim
ilarit
y >98%
>95%
>92%
>90%
Testing the Library Depth
Testing the Library Depth
−0.05 0.00 0.05 0.10 0.15
−0.06
−0.04
−0.02
0.00
0.02
PCA1
PC2
ColeopteraDipteraEphemeropteraHemipteraHymenopteraLepidopteraPlecopteraThysanopteraTrichoptera
−0.05 0.00 0.05 0.10
−0.04
−0.02
0.00
0.02
0.04
PC1
PC2
ColeopteraDipteraEphemeropteraHemipteraHymenopteraLepidopteraPlecopteraThysanopteraTrichoptera
K-mers (k=3)
Ratnasingham, Ma, Hebert, in prep.
Amino Acid Composition
Barcode Index Number (BIN)Algorithm
Registry
• Tuned to the marker (COX1)
• Fixed parameters for balanced OTU generation
• Uses prior threshold but refines for each group
• Occurrence of DNA Barcode (place and time)
• Aggregation of all associated metadata
• Reusable - works across studies
0 50000 100000 150000 200000 250000 300000
Mammalia
Birds
Insecta
Fish
Araneae
Mollusca
Plants
Fungi
Importance of registering OTUs
SpeciesBarcode Index NumbersUnregistered OTUs
0
10,000
20,000
30,000
40,000
50,000
60,000
LATITUDINAL RANGE
LifeScanner Solution Overview
Species Identification
ID Engine
Sample Collection
Sequencing
PCR
Prep
Partner Sequencing
Labs
Emb
race
Big
Da
ta
Impact
Reporting
Analysis
Monitoring
Forecasting
Complexity (Data volume & Dimensionality)
What happened?
Why it happened?
What is happening?
What might happen?
Community Curated Libraries
Tier 3
Tier 2
Tier 1Purpose generated & reference specimens availableBarcode compliant & consistentKey species (e.g. Dirty 22, Domesticated & Bush meat)
Curated for consistency in taxonomic assignmentsBarcode compliant & consistentCITES/REDLIST (e.g. Endangered & controlled species)
Mined from BOLDLimited verification and only to be used as last resortDisease vectors & invasive species
78%
20%
25%
Community Defined Metadata Extensions
Rougerie R, Smith AM, Fernandez-Triana J, Lopez-Vaamonde C, Ratnasingham S Hebert PDN. 2011. Molecular analysis of parasitoid linkages (MAPL): gut contents of adult parasitoid wasps reveal larval host. Mol Ecol 20:179-186.
More Analytical Tools
��
����
��
����
��
����
��
�� ����� ����� ����� ����� ������ ������
��� �������
���������
�������������� ��������������������������� ���� ����
���������
��������� �
����������������������!�
����������������������!�
���������������� ������!�
������� ��������!�
���������������������!�
�����������������������!�
�����������������!�
����������������!�
��������������!�
��� ��� �� ��
BOLD4 – Some other features
• Checklist Support (synonyms, progress, shopping lists)
• Data portal for core facilities
• Complete record histories
• RESL algorithm on your own datasets
• Storage and analysis of pre-clustered NGS data
Support for Metabarcoding
mBRAVELinkages & Partners
Metabarcoding Research And Visualization Environment