Dr David Schindel and Mike Trizna - BOL Data Portal
-
Upload
consortium-for-the-barcode-of-life-cbol -
Category
Education
-
view
893 -
download
2
description
Transcript of Dr David Schindel and Mike Trizna - BOL Data Portal
![Page 1: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/1.jpg)
The Barcode of LifeData Portal
(http://bol.uvm.edu)
Dr. David E Schindel, Executive Secretary
Michael Trizna, Database Specialist
Consortium for the Barcode of Life (CBOL)
Smithsonian Institution
Washington, DC
www.barcodeoflife.org;
![Page 2: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/2.jpg)
Contents of PresentationCrowd-sourced open source software
How does Data Portal complement BOLD and GenBank?
Data Portal capabilities
Case Study: Smithsonian frozen bird tissue project
![Page 3: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/3.jpg)
An Experiment in Museum Tissue Mining and Fast Data Release
Tissue sampling winter/spring
Sequencing completed in September
Sequence quality control in October
Taxonomic checking in early November– Obvious errors removed– Minor discrepancies remain
Data released for Adelaide Conference– Crowd-sourced annotation by community– Will data be mis-used?
![Page 4: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/4.jpg)
Unique Data Portal Capabilities
Creating customized datasets from public and/or your private data
Online library of standard datasets
Support sharing within project teams using Connect IDs, easy link to Working Groups
Running different identification analyses based on different methodologies:– Standard sequence input using FASTA format– Use standard or customized datasets
![Page 5: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/5.jpg)
Barcode Aggregator
727,170 public records
![Page 6: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/6.jpg)
Summary Statistics per Family
![Page 7: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/7.jpg)
Creating Customized Datasets
![Page 8: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/8.jpg)
Existing Data Analysis Packages
LIST of packages– BLOG– BRONX– Kernel– CAOS– USEARCH– BLAST
Output of identification routines as probabilities of assignment
![Page 9: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/9.jpg)
Data Analysis Methods Session
New packages presented Friday afternoon:– Damon Little: Automatic Plants Barcode
pipeline (from raw traces to trimmed/edited sequences)
– Ka Hou Chu: Composite Vector Method (profile trees for faster alignment and tree-based analysis)
– Alain Franc: Matching Next Generation results to Sanger-based reference records
![Page 10: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/10.jpg)
![Page 11: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/11.jpg)
Sample output
![Page 12: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/12.jpg)
CONNECT for Data Portal Collaboration
![Page 13: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/13.jpg)
![Page 14: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/14.jpg)
The USNM Bird ProjectUSNM Division of Birds frozen tissue collection:– 21,104 specimens, 2512 species
Which new ones ones to sample/barcode?
Public records for birds– All public bird COI records: 10,967– All BARCODE records in GenBank: 8,419– BARCODE with taxonomic names: 7,965– BARCODE, name and 2 traces: 2,388
![Page 15: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/15.jpg)
Moving Data Among BOLD, GenBank, Data Portal
USNM Excel Spreadsheet
(KE-Emu Source)
Local database that holds all fields from
the original spreadsheet
Data Portal Aggregator database
BOLDSplit into projects that consist of 2-4 plates
![Page 16: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/16.jpg)
Creating a ‘Pick List’
Spreadsheet of tissue samples compared with:– ITIS taxonomy– Clemens species list in BOLD– Counts of GenBank and/or public BOLD
records– Geographic informattion
Screenshot of USNM list side-by-side with BOLD records
![Page 17: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/17.jpg)
Identifying Samples to be Subsampled
![Page 18: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/18.jpg)
Side-by-Side Lists
![Page 19: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/19.jpg)
USNM Bird Dataset
3150 tissues sampled
168 failed sequences
94 problematic sequences
166 clustered badly
2761 ‘BARCODE-ready’ samples
1,147 ‘first-BARCODE’ species
91% increase over 1,259 barcoded species
(3,892 listed in BOLD includes BINs, others)
![Page 20: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/20.jpg)
Two problematic clades, USNM data
Flycatchers: Family Tyrannidae– Sublegatus arenarum, S. modestus, S.
obscurior, S. sp.– Conopias parvus, C. albovittatus– Myiarchus ferox, M. swainsoni, M. sp.
Hummingbirds: Family Trochilidae– Phaethornis longuemareus
Inconsistencies within USNM dataset
Incompatibilities with public, other data
![Page 21: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/21.jpg)
![Page 22: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/22.jpg)
Resolving Mis-identified Specimens
![Page 23: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/23.jpg)
What testing dataset to use?
ID trees and analytical routines could use:– All public bird COI records: 10,967– All BARCODE records in GenBank: 8,419– BARCODE with taxonomic names: 7,965– BARCODE, name and 2 traces: 2,388
Which ones have reliable taxonomic IDs?
![Page 24: Dr David Schindel and Mike Trizna - BOL Data Portal](https://reader036.fdocuments.us/reader036/viewer/2022062513/556253d0d8b42a6c368b51c5/html5/thumbnails/24.jpg)
Preparing a Data Release PaperSummary statistics from Data Portal
Figures from BOLD