GBIF Checklist bank and the backbone
-
Upload
markus-doering -
Category
Science
-
view
470 -
download
1
Transcript of GBIF Checklist bank and the backbone
![Page 1: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/1.jpg)
GBIF Checklist BankIndexing & Backbone
![Page 2: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/2.jpg)
Checklist Scope1.846 datasets registered 18 million name records
Plazi (1.131), Pensoft (178), CoL GSDs (156)
![Page 3: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/3.jpg)
Denormalized Checklist
![Page 4: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/4.jpg)
Normalized Checklist
![Page 5: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/5.jpg)
Checklist Challenges• Highly relational taxonomic data, almost all records linked in tree & basionym
• Wrong or missing records destroy dataset integrity, not just a single record! • Different to flat, unrelated occurrence records
• Data Quality • broken referential integrity • bad names or placeholders (e.g. «Unallocated Family») • missing or unused controlled vcabularies, e.g. «art» for rank species
• Name strings can be published in several ways • ScientificName • ScientificName + Authorship • Genus + SpeciesEpitheton + Rank + InfraspecificEpitheton + Authorship
• Classifications can be published in several ways • Normalised via parentNameUsageID • Normalised via parentNameUsage • Denormalised via Kingdom,Phylum,Class,Order,Family,Genus
![Page 6: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/6.jpg)
Checklist Indexing• Basic archive validation
• unique ids
• Checklist Normalizer • resolve relations • create implicit taxa from denormalised classification • interpret controlled vocabularies, e.g. rank • match to backbone • match to previous version to keep GBIF ids stable
• Checklist Importer • Inserts data to PostgresDB and solr index for searches
• Checklist Analyser • generate dataset metrics
![Page 7: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/7.jpg)
Organizing Occurrences
• GBIF needs a single, consistent taxonomy • for metrics, search, maps • considerable variation in higher taxa • synonymies can be very large
• Catalog of Life is largest single source • ~90% of GBIF occurrence records (thanks to birds) • ~50% of GBIF occurrence names (35% in 2010)
• GBIF needs to assemble a taxonomy • originally merged (noisy) names found
in occurrences. Resulted in lots of duplicates • improved by stitching together checklist datasets
Cronquist classification Mimosaceae: 3,200 species Caesalpiniaceae: 2,000 species Fabaceae: 14,000 species
“Modern” classification Fabaceae: 19,200 species
Mimosoideae: 3,200 species Cæsalpinioideae: 2,000 species Faboideae: 14,000 species
![Page 8: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/8.jpg)
Current Backbone Issues• Far too many accepted species (acc/syn)
• Cactaceae: GBIF 12.062 (342 syn), TPL 2.233 (5.422 syn) + 5.500 unknown • Genus Weingartia: GBIF 129 (0 syn), TPL 8 (26 syn) + 68 unknown
• Many accepted names based on the same basionym • Sulcorebutia breviflora Backeb. • Weingartia breviflora (Backeb.) Hentzschel & K.Augustin
• No synonyms with different authors possible • Poa pubescens R.Br. synonym of Eragrostis pubescens (R.Br.) Steud. • Poa pubescens Lej. synonym of Poa pratensis L. • merged all names with exact same canonical name
• list of known homonym genera (IRMNG) used to disambiguate between larger groups
![Page 9: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/9.jpg)
Backbone Building
• Overlay ordered sources • Start with Catalog of Life • Primary source defines status • Create new name if kingdom, canonical name & authorship do not exist in
current nub
• Ignore source name if … • not a major Linnean rank (infraspecifc ranks are included) • higher ranks above family (configurable per source) • status conflicts with already existing status • hybrid formula, cultivar, candidatus or placeholder names !!!
Catalogue of Life
Fauna Europaea
GRIN
MammalSpeciesWorld
Observations
Specimens 8000 Species Lists
10s of taxonomic resources
5M+ namesin Primary Data Index
NUBMerged
Match
![Page 10: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/10.jpg)
Backbone AssemblingAnimalia Archaea Bacteria Chromista Fungi Plantae Protozoa Viruses incertae sedis
• Nub build starts with 8 kingdoms
![Page 11: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/11.jpg)
Backbone AssemblingPlantae
Magnoliophyta Magnoliopsida
Asterales Asteraceae
Helianthus L. Helianthus anuus L.
• Catalog of Life is added • Defines higher classification
Plantae Magnoliophyta
Magnoliopsida Asterales
Asteraceae Helianthus L.
Helianthus anuus L.
![Page 12: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/12.jpg)
Backbone AssemblingPlantae
Magnoliophyta Magnoliopsida
Asterales Asteraceae
Helianthus L. Helianthus anuus L.
Cichorium Cichorium intybus L.
• Missing genera are created • Tribe is ignored
Asteraceae Cichorieae Lam & DC. [tribe]
Cichorium intybus L.
![Page 13: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/13.jpg)
Backbone AssemblingPlantae
Magnoliophyta Magnoliopsida
Asterales Asteraceae
Helianthus L. Helianthus anuus L.
Cichorium Linneaus Cichorium intybus L.
= C. balearicum Porta = C. byzantinum Clementi
• Synonyms respect authors • Author match very loose • Existing genus author updated
Plantae Asteraceae
Cichorium Linneaus Cichorium intybus Linneaus
= Cichorium balearicum Porta = Cichorium byzantinum Clem. = Cichorium byzantinum Clementi
![Page 14: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/14.jpg)
Backbone AssemblingPlantae
Magnoliophyta Magnoliopsida
Asterales Asteraceae
Helianthus L. Helianthus anuus L.
Cichorium L. Cichorium intybus L.
= C. balearicum Porta = C. byzantinum Clem.
• Prefer authors from nomenclators
Asteraceae Cichorium L. Cichorium byzantinum Clem.
![Page 15: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/15.jpg)
Backbone AssemblingAsteraceae
Helianthus L. Helianthus anuus L.
Agoseris Agoseris apargioides (Less.) Greene
= A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird
Cichorium L. Cichorium intybus L.
= C. balearicum Porta = C. byzantinum Clem.
• Infraspecifics are included
Asteraceae Agoseris apargioides (Less.) Greene
= A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird
![Page 16: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/16.jpg)
Backbone AssemblingAsteraceae
Helianthus L. Helianthus anuus L.
Agoseris Agoseris apargioides (Less.) Greene
= A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird
Agoseris eastwoodiae Fedde Agoseris maritima E. Sheld.
Cichorium L. Cichorium intybus L.
= C. balearicum Porta = C. byzantinum Clem.
• Other source treats them as species
• Same canonical maritima allowed twice - author different
Asteraceae Agoseris eastwoodiae Fedde Agoseris maritima E. Sheld.
![Page 17: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/17.jpg)
Final Cleanup - BasionymsAsteraceae
Helianthus L. Helianthus anuus L.
Agoseris Agoseris apargioides (Less.) Greene
= A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz
= Agoseris eastwoodiae Fedde A. a. var. maritima (E. Sheld.) Baird
= Agoseris maritima E. Sheld. Cichorium L.
Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem.
• Finally basionyms are detected • by terminal epithet & author
within a family • Only 1 accepted per group
• the most trusted first stays
![Page 18: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/18.jpg)
Final Cleanup - AutonymsAsteraceae
Helianthus L. Helianthus anuus L.
Agoseris Agoseris apargioides (Less.) Greene
= A. maritima Eastw. A. a. var. apargioides A. a. var. eastwoodiae (Fedde) Munz
= Agoseris eastwoodiae Fedde A. a. var. maritima (E. Sheld.) Baird
= Agoseris maritima E. Sheld. Cichorium L.
Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem.
• Create missing autonyms
![Page 19: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/19.jpg)
Backbone Building Rules• Create missing genus or species in classification
• only for accepted taxa
• Create missing autonyms for infraspecific
• Detect basionyms based on terminal epithet & authorship • Assumes epithet & authorship in family is unique • Converts all but one accepted to synonyms
• Flag taxa as doubtful • genus or higher taxon without any species (IRMNG) • species (or infrasp.) with a parent genus (or species) considered to be a synonym
• moved to newly accepted genus (or species) • the case for potential children of synonymised basionym combination
![Page 20: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/20.jpg)
Backbone Sources• GBIF Backbone Patch
• Catalogue of Life
• World Register of Marine Species
• Dyntaxa - Svensk taxonomisk databas
• GRIN Taxonomy
• Fauna Europaea
• Integrated Taxonomic Information System
• Euro+Med Plantbase
• Interim Register of Marine and Nonmarine Genera
• The Clements Checklist
• IOC World Bird Names
• Mammal Species of the World
• Paleobiology Database
• Nomenclators
• International Plant Names Index
• Index Fungorum
• ZooBank
• Prokaryotic Nomenclature Up-to-date
• ICTV Master Species List
• Organisations
• Species Files
• Biodiversity Data Journal (Pensoft)
• ZooKeys (Pensoft)
• PhytoKeys (Pensoft)
• Plazi ???
![Page 21: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/21.jpg)
Backbone Matching
• Occurrence • fuzzy name match • classification match • allow higher rank matches
• Checklist • match kingdom • require straight canonical match • incl authorship comparison • no webservice yet, only embedded
![Page 22: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/22.jpg)
NameUsageParsed Name
Backbone Match
Citation
Dataset Metrics
Verbatim Record
Metrics
Extensions
• Checklists & Nubsame structure
• Parent-child hierarchy • normalized classification
• flexible ranks
• synonyms accepted rel.
• Dataset metrics as timeseries
• Basionym relation
Schema
![Page 23: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/23.jpg)
CLB Supported Extensions• Description: human paragraphs about some topic • Distribution: area ranges with statuses • Identifier: additional identifier for the record • Multimedia: image, video, sound • Literature references: bibliography • Occurrence (indexed via occurrence workflows) • Species Profile: extinct, marine, freshwater, terrestrial flags • Types and specimens: (overlaps with Occurrence) • Vernacular names: name with language & region
http://rs.gbif.org/extension/gbif/1.0/
![Page 24: GBIF Checklist bank and the backbone](https://reader030.fdocuments.us/reader030/viewer/2022021509/587a58d91a28ab520b8b6179/html5/thumbnails/24.jpg)
Normalizing Classifications