Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

49
Global Digital Infrastructure for biological nomenclature and taxonomy Ellinor Michel 1,2 Richard Pyle 2,3 Robert Guralnick 4 Jon Todd 1,2 1 The Natural History Museum, London UK 2 Int’l Committee on Bionomenclature 3 Bishop Museum, HI, USA 4 Univ of Colorado, Boulder, USA

description

Global Digital Infrastructure for Biological Nomenclature and Taxonomy Ellinor Michel, Dep’t of Life Sciences, The Natural History Museum, London, UK, ([email protected]) Richard L. Pyle, Natural Sciences Dep’t, Bishop Museum, Honolulu, HI, USA Robert P. Guralnick, Dep’t of Ecology & Evolutionary Biology, Univ Colorado, Boulder, CO, USA Jon Todd, Dep’t of Earth Sciences, The Natural History Museum, London, UK, The future for interoperable scientific information is digital, yet scientific names, the handles for all biodiversity information, remain without an integrated system tied to published descriptions and museum type specimens. Descriptions and type specimens provide standards for the otherwise fluid concepts of biological taxa. We are working to unify the infrastructures for biological nomenclature across nomenclatural codes (including zoological (ICZN - http://iczn.org/), botanical (ICNafp - http://www.iapt-taxon.org/nomen/main.php) and bacterial (ICNB) codes) through the Global Names Architecture (GNA). Our initial focus is on animal names, as these comprise the largest component of metazoan biodiversity and ZooBank (zoobank.org) is the first code-related online nomenclatural registration system. Users are applied scientists in agriculture, medicine, veterinary science and climate change research; biodiversity researchers such as ecologists, physiologists; archives such as museums; the scientific publishing community – in short, all users of scientific names of organisms based on the work of taxonomists.

Transcript of Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Page 1: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Global Digital Infrastructure for

biological nomenclature and

taxonomy

Ellinor Michel1,2 Richard Pyle2,3

Robert Guralnick4

Jon Todd1,2

1The Natural History Museum, London UK 2Int’l Committee on Bionomenclature

3Bishop Museum, HI, USA4Univ of Colorado, Boulder, USA

Page 2: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

THE LINNAEAN ENTERPRISE

(E.O. Wilson) ________

the task of identifying all of

Earth’s biodiversity

Page 3: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.

- Grimaldi & Engel, 2005

Names and the information revolution

Note: they don’t say THE scientific name (i.e., singular)

Page 4: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Names, meanings and how to manage them

Page 5: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Image: Biodiversity Heritage LibraryImage: Wikimedia Commons

Carolus Linnaeus1707-1778

Page 6: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Images: Biodiversity Heritage Library

Page 7: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

4,398 Species

Estimated 2-6 names for every valid (=currently considered definable and ‘real’) species

Equivalent of 359 volumes of Systema Naturae

Page 8: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Nomenclature Taxonomy

Type specimen

Is the objective physical standard for a name that anchors the name.

Page 9: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Type Principle Stabilizing biological names with a physical standard

… the type specimen

From Linnaeus’ Fish Collection held at the Linnean Society of London

From living animals to ….

Pomatomus saltator (Linnaeus 1766)

Page 10: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Nomenclatural Codes:

• Zoology• Botany• Bacteria

• Viruses• Cultivated

Plants

Page 11: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Type Specimen = NameStabilizing biological names with a physical standard

Gasterosteus saltatrix Linnaeus, 1766Pomatomus saltator (Linnaeus 1766)

Page 12: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Whole Organism✔Types for the Future

Organism Part✔Tissue Sample✔

PCR Product?DNA Sequence✗

DNA Extraction?

Page 13: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Whole Organism✔Types for the Future

Organism Part✔Tissue Sample✔

PCR Product - reproduced?DNA Sequence

- derived & interpreted

DNA Extraction?

Page 14: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Sequence data is a (usually high fidelity, fine granularity) representation of

molecules

DNA, RNA or proteins are single kinds of organismal data

Page 15: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Sequencing Type SpecimensData enrichment of existing types

Types for the Future

Sequencing ‘Epitypes’(‘botanical’ term, new specimens from type locality, etc.)

Risk of losing stability

?

‘Type Sequence’✗

Page 16: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

A name = ‘computer’ readable code

that links information

Page 17: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

FFF7160A-372D-40E9-9611-23AF5D9EAC4C

Hard for a human;Easy for a computer

Easy for a human;Hard for a computer

“Pomatomus saltator Linn.”

A name = ‘computer’ readable code

Page 18: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

GenBank

HymenopteraNameServer

BDWB

CalPhotos

Namesconnecting information

ToL Where AreBiodiversity Data?

Page 19: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

GenBank

HymenopteraNameServer

BDWB

CalPhotos

“Pomatomus saltator”

ToL “Pomatomus saltatrix”

“Pom. salt.”

“P. saltator”

“Pomatomus saltator (Linnaeus 1766)”

“Pomatomus saltatrix (L.)”

“Gasterosteus saltatrix Linn., 1766”

“Gasterosteus saltator Linnaeus”

“G. saltator L.”

“Gaſteroſteus Sallatrix”

“P. saltator”“G. saltatrix”

“Pomatomus mediterraneus (Rafinesque, 1810)”

“Pomatomus pallasii (Eichwald, 1831)”

“Pomatomus conidens (Castelnau, 1861)”

“P. nalnal (Rochebrune, 1880)”

“Pomatomus tubulus (Saville-Kent, 1893)”

“Pomatomus heptacanthus (Lacépède, 1801)”“P. skib Lacépède, 1802”

“P. lophar (Suckow)”

A Global Names Architecture

Page 20: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

• Taxon Names are the fundamental link among virtually all biodiversity information

• Biodiversity Information relates to species concepts, but data resources are usually tied to text-string names

• Text-string names are difficult to cross-link due to spelling variations, different genus-species combinations, homonyms, synonyms, etc.

• Linking text-string names to concepts requires source-based (literature-based) approach

• The key challenge is to cross-link thousands of biodiversity datasets through taxon concepts, using only text-string names

A Global Names ArchitectureRationale

Page 21: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Funding & Support (2007- to date)

National Science Foundation BiSciCol (DBI-0956415) GNA (DBI-1062441)Encyclopedia of LifeGBIFPBIN / NBIIVarious others (e.g., NOAA, other NSF projects)More NSF & EU Proposals in Process

Partners & Governance

CoL (Species2000 / ITIS), EOL, BHL, GBIF, IPNI, Index Fungorum, ICZN / ZooBank, Landcare Research, MOBOT / Tropicos, Bishop Museum, WHOI, IRMNG, PESI, ALA (and numerous others)

Global Names Architecture Advisory Panel (GNAAP)

A Global Names Architecture

Page 22: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

What GNA is NOT

Yet another database

A Global Names Architecture

What GNA IS (…intended to be…)

Easy for a human;Hard for a computer

“Pomatomus saltator Linn.”

Name or Text-string name

Page 23: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

What GNA is NOT

A Global Names Architecture

Yet another database

What GNA IS (…intended to be…)

FFF7160A-372D-40E9-9611-23AF5D9EAC4C

Hard for a human;Easy for a computer

Easy for a human;Hard for a computer

“Pomatomus saltator Linn.”

Name or Text-string name UUID or GUID (Universally or Globally Unique Identifier)

Page 24: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Data Components

A Global Names Architecture

Global Names Index (GNI)• Database and services

optimized for taxon names represented as raw text strings. (“Dirty Bucket”)

• ~17+ million text strings

• Parsing Services

• Lexical Grouping

• Links back to sources

• Developed at Woods Hole

Page 25: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Data Components

Global Names Usage Bank(GNUB)• Database and services

optimized for taxon names represented as “curated” Taxon Name Usages.(“Clean Bucket”)

• >70K Agents

• >73K References

• >523K Taxon Name Usages(>186K Protonyms)

• Developed at Bishop Museum

A Global Names Architecture

Global Names Index (GNI)• Database and services

optimized for taxon names represented as raw text strings. (“Dirty Bucket”)

• ~17+ million text strings

• Parsing Services

• Lexical Grouping

• Links back to sources

• Developed at Woods Hole

Page 26: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Taxon Name Usage (TNU)A usage of a taxon name within the context of a Reference.

Protonym (≈Basionym)A usage of a taxon name representing the Code-Compliant “creation” of a new name.

ReferenceAny static document source (Publication; Specimen Determination Label; Field Notes, Correspondence, etc.).

tnuID Reference NameString Rank

123 Fowler & Bean, 1930:181 Belonoperca Genus

ProtonymID

123

ValidUsageID

123

234 Fowler & Bean, 1930:182 chabanaudi Species 234 234

ParentUsageID

123

Page 27: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

tnuID Reference NameString Rank

123 Fowler & Bean, 1930:181 Belonoperca Genus

ProtonymID

123

ValidUsageID

123

234 Fowler & Bean, 1930:182 chabanaudi Species 234 234

ParentUsageID

123

567 Baldwin & Smith, 1998:325 Diploprionini Tribe 876

678 Baldwin & Smith, 1998:325 Epinephelinae Subfamily 765

789 Baldwin & Smith, 1998:325 Serranidae Family 654

987 Baldwin & Smith, 1998:325 Teleostei Order

... ... ... ...

567

678

789

678

789

987

345 Baldwin & Smith, 1998 pylei Species 345 345 456

456 Baldwin & Smith, 1998:325 Belonoperca Genus 123 456 567

Page 28: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

That’s cool…

(But how does it help?)

Page 29: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

“Pomatomus saltator Linn.”

A Global Names Architecture

Page 30: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

“Pomatomus saltator Linn.”

Global Names Architecture

Pomatomus | saltator | Linn.

FFF7160A-372D-40E9-9611-23AF5D9EAC4C

4D4EC609-D241-411E-BA54-DD4E0530E494Pomatomus Lacépède, 1802

3437CC89-8D7F-4D2C-9813-F64EFA22FDD3Pomatomus Risso, 1810

A Global Names Architecture

Page 31: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

“Pomatomus saltator Linn.”

Global Names Architecture

Pomatomus | saltator | Linn.

FFF7160A-372D-40E9-9611-23AF5D9EAC4C

Gasterosteus saltatrix Linnaeus, 1766

A Global Names Architecture

Page 32: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

GNIE FFF7160A-372D-40E9-9611-23AF5D9EAC4C

Lopharis mediterraneus Rafinesque, 1810 A3BE1320-806C-4585-B5FE-5CAB2FB5DDF9 8C7E3E5A-3B36-4D23-AA0A-035F726DFBCD 1

Pomatomus mediterraneus (Rafinesque, 1810) 4D4EC609-D241-411E-BA54-DD4E0530E494 8C7E3E5A-3B36-4D23-AA0A-035F726DFBCD 1

Pomatomus pallasii (Eichwald, 1831) 4D4EC609-D241-411E-BA54-DD4E0530E494 969F0B46-2FD1-42BC-A5C9-22D2410076B3 1

Sypterus pallasii Eichwald, 1831 288B17D4-3133-4893-8998-74FD115DBB31 969F0B46-2FD1-42BC-A5C9-22D2410076B3 1

Pomatomus conidens (Castelnau, 1861) 4D4EC609-D241-411E-BA54-DD4E0530E494 64873DE7-934E-4603-8815-2EA90B66482B 1

Temnodon conidens Castelnau, 1861 67D18558-763F-48B2-8F5A-44DBCF827931 64873DE7-934E-4603-8815-2EA90B66482B 1

Gonenion serra Rafinesque, 1810 95CA3EFA-4A11-4C71-9325-5926652E6E84 DC567BAA-2C8B-4065-9A9F-4E91441ACC9A 1

Pomatomus serra (Rafinesque, 1810) 4D4EC609-D241-411E-BA54-DD4E0530E494 DC567BAA-2C8B-4065-9A9F-4E91441ACC9A 1

Pomatomus tubulus (Saville-Kent, 1893) 4D4EC609-D241-411E-BA54-DD4E0530E494 25DE7CC4-E19F-4ADB-A590-582951E60AB1 1

Temnodon tubulus Saville-Kent, 1893 67D18558-763F-48B2-8F5A-44DBCF827931 25DE7CC4-E19F-4ADB-A590-582951E60AB1 1

Pomatomus nalnal (Rochebrune, 1880) 4D4EC609-D241-411E-BA54-DD4E0530E494 C33CA532-4060-4710-9533-7756F66B86C7 1

Sparactodon nalnal Rochebrune, 1880 921EE8B0-5C5C-45EB-BCDB-CD2AA724229D C33CA532-4060-4710-9533-7756F66B86C7 1

Pomatomus pedica Whitley, 1931 4D4EC609-D241-411E-BA54-DD4E0530E494 86ADEBD2-8F52-492E-9792-946B1CCFBFA8 2

Perca lophar Forsskål, 1775 85AA15EB-BF9B-44D6-9DF2-4BE36017768C 00AB9C96-5FCA-4358-98E4-A186D8DE5490 1

Pomatomus lophar (Forsskål, 1775) 4D4EC609-D241-411E-BA54-DD4E0530E494 00AB9C96-5FCA-4358-98E4-A186D8DE5490 1

Cheilodipterus heptacanthus Lacépède, 1801 FBDC898C-F1EA-4768-85B0-521417959096 F848129B-E72A-4C84-918B-B98BEA1FF7AC 1

Pomatomus heptacanthus (Lacépède, 1801) 4D4EC609-D241-411E-BA54-DD4E0530E494 F848129B-E72A-4C84-918B-B98BEA1FF7AC 1

Pomatomus skib Lacépède, 1802 4D4EC609-D241-411E-BA54-DD4E0530E494 EC88117C-C83F-4088-8347-BF9C0491D874 2

Pomatomus sypterus (Pallas, 1814) 4D4EC609-D241-411E-BA54-DD4E0530E494 7CFC2F1F-1B81-4F03-9B31-E4CC81E20185 1

Scomber sypterus Pallas, 1814 86791AEA-32C5-4492-96B8-8273AB440742 7CFC2F1F-1B81-4F03-9B31-E4CC81E20185 1

Chromis epicurorum Gronow in Gray, 1854 EBF02DF4-0D67-4865-AFA1-A71FCB56BBBA 4A4BFEAE-ACC5-4E12-98E1-F89B37896193 1

Pomatomus epicurorum (Gronow in Gray, 1854) 4D4EC609-D241-411E-BA54-DD4E0530E494 4A4BFEAE-ACC5-4E12-98E1-F89B37896193 1

Anthias lophar Suckow, 1799 4E43AEB3-D6EB-478A-9039-46F8370C4F93 E28B3C39-6412-4F30-BED1-FFA67225EDC8 1

Pomatomus lophar (Suckow, 1799) 4D4EC609-D241-411E-BA54-DD4E0530E494 E28B3C39-6412-4F30-BED1-FFA67225EDC8 1

Gasterosteus saltatrix Linnaeus, 1766 97CE20CD-4D6E-4A59-8266-CD161B5521FB FFF7160A-372D-40E9-9611-23AF5D9EAC4C 1

Gaſteroſteus Sallatrix Linnaeus, 1766 97CE20CD-4D6E-4A59-8266-CD161B5521FB FFF7160A-372D-40E9-9611-23AF5D9EAC4C 1

Pomatomus saltator (Linnaeus, 1766) 4D4EC609-D241-411E-BA54-DD4E0530E494 FFF7160A-372D-40E9-9611-23AF5D9EAC4C 16

Pomatomus saltatrix (Linnaeus, 1766) 4D4EC609-D241-411E-BA54-DD4E0530E494 FFF7160A-372D-40E9-9611-23AF5D9EAC4C 35

79 Taxon Name Usages

Homotypic Synonyms

Heterotypic Synonyms

Genus Protonym UUID Species Protonym UUID Refs.

Page 33: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Page 34: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Page 35: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Page 36: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Page 37: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

California Academy

of Sciences

AnimalBase

DEncyclopedia of Life

Marine Species Identification

Portal

DCatalog of Life

D

FishBase

D

WoRMS

D

ITIS

D

IRMNG

D

GBIF

Page 38: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

California Academy

of Sciences

BHL

DAnimalBase

Amphibian Species of the World

DHymenoptera

Online

D

FishBase

D

IPNI

Page 39: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

GenBank

HymenopteraNameServer

BDWB

CalPhotos ToL

FFF7160A-372D-40E9-9611-23AF5D9EAC4C

A Global Names Architecture

Page 40: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

ToL

GenBank

HymenopteraNameServer

BDWB

CalPhotos

FFF7160A-372D-40E9-9611-23AF5D9EAC4C

A Global Names Architecture

Page 41: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Current Numbers- 1,236 Contributors- 70,404 Agents (Authors)- 73,779 References

- 8,547 Journals- 4,302 Books- 2,205 Book Sections- 51,205 Articles- 7,520 Other

- 523,700 Taxon Name Usages- 185,772 Protonyms

Scaling Content

~2M Species~5-10M Protonyms~50M Name-Strings?~100M’s TNUs???

Bulk Import- Sherborn’s Index Animalium (7,700+ References, 430K TNUs)- Hymenoptera Name Server- Systema Dipterorum (35K References, 130K TNUs)- Dozen+ other nomenclator databases- BHL (3,400 Journals, 55K Books, 100’sK Articles)

Publication Workflow (Pensoft, Zootaxa, Others)

Page 42: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Global

Names

Usage

Bank

BHL scanned text is processed to discover taxon names in Global Names Index.

Taxon names in GNI are anchored to Protonyms in GNUB.

BHL References cross-linked to Protonyms to

generate Taxon Name Usages.

Page 43: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Nomenclature Taxonomy

Type specimen

Page 44: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

But we all know that some names aren’t simple

• Sometimes name strings have multiple meanings

• In these cases a name string cannot act as a taxon-identifier without knowing how it has been interpreted

Taxonomy

Page 45: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Page 46: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

In birds there are many allopatric subspecies and different authorities interpret inclusiveness of taxa in different ways (= different taxon concepts)

Page 47: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Different taxon concepts identified with unique IDs (UUIDs)

Page 48: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

• Taxon Names are the fundamental link among virtually all biodiversity information

• Biodiversity Information relates to species concepts, but data resources are usually tied to text-string names

• Text-string names are difficult to cross-link due to spelling variations, different genus-species combinations, homonyms, synonyms, etc.

• Linking text-string names to concepts requires source-based (literature-based) approach

• The key challenge is to cross-link thousands of biodiversity datasets through taxon concepts, using only text-string names

A Global Names ArchitectureConclusions

Page 49: Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2

Thanks!

Nomenclature Taxonomy

Questions?