Surfacing the deep data of taxonomy

49
Surfacing the deep data of taxonomy @rdmpage http://iphylo.blogspot.com

description

Talk at Linnean Society London, 20 September 2012

Transcript of Surfacing the deep data of taxonomy

Page 1: Surfacing the deep data of taxonomy

Surfacing the deep data of taxonomy

@rdmpage

http://iphylo.blogspot.com

Page 2: Surfacing the deep data of taxonomy

To a first approximation the taxonomy of life is already digital…

Page 3: Surfacing the deep data of taxonomy

doi:10.1126/science.276.5313.734

Page 4: Surfacing the deep data of taxonomy

Data – GenBank

Publications – PubMed

Names – Names4Life

Page 5: Surfacing the deep data of taxonomy

So, we’re done! (aren’t we?)

Page 6: Surfacing the deep data of taxonomy

doi:10.1126/science.276.5313.734

Page 7: Surfacing the deep data of taxonomy
Page 8: Surfacing the deep data of taxonomy

Zoology as microbiology

GenBank DNA barcoding➔

PubMed Digital archives (BHL)➔

Names ION, ZooBank, uBio, …➔

Microbiology Zoology

Images from http://phylopic.org

Page 9: Surfacing the deep data of taxonomy

Why does having a single database of names matter?

Page 10: Surfacing the deep data of taxonomy

Bacterial names linked to literature

http://dx.doi.org/10.1099/ijs.0.035154-0

Page 11: Surfacing the deep data of taxonomy

Paenibacillus polymyxa

• http://dx.doi.org/10.1601/nm.5110 (name)• http://dx.doi.org/10.1601/tx.5110 (taxon)

Image from http://dx.doi.org/10.1128/ AEM.71.11.7292-7300.2005

Page 12: Surfacing the deep data of taxonomy

…still not convinced?

Page 13: Surfacing the deep data of taxonomy
Page 14: Surfacing the deep data of taxonomy

O Lambert et al. Nature 466, 105-108 (2010) doi:10.1038/nature09067

Skull, mandible and tooth morphology of the holotype of L. melvillei MUSM 1676.

Page 15: Surfacing the deep data of taxonomy

Leviathan melvillei

Page 16: Surfacing the deep data of taxonomy
Page 17: Surfacing the deep data of taxonomy

Bugger…

Page 18: Surfacing the deep data of taxonomy

Livyatan melvillei

Page 19: Surfacing the deep data of taxonomy

Two kinds of #fail

Page 20: Surfacing the deep data of taxonomy

We don’t have a list of all names

Page 21: Surfacing the deep data of taxonomy

Publications containing names often not accessible

Page 22: Surfacing the deep data of taxonomy

Leviathan melvillei

Page 23: Surfacing the deep data of taxonomy

Need more convincing?

Page 24: Surfacing the deep data of taxonomy

Dark taxa

http://iphylo.blogspot.co.uk/2011/04/dark-taxa-genbank-in-post-taxonomic.html

Page 25: Surfacing the deep data of taxonomy

Mammals in GenBank

Proper Linnaean names

Aus sp.

Page 26: Surfacing the deep data of taxonomy

Mammals

Proper Linnaean names

Aus sp.

Page 27: Surfacing the deep data of taxonomy

“Invertebrates”

BOLD

Page 28: Surfacing the deep data of taxonomy

Is this a problem?

Page 29: Surfacing the deep data of taxonomy

It’s the norm for Bacteria

Page 30: Surfacing the deep data of taxonomy

Dark taxa will only increase in number

Page 31: Surfacing the deep data of taxonomy

Roth v. Wikipeia

http://www.newyorker.com/online/blogs/books/2012/09/an-open-letter-to-wikipedia.html

Page 32: Surfacing the deep data of taxonomy

Wikipedia says “no”

Page 33: Surfacing the deep data of taxonomy

“I understand your point that the author is the greatest authority on their own work,” writes the Wikipedia Administrator—“but we require secondary sources.”

Page 34: Surfacing the deep data of taxonomy

@quominus

http://quominus.org/archives/981

One of Wikipedia’s core principles, along with things like neutrality, is verifiability: a reader must be able to look at a statement in a Wikipedia article and find out where it comes from.

Page 35: Surfacing the deep data of taxonomy

Taxonomic statements should be verifiable

Page 36: Surfacing the deep data of taxonomy

Literature is the evidence base for taxonomy

Page 37: Surfacing the deep data of taxonomy

Literature online

Museums, universities,and scientific societies

Digital archives

Commercialpublishers

Page 38: Surfacing the deep data of taxonomy

http://iphylo.org/~rpage/itaxon

Page 39: Surfacing the deep data of taxonomy

Animal names per decade

Data from http://www.organismnames.com

Page 40: Surfacing the deep data of taxonomy

Names with a DOI

25%

Page 41: Surfacing the deep data of taxonomy

BioStor (BHL)

©25%

@biostor_org

http://biostor.org

Page 42: Surfacing the deep data of taxonomy

Online(DOI, BioStor, JSTOR,DSpace,PDF, …)

50%

Page 43: Surfacing the deep data of taxonomy

Identifiers

Page 44: Surfacing the deep data of taxonomy

Vast majority of names are in the legacy literature

Page 45: Surfacing the deep data of taxonomy

Zootaxa and Zookeys

XML

Page 46: Surfacing the deep data of taxonomy

My wish list…

Page 47: Surfacing the deep data of taxonomy

Names linked to:

literaturespecimensgeographysequences

phylogeny…

Page 48: Surfacing the deep data of taxonomy
Page 49: Surfacing the deep data of taxonomy

BioNames

(real soon now…)

Computable Data Challenge