Taxonomic databases: The SEEK and VegBank experience

41
Taxonomic databases: Taxonomic databases: The SEEK and VegBank The SEEK and VegBank experience experience R.K. Peet R.K. Peet The University of North Carolina Ecological Society of America Vegetation Panel The SEEK development team

description

Taxonomic databases: The SEEK and VegBank experience. R.K. Peet The University of North Carolina Ecological Society of America Vegetation Panel The SEEK development team. Biodiversity informatics depends on accurate and precise taxonomy. - PowerPoint PPT Presentation

Transcript of Taxonomic databases: The SEEK and VegBank experience

Page 1: Taxonomic databases:  The SEEK and VegBank experience

Taxonomic databases: Taxonomic databases: The SEEK and VegBank The SEEK and VegBank

experienceexperience

R.K. PeetR.K. Peet

The University of North CarolinaEcological Society of America Vegetation Panel

The SEEK development team

Page 2: Taxonomic databases:  The SEEK and VegBank experience

• Accurate identification and labelling of organisms is a critical part of collecting, recording and reporting biological data.

• Increasingly, research in biodiversity and ecology is based on the integration (and re-use) of multiple datasets.

Biodiversity informatics Biodiversity informatics depends on accurate and depends on accurate and

precise taxonomyprecise taxonomy

Page 3: Taxonomic databases:  The SEEK and VegBank experience

• What was a minor annoyance for a few tens of records becomes intractable when looking at a million records.

• Some data types, such as organism identifications, are inherently more complex to define with the consequence that few standards have been adopted.

Page 4: Taxonomic databases:  The SEEK and VegBank experience

Biodiversity data structure

Taxonomic database

Observation database

Occurrence database

Observation/Collection Event

Specimen or Object

Bio-Taxon

Locality

Observation or Community Type

Observation type database

Page 5: Taxonomic databases:  The SEEK and VegBank experience

VegBankVegBank

• The ESA Vegetation Panel is developing VegBank as a public archive for vegetation plot observations (http://vegbank.org).

• VegBank is expected to function for vegetation plot data in a manner analogous to GenBank.

• Primary data will be deposited for reference, novel synthesis, and reanalysis.

• The database architecture is generalizable to most types of species co-occurrence data.

Page 6: Taxonomic databases:  The SEEK and VegBank experience

www.vegbank.orgwww.vegbank.org

Page 7: Taxonomic databases:  The SEEK and VegBank experience

What is SEEK?Science Environment for Ecological Knowledge

Multidisciplinary project to create:Scientific-workflow system (Kepler)

– Design, reuse, and execute scientific analyses

Distributed data network (EcoGrid)– Environmental, ecological, and systematics data

KR & Semantic Mediation– Discover, integrate, and compose hard-to-relate data and services via

ontologies

Taxonomic concept services– Resolve taxon ambiguities

Collaborators (the SEEK team)• NCEAS, UNM, SDSC/UCSD, U Kansas• Vermont, Napier, ASU, UNC

Page 8: Taxonomic databases:  The SEEK and VegBank experience

Data SetData Set

Data Set

Ecological Data Set

Ecological data set providers

Concept Provider 1e.g. Fishbase

Concept Provider 3e.g. Prometheus

Concept Provider 2e.g. ITIS

Taxonomic concept providers

Taxonomy transfer schema- TML

Concept matching/expansion/…Weighted concepts

Semantic Mediation SystemReturn list of Data Sets

User’s Taxonomic concept + quality measure

Name/Concept Repository

Ecological metadata language- EML (Containing Collector’s

Taxonomic concept(s))

EML repository

Taxon coverage

SEEK High-Level Approach

Page 9: Taxonomic databases:  The SEEK and VegBank experience

Taxonomic database Taxonomic database challenge:challenge:

Standardizing organisms and Standardizing organisms and communitiescommunities

The problem:The problem: Integration of data potentially Integration of data potentially

representing different times, places, representing different times, places, investigators and taxonomic standards.investigators and taxonomic standards.

The traditional solution:The traditional solution: A standard list of organisms / A standard list of organisms /

communities.communities.

Page 10: Taxonomic databases:  The SEEK and VegBank experience

Standard lists are available for Taxa

Representative examples for higher plants in Representative examples for higher plants in North America / USNorth America / US

USDA PlantsUSDA Plants http://plants.usda.gov ITIS http://www.itis.usda.gov NatureServe http://www.natureserve.org BONAP Flora North America

These are intended to be checklists wherein the taxa These are intended to be checklists wherein the taxa recognized perfectly partition all plants. The lists can recognized perfectly partition all plants. The lists can be dynamic.be dynamic.

Page 11: Taxonomic databases:  The SEEK and VegBank experience

Abies lasiocarpa

Abies bifolia

Abies lasiocarpa

sec. Littlesec. USDA PLANTS

sec. Flora North America

Three concepts of subalpine firThree concepts of subalpine fir

Splitting one species into two illustrates the ambiguity often associated with scientific names.

Page 12: Taxonomic databases:  The SEEK and VegBank experience

USDA Plants & ITIS

Abies lasiocarpa

var. lasiocarpa

var. arizonica

One concept ofAbies lasiocarpa

Page 13: Taxonomic databases:  The SEEK and VegBank experience

Flora North America

Abies lasiocarpa

Abies bifolia

A narrow concept of Abies lasiocarpa

Partnership with USDA plants to provide plant concepts for data integration

Page 14: Taxonomic databases:  The SEEK and VegBank experience

Andropogon virginicusAndropogon virginicus complex in the complex in the CarolinasCarolinas

9 elemental units; 17 base concepts9 elemental units; 17 base concepts

Page 15: Taxonomic databases:  The SEEK and VegBank experience

Standardized taxon lists Standardized taxon lists failfail

to allow dataset integrationto allow dataset integration

The reasons include:The reasons include:

• Taxonomic concepts are not defined (just Taxonomic concepts are not defined (just lists), lists),

• Relationships among concepts are not Relationships among concepts are not defineddefined

• The user cannot reconstruct the database as The user cannot reconstruct the database as viewed at an arbitrary time in the past, viewed at an arbitrary time in the past,

• Multiple party perspectives on taxonomic Multiple party perspectives on taxonomic concepts and names cannot be supported or concepts and names cannot be supported or reconciled.reconciled.

Page 16: Taxonomic databases:  The SEEK and VegBank experience

Name ReferenceConcept

Taxonomic theoryTaxonomic theory

A taxon concept represents a unique combination of a name and a reference.

Report -- name sec reference.

.

Page 17: Taxonomic databases:  The SEEK and VegBank experience

Name ConceptUsage

A usage represents an association of a concept with

a name.

• The name used in defining the concept need not be the same name used in your work.

e.g. Carya alba = Carya tomentosa sec. Gleason & Cronquist 1991.

• Usage can be used to apply multiple name systems to a concept

Page 18: Taxonomic databases:  The SEEK and VegBank experience

Relationships among concepts

allow comparisons and conversions

• Congruent, equal (=)• Includes (>)• Included in (<)• Overlaps (><)• Disjunct (|)• and others …

Page 19: Taxonomic databases:  The SEEK and VegBank experience

High-elevation fir trees of western US

AZ NM CO WY MT AB eBC wBC WA OR

var. arizonica

Abies lasiocarpa

Distribution

USDA & ITIS

Flora North America

Abies bifolia Abies lasiocarpa

A. lasiocarpa sec USDA > A. lasiocarpa sec FNA

A. lasiocarpa sec USDA > A. bifolia sec FNA

A. lasiocarpa v. lasiocarpa sec USDA > A. lasiocarpa sec FNA

A. lasiocarpa v. lasiocarpa sec USDA | A. bifolia sec FNA

A. lasiocarpa v. arizonica sec USDA < A. bifolia sec FNA

var. lasiocarpa

Page 20: Taxonomic databases:  The SEEK and VegBank experience

Party Perspective

The Party Perspective on a Concept includes:

• Status – Standard, Nonstandard, Undetermined

• Correlation with other concepts – Equal, Greater, Lesser, Overlap, Undetermined.

• Start & Stop dates.

Page 21: Taxonomic databases:  The SEEK and VegBank experience

Intended functionality

• Organisms are labeled by reference to concept (name-reference combination),

• Party perspectives on concepts and names can be dynamic, but remain perfectly archived,

• User can select which party perspective to follow, and at which date,

• Different names systems are supported,

• Enhanced stability in recognized concepts by separating name assignment and rank from concept.

Page 22: Taxonomic databases:  The SEEK and VegBank experience

When reporting the identity of organisms in publications, data, or on specimens, provide the full scientific name of each kind of organism and the reference that provided the taxonomic concept.

e.g., Abies lasiocarpa sec. Flora North America 1997.

Best practice: Report taxa by reference to concepts.

Page 23: Taxonomic databases:  The SEEK and VegBank experience

• Reference high-quality sources for taxon concepts such as a major compendium that provides its own defined concepts, or a source that references the concepts of others.

• Avoid checklists as they typically lack true taxonomic descriptions or circumscriptions.

Best practice: Choose high-quality concepts

Page 24: Taxonomic databases:  The SEEK and VegBank experience

SEEK & GBIF are working to provide standards for concept

data• Several data models incorporate

taxon concepts. The IOPI, VegBank, and Taxonomer models are optimized for different uses.

• SEEK, GBIF, and TDWG developed TCS, which was adopted by TDWG in August 2005 and is being implemented by GBIF and SEEK.

Page 25: Taxonomic databases:  The SEEK and VegBank experience

• A name in a publication could be either a concept or an identification.

• An annotation is an identification.

• Identifications should include linkage to at least one concept, but need not be limited to a single concept.

Concepts and identifications

are distinct.

Page 26: Taxonomic databases:  The SEEK and VegBank experience

Documenting identifications

Relationships added for identification= Indicates identification ~ (or aff.) Indicates similarity≡ Indicates identity, or defined as

Example of complex identification< Potentilla sec. Cronquist 1991 +~ Potentilla simplex sec Cronquist 1991 +~ Potentilla canadensis sec Cronquist 1991

Page 27: Taxonomic databases:  The SEEK and VegBank experience

Fuzzy logic qualification

1 = Absolutely wrong2 = Understandable but wrong3 = Reasonable or acceptable 4 = Good answer5 = Absolutely correct

Page 28: Taxonomic databases:  The SEEK and VegBank experience

Biodiversity informatics depends on standards and

connectivity• Names (Linnean Core)• Taxonomic concepts (TCS)• Publications (Alexandrian core, etc)• Observations (proposed TDWG

standard)• Identifications (proposed EML

extension)• GUIDS (under development by GBIF)

Page 29: Taxonomic databases:  The SEEK and VegBank experience

Tools to develop and map concepts

• Taxonomists need mapping and visualization tools for relating concepts of various authors. SEEK is building prototypes for review and possible adoption.

• Aggregators need tools for mapping relationships among concepts.

• Users need tools for entering legacy concepts. Several are in development.

Page 30: Taxonomic databases:  The SEEK and VegBank experience

Concept mapper

Page 31: Taxonomic databases:  The SEEK and VegBank experience
Page 32: Taxonomic databases:  The SEEK and VegBank experience
Page 33: Taxonomic databases:  The SEEK and VegBank experience

Demonstration ProjectsConcept relationships of Southeastern US

plants treated in different floras.

Based on > 50,000 mapped concepts

Page 34: Taxonomic databases:  The SEEK and VegBank experience
Page 35: Taxonomic databases:  The SEEK and VegBank experience
Page 36: Taxonomic databases:  The SEEK and VegBank experience

Step 1: Adoption of minimum standards and best practices by high-quality journals, funding agencies, and professional organizations.

Distributed information systems - and the way

ahead

Page 37: Taxonomic databases:  The SEEK and VegBank experience

Publishers, curators and data managers need to tag taxon

interpretations with concepts

• Precedence exists with tagging literature citations and GenBank accessions

• Presses are linking scientific names in many ejournals to ITIS (e.g. Evolution, Ecology)

Page 38: Taxonomic databases:  The SEEK and VegBank experience

Step 2: Creation, availability, and maintenance of databases that document core sets of taxonomic concepts and the relationships of these concepts to each other.

The way ahead

Page 39: Taxonomic databases:  The SEEK and VegBank experience

True concept-based checklists

• Equivalent of ITIS but with concept documentation and including how other concepts map onto the concepts accepted by the party.

• Several are operative or in development including EuroMed, IOPI-GPC, Biotics, VegBank. Concept documentation planned for ITIS/USDA.

Page 40: Taxonomic databases:  The SEEK and VegBank experience

Registration system and standard identifiers for names, references, and

concepts• Essential for data exchange

• GBIF is hosting a set of international workshops to design the GUID infrastructure.

Page 41: Taxonomic databases:  The SEEK and VegBank experience

Step 3: Development and provision of tools to facilitate mark-up of data and manuscripts with taxonomic concepts

Step 4: Demonstration projects

The way ahead