Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program...

44
Virtual Biodiversity Research and Access Network for Taxonomy (ViBRANT) Knowledge Organization System for GBIF Dag Endresen Knowledge Systems Engineer Éamonn Ó Tuama Senior Programme Officer, Inventory, Discovery, Access (IDA) Global Biodiversity Information Facility (GBIF) 31 August 2012

description

Slides from a presentation on the Knowledge Organization System (KOS) work program for GBIF. KOS developments for biodiversity information resources and input to the emerging Vocabulary Management Task Group (VoMaG). Links GBIF KOS prototype tools, http://kos.gbif.org/ Tool: Semantic Wiki prototype, http://terms.gbif.org/wiki/ Tool: ISOcat prototype demo, http://kos.gbif.org/isocat/ GBIF concept vocabulary term browser, http://kos.gbif.org/termbrowser/ GBIF Resources Repository, http://rs.gbif.org/terms/ GBIF Vocabulary Server, http://vocabularies.gbif.org/ GBIF Resources Browser, http://tools.gbif.org/resource-browser/

Transcript of Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program...

Page 1: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

   

Virtual Biodiversity Research and Access Network for Taxonomy (ViBRANT)

Knowledge Organization System for GBIF

Dag Endresen Knowledge Systems Engineer Éamonn Ó Tuama Senior Programme Officer, Inventory, Discovery, Access (IDA) Global Biodiversity Information Facility (GBIF) 31 August 2012

Page 2: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Enabling interoperability for the GBIF network and beyond “The ability of two or more systems or components to exchange information and to use the information that has been exchanged” (ref: IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries, ISBN:155937079)

Key requirement: Common exchange standards and protocols for biodiversity data …

necessitate agreement on use of common vocabularies for the classes of objects and their properties

Knowledge Organisation Systems (KOS) - can help us manage our vocabularies

GEO BON, IPBES

Page 3: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Knowledge Organisation Systems

- Term lists: glossaries, dictionaries, gazetteers

- Classifications / categorisations: taxonomies

- Relationships: thesauri, ontologies

... to manage the vocabularies used for sharing biodiversity information.

simple relationships a model of a domain

e.g., Dewey Decimal Classification

Page 4: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

taxonRank

higherClassification

taxonConceptID collectionCode

geodeticDatum specificEpithet

coordinatePosition

Darwin Core – a glossary of terms

collectionCode: The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived. Examples: "Mammals", "Hildebrandt", "eBird".

Page 5: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

nt Natural resources nt Biological resources

nt Genetic resources nt Germplasm

uf Genetic material uf Germplasm resources rt Protoplasm rt Genes rt Gene pools rt Biodiversity rt Germplasm collections rt Gametes

AgroVoc vocabulary – a thesaurus

bt Resources

bt = broader term nt = narrower term uf = used for rt = related term

http://aims.fao.org/standards/agrovoc/functionalities/hierarchy

Page 6: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Ontology – a model of a domain

ontologies = computable dictionaries

inverseOf

sameAs differentFrom

William Jefferson Clinton

Bill Clinton

collectors take samples

samples are taken by collectors

NHM, Los Angeles County

NHM, London

A hasAncestor B B hasAncestor C

hasAncestor

transitiveProperty

Clinton image source: http://www.whitehouse.gov/sites/default/files/first-family/masthead_image/42bc_header_sm.jpg?1250887359

Page 7: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Term  versus  Concept  

Dextre Clarke, S.G. and L. Zeng (2012). From ISO 2788 to ISO 25964: the evolution of thesaurus Standards towards Interoperability and data modeling. ISQ Information Standards Quarterly 24(1): 20-26.

“The SKOS (simple knowledge organization system) format is designed to present KOS data in a format that is suitable for machine inferencing and particularly for use in the Semantic Web (….) The model [ISO 25964] is based on the understanding that thesauri show the relationships between concepts – units of thought – and distinguishes these from the terms that are used to label these concepts. These terms may be in one or more languages, and one term per language is chosen as a preferred term for each concept. One or more additional terms for the same concept may be recorded in the thesaurus as non-preferred terms.” Will, L. (2012). The ISO 25964 Data Model for the Structure of an Information Retrieval Thesaurus. Bulletin of the American Society for Information Science and Technology 38(4): 48-51.

Page 8: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

- New dedicated position at GBIF funded through external projects (ViBRANT, i4Life)

Knowledge Organisation Systems

- Review GBIF Vocabularies Service and develop vocabulary management system

- Engage with wider community: - participation in Dublin Core workshop, Sept 2011 - KOS symposium at TDWG 2011 Conf, Oct 2011 - TDWG Vocabulary Management Task Group, 2012

- Review recommendations in KOS task group report and develop implementation roadmap

KOS activities in GBIF work programme

Key requirement: a platform to support the development, maintenance and governance of vocabularies for the biodiversity community

Page 9: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

ViBRANT: Task 4.1 Ontology platform (GBIF, JKI) Description of work:

•  “[F]lexible, user-friendly ontology management environment, enabling users to create, define, extent and share their own terms and concepts where needed, providing options for discussions and annotation, while supporting re-use of terms from standardized ontologies wherever possible”.

•  Extent the functionalities of existing vocabulary services (like GBIF).

•  Collaborative community interface for users and user-networks, bottom-up, user-friendly and non-technical.

•  Flexibility for biologists to express their knowledge regardless of whether the terminology has been standardized yet or not.

Text from the ViBRANT project summary, page 13 (my highlighting).

9

Page 10: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

ViBRANT WP4: GBIF tasks and deliverables

Deliverable  4.2:  Ontology  tools:    •  “Develop  the  GBIF  ontology  tool  and  produce  an  equivalent  tool  based  on  a  seman<c  wiki.  Deliver  a  single  user  interface  for  ontology  crea<on  and  edi<ng  based  on  user-­‐acceptance  of  the  alterna<ve  technologies.”   Text from the ViBRANT project summary, page 14 (my highlighting).

10

Page 11: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

h=p://community.gbif.org/pg/groups/21382/    

11

Governance structure (TDWG VoMaG)

Page 12: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

•  Maximize the reuse of terms, focus on the definition and labels for basic terms.

•  Low threshold for non-technical biologists and biodiversity domain experts to access terms and contribute (compared to richer ontologies).

•  Preferred technology: RDF (resource description framework) and SKOS (simple knowledge organization system).

•  Construction and maintenance of OWL ontologies are demanding in respect to expertise, effort and costs.

•  Maintaining SKOS vocabularies are less demanding. •  RDF resources are designed to be easily extended. •  Ontologies (OWL) can be based on (extend) terms

declared by a RDF/SKOS vocabulary. •  SKOS became a W3C recommendation in 2009.

Why use a flat vocabulary ?

12

Page 13: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

•  OWL DL supports machine reasoning through machine accessible formal semantics.

•  OWL provides by default an URI as identifier for classes, properties, relations and instances.

•  E.g. OBO target practical solutions in the biomedical / biology domain, while OWL is more generic and provide cross-domain interoperability.

•  OWL 1.0 became a W3C recommendation in 2004, •  OWL 2.0 in 2009. •  http://www.w3.org/2007/OWL/

•  Recommendation: •  REUSE terms declared by flat vocabularies… •  Start with SKOS - then explore OWL…

Why use OWL (web ontology language) ?

13

Page 14: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Concept Vocabulary (rdf, skos)

Wiki Vocabulary Management

ISOcat Vocabulary Management

Excel, text, etc… Template for Vocabularies

GBIF Resources Browser

Resources Repository

1. Mint and maintain concepts and terms, in domain-expert working groups.

2. Release final version as a Concept Vocabulary. 3. REUSE terms from published concept vocabularies

and ontologies when designing new DwC-A extensions & controlled value vocabularies.

4. Publish at the GBIF Resources Repository. 5. Browse at the GBIF Resources Browser.

GBIF  Vocabularies  

DwC-A extensions & controlled vocabularies Evaluation of

collaborative management tools http://kos.gbif.org/

proposed template processor

2

1

1

1

4

3

5

GBIF Vocabularies as a collaborative management tool for Darwin Core Archive extensions and controlled vocabularies.

Vocabulary management

14

Page 15: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

GBIF  Vocabularies  

Darwin Core Archive extensions and controlled value vocabularies

GBIF Vocabularies as a collaborative

management tool for Darwin Core Archive

extensions and controlled value

vocabularies.

Concept Vocabulary (rdf, skos)

Wiki Vocabulary Management

Resources Repository

ISOcat Vocabulary Management

MS Excel Template for Vocabularies

Evaluation of various tools for collaborative management of concept vocabularies (RDF).

DwC-A extensions & controlled vocabularies

GBIF IPT

Scratchpads

?

GBIF Vocabulary Server (Drupal)

GBIF Vocab Server is based on Drupal 6 / Scratchpads (v1) --> Drupal 7/Scratchpads2 --> Drupal 8 ?

Integration with Scratchpads2? Integration with the NPT?

15

Page 16: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Concept Vocabulary (rdf, skos) Resources

Repository

DwC-A extensions &

controlled vocabularies

GBIF IPT

Scratchpads

Wiki Forum for Terms

Wiki forum for terms as an open community platform for description and maintenance of existing terms. Replacement tool also for the GBIF Vocabulary Server?

Semantic wiki forum for terms

16

Wiki Vocabulary Management

ISOcat Vocabulary Management

MS Excel Template for Vocabularies

Evaluation of various tools for collaborative management of concept vocabularies (RDF).

?

Page 17: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Concept Vocabulary (rdf, skos) Resources

Repository

The GBIF Term Browser allows a user to browse for

terms defined in widely used concept vocabularies such as

Darwin Core, Dublin Core, FOAF, etc., including where

available, translations. http://kos.gbif.org/termbrowser/

GBIF Term browser

17

Wiki Vocabulary Management

ISOcat Vocabulary Management

MS Excel Template for Vocabularies

Evaluation of various tools for collaborative management of concept vocabularies (RDF).

Concept vocabularies stored/deposited at http://rs.gbif.org/terms/

Page 18: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Concept Vocabulary (rdf, skos)

Wiki tool inc. Ontology Management ??

Resources Repository (incl. ontologies?)

Ontologies (rdf, owl)

Biodiversity ontology management

Evaluation of tools for the

development of biodiversity

ontologies.

REUSE terms from RDF vocabularies

Evaluation of biodiversity

ontology repository

solutions.

18

1 2

Page 19: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

BioPortal ontology repository

h=p://bioportal.bioontology.org/projects/168    

Proposal: establish a biodiversity “slice” at the NCBO BioPortal. •  Loading biodiversity ontologies into the NCBO BioPortal promotes

mapping (and reuse of terms) between bio-medical and biodiversity ontologies.

•  An instance of the BioPortal software for biodiversity requires long-term obligations to host and maintain the resource – does e.g. GBIF have the resources to offer to host a BioPortal instance?

19

Page 20: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Concept vocabularies (skos:conceptSchema, RDF)

•  Darwin Core, Darwin Core “extensions”, NCD, GNA, Audubon Core (and other vocabularies of concepts).

as a basis and foundation for

Software application schema (XML, XML schema)

•  Darwin Core Archive (DwC-A) extensions and controlled value vocabularies.

•  Resources such as the DwC-A extensions and controlled value vocabularies REUSE terms (URI) from a vocabulary of terms.

20

GBIF KOS resources

Page 21: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Biodiversity KOS (based on Darwin Core)

Darwin Core (DwC) is a flat list of terms, expressed using RDF. à DwC “extensions” (flat vocabularies for declaration of concepts). à Reuse concepts from other vocabularies whenever possible. Darwin Core Archive (DwC-A) has a star schema model. •  DwC-A core(s), extensions and controlled value vocabularies

•  declared as XML lists of terms. •  DwC-A resources should REUSE terms from Darwin Core and other flat

concept vocabularies. •  New DwC-A core types (data types), eg. sample? Formalize class

entities (ontology). [Current types: Taxon & Occurrence] à  Formalize a governance structure for maintaining KOS resources

based on the principles established for Darwin Core (towards TDWG VoMaG).

21

Page 22: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Darwin Core Archive (DwC-A) v  DwC-A publish DwC records including terms

from DwC-A extensions. v  Simple text based format. v  Zipped single file archive.

Germplasm.txt

22

Page 23: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Darwin Core Archive extension (XML term list)

23 http://rs.gbif.org/sandbox/extension/audubon.xml

Page 24: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Concept vocabulary (RDF/SKOS)

http://rs.gbif.org/terms/geotime/geotimeConcept.rdf 24

In progress: XSLT -> HTML for human readable version.

Page 25: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

GBIF Vocabulary Server The GBIF Vocabulary Server can assist a user to create and manage DwC-A extensions or controlled value vocabularies. However, it is not designed to create RDF/SKOS concept vocabulary resources with reusable concepts. It can export XML, but not RDF. It is based on Scratchpads (v1), aka. Drupal v 6.

25

XML export

edit interface

Page 26: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Global Names Architecture (GNA)

26

Many of the GNA term URI identifiers does not resolve (404 not found). The rowType identifiers simply resolve to the software application schema (to the DwC-A extension). We propose to formalize the GNA concept declarations using RDF/SKOS for improved re-usability of the GNA terms and concepts.

Page 27: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Global Names Architecture (GNA)

27

The Global Names Architecture (GNA) terms were originally simply declared by the DwC-A extension. We propose to formalize the GNA concept declarations using RDF/SKOS for improved re-usability of the GNA terms.

RDF/SKOS    

XML  

Page 28: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Global Names Architecture (GNA)

28

We propose to formalize the GNA concept declarations using RDF/SKOS for improved re-usability of the GNA terms.

RDF/SKOS  

Page 29: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Darwin Core Archive extensions

29

•  Global Names Architecture (GNA) •  Audubon Core (multimedia) •  Invasive species (GISIN) •  Genetic Resources (Germplasm) •  Natural Collections Description (NCD) •  Metadata profile (EML) •  EOL species profile •  Taxonomic Concept Schema (TCS) •  Genomics Standards Consortium (GSC) •  Meta-genomics (?) •  ABCD (?) •  …

Page 30: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

•  Geological time periods •  chronostratigraphy •  magnetostratigraphy

•  Species interactions •  saproxylic interactions •  pollinators

•  Country codes •  Language •  Basis of record •  Taxonomic rank •  Nomenclatural status •  Life form •  Life stage •  …

Controlled value vocabularies

30

Page 31: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

a  proposed  workflow  /  brainstorming  

Page 32: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Versioning  resources  

Move outdated vocabularies to a separated folder named “deprecated”? No versions? Will IPT be aware of this folder? Note that previous DwC-A datasets could be mapped to deprecated vocabulary resources…!

Page 33: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Versioning  resources  

Version the DwC-A vocabularies and extensions using a [_DATE] postfix. Could IPT be made aware of this postfix? Note that previous DwC-A datasets could be mapped to outdated vocabulary resources…!

Page 34: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Versioning  RDF  vocabularies  

Move outdated vocabularies to a subfolder named “archive/[DATE]”? Same versioning model for extensions and vocabularies…?

Page 35: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Versioning  RDF  vocabularies  

Deprecated and outdated vocabularies and DwC-A resources could declare their status, eg. using dcterms:isReplacedBy…? Drawback: the XML document is required to be accessed and parsed to read resource status.

Page 36: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Versioning  vocabulary  resources  

•  Separated folder named “deprecated”?

•  Postfix using [_DATE]?

•  Subfolder named “archive/[DATE]”?

•  dcterms:isReplacedBy

•  Other ideas, solutions?

Page 37: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

a  proposed  workflow  

Page 38: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

TranslaTon  of  vocabulary  term  descripTons  

Term translations (SKOS/RDF) dwc_translations.rdf

Archive (SKOS/RDF) [DATE]/dwc_translations.rdf

Export working file format from the SKOS file (RDF/SKOS à CSV).

Expert working groups or a collaborative expert community develop new translations or refine previous translations.

Archive the translations each time the “active” SKOS file is updated.

The expert group provides their output as a CSV file, XML data or as a SKOS/RDF resource.

Translations for a given vocabulary of terms are maintained and published as a SKOS/RDF file at the GBIF Resources Repository (http://rs.gbif.org/terms/).

Page 39: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Example: master SKOS/RDF resource

http://rs.gbif.org/terms/dwc/dwc_translations.rdf

[ [ [ [ en

es

zh

ja

Page 40: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Workflow  for  term  translaTon  

Term translations (SKOS/RDF) dwc_translations.rdf

Adding new term translations or updating previous term translations always starts and ends with the “active” SKOS/RDF resource for translations.

XSLT  

dwc_translations_de.csv dwc_translations_es.csv dwc_translations_fr.csv dwc_translations_jp.csv dwc_translations_ru.csv dwc_translations_zh_Hans.csv …

dwc_translations_fr.csv (*) updated

XSLT  

dwc_translations_de.csv dwc_translations_es.csv dwc_translations_fr.csv (*) dwc_translations_jp.csv dwc_translations_ru.csv dwc_translations_zh_Hans.csv … dwc_translations_pt.csv (**)

(*) Updated CSV files with translations simply replace extracted previous translations – in the XSLT split and merge cycle. (**) Adding translations to a new language simply by adding the CSV resource into the XSLT cycle.

XSLT split and merge

cycle

expert group

Page 41: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

New data types?

- complement, not duplicate work

- GBIF as premier gateway to discovery, access

Genomic level observations

A roadmap developed by Q1 2013 - genomic data - ecological data

Ecological measurements associated with observations

Page 42: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Metadata

The GBIF metadata catalogue system allows interoperability across distributed metadata repositories http://metadata.gbif.org

Essential for discovery and access to new data types

The challenge ahead ... populating the catalogue with high quality, complete metadata

Page 43: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

GBIF KOS work-program Some  suggested  next  steps  

•  GBIF  Resources  Repository  (h=p://rs.gbif.org/)  

•  Further  development  of  new  DwC-­‐A  extensions  and  controlled  value  vocabularies.  

•  Workflow  for  the  translaTon  of  term  descripTons.  

•  ConTnue  the  evaluaTon  of  collaboraTve  tools  for  management  of  flat  vocabularies  of  terms  (RDF/SKOS).  

•  SemanTc  Wiki,  ISOcat,  Protégé  (web-­‐protégé),  …  

•  New  semanTc  Wiki  for  descripTon  of  terms  /  glossary  of  terms  /  community-­‐driven  discussion  forum  (with  JKI,  Gregor  Hagedorn).  

•  Discussion,  discovery  and  REUSE  of  exisTng  terms.  

•  NCBO  BioPortal  as  a  repository  for  biodiversity  ontologies.  

•  Will  GBIF  contribute  to  mint  new  biodiversity  ontologies?  •  BFO  based  OWL  version  of  Darwin  Core…?  

•  KOS  governance  structure  developed  and  formalized  by  the  (TDWG)  Vocabulary  Management  Task  Group  (VoMaG).  

•  Roadmap  for  KOS  into  the  GBIF  infrastructure,  portal,  …!  

43

Page 44: Knowledge Organization System (KOS) for biodiversity information resources, GBIF KOS work program (Dag and Eamonn, 2012).

Furthermore, I think that we need persistent identifiers!

Cato the Elder ended all his speeches in the senate of Rome with: "Ceterum

autem censeo Carthaginem esse delendam" (English: "Furthermore, I think Carthage must be destroyed").

44