Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several...

34
Review of Research Vocabularies Australia Simon J D Cox and Jonathan Yu 2019-08-14 Revised 2019-09-09 Prepared for Australian Research Data Commons Adrian Burton, Cel Pilapil Australia’s National Science Agency

Transcript of Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several...

Page 1: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia Simon J D Cox and Jonathan Yu

2019-08-14

Revised 2019-09-09

Prepared for Australian Research Data Commons

Adrian Burton, Cel Pilapil

Australia’s National Science Agency

Page 2: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | i

Land and Water

Citation

Cox, S J D and Yu, J (2019) Review of Research Vocabularies Australia

Copyright

© Commonwealth Scientific and Industrial Research Organisation 2019. To the extent permitted

by law, all rights are reserved and no part of this publication covered by copyright may be

reproduced or copied in any form or by any means except with the written permission of CSIRO.

Important disclaimer

CSIRO advises that the information contained in this publication comprises general statements

based on scientific research. The reader is advised and needs to be aware that such information

may be incomplete or unable to be used in any specific situation. No reliance or actions must

therefore be made on that information without seeking prior expert professional, scientific and

technical advice. To the extent permitted by law, CSIRO (including its employees and consultants)

excludes all liability to any person for any consequences, including but not limited to all losses,

damages, costs, expenses and any other compensation, arising directly or indirectly from using this

publication (in part or in whole) and any information or material contained in it.

CSIRO is committed to providing web accessible content wherever possible. If you are having

difficulties with accessing this document please contact [email protected].

Page 3: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4
Page 4: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | i

Contents

Acknowledgments ............................................................................................................................ii

Executive summary ......................................................................................................................... iii

1 Goals and approach ............................................................................................................ 1

2 Overview of Research Vocabularies Australia .................................................................... 2

2.1 Purpose .................................................................................................................. 2

2.2 Current state .......................................................................................................... 2

2.3 Usage patterns....................................................................................................... 4

2.4 Expressivity ............................................................................................................ 5

3 Stakeholder survey ............................................................................................................. 6

3.1 Limitations ............................................................................................................. 6

3.2 Potential enhancements ....................................................................................... 6

4 Workshop ............................................................................................................................ 8

4.1 Workshop participants .......................................................................................... 8

4.2 Program ................................................................................................................. 9

4.3 Workshop process ............................................................................................... 10

4.4 Workshop outputs ............................................................................................... 10

5 Related work ..................................................................................................................... 14

5.1 Initiatives and services ........................................................................................ 14

5.2 Vocabulary tools, libraries and platforms ........................................................... 16

5.3 Governance patterns ........................................................................................... 18

6 Recommendations ............................................................................................................ 21

6.1 Engagement, training, practice ........................................................................... 21

6.2 Governance ......................................................................................................... 22

6.3 Tools and Technology .......................................................................................... 23

6.4 Content ................................................................................................................ 24

6.5 Impact and analytics tracking .............................................................................. 25

Page 5: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

ii | CSIRO Australia’s National Science Agency

Acknowledgments

This report was the result of a consultation involving ARDC staff and various stakeholders from the

Australian research community, as listed in 4.1 Workshop participants.

Page 6: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | iii

Executive summary

This review of Research Vocabularies Australia (RVA) undertakes an evaluation of the current

capabilities and provides recommendations concerning its future directions and development. The

review includes a stakeholder consultation and expert assessment.

The review found that RVA is meeting a clear need, and provides a suite of capabilities that are

valued by the community. Recommendations are made for future improvements to RVA in a

number of areas, including content, technology, user-support, governance and community

engagement, and performance and analytics.

Page 7: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4
Page 8: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 1

1 Goals and approach

The review was undertaken as part of a project to identify future directions for the ARDC data

publication services in the area of (i) data discovery and (ii) vocabulary services. ARDC wishes to be

informed and guided by external and independent research and consultation as to whether (and if

so how) future development of the data publication services might be undertaken.

Specific topics relating to vocabularies include

• Vocabulary re-use/discovery/ tooling/applications layer

• complex knowledge organisation and related research infrastructure scenarios

• persistent URL management

Inputs in the review were

1. stakeholder survey - issued 2019-06-10

2. stakeholder workshop - held 2019-06-25

3. background knowledge and environmental scans by consultants

4. interviews with ARDC staff

5. comments on the draft report from workshop attendees and other stakeholders

Page 9: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

2 | CSIRO Australia’s National Science Agency

2 Overview of Research Vocabularies Australia

2.1 Purpose

Research Vocabularies Australia (RVA) offers a suite of services to allow the Australian research

community to publish and access controlled-vocabularies used in various disciplines and cross-

disciplinary applications via the world wide web.

RVA is one of a number of web-based services providing term definitions for use in research

projects and data. RVA is unusual in that its scope is the whole of the research sector, though its

content is loaded in response to specific community requests, and reflects varying levels of

engagement with different research communities and disciplines. Nevertheless, the technical basis

for RVA’s services follows international norms and best practice.

Community vocabulary services enable sharing of technical definitions within and between

disciplines. These definitions are important both for tagging projects, initiatives and datasets to

assist discovery in data catalogues, but also to provide specific definitions of elements within

datasets, such as units of measure, experimental procedures, variables, statistical classifications,

etc. These were traditionally captured in obscure annotations in ‘column headings’. Hosting

precise, accessible, shared definitions in a common service with individual web-identifiers at a fine

level of granularity (i.e. per term, not just per vocabulary) allows a project to refer to a shared

definition, rather than transcribing it locally or even formulating a new definition for concepts that

should be defined once and reused many times. This approach supports interoperability and re-

use of data, consistent with the principle that research datasets should be FAIR - Findable,

Accessible, Interoperable and Reusable.

2.2 Current state

RVA includes the following components:

- Vocabulary editor (PoolParty - commercial product)

- Vocabulary repository (RDF4J/ELDA/SISSvoc - open source)

- Vocabulary registry (RVA Registry - ARDC developed and maintained)

- Vocabulary portal (RVA Portal - UI over RVA Registry API)

- User support, documentation

Page 10: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 3

Figure 1. ARDC Vocabulary Services - component diagram

The technologies for each function were selected following a requirements-gathering and

assessment exercise initiated in 2014.

These components enable several modes of access to vocabularies via the online web system:

- browser user interface - for vocabulary discovery, term search, and vocabulary exploration

- browser user interface - for vocabulary registration, upload and versioning

- registration API - for vocabulary maintainers

- search and download API - for system integrators and application builders

- vocabulary widget - for application builders

217 vocabularies are visible in the RVA Portal1, including

- 80 which are a simple listing, with a link to an externally hosted web-page for the

vocabulary

- 137 which are hosted in the repository, enabling an API and interactive UI for exploration,

and optional widget, of which

o 100 were loaded from Poolparty

o 37 were prepared using another tool, and then loaded from an RDF representation

1 As of 2019-07-30

Page 11: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

4 | CSIRO Australia’s National Science Agency

Figure 2. Summary of hosting arrangements for vocabularies listed in the RVA Portal

The vocabularies hosted in the RVA repository follow contemporary best practice whereby

- each term is denoted by a web-identifier (URI)

o e.g. http://resource.geosciml.org/classifier/ics/ischart/Jurassic

- a description of each term is available

o as a web-page (for human consumption),

o structured using SKOS, in various RDF encodings (for use in applications)

The RVA API and UI use SISSVoc, in which the description of each term and vocabulary is accessed

at a URL which includes the term URI, e.g.

URL for the vocabulary service

http://vocabs.ands.org.au/repository/api/lda/csiro/international-chronostratigraphic-chart/2018-

revised-corrected/resource?uri=http://resource.geosciml.org/classifier/ics/ischart/Jurassic

URI for a term or vocabulary item

2.3 Usage patterns

Vocabulary contributors use RVA in a number of different ways.

2.3.1 Primary host, API and UI

Many contributors use RVA as the primary or sole host for their vocabulary, and suggest that their

community use the RVA user-interface and API as-is. This particularly applies to those who also

use the RVA-hosted Poolparty for vocabulary creation and maintenance.

Page 12: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 5

In this case it is recommended that the owner of the URI domain for the term or vocabulary (e.g.

resource.geosciml.org) re-direct all URI requests to RVA, e.g.

http://resource.geosciml.org/classifier/ics/ischart/Jurassic

-- HTTP 302/303/307 →

http://vocabs.ands.org.au/repository/api/lda/csiro/international-chronostratigraphic-

chart/2018-revised-

corrected/resource?uri=http://resource.geosciml.org/classifier/ics/ischart/Jurassic

2.3.2 Primary host, remote UI

Some contributors use the RVA repository as the primary host for their vocabulary, and then

provide their own UI – e.g. Geoscience Australia, TERN. The connection to the content may be

either through the SPARQL endpoint or the SISSVoc API.

2.3.3 Secondary (backup) host or cache

The RVA Repository can serve as a reliable alternative host (backup or cache) for any vocabulary

maintainer.

2.3.4 Linked-data cache

A common use of RVA is to provide a linked-data representation or view of a vocabulary whose

primary or canonical representation is less web-friendly – for example as a tabulation in a CSV or

spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of

Statistics are ‘re-published’ in RVA.

2.4 Expressivity

The RVA repository is a standard RDF triple store, so vocabularies may include elements from any

RDF vocabulary (or OWL ontology). A small number of vocabularies in RVA do go beyond the basic

SKOS properties in order to get more semantic expressivity. However, the higher-level RVA

applications (Vocabulary API, RVA catalogue, vocabulary widgets) are optimised for SKOS, and

assume a particular pattern for the internal organization and nesting of vocabulary items, and

most RVA vocabularies follow this pattern and basic SKOS representation.

Page 13: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

6 | CSIRO Australia’s National Science Agency

3 Stakeholder survey

A survey was distributed to RVA stakeholders prior to the workshop. The response was small (8)

but there was valuable qualitative information. A dominant message was that RVA is highly valued

by those stakeholders that have used it.

3.1 Limitations

Descriptions of some specific limitations and impediments to greater use of RVA were provided:

- Availability of and preference for a community/domain service (e.g. CESSDA)

- The current interface makes it difficult to search for terms across vocabs

o “you get a list of sets and NOW you have to look into each one to actually find a

potential term. Tedious and useless”

o “Discovering a term across multiple vocab portals with a single query would be nice.

It took me 30 minutes to find a term or even if one existed just looking at 3 sites.”

- The vocabulary upload/maintenance API makes it difficult to integrate with enterprise

tools and processes

o “ABARES staff would be seeking support on how to manage this update within the

service ...”

- The level of community or domain endorsement per vocabulary is unclear

o “We have authoritative datasets but are there authoritative vocabularies

(community endorsed vocabularies?), and if so, how are they identified within a

vocabulary service?”

- Government agencies are unsure of the scope of RVA/ARDC

o “could the RVA service support more than just the research community?”

3.2 Potential enhancements

Participants rated 10 potential RVA enhancements as summarized in the following table (only 6

responses to the structured questions).

Page 14: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 7

Page 15: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

8 | CSIRO Australia’s National Science Agency

4 Workshop

A workshop of RVA users was held 2019-06-25 at Black Mountain, Canberra

4.1 Workshop participants

Attendees Organizers

Christine Price BOM Cel Pilapil ARDC

Edmond Chuc TERN (remote) Joel Benn ARDC

Siddeswara Guru TERN Adrian Burton ARDC

Natalia Atkins IMOS Jonathan Yu CSIRO

Megan Wong FedUni Simon Cox CSIRO

Robyn Tottenham ABARES Rowan Brownlee ARDC

Tatiana Antsoupova NAA Richard Walker ARDC

Tessa Elieff NAA Julia Martin ARDC

Chantelle Doan BOM

Jasmine Howorth ABARES

Jenny Wood AIATSIS

Joanne Sullivan ABARES

John Morrissey CSIRO

John Page CSIRO

Lara Sedgemen GA

Les Kneebone APO

Marina McGale ADA

Margie Smith GA

Negin Moghaddam GA

Page 16: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 9

4.2 Program

Time Topic

9:30- 10:00am Arrival, Tea/Coffee

10:00- 10:05am Welcome and housekeeping

10:05- 10:10am Goals and structure of workshop

10:10-10:20am Self introduction:

- Your name

- Your affiliation

- Why are you here?

10:20- 11:10am Setting the Scene

Presentations from three exemplary users of controlled vocabularies,

addressing the question: ⏎

Where is knowledge organisation currently in your enterprise?

Future possibilities based on international examples

11:10- 11:20am Stretch break/Coffee/Morning tea

11:20am-12:30pm Breakout session:

- What is working in RVA?

- What are the gaps for satisfying community use cases (which

may exist at different levels, users, operations)

- Report back from discussion in breakout

12:30- 1:15pm L U N C H

1:15- 2:30pm Interactive - Options for improvement and new things

2:30- 2:55pm Priorities for RVA

2:55- 3:00pm Closing remarks

3:00pm End of workshop

Workshop slides are available from

https://docs.google.com/presentation/d/1tFWTuKnslAZAm4h3K8xgO_1UFOjPfph3JrB3qgq7cMo/view

Page 17: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

10 | CSIRO Australia’s National Science Agency

4.3 Workshop process

The workshop was primarily framed around a series of table sessions of small groups, to allow for

contributions by all participants.

The first set of breakouts were focused on assessment of the current state of RVA and identifying

candidate opportunities for change or enhancement. The second set of breakouts aimed at

prioritizing the proposed changes according to their importance and urgency. The results of the

interactive sessions were captured with post-its that were sorted into four categories: Keep, Stop,

Change, New.

4.4 Workshop outputs

4.4.1 Keep and stop

There was a general consensus that RVA is offering an important service, with most current

features found useful (Poolparty editor, widget, notifications).

There is particularly strong support for the support and engagement arrangements. One-on-one

coaching of new content providers is found to be very useful. However, it is also noted that this

may not be scalable or sustainable, and other user-support mechanisms should be pursued.

A few suggestions for stopping or de-emphasizing features were made, though these are not fully

consistent with other recommendations.

Page 18: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 11

4.4.2 Enhancements: changes and new features

Promotion and outreach activities are valued, and increases in these are suggested.

Term search is an important feature of RVA, though improvements are suggested.

Page 19: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

12 | CSIRO Australia’s National Science Agency

General

IMPORTANT AND URGENT IMPORTANT BUT NOT SO URGENT

● Governance

● Tools

Usage analytics / feedback

Specific

IMPORTANT AND URGENT IMPORTANT BUT NOT SO URGENT

● Governance workflows (stories?)

● Versioning granularity (gap in tools?)

● Ability for non-vocabulary-owners to

create mappings/matches between

vocabularies

● Tools to support end user workflows, e.g.

integrate vocabs easily into Excel,

Databases, Geoserver

● Tutorials and educational communications

(webinars, roadshow) - what is a vocab,

why organisations should use RVA incl.

government agencies

● Enhance vocab search

● Possible consolidation of vocabularies

(currently multiple instances of similar

scope and cross-domain use cases, e.g.

organisation names, units of measure)

● Alerts and notifications when vocabulary

of interest is added/created/edited

● Tool to remind vocabulary managers to

check and update vocabulary

● Vocabulary recommender tool

● Citations for vocabularies

● Notifications and analytics for

understanding users of vocabularies

● Notification of use of vocabs via widget

● Vocabulary publication workflow example

stories written up

● Community building

● User interface consistency, customization

Page 20: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 13

4.4.3 Ideal vocabulary service

A subset of the workshop participants were newcomers to RVA, with no current experience of the

system and its features. In place of the ‘retrospective’ exercise, these participants were asked to

speculate on what an ideal vocabulary service would offer. Their deliberations are summarized in

the following figure:

Page 21: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

14 | CSIRO Australia’s National Science Agency

5 Related work

This section provides a scan of some related vocabulary services work internationally. First a set of

vocabulary initiatives are summarized. Second, a review of related tools and platforms is provided.

Note that this survey is not exhaustive, but indicates the scope of vocabularies published through

online services, and some of the approaches and platforms in use.

5.1 Initiatives and services

Here we list a number of initiatives that are primarily concerned with specific collections of

content. In some cases the scope is highly generic (e.g. Library of Congress) though still primarily

designed to be used in a specific set of applications (e.g. cataloguing information artefacts). In

other cases the scope is specialized, primarily domain or even application specific (e.g. DDI, NVS)

though it may have utility more generally.

The table lists some services currently available hosted by other institutions for the purposes of

publishing and managing vocabularies online. In addition to the existing content service, some

new initiatives are likely to generate new vocabularies of widespread significance.

NAME SCOPE STATUS ROLE & USAGE

CURRENT

Library of Congress Linked Data Service https://id.loc.gov/

Classifications for libraries and archives e.g. Subject Headings Classification schemes

Stable, curated

US, global reference

DDI Alliance controlled vocabularies

http://www.ddialliance.org/controlled-vocabularies

Social science classifications and definitions

Stable, curated

Official statistics bureaux, World Bank

BioPortal

https://bioportal.bioontology.org/

Hosts biomedical ontologies from OBO and similar

Stable, open Biomedical, life sciences

OBO Foundry

http://www.obofoundry.org/ http://www.ontobee.org/

Biological, biomedical, and life-sciences

Continuous maintenance

Biomedical research and tagging

Agroportal

http://agroportal.lirmm.fr/

Publishing and managing agricultural vocabularies

Stable, semi-curated

AGROVOC

http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/

Single-source controlled vocabulary of agriculture terms

Stable, curated

GACS

http://browser.agrisemantics.org/gacs/en/

Harmonized vocabulary of agriculture terms

Experimental, curated

Page 22: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 15

NERC Vocabulary Service (NVS)

https://www.bodc.ac.uk/resources/products/web_services/vocab/

Oceanography - point of truth for multiple international initiatives

Stable, curated

(Oracle/ Jena/ homebrew)

point of truth for multiple international initiatives, e.g. SeaDataNet

UK Centre for Ecology and Hydrology Semantic Web Portal

http://vocabs.ceh.ac.uk/edg/tbl/swp?_viewName=home

Environment and ecology

e.g. EnvThes

Experimental, curated

ODM2 controlled vocabularies

http://vocabulary.odm2.org/

Classifiers for earth and environment science observations

Experimental, curated

Earthcube (US)

SWEET

https://github.com/ESIPFed/sweet

Modular ontology suite covering Earth system science.

Actively maintained through ESIP Semantic Technologies Committee

Established early in development of semantic web, widely used and cited

QUDT

http://www.qudt.org/release2/qudt-catalog.html

Units of measure, quantity-types (observable properties)

In development, curated

Originally sponsored by NASA, now being positioned for general use (but stalled for several years and lost momentum)

EMERGING

InteroperAble Descriptions of Observable Property Terminology (I-ADOPT)

https://www.rd-alliance.org/groups/interoperable-descriptions-observable-property-terminology-wg-i-adopt-wg

RDA Working Group to harmonize ‘observable property’ definitions and vocabularies. Input from EnvThes, ENVO, …

Proposed

Digital Representation of Units of Measure (DRUM)

http://www.codata.org/task-groups/drum

CODATA Task Group to address a well known gap in the science reference data systems, i.e. units-of-measure. Input from QUDT, NVS

Supported by IUPAC, NIST, TopQuadrant

Page 23: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

16 | CSIRO Australia’s National Science Agency

5.2 Vocabulary tools, libraries and platforms

This subsection provides background on the state-of-the-art around tools, libraries and platforms

for vocabulary publication, editing and management. We provide the following across each item

being considered:

● Brief description

● Role - Where in the stack is the tool designed for

● Who developed the tool

● Software licence

● Usage - where the tool is being used/has it been used

● Last updated

● Status of development (in-development, closed, dormant)

The table below lists the set of tools and libraries currently available, some of which are used in

RVA.

NAME DESCRIPTION ROLE STATUS USAGE

ELDA Java-based implementation of a Linked Data API library. Configurable way to access RDF data using simple RESTful URLs that are translated into queries to a SPARQL endpoint. https://github.com/epimorphics/elda

Linked Data API software

In-development. Last code commit Oct 2018. Open Source

Gozalez-Toral et al. 2019 National Archives UK UK Defra Data Services Platform SISSVoc and ARDC RDA

SISSVoc SISSVoc provides a set of SKOS-based APIs implemented using ELDA. http://sissvoc.info

Linked Data API software Implemention for SKOS

In-development. Last code commit Feb 2019.

Open Source

Research Vocabularies Australia AuScope

VocPrez Provides a configurable template-based SKOS presenter over the web. https://github.com/CSIRO-enviro-informatics/VocPrez

Presentation layer software

In-development. Alpha/Pre-release. Last code commit July 2019. Open Source

TERN Geosciences Australia

pyLDAPI Python-based Linked Data API library https://github.com/rdflib/pyLDAPI

Linked Data API software

In development. Last code commit June 2019. Open Source

VocPrez clients Loc-I project

SKOSMOS PHP-based web-based SKOS browser and publishing tool. Implements SKOS-based APIs with both human and machine readable views and queries supported. http://skosmos.org/

Linked Data API software implemention for SKOS

Open Source.Last code commit May 2019.

Developed at the National Library of Finland.

Contact: [email protected]

Finto

FAO / AGROVOC

Rhineland-Palatinate spatial data initiative classifications

Page 24: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 17

UNESCO Thesaurus

Controlled vocabularies used to index Luxemburgish legislation

Loterre, a multidisciplinary terminology platform

VocBench Software for deploying a platform for editing ontologies, thesauri and RDF datasets. http://vocbench.uniroma2.it/ https://bitbucket.org/art-uniroma2/vocbench3/downloads/

Ontology/vocabulary editing environment software

Latest version: v3 In-development. Last code commit July 2019. Open Source

FAO / AGROVOC

BioPortal Online services for OWL ontologies; https://bioportal.bioontology.org/

Class and property search; Text annotator; Ontology recommender

Open Bioportal - biomedical- and life-sciences; Agroportal - agriculture

MMI ORR (common codebase with ESIP-COR)

Ontology Repository/Registry software https://mmisw.org/orrdoc/ https://github.com/mmisw/orr

supports creation of concept/class mappings

In development. Last code commit Feb 2019. Latest release: v3.8.2 Open Source

Marine community - appears to be funded via EarthCube projects. Basis of ESIP-COR http://cor.esipfed.org/ which hosts SWEET

Web Protege

Java-based software for deploying a platform for collaborative ontology development. https://github.com/protegeproject/webprotege

Online platform hosted by Stanford here: https://webprotege.stanford.edu/

Ontology/vocabulary editing environment software

In development. Last code commit July 2019. Open Source

Biomedical projects

Topbraid EDG-VM

Topbraid Enterprise Data Governance Vocabulary Management https://www.topquadrant.com/products/topbraid-edg-vocabulary-management/

Enterprise vocabulary editing/management environment software

Proprietary

PoolParty PoolParty Semantic Suite provides a platform for managing vocabularies and semantic definitions. https://www.poolparty.biz/

Vocabulary editing/ management environment software

Proprietary

Page 25: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

18 | CSIRO Australia’s National Science Agency

Linked Data Registry

Platform for registration and maintenance of granular RDF-resources - such as SKOS concepts and collections; https://github.com/UKGovLD/registry-core

transparent fine-grained status and versioning

Open source Used by WMO, BRGM, and CSIRO

Re3registry Platform for registration of data including reference data (definitions) https://joinup.ec.europa.eu/solution/re3gistry

Definitions service

Open Source (EUPL)

Used by INSPIRE, BRGM e.g. http://registre.geocatalogue.fr/codelist

5.3 Governance patterns

The initiatives listed above have a variety of workflows for development and maintenance of the

vocabulary content. In many cases they are delivering vocabularies from legacy arrangements,

some of which have tightly controlled maintenance by long-term custodians. But there is a general

move towards community governance of research vocabularies.

5.3.1 Example implementations

Governance platforms increasingly utilize ticketing systems for proposed changes, hosted on

GitHub or similar publicly visible systems. This allows for the development of consensus through

threaded conversations, with a good level of transparency. Nevertheless, the exact patterns vary.

For example:

SWEET

SWEET uses a GitHub issue-tracker https://github.com/ESIPFed/sweet/issues Contributors are

encouraged to create pull-requests that actually implement updates in the RDF sources. When it is

judged that consensus has been reached PRs are merged by the SWEET editor. Micro-citations for

changes are supported using the contributor’s ORCID.

The ESIP-COR service is the ‘live’ system for resolving SWEET URIs, and periodic ‘releases’ of

SWEET are announced which are then loaded into COR. The development version of SWEET is

always available from GitHub.

ENVO

ENVO uses a GitHub issue-tracker https://github.com/EnvironmentOntology/envo/issues

Contributors are expected to provide as much information as possible about their proposal, but

implementation of new definitions and changes is done only by the ENVO editors. This is primarily

because definitions are formally axiomatized using OWL and OBO, and this requires knowledge of

the rest of ENVO and OBO.

ENVO URIs resolved to Ontobee http://www.ontobee.org/. ENVO is continuously maintained and

does not have any notion of ‘versions’.

Page 26: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 19

NVS

Many of the collections in the NERC Vocabulary Service are just representations of lists and

controlled vocabularies maintained by specific agencies and organizations in the oceanography

community. However, several key collections are actively maintained within NVS, with 100s of

updates every year.

For most of its history, content maintenance was undertaken privately by NVS staff, principally by

harvesting definitions from datasets introduced into the BODC archive, but also in response to

individual requests from users, through private channels (e.g. email).

Starting in 2019 NVS has established a publicly visible issue tracker on GitHub

https://github.com/nvs-vocabs - with one list per collection (each implemented as a GitHub

repository) though the actual data is still maintained in an Oracle database that is not directly

accessible outside BODC. Currently more than 90% of the changes are still managed outside of the

GitHub tracker, but the NVS team are considering strategies to harmonize the workflows.

NVS vocabularies are continuously maintained, with status metadata available for each term.

Previous versions of each term definition are available from URIs including a version number. NVS

also has a notion of versions on a per-collection (vocabulary) basis, with the older versions also

available from URIs including a version number.

QUDT

The QUDT project is tightly governed through a private GitHub.

--

These stories could inform the development of some technical practices for governance of

individual vocabularies. Each of these processes uses the technology platform to support the

governance arrangements adopted by the community, though in some cases (NVS) governance

arrangements are evolving partly prompted by the capabilities and expectations provided by the

platform.

5.3.2 Role of community

However, the critical element of any governance story is to identify and delineate the community

that has interest and expertise relating to a specific vocabulary scope, and to identify or designate

gatekeepers for governance, and the method for reaching consensus or endorsement of changes.

The roles in maintaining a general-purpose registration system are nicely summarized in the

diagram below from ISO 19135, understanding

- The ‘Registry’ corresponds to the RVA vocabulary repository, with ARDC as the ‘Registry

Manger’

- The ‘Submitting organization’ and ‘Register Owner’ are a vocabulary publisher, and the unit

within that organization who take responsibility for the content

- The ‘Control Body’ is the person or team delegated by the register owner to review and

authorize content changes

Page 27: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

20 | CSIRO Australia’s National Science Agency

- A ‘Register’ corresponds to an individual vocabulary, with the person responsible for

loading and maintaining the content in RVA as the ‘Register Manager’

Page 28: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 21

6 Recommendations

In this section we provide a tabulated summary of recommendations for future action in RVA. The recommendations are sorted in five categories, with three levels

of priority. Recommendations arose from each of different consultations, as indicated. The recommendations must be reviewed by ARDC to evaluate the extent to

which they match the ARDC priorities, and the resourcing implications. It is recognised that it may not be feasible to execute all recommendations on the time-

frame indicated, and that some adjustments to the prioritization may be necessary.

6.1 Engagement, training, practice

✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent

CONTINUE 1 YEAR 3 YEARS

✪✪ E.1 Vocabulary publication helpdesk (Rowan/Cel)

■ Email notification for

vocabulary changes

■ AVSIG meetings

webinars for important and

major RVA changes and

updates

✪ E.2 Improved user support:

- Review on-boarding checklist for RVA contributors; complement with screen-

capture presentations and pointers

- Engage with community to establish ‘Finding Aids’ to direct users to groups of

vocabularies recommended in specific domains or applications, e.g. AIHW

endorsed vocabularies for health

✪✪ E.3 Run RVA tutorials/roadshow to promote RVA, and create training materials videos

✪✪ E.4 Develop user-stories of exemplary patterns and workflows

- Conversion of vocabularies to SKOS/SKOS+ - Registration and upload options - Vocabulary maintenance/revision cycles/versioning - Governance arrangements, including discipline/community engagement - Use and adoption of vocabularies by communities – related to T.7

Use real examples like NEII, Soils, CGI (Geology), DDI, ABS

✪ E.6 Build a community of practice and/or specialised set of communities of practice, for specific domains/groups

✪ E.7 Support training for interacting with machine-to-machine interfaces, e.g. SPARQL training, examples/cookbook for ELDA, improving vocab widget for uptake

✪ E.8. Engage with the broader community to

develop additional ‘Finding Aids’ to support a wider

range of domains or applications of vocabularies.

Related to T.11

Page 29: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

22 | CSIRO Australia’s National Science Agency

6.2 Governance

✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent

CONTINUE 1 YEAR 3 YEARS

✪✪ G.1 Ensure that governance-related information (e.g. “legitimacy” of a

vocabulary hosted in RVA on behalf of an external organization or community) and

arrangements (provenance, last updated, point-of-contact etc.) are

obvious/transparent on the landing page for each vocabulary (may require

metadata remediation)

- Clean up the publisher/party list - align with ORCID, ROR or similar - additional related-party roles (e.g.

http://registry.it.csiro.au/def/isotc211/CI_RoleCode) - Revisit vocabulary metadata scheme - add link to issue-tracker(s) for community

input, status description (e.g. authorized by X for use in specified contexts)

✪ G.2 Establish guidelines on how to maintain vocabularies (relates to E.3 and E.4)

✪ G.3 Document additional user stories and patterns of governance as part of RVA (relates to G.1 and E.4)

✪✪ G.4 Explore establishing a Vocabulary

Governance Special Interest Group (VG-SIG) and RVA

registration requirements (developed with the VG-

SIG)

Page 30: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 23

6.3 Tools and Technology

✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent

CONTINUE 1 YEAR 3 YEARS

✪ T.1 Vocabulary editor (PoolParty)

✪ T.2 Vocabulary repository/API (ELDA, SISSVoc)

✪ T.3 Simple interfaces for

RVA portal

✪✪ T.4 Refine term search function - term-search function to return concept definitions (not just links to vocabularies containing concept definitions)

✪ T.5 Investigate possibility of a federated term search over the leading vocabulary services (see table)

✪✪ T.6 Provide manual, semi-automated and automated tools for establishing vocabulary mappings

✪ T.7 Undertake a user experience study to document and understand user needs and expectations for RVA services and RVA UI

✪ T.8 Provide a UI for vocabulary details with a look-and-feel more consistent with the ARDC Portal (may be alongside ELDA/SISSVoc, rather than in place of it)

✪ T.9 Provide additional content assimilation pathways (e.g. API for CSV or Excel)

✪ T.10 Provide APIs to support common application stacks (e.g. DBMS, GeoServer)

✪ T.11 Vocabulary recommender - c.f. https://bioportal.bioontology.org/recommender

✪✪✪✪ T.12 Support for cataloguing and hosting

vocabulary mappings maintained as separate

artefacts (e.g. linksets)

✪ T.13 Re-evaluate options for

● vocabulary-editor (e.g. VocBench) ● vocabulary API/UI (e.g. SKOSMOS, VocPrez)

✪✪ T.14 Additional content assimilation pathways (e.g. cache/sync with external services)

✪✪ T.15 APIs to support additional application stacks (e.g. DBMS, GeoServer)

✪✪ T.16 Expose registry metadata through m2m

interfaces that are more “standards compliant”

(DCAT, schema.org)

✪✪ T.17 Collaborate with other vocabulary service

providers to specify patterns for, or standardize,

vocabulary service APIs

Page 31: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

24 | CSIRO Australia’s National Science Agency

6.4 Content

✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent

CONTINUE 1 YEAR 3 YEARS

✪ C.1 Continue access to

different RDF-based schemas

for definitions

SKOS/schema.org/’custom

schema’

✪✪ C.2 Add more external vocabularies or interest to the Australian research

community. Survey specific domain-communities to identify gaps. Candidates might

include:

● QUDT, EnvThes, SWEET

● UN, DDI and ABS vocabularies

● ODM2 codelists

● ASLS (soil and landscape)

✪ C.3 Provide a directory of (external) vocabulary services of interest to the Australian research community

✪ C.4 Periodically poll publishers/custodians to verify the status of published vocabularies (c.f. FAIRsharing practice)

These recommendations might require a specific RVA content-curator role to be designated.

✪ C.8 Clean up the registers associated with RVA - esp. related-parties (organizations,

publishers, creators), and rationalize duplicates so that the faceted search is better –

also see G.1

✪✪✪ C.5 Include vocabulary mappings

✪✪✪ C.6 Support richer vocabulary structures in the standard RVA portal (SKOS, OWL)

✪ C.7 Vocabularies with overlapping scope - in

consultation with the relevant communities develop

strategy to consolidate these, or to provide trust

metrics to support selection of vocabularies – likely

to involve engagement with community of practice

E.6 and Governance interest group G.4

Page 32: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 25

6.5 Impact and analytics tracking

✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent

CONTINUE 1 YEAR 3 YEARS

✪✪ I.1 Support richer usage analytics / feedback

● Analytics visible to vocabulary managers for providing feedback to their host organisation

● Use of vocabs via widget

✪ I.2 Support richer usage analytics / feedback

● Publicly accessible general usage analytics ● Citations for vocabularies

Analytics for understanding users of vocabularies

Page 33: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4
Page 34: Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of Statistics are re-published in RVA. 2.4

Review of Research Vocabularies Australia | 1