Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several...
Transcript of Review of Research Vocabularies Australia · spreadsheet, PDF or web-page. For example, several...
Review of Research Vocabularies Australia Simon J D Cox and Jonathan Yu
2019-08-14
Revised 2019-09-09
Prepared for Australian Research Data Commons
Adrian Burton, Cel Pilapil
Australia’s National Science Agency
Review of Research Vocabularies Australia | i
Land and Water
Citation
Cox, S J D and Yu, J (2019) Review of Research Vocabularies Australia
Copyright
© Commonwealth Scientific and Industrial Research Organisation 2019. To the extent permitted
by law, all rights are reserved and no part of this publication covered by copyright may be
reproduced or copied in any form or by any means except with the written permission of CSIRO.
Important disclaimer
CSIRO advises that the information contained in this publication comprises general statements
based on scientific research. The reader is advised and needs to be aware that such information
may be incomplete or unable to be used in any specific situation. No reliance or actions must
therefore be made on that information without seeking prior expert professional, scientific and
technical advice. To the extent permitted by law, CSIRO (including its employees and consultants)
excludes all liability to any person for any consequences, including but not limited to all losses,
damages, costs, expenses and any other compensation, arising directly or indirectly from using this
publication (in part or in whole) and any information or material contained in it.
CSIRO is committed to providing web accessible content wherever possible. If you are having
difficulties with accessing this document please contact [email protected].
Review of Research Vocabularies Australia | i
Contents
Acknowledgments ............................................................................................................................ii
Executive summary ......................................................................................................................... iii
1 Goals and approach ............................................................................................................ 1
2 Overview of Research Vocabularies Australia .................................................................... 2
2.1 Purpose .................................................................................................................. 2
2.2 Current state .......................................................................................................... 2
2.3 Usage patterns....................................................................................................... 4
2.4 Expressivity ............................................................................................................ 5
3 Stakeholder survey ............................................................................................................. 6
3.1 Limitations ............................................................................................................. 6
3.2 Potential enhancements ....................................................................................... 6
4 Workshop ............................................................................................................................ 8
4.1 Workshop participants .......................................................................................... 8
4.2 Program ................................................................................................................. 9
4.3 Workshop process ............................................................................................... 10
4.4 Workshop outputs ............................................................................................... 10
5 Related work ..................................................................................................................... 14
5.1 Initiatives and services ........................................................................................ 14
5.2 Vocabulary tools, libraries and platforms ........................................................... 16
5.3 Governance patterns ........................................................................................... 18
6 Recommendations ............................................................................................................ 21
6.1 Engagement, training, practice ........................................................................... 21
6.2 Governance ......................................................................................................... 22
6.3 Tools and Technology .......................................................................................... 23
6.4 Content ................................................................................................................ 24
6.5 Impact and analytics tracking .............................................................................. 25
ii | CSIRO Australia’s National Science Agency
Acknowledgments
This report was the result of a consultation involving ARDC staff and various stakeholders from the
Australian research community, as listed in 4.1 Workshop participants.
Review of Research Vocabularies Australia | iii
Executive summary
This review of Research Vocabularies Australia (RVA) undertakes an evaluation of the current
capabilities and provides recommendations concerning its future directions and development. The
review includes a stakeholder consultation and expert assessment.
The review found that RVA is meeting a clear need, and provides a suite of capabilities that are
valued by the community. Recommendations are made for future improvements to RVA in a
number of areas, including content, technology, user-support, governance and community
engagement, and performance and analytics.
Review of Research Vocabularies Australia | 1
1 Goals and approach
The review was undertaken as part of a project to identify future directions for the ARDC data
publication services in the area of (i) data discovery and (ii) vocabulary services. ARDC wishes to be
informed and guided by external and independent research and consultation as to whether (and if
so how) future development of the data publication services might be undertaken.
Specific topics relating to vocabularies include
• Vocabulary re-use/discovery/ tooling/applications layer
• complex knowledge organisation and related research infrastructure scenarios
• persistent URL management
Inputs in the review were
1. stakeholder survey - issued 2019-06-10
2. stakeholder workshop - held 2019-06-25
3. background knowledge and environmental scans by consultants
4. interviews with ARDC staff
5. comments on the draft report from workshop attendees and other stakeholders
2 | CSIRO Australia’s National Science Agency
2 Overview of Research Vocabularies Australia
2.1 Purpose
Research Vocabularies Australia (RVA) offers a suite of services to allow the Australian research
community to publish and access controlled-vocabularies used in various disciplines and cross-
disciplinary applications via the world wide web.
RVA is one of a number of web-based services providing term definitions for use in research
projects and data. RVA is unusual in that its scope is the whole of the research sector, though its
content is loaded in response to specific community requests, and reflects varying levels of
engagement with different research communities and disciplines. Nevertheless, the technical basis
for RVA’s services follows international norms and best practice.
Community vocabulary services enable sharing of technical definitions within and between
disciplines. These definitions are important both for tagging projects, initiatives and datasets to
assist discovery in data catalogues, but also to provide specific definitions of elements within
datasets, such as units of measure, experimental procedures, variables, statistical classifications,
etc. These were traditionally captured in obscure annotations in ‘column headings’. Hosting
precise, accessible, shared definitions in a common service with individual web-identifiers at a fine
level of granularity (i.e. per term, not just per vocabulary) allows a project to refer to a shared
definition, rather than transcribing it locally or even formulating a new definition for concepts that
should be defined once and reused many times. This approach supports interoperability and re-
use of data, consistent with the principle that research datasets should be FAIR - Findable,
Accessible, Interoperable and Reusable.
2.2 Current state
RVA includes the following components:
- Vocabulary editor (PoolParty - commercial product)
- Vocabulary repository (RDF4J/ELDA/SISSvoc - open source)
- Vocabulary registry (RVA Registry - ARDC developed and maintained)
- Vocabulary portal (RVA Portal - UI over RVA Registry API)
- User support, documentation
Review of Research Vocabularies Australia | 3
Figure 1. ARDC Vocabulary Services - component diagram
The technologies for each function were selected following a requirements-gathering and
assessment exercise initiated in 2014.
These components enable several modes of access to vocabularies via the online web system:
- browser user interface - for vocabulary discovery, term search, and vocabulary exploration
- browser user interface - for vocabulary registration, upload and versioning
- registration API - for vocabulary maintainers
- search and download API - for system integrators and application builders
- vocabulary widget - for application builders
217 vocabularies are visible in the RVA Portal1, including
- 80 which are a simple listing, with a link to an externally hosted web-page for the
vocabulary
- 137 which are hosted in the repository, enabling an API and interactive UI for exploration,
and optional widget, of which
o 100 were loaded from Poolparty
o 37 were prepared using another tool, and then loaded from an RDF representation
1 As of 2019-07-30
4 | CSIRO Australia’s National Science Agency
Figure 2. Summary of hosting arrangements for vocabularies listed in the RVA Portal
The vocabularies hosted in the RVA repository follow contemporary best practice whereby
- each term is denoted by a web-identifier (URI)
o e.g. http://resource.geosciml.org/classifier/ics/ischart/Jurassic
- a description of each term is available
o as a web-page (for human consumption),
o structured using SKOS, in various RDF encodings (for use in applications)
The RVA API and UI use SISSVoc, in which the description of each term and vocabulary is accessed
at a URL which includes the term URI, e.g.
URL for the vocabulary service
↓
http://vocabs.ands.org.au/repository/api/lda/csiro/international-chronostratigraphic-chart/2018-
revised-corrected/resource?uri=http://resource.geosciml.org/classifier/ics/ischart/Jurassic
↑
URI for a term or vocabulary item
2.3 Usage patterns
Vocabulary contributors use RVA in a number of different ways.
2.3.1 Primary host, API and UI
Many contributors use RVA as the primary or sole host for their vocabulary, and suggest that their
community use the RVA user-interface and API as-is. This particularly applies to those who also
use the RVA-hosted Poolparty for vocabulary creation and maintenance.
Review of Research Vocabularies Australia | 5
In this case it is recommended that the owner of the URI domain for the term or vocabulary (e.g.
resource.geosciml.org) re-direct all URI requests to RVA, e.g.
http://resource.geosciml.org/classifier/ics/ischart/Jurassic
-- HTTP 302/303/307 →
http://vocabs.ands.org.au/repository/api/lda/csiro/international-chronostratigraphic-
chart/2018-revised-
corrected/resource?uri=http://resource.geosciml.org/classifier/ics/ischart/Jurassic
2.3.2 Primary host, remote UI
Some contributors use the RVA repository as the primary host for their vocabulary, and then
provide their own UI – e.g. Geoscience Australia, TERN. The connection to the content may be
either through the SPARQL endpoint or the SISSVoc API.
2.3.3 Secondary (backup) host or cache
The RVA Repository can serve as a reliable alternative host (backup or cache) for any vocabulary
maintainer.
2.3.4 Linked-data cache
A common use of RVA is to provide a linked-data representation or view of a vocabulary whose
primary or canonical representation is less web-friendly – for example as a tabulation in a CSV or
spreadsheet, PDF or web-page. For example, several vocabularies from the Australian Bureau of
Statistics are ‘re-published’ in RVA.
2.4 Expressivity
The RVA repository is a standard RDF triple store, so vocabularies may include elements from any
RDF vocabulary (or OWL ontology). A small number of vocabularies in RVA do go beyond the basic
SKOS properties in order to get more semantic expressivity. However, the higher-level RVA
applications (Vocabulary API, RVA catalogue, vocabulary widgets) are optimised for SKOS, and
assume a particular pattern for the internal organization and nesting of vocabulary items, and
most RVA vocabularies follow this pattern and basic SKOS representation.
6 | CSIRO Australia’s National Science Agency
3 Stakeholder survey
A survey was distributed to RVA stakeholders prior to the workshop. The response was small (8)
but there was valuable qualitative information. A dominant message was that RVA is highly valued
by those stakeholders that have used it.
3.1 Limitations
Descriptions of some specific limitations and impediments to greater use of RVA were provided:
- Availability of and preference for a community/domain service (e.g. CESSDA)
- The current interface makes it difficult to search for terms across vocabs
o “you get a list of sets and NOW you have to look into each one to actually find a
potential term. Tedious and useless”
o “Discovering a term across multiple vocab portals with a single query would be nice.
It took me 30 minutes to find a term or even if one existed just looking at 3 sites.”
- The vocabulary upload/maintenance API makes it difficult to integrate with enterprise
tools and processes
o “ABARES staff would be seeking support on how to manage this update within the
service ...”
- The level of community or domain endorsement per vocabulary is unclear
o “We have authoritative datasets but are there authoritative vocabularies
(community endorsed vocabularies?), and if so, how are they identified within a
vocabulary service?”
- Government agencies are unsure of the scope of RVA/ARDC
o “could the RVA service support more than just the research community?”
3.2 Potential enhancements
Participants rated 10 potential RVA enhancements as summarized in the following table (only 6
responses to the structured questions).
Review of Research Vocabularies Australia | 7
8 | CSIRO Australia’s National Science Agency
4 Workshop
A workshop of RVA users was held 2019-06-25 at Black Mountain, Canberra
4.1 Workshop participants
Attendees Organizers
Christine Price BOM Cel Pilapil ARDC
Edmond Chuc TERN (remote) Joel Benn ARDC
Siddeswara Guru TERN Adrian Burton ARDC
Natalia Atkins IMOS Jonathan Yu CSIRO
Megan Wong FedUni Simon Cox CSIRO
Robyn Tottenham ABARES Rowan Brownlee ARDC
Tatiana Antsoupova NAA Richard Walker ARDC
Tessa Elieff NAA Julia Martin ARDC
Chantelle Doan BOM
Jasmine Howorth ABARES
Jenny Wood AIATSIS
Joanne Sullivan ABARES
John Morrissey CSIRO
John Page CSIRO
Lara Sedgemen GA
Les Kneebone APO
Marina McGale ADA
Margie Smith GA
Negin Moghaddam GA
Review of Research Vocabularies Australia | 9
4.2 Program
Time Topic
9:30- 10:00am Arrival, Tea/Coffee
10:00- 10:05am Welcome and housekeeping
10:05- 10:10am Goals and structure of workshop
10:10-10:20am Self introduction:
- Your name
- Your affiliation
- Why are you here?
10:20- 11:10am Setting the Scene
Presentations from three exemplary users of controlled vocabularies,
addressing the question: ⏎
Where is knowledge organisation currently in your enterprise?
Future possibilities based on international examples
11:10- 11:20am Stretch break/Coffee/Morning tea
11:20am-12:30pm Breakout session:
- What is working in RVA?
- What are the gaps for satisfying community use cases (which
may exist at different levels, users, operations)
- Report back from discussion in breakout
12:30- 1:15pm L U N C H
1:15- 2:30pm Interactive - Options for improvement and new things
2:30- 2:55pm Priorities for RVA
2:55- 3:00pm Closing remarks
3:00pm End of workshop
Workshop slides are available from
https://docs.google.com/presentation/d/1tFWTuKnslAZAm4h3K8xgO_1UFOjPfph3JrB3qgq7cMo/view
10 | CSIRO Australia’s National Science Agency
4.3 Workshop process
The workshop was primarily framed around a series of table sessions of small groups, to allow for
contributions by all participants.
The first set of breakouts were focused on assessment of the current state of RVA and identifying
candidate opportunities for change or enhancement. The second set of breakouts aimed at
prioritizing the proposed changes according to their importance and urgency. The results of the
interactive sessions were captured with post-its that were sorted into four categories: Keep, Stop,
Change, New.
4.4 Workshop outputs
4.4.1 Keep and stop
There was a general consensus that RVA is offering an important service, with most current
features found useful (Poolparty editor, widget, notifications).
There is particularly strong support for the support and engagement arrangements. One-on-one
coaching of new content providers is found to be very useful. However, it is also noted that this
may not be scalable or sustainable, and other user-support mechanisms should be pursued.
A few suggestions for stopping or de-emphasizing features were made, though these are not fully
consistent with other recommendations.
Review of Research Vocabularies Australia | 11
4.4.2 Enhancements: changes and new features
Promotion and outreach activities are valued, and increases in these are suggested.
Term search is an important feature of RVA, though improvements are suggested.
12 | CSIRO Australia’s National Science Agency
General
IMPORTANT AND URGENT IMPORTANT BUT NOT SO URGENT
● Governance
● Tools
Usage analytics / feedback
Specific
IMPORTANT AND URGENT IMPORTANT BUT NOT SO URGENT
● Governance workflows (stories?)
● Versioning granularity (gap in tools?)
● Ability for non-vocabulary-owners to
create mappings/matches between
vocabularies
● Tools to support end user workflows, e.g.
integrate vocabs easily into Excel,
Databases, Geoserver
● Tutorials and educational communications
(webinars, roadshow) - what is a vocab,
why organisations should use RVA incl.
government agencies
● Enhance vocab search
● Possible consolidation of vocabularies
(currently multiple instances of similar
scope and cross-domain use cases, e.g.
organisation names, units of measure)
● Alerts and notifications when vocabulary
of interest is added/created/edited
● Tool to remind vocabulary managers to
check and update vocabulary
● Vocabulary recommender tool
● Citations for vocabularies
● Notifications and analytics for
understanding users of vocabularies
● Notification of use of vocabs via widget
● Vocabulary publication workflow example
stories written up
● Community building
● User interface consistency, customization
Review of Research Vocabularies Australia | 13
4.4.3 Ideal vocabulary service
A subset of the workshop participants were newcomers to RVA, with no current experience of the
system and its features. In place of the ‘retrospective’ exercise, these participants were asked to
speculate on what an ideal vocabulary service would offer. Their deliberations are summarized in
the following figure:
14 | CSIRO Australia’s National Science Agency
5 Related work
This section provides a scan of some related vocabulary services work internationally. First a set of
vocabulary initiatives are summarized. Second, a review of related tools and platforms is provided.
Note that this survey is not exhaustive, but indicates the scope of vocabularies published through
online services, and some of the approaches and platforms in use.
5.1 Initiatives and services
Here we list a number of initiatives that are primarily concerned with specific collections of
content. In some cases the scope is highly generic (e.g. Library of Congress) though still primarily
designed to be used in a specific set of applications (e.g. cataloguing information artefacts). In
other cases the scope is specialized, primarily domain or even application specific (e.g. DDI, NVS)
though it may have utility more generally.
The table lists some services currently available hosted by other institutions for the purposes of
publishing and managing vocabularies online. In addition to the existing content service, some
new initiatives are likely to generate new vocabularies of widespread significance.
NAME SCOPE STATUS ROLE & USAGE
CURRENT
Library of Congress Linked Data Service https://id.loc.gov/
Classifications for libraries and archives e.g. Subject Headings Classification schemes
Stable, curated
US, global reference
DDI Alliance controlled vocabularies
http://www.ddialliance.org/controlled-vocabularies
Social science classifications and definitions
Stable, curated
Official statistics bureaux, World Bank
BioPortal
https://bioportal.bioontology.org/
Hosts biomedical ontologies from OBO and similar
Stable, open Biomedical, life sciences
OBO Foundry
http://www.obofoundry.org/ http://www.ontobee.org/
Biological, biomedical, and life-sciences
Continuous maintenance
Biomedical research and tagging
Agroportal
http://agroportal.lirmm.fr/
Publishing and managing agricultural vocabularies
Stable, semi-curated
AGROVOC
http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/
Single-source controlled vocabulary of agriculture terms
Stable, curated
GACS
http://browser.agrisemantics.org/gacs/en/
Harmonized vocabulary of agriculture terms
Experimental, curated
Review of Research Vocabularies Australia | 15
NERC Vocabulary Service (NVS)
https://www.bodc.ac.uk/resources/products/web_services/vocab/
Oceanography - point of truth for multiple international initiatives
Stable, curated
(Oracle/ Jena/ homebrew)
point of truth for multiple international initiatives, e.g. SeaDataNet
UK Centre for Ecology and Hydrology Semantic Web Portal
http://vocabs.ceh.ac.uk/edg/tbl/swp?_viewName=home
Environment and ecology
e.g. EnvThes
Experimental, curated
ODM2 controlled vocabularies
http://vocabulary.odm2.org/
Classifiers for earth and environment science observations
Experimental, curated
Earthcube (US)
SWEET
https://github.com/ESIPFed/sweet
Modular ontology suite covering Earth system science.
Actively maintained through ESIP Semantic Technologies Committee
Established early in development of semantic web, widely used and cited
QUDT
http://www.qudt.org/release2/qudt-catalog.html
Units of measure, quantity-types (observable properties)
In development, curated
Originally sponsored by NASA, now being positioned for general use (but stalled for several years and lost momentum)
EMERGING
InteroperAble Descriptions of Observable Property Terminology (I-ADOPT)
https://www.rd-alliance.org/groups/interoperable-descriptions-observable-property-terminology-wg-i-adopt-wg
RDA Working Group to harmonize ‘observable property’ definitions and vocabularies. Input from EnvThes, ENVO, …
Proposed
Digital Representation of Units of Measure (DRUM)
http://www.codata.org/task-groups/drum
CODATA Task Group to address a well known gap in the science reference data systems, i.e. units-of-measure. Input from QUDT, NVS
Supported by IUPAC, NIST, TopQuadrant
16 | CSIRO Australia’s National Science Agency
5.2 Vocabulary tools, libraries and platforms
This subsection provides background on the state-of-the-art around tools, libraries and platforms
for vocabulary publication, editing and management. We provide the following across each item
being considered:
● Brief description
● Role - Where in the stack is the tool designed for
● Who developed the tool
● Software licence
● Usage - where the tool is being used/has it been used
● Last updated
● Status of development (in-development, closed, dormant)
The table below lists the set of tools and libraries currently available, some of which are used in
RVA.
NAME DESCRIPTION ROLE STATUS USAGE
ELDA Java-based implementation of a Linked Data API library. Configurable way to access RDF data using simple RESTful URLs that are translated into queries to a SPARQL endpoint. https://github.com/epimorphics/elda
Linked Data API software
In-development. Last code commit Oct 2018. Open Source
Gozalez-Toral et al. 2019 National Archives UK UK Defra Data Services Platform SISSVoc and ARDC RDA
SISSVoc SISSVoc provides a set of SKOS-based APIs implemented using ELDA. http://sissvoc.info
Linked Data API software Implemention for SKOS
In-development. Last code commit Feb 2019.
Open Source
Research Vocabularies Australia AuScope
VocPrez Provides a configurable template-based SKOS presenter over the web. https://github.com/CSIRO-enviro-informatics/VocPrez
Presentation layer software
In-development. Alpha/Pre-release. Last code commit July 2019. Open Source
TERN Geosciences Australia
pyLDAPI Python-based Linked Data API library https://github.com/rdflib/pyLDAPI
Linked Data API software
In development. Last code commit June 2019. Open Source
VocPrez clients Loc-I project
SKOSMOS PHP-based web-based SKOS browser and publishing tool. Implements SKOS-based APIs with both human and machine readable views and queries supported. http://skosmos.org/
Linked Data API software implemention for SKOS
Open Source.Last code commit May 2019.
Developed at the National Library of Finland.
Contact: [email protected]
Finto
FAO / AGROVOC
Rhineland-Palatinate spatial data initiative classifications
Review of Research Vocabularies Australia | 17
UNESCO Thesaurus
Controlled vocabularies used to index Luxemburgish legislation
Loterre, a multidisciplinary terminology platform
VocBench Software for deploying a platform for editing ontologies, thesauri and RDF datasets. http://vocbench.uniroma2.it/ https://bitbucket.org/art-uniroma2/vocbench3/downloads/
Ontology/vocabulary editing environment software
Latest version: v3 In-development. Last code commit July 2019. Open Source
FAO / AGROVOC
BioPortal Online services for OWL ontologies; https://bioportal.bioontology.org/
Class and property search; Text annotator; Ontology recommender
Open Bioportal - biomedical- and life-sciences; Agroportal - agriculture
MMI ORR (common codebase with ESIP-COR)
Ontology Repository/Registry software https://mmisw.org/orrdoc/ https://github.com/mmisw/orr
supports creation of concept/class mappings
In development. Last code commit Feb 2019. Latest release: v3.8.2 Open Source
Marine community - appears to be funded via EarthCube projects. Basis of ESIP-COR http://cor.esipfed.org/ which hosts SWEET
Web Protege
Java-based software for deploying a platform for collaborative ontology development. https://github.com/protegeproject/webprotege
Online platform hosted by Stanford here: https://webprotege.stanford.edu/
Ontology/vocabulary editing environment software
In development. Last code commit July 2019. Open Source
Biomedical projects
Topbraid EDG-VM
Topbraid Enterprise Data Governance Vocabulary Management https://www.topquadrant.com/products/topbraid-edg-vocabulary-management/
Enterprise vocabulary editing/management environment software
Proprietary
PoolParty PoolParty Semantic Suite provides a platform for managing vocabularies and semantic definitions. https://www.poolparty.biz/
Vocabulary editing/ management environment software
Proprietary
18 | CSIRO Australia’s National Science Agency
Linked Data Registry
Platform for registration and maintenance of granular RDF-resources - such as SKOS concepts and collections; https://github.com/UKGovLD/registry-core
transparent fine-grained status and versioning
Open source Used by WMO, BRGM, and CSIRO
Re3registry Platform for registration of data including reference data (definitions) https://joinup.ec.europa.eu/solution/re3gistry
Definitions service
Open Source (EUPL)
Used by INSPIRE, BRGM e.g. http://registre.geocatalogue.fr/codelist
5.3 Governance patterns
The initiatives listed above have a variety of workflows for development and maintenance of the
vocabulary content. In many cases they are delivering vocabularies from legacy arrangements,
some of which have tightly controlled maintenance by long-term custodians. But there is a general
move towards community governance of research vocabularies.
5.3.1 Example implementations
Governance platforms increasingly utilize ticketing systems for proposed changes, hosted on
GitHub or similar publicly visible systems. This allows for the development of consensus through
threaded conversations, with a good level of transparency. Nevertheless, the exact patterns vary.
For example:
SWEET
SWEET uses a GitHub issue-tracker https://github.com/ESIPFed/sweet/issues Contributors are
encouraged to create pull-requests that actually implement updates in the RDF sources. When it is
judged that consensus has been reached PRs are merged by the SWEET editor. Micro-citations for
changes are supported using the contributor’s ORCID.
The ESIP-COR service is the ‘live’ system for resolving SWEET URIs, and periodic ‘releases’ of
SWEET are announced which are then loaded into COR. The development version of SWEET is
always available from GitHub.
ENVO
ENVO uses a GitHub issue-tracker https://github.com/EnvironmentOntology/envo/issues
Contributors are expected to provide as much information as possible about their proposal, but
implementation of new definitions and changes is done only by the ENVO editors. This is primarily
because definitions are formally axiomatized using OWL and OBO, and this requires knowledge of
the rest of ENVO and OBO.
ENVO URIs resolved to Ontobee http://www.ontobee.org/. ENVO is continuously maintained and
does not have any notion of ‘versions’.
Review of Research Vocabularies Australia | 19
NVS
Many of the collections in the NERC Vocabulary Service are just representations of lists and
controlled vocabularies maintained by specific agencies and organizations in the oceanography
community. However, several key collections are actively maintained within NVS, with 100s of
updates every year.
For most of its history, content maintenance was undertaken privately by NVS staff, principally by
harvesting definitions from datasets introduced into the BODC archive, but also in response to
individual requests from users, through private channels (e.g. email).
Starting in 2019 NVS has established a publicly visible issue tracker on GitHub
https://github.com/nvs-vocabs - with one list per collection (each implemented as a GitHub
repository) though the actual data is still maintained in an Oracle database that is not directly
accessible outside BODC. Currently more than 90% of the changes are still managed outside of the
GitHub tracker, but the NVS team are considering strategies to harmonize the workflows.
NVS vocabularies are continuously maintained, with status metadata available for each term.
Previous versions of each term definition are available from URIs including a version number. NVS
also has a notion of versions on a per-collection (vocabulary) basis, with the older versions also
available from URIs including a version number.
QUDT
The QUDT project is tightly governed through a private GitHub.
--
These stories could inform the development of some technical practices for governance of
individual vocabularies. Each of these processes uses the technology platform to support the
governance arrangements adopted by the community, though in some cases (NVS) governance
arrangements are evolving partly prompted by the capabilities and expectations provided by the
platform.
5.3.2 Role of community
However, the critical element of any governance story is to identify and delineate the community
that has interest and expertise relating to a specific vocabulary scope, and to identify or designate
gatekeepers for governance, and the method for reaching consensus or endorsement of changes.
The roles in maintaining a general-purpose registration system are nicely summarized in the
diagram below from ISO 19135, understanding
- The ‘Registry’ corresponds to the RVA vocabulary repository, with ARDC as the ‘Registry
Manger’
- The ‘Submitting organization’ and ‘Register Owner’ are a vocabulary publisher, and the unit
within that organization who take responsibility for the content
- The ‘Control Body’ is the person or team delegated by the register owner to review and
authorize content changes
20 | CSIRO Australia’s National Science Agency
- A ‘Register’ corresponds to an individual vocabulary, with the person responsible for
loading and maintaining the content in RVA as the ‘Register Manager’
Review of Research Vocabularies Australia | 21
6 Recommendations
In this section we provide a tabulated summary of recommendations for future action in RVA. The recommendations are sorted in five categories, with three levels
of priority. Recommendations arose from each of different consultations, as indicated. The recommendations must be reviewed by ARDC to evaluate the extent to
which they match the ARDC priorities, and the resourcing implications. It is recognised that it may not be feasible to execute all recommendations on the time-
frame indicated, and that some adjustments to the prioritization may be necessary.
6.1 Engagement, training, practice
✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent
CONTINUE 1 YEAR 3 YEARS
✪✪ E.1 Vocabulary publication helpdesk (Rowan/Cel)
■ Email notification for
vocabulary changes
■ AVSIG meetings
webinars for important and
major RVA changes and
updates
✪ E.2 Improved user support:
- Review on-boarding checklist for RVA contributors; complement with screen-
capture presentations and pointers
- Engage with community to establish ‘Finding Aids’ to direct users to groups of
vocabularies recommended in specific domains or applications, e.g. AIHW
endorsed vocabularies for health
✪✪ E.3 Run RVA tutorials/roadshow to promote RVA, and create training materials videos
✪✪ E.4 Develop user-stories of exemplary patterns and workflows
- Conversion of vocabularies to SKOS/SKOS+ - Registration and upload options - Vocabulary maintenance/revision cycles/versioning - Governance arrangements, including discipline/community engagement - Use and adoption of vocabularies by communities – related to T.7
Use real examples like NEII, Soils, CGI (Geology), DDI, ABS
✪ E.6 Build a community of practice and/or specialised set of communities of practice, for specific domains/groups
✪ E.7 Support training for interacting with machine-to-machine interfaces, e.g. SPARQL training, examples/cookbook for ELDA, improving vocab widget for uptake
✪ E.8. Engage with the broader community to
develop additional ‘Finding Aids’ to support a wider
range of domains or applications of vocabularies.
Related to T.11
22 | CSIRO Australia’s National Science Agency
6.2 Governance
✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent
CONTINUE 1 YEAR 3 YEARS
✪✪ G.1 Ensure that governance-related information (e.g. “legitimacy” of a
vocabulary hosted in RVA on behalf of an external organization or community) and
arrangements (provenance, last updated, point-of-contact etc.) are
obvious/transparent on the landing page for each vocabulary (may require
metadata remediation)
- Clean up the publisher/party list - align with ORCID, ROR or similar - additional related-party roles (e.g.
http://registry.it.csiro.au/def/isotc211/CI_RoleCode) - Revisit vocabulary metadata scheme - add link to issue-tracker(s) for community
input, status description (e.g. authorized by X for use in specified contexts)
✪ G.2 Establish guidelines on how to maintain vocabularies (relates to E.3 and E.4)
✪ G.3 Document additional user stories and patterns of governance as part of RVA (relates to G.1 and E.4)
✪✪ G.4 Explore establishing a Vocabulary
Governance Special Interest Group (VG-SIG) and RVA
registration requirements (developed with the VG-
SIG)
Review of Research Vocabularies Australia | 23
6.3 Tools and Technology
✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent
CONTINUE 1 YEAR 3 YEARS
✪ T.1 Vocabulary editor (PoolParty)
✪ T.2 Vocabulary repository/API (ELDA, SISSVoc)
✪ T.3 Simple interfaces for
RVA portal
✪✪ T.4 Refine term search function - term-search function to return concept definitions (not just links to vocabularies containing concept definitions)
✪ T.5 Investigate possibility of a federated term search over the leading vocabulary services (see table)
✪✪ T.6 Provide manual, semi-automated and automated tools for establishing vocabulary mappings
✪ T.7 Undertake a user experience study to document and understand user needs and expectations for RVA services and RVA UI
✪ T.8 Provide a UI for vocabulary details with a look-and-feel more consistent with the ARDC Portal (may be alongside ELDA/SISSVoc, rather than in place of it)
✪ T.9 Provide additional content assimilation pathways (e.g. API for CSV or Excel)
✪ T.10 Provide APIs to support common application stacks (e.g. DBMS, GeoServer)
✪ T.11 Vocabulary recommender - c.f. https://bioportal.bioontology.org/recommender
✪✪✪✪ T.12 Support for cataloguing and hosting
vocabulary mappings maintained as separate
artefacts (e.g. linksets)
✪ T.13 Re-evaluate options for
● vocabulary-editor (e.g. VocBench) ● vocabulary API/UI (e.g. SKOSMOS, VocPrez)
✪✪ T.14 Additional content assimilation pathways (e.g. cache/sync with external services)
✪✪ T.15 APIs to support additional application stacks (e.g. DBMS, GeoServer)
✪✪ T.16 Expose registry metadata through m2m
interfaces that are more “standards compliant”
(DCAT, schema.org)
✪✪ T.17 Collaborate with other vocabulary service
providers to specify patterns for, or standardize,
vocabulary service APIs
24 | CSIRO Australia’s National Science Agency
6.4 Content
✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent
CONTINUE 1 YEAR 3 YEARS
✪ C.1 Continue access to
different RDF-based schemas
for definitions
SKOS/schema.org/’custom
schema’
✪✪ C.2 Add more external vocabularies or interest to the Australian research
community. Survey specific domain-communities to identify gaps. Candidates might
include:
● QUDT, EnvThes, SWEET
● UN, DDI and ABS vocabularies
● ODM2 codelists
● ASLS (soil and landscape)
✪ C.3 Provide a directory of (external) vocabulary services of interest to the Australian research community
✪ C.4 Periodically poll publishers/custodians to verify the status of published vocabularies (c.f. FAIRsharing practice)
These recommendations might require a specific RVA content-curator role to be designated.
✪ C.8 Clean up the registers associated with RVA - esp. related-parties (organizations,
publishers, creators), and rationalize duplicates so that the faceted search is better –
also see G.1
✪✪✪ C.5 Include vocabulary mappings
✪✪✪ C.6 Support richer vocabulary structures in the standard RVA portal (SKOS, OWL)
✪ C.7 Vocabularies with overlapping scope - in
consultation with the relevant communities develop
strategy to consolidate these, or to provide trust
metrics to support selection of vocabularies – likely
to involve engagement with community of practice
E.6 and Governance interest group G.4
Review of Research Vocabularies Australia | 25
6.5 Impact and analytics tracking
✪ ARDC contribution ✪ Expert feedback ✪ Stakeholder feedback (workshop) ✪ Stakeholder feedback survey) | High priority/urgent
CONTINUE 1 YEAR 3 YEARS
✪✪ I.1 Support richer usage analytics / feedback
● Analytics visible to vocabulary managers for providing feedback to their host organisation
● Use of vocabs via widget
✪ I.2 Support richer usage analytics / feedback
● Publicly accessible general usage analytics ● Citations for vocabularies
Analytics for understanding users of vocabularies
Review of Research Vocabularies Australia | 1