NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server...

13
NERC DataGrid NERC DataGrid Vocabulary Workshop, RAL, February Vocabulary Workshop, RAL, February 25, 2009 25, 2009 NERC DataGrid NERC DataGrid Vocabulary Server Vocabulary Server Description Description

Transcript of NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server...

Page 1: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Workshop, RAL, February 25, 2009Vocabulary Workshop, RAL, February 25, 2009

NERC DataGridNERC DataGridVocabulary Server Vocabulary Server DescriptionDescription

Page 2: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

OutlineOutline

Vocabulary Server:

Data model Implementation Content Usage Development path

Page 3: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Server Data ModelVocabulary Server Data Model

The fundamental building block of the data model is a term, which is equivalent to a SKOS “concept”

Each term has:

Key: a semantically neutral string that forms the basis of a URN

Label: a human-readable name for the concept

Alternative label: used for abbreviations Definition: more verbose explanation of the

concept

Page 4: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Server Data ModelVocabulary Server Data Model

The terms are aggregated into lists equivalent to SKOS ‘collections’

Each list is given a semantically neutral identifier (4-byte string)

Lists may aggregated in ‘Superlists’

Each ‘Superlist’ is given a semantically opaque identifier (bytes 1-3 of the component list identifiers)

Page 5: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Server Data ModelVocabulary Server Data Model

The ‘Superlist’ concept was inherited from 1980s BODC infrastructure

It has no parallel in any knowledge representation standard

It is has the unpleasant side effect of giving terms alternative possible URNs

Its deprecation is becoming a priority

Page 6: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Server ImplementationVocabulary Server Implementation

Server back end is an Oracle relational database

All terms are stored in a single table

List and superlist aggregations implemented as a 2-level indexing table hierarchy

Heavily defended by constraints and triggers

Fully automated timestamps and update ‘fingerprints’

Fully automated audit trails

Fully automated list and superlist versioning

Page 7: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Server ImplementationVocabulary Server Implementation

Term URLs, list URLs and API calls invoke Java applications that submit SQL queries and wrap up the output as XML documents

Page 8: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Server ImplementationVocabulary Server Implementation

Why not XML?

Grew out of an integral part of the BODC Oracle infrastructure

Experiments with XML – particularly OWL – technology did not go well Maintenance tools seem less effective Navigation difficulties through very large XML

documents Performance issues with lists containing 20000+

terms XML has benefits such as access to

inference engines, so worth persevering Answer might be to have operational XML

builds from a relational back end

Page 9: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Server ContentVocabulary Server Content

Server Contents (2009-02-10) 76 public superlists 125 public lists 124701 public terms 80987 public mappings (RDF triples)

Some of the subject areas covered Parameters Platforms Instruments Coverage terms Geographic keywords

Page 10: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

Vocabulary Server UsageVocabulary Server Usage

Server Usage for 2008 (2009 to 2009-02-10 in brackets)

4793116 (607172) total hits

56232 (7134) vocabulary catalogue downloads

78708 (10233) vocabulary term/list downloads

1367 (433) vocabulary map downloads

2479 (73) term searches

1501 (74) term verifications

Rest of total is robots mining semantic links (getRelatedRecordByTerm method)

Page 11: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

VS Development PathVS Development Path

Version 1.1 current operational version

Version 1.2 currently under development

Transparent upgrade (no change to WSDL) Bug fix and activation of versioned list

serving Additional service API providing list content

upgrade functionality to authenticated, authorised external users

Page 12: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

VS Development PathVS Development Path

Version 2.0 currently being designed

Revisit back end design Governance labelling Deprecation support Introduce more XML technology?

Introduce formally-registered, truly permanent URNs

Single RESTful API giving both read and write access through appropriate HTTP methods

Output document revision to SKOS 2008

Page 13: NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.

NERC DataGridNERC DataGrid

VS Development PathVS Development Path

Whatever happens with V2.0 we will not annoy a large and very active user base through change

Both versions will therefore run in parallel until V1.2 calls are no longer logged