LSIDs and RDF in TDWG

Post on 30-Dec-2015

21 views 1 download

description

LSIDs and RDF in TDWG. Roger Hyam, TDWG, RBGE Donald Hobern, GBIF June 7-9, 2006 - Edinburgh, UK. Paradigm. Starting assumption is that standards are about sharing data. Sharing data also implies sharing data through time. Archive. What is Shared?. Sharing raw literals isn’t much use. - PowerPoint PPT Presentation

Transcript of LSIDs and RDF in TDWG

LSIDs and RDF in TDWG

Roger Hyam, TDWG, RBGEDonald Hobern, GBIF

June 7-9, 2006 - Edinburgh, UK

Paradigm

• Starting assumption is that standards are about sharing data.

• Sharing data also implies sharing data through time.

Archive

What is Shared?

• Sharing raw literals isn’t much use.

• They need to be gathered together into ‘semantic’ units or objects.

TaxonName:1234Bellis perennis

perennis

Bellis

1234

Semantics of Objects

• Objects need to be based on some shared semantics.

• There needs to be somewhere to look up what they mean – an ontology.

TaxonName:Bellis perennis

Ontology

TaxonName?

Identity of Objects

• How do I refer to this object?

• Who should I credit?

• Who should I send corrections to?

• Is it the same record as I already have or is it a new one?

• What is the official version of this data - has some one altered it before I received it?

TDWG TAG-1 Meeting

• There was consensus on-– Architecture is concerned with shared data– Biodiversity data will be modeled as a graph

of identifiable objects– The semantics of these objects will be

encoded in a series of shared ontologies– Ontologies will be related to each other on the

basis of a shared Base and Core ontologies as a minimum

• Discussion continues on how this is done

Implications

• We need a ontology to define and relate the objects we exchange.

• Ontology governance/management is paramount.

• We need a system of GUIDs to identify the objects.

• We need a roadmap for the protocols to exchange these objects.

Structure of the OntologyBase Ontology

Core Ontology

Domain Ontology

Application Ontologies

BaseThing BaseActor

CoreTaxonName CoreInstitution

TaxonName

NomencalturalType

NomeclaturalNote

Herbarium

ABCD DarwinCore ???

Ontology Governance

• Allow people to create Domain sub-ontologies easily – prevent alienation.

• Each ontology construct (concept) has a status.

• Status is increased by passing through explicit gates defined by actual usage.

Experimental Shared Recommend

What about RDF?

• The need to share identifiable objects has been established without reference to a technology.

• We are interested in objects not triples.• Typical use case involves a client consuming

semantically heterogeneous data from multiple sources.

• Semantic Web technologies would be ideal – but aren’t part of the TDWG culture and there are ‘unbelievers’.

Current ‘Standards’

• DarwinCore & DiGIR– Based on Z39.50– HTTP based XML message / response– Simple ‘flat’ application schemas (RDF-like)

• ABCD & BioCASe– Based on DarwinCore & DiGIR– Complex document structure.

• TAPIR– Unification of BioCASe and DiGIR

• No RDF, Objects or GUIDs here yet!

Combing Data

• GBIF data portal is the only ‘application’ that does data integration between these formats.

• No standard way to include XML fragments from other XSD other than xs:any.

• There is overlap between the different schemas and no easy way to merge them.

What about LSIDs

• GUID-1 meeting considered several GUID technologies including (LSID, DOI & Handle).

• Life Science Identifiers are being assessed.– I3C & OMG URNs– urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434– getData()– getMetadata()

LSID Permanence

• LSIDs should not be recycled – i.e. Used for more that one object.

• LSIDs should always resolve but it is OK for them to resolve to a 404 (Gone) error.

• No central authority to control these things.

• Even DOIs go away if there isn’t institutional backing!

LSIDs for Everything?

• Are there some things for which LSIDs are inappropriate?– <logo rdf:resource=“urn:lsid:example.com:branding:logo.gif” />– xsi:schemaLocation=“urn:lsid:example.com:xsd:taxon.xsd”– xmlns:tn=“urn:lsid:example.com:ontology:taxon/”

• Definitely places where we will use something else.

• Other people will use their own identifiers e.g. DOI, Handle etc.

So what’s cooking?

XSD BasedConceptual Schemas

XML Based Exchange Protocols

200+ Data Providers50+ Million Anonymous ‘Records’

Emergent Semantic Web

Recognised NeedFor GUIDS Different GUID

Technologies

A TDWGOntology

OGC Standards (GML)

BioMOBY

Other!

Clients?

Possible Roadmap

• Build the ontology as a focus for semantics.

• Resolution and Harvest protocols should be relatively easy to plug into or wrap round existing service providers so approach these first.

• Search/Query – More problematic BioCASe, DiGIR, TAPIR, SPARQL, other?

Thank You

• Gordon and Betty Moore Foundation

• Global Biodiversity Information Facility

• NESC

• TDWG Members