LSIDs and RDF in TDWG

18
LSIDs and RDF in TDWG Roger Hyam, TDWG, RBGE Donald Hobern, GBIF June 7-9, 2006 - Edinburgh, UK

description

LSIDs and RDF in TDWG. Roger Hyam, TDWG, RBGE Donald Hobern, GBIF June 7-9, 2006 - Edinburgh, UK. Paradigm. Starting assumption is that standards are about sharing data. Sharing data also implies sharing data through time. Archive. What is Shared?. Sharing raw literals isn’t much use. - PowerPoint PPT Presentation

Transcript of LSIDs and RDF in TDWG

Page 1: LSIDs and RDF in TDWG

LSIDs and RDF in TDWG

Roger Hyam, TDWG, RBGEDonald Hobern, GBIF

June 7-9, 2006 - Edinburgh, UK

Page 2: LSIDs and RDF in TDWG

Paradigm

• Starting assumption is that standards are about sharing data.

• Sharing data also implies sharing data through time.

Archive

Page 3: LSIDs and RDF in TDWG

What is Shared?

• Sharing raw literals isn’t much use.

• They need to be gathered together into ‘semantic’ units or objects.

TaxonName:1234Bellis perennis

perennis

Bellis

1234

Page 4: LSIDs and RDF in TDWG

Semantics of Objects

• Objects need to be based on some shared semantics.

• There needs to be somewhere to look up what they mean – an ontology.

TaxonName:Bellis perennis

Ontology

TaxonName?

Page 5: LSIDs and RDF in TDWG

Identity of Objects

• How do I refer to this object?

• Who should I credit?

• Who should I send corrections to?

• Is it the same record as I already have or is it a new one?

• What is the official version of this data - has some one altered it before I received it?

Page 6: LSIDs and RDF in TDWG

TDWG TAG-1 Meeting

• There was consensus on-– Architecture is concerned with shared data– Biodiversity data will be modeled as a graph

of identifiable objects– The semantics of these objects will be

encoded in a series of shared ontologies– Ontologies will be related to each other on the

basis of a shared Base and Core ontologies as a minimum

• Discussion continues on how this is done

Page 7: LSIDs and RDF in TDWG

Implications

• We need a ontology to define and relate the objects we exchange.

• Ontology governance/management is paramount.

• We need a system of GUIDs to identify the objects.

• We need a roadmap for the protocols to exchange these objects.

Page 8: LSIDs and RDF in TDWG

Structure of the OntologyBase Ontology

Core Ontology

Domain Ontology

Application Ontologies

BaseThing BaseActor

CoreTaxonName CoreInstitution

TaxonName

NomencalturalType

NomeclaturalNote

Herbarium

ABCD DarwinCore ???

Page 9: LSIDs and RDF in TDWG

Ontology Governance

• Allow people to create Domain sub-ontologies easily – prevent alienation.

• Each ontology construct (concept) has a status.

• Status is increased by passing through explicit gates defined by actual usage.

Experimental Shared Recommend

Page 10: LSIDs and RDF in TDWG

What about RDF?

• The need to share identifiable objects has been established without reference to a technology.

• We are interested in objects not triples.• Typical use case involves a client consuming

semantically heterogeneous data from multiple sources.

• Semantic Web technologies would be ideal – but aren’t part of the TDWG culture and there are ‘unbelievers’.

Page 11: LSIDs and RDF in TDWG

Current ‘Standards’

• DarwinCore & DiGIR– Based on Z39.50– HTTP based XML message / response– Simple ‘flat’ application schemas (RDF-like)

• ABCD & BioCASe– Based on DarwinCore & DiGIR– Complex document structure.

• TAPIR– Unification of BioCASe and DiGIR

• No RDF, Objects or GUIDs here yet!

Page 12: LSIDs and RDF in TDWG

Combing Data

• GBIF data portal is the only ‘application’ that does data integration between these formats.

• No standard way to include XML fragments from other XSD other than xs:any.

• There is overlap between the different schemas and no easy way to merge them.

Page 13: LSIDs and RDF in TDWG

What about LSIDs

• GUID-1 meeting considered several GUID technologies including (LSID, DOI & Handle).

• Life Science Identifiers are being assessed.– I3C & OMG URNs– urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434– getData()– getMetadata()

Page 14: LSIDs and RDF in TDWG

LSID Permanence

• LSIDs should not be recycled – i.e. Used for more that one object.

• LSIDs should always resolve but it is OK for them to resolve to a 404 (Gone) error.

• No central authority to control these things.

• Even DOIs go away if there isn’t institutional backing!

Page 15: LSIDs and RDF in TDWG

LSIDs for Everything?

• Are there some things for which LSIDs are inappropriate?– <logo rdf:resource=“urn:lsid:example.com:branding:logo.gif” />– xsi:schemaLocation=“urn:lsid:example.com:xsd:taxon.xsd”– xmlns:tn=“urn:lsid:example.com:ontology:taxon/”

• Definitely places where we will use something else.

• Other people will use their own identifiers e.g. DOI, Handle etc.

Page 16: LSIDs and RDF in TDWG

So what’s cooking?

XSD BasedConceptual Schemas

XML Based Exchange Protocols

200+ Data Providers50+ Million Anonymous ‘Records’

Emergent Semantic Web

Recognised NeedFor GUIDS Different GUID

Technologies

A TDWGOntology

OGC Standards (GML)

BioMOBY

Other!

Clients?

Page 17: LSIDs and RDF in TDWG

Possible Roadmap

• Build the ontology as a focus for semantics.

• Resolution and Harvest protocols should be relatively easy to plug into or wrap round existing service providers so approach these first.

• Search/Query – More problematic BioCASe, DiGIR, TAPIR, SPARQL, other?

Page 18: LSIDs and RDF in TDWG

Thank You

• Gordon and Betty Moore Foundation

• Global Biodiversity Information Facility

• NESC

• TDWG Members