Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair,...

70
Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome 17/18 May, 2006

Transcript of Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair,...

Page 1: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Terminology Markup Framework and

TBX-SKOS Interoperability

Gerhard BudinUniversity of Vienna

Chair, ISO/TC 37/SC 2

3rd Ecoterm Group MeetingFAO, Rome

17/18 May, 2006

Page 2: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

A Brief History Problems and Solutions• Strong diversity of lexico-terminological resources

– Data models, data structures + data semantics– Diversity of semantic, linguistic/cultural complexity and semantic

depth/richness

• Diversity of user groups and their requirements• Sheer quantity of resources• Data interchange between organizations (within and

across domains) as well as (distributed) data integration – early needs asking for immediate solutions

• History of data modeling• History of interchange standards• History of semantic interoperability management

Page 3: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Need for multi-level modeling architectures

Page 4: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Developing the Terminology Markup Framework in order to cope with this complexity-diversity

• Based on empirical studies and practical user-driven requirements analysis

• Markup/representation/modeling: XML, XMLS, RDF, UML• Open standards strategy (ISO TC 37)

– ISO 12620 Data categories – meta-model element + semantics registry (RDF)

– ISO 16642 Terminology Markup Framework (TMF) – meta-model architecture and specifications (UML)

– ISO 12200 – Terminology Markup Language (XML)• Instance for language industry: TBX Termbase Exchange Format

(XML) • Instance for lexicography/publishing: LexML ISO 1951

– Lexical Markup Framework (LMF) (UML) – ISO 704 and ISO 1087 (foundational level)– ISO 15188 (workflow and collaborative issues)– Alignment with ISO 11179, W3C, OASIS, etc.

Page 5: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Introduction to TBX

• TBX® stands for TermBase eXchange• TBX is a Terminological Markup Framework (TMF)

markup language– TMF is an ISO standard (16642)

• TBX is consistent with ISO 12200 (MARTIF)• TBX is maintained by OSCAR (www.lisa.org)• The TBX specification is free

Page 6: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Who Should Care about TBX?

• If you don’t care about terminological consistency in terminology, then you have no reason to care about TBX

• If you only need a simple bilingual list of terms (source term and target term) with no additional information, then you don’t need TBX; just use a two-column spreadsheet for your list

Page 7: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

On the other hand…

• If you do care about terminological consistency and you maintain one or more terminology databases (termbases), then you should be interested in TBX, unless you want your termbase to be locked into the terminology management software you are currently using.

• Portability of complex terminological data is the key benefit of TBX

Page 8: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

What does TBX look like?

• A TBX file is an XML document• A TBX file consists of:

– A header that describes the file– A set of entries, one per concept in the termbase– For each concept, a set of terms, grouped by

language, that designate the concept• A terminological concept entry (termEntry)

– Can be multilingual– Can be monolingual

Page 9: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Example of a TBX file

• <?xml version='1.0'?> [+ ref to DTD/schema] • <martif type='TBX' xml:lang='en'>• <martifHeader> [global info] </martifHeader>• <text>• <body> [concept entries] </body>• </text>• </martif>

Page 10: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

TBX Header

• <martifHeader>• <fileDesc>• <sourceDesc>• <p>from Budin Kobe 2006</p>• </sourceDesc>• </fileDesc>• <encodingDesc>• <p type='DCSName'> SYSTEM

"TBXDCSv05c.xml" </p>• </encodingDesc>• </martifHeader>

Page 11: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

TBX Body

• <body>• <termEntry id='C171'>• [concept: a dollop of cream]• </termEntry>• <termEntry id='C180'>• [concept: frog legs]• </termEntry>• </body>

Page 12: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

TBX and Other Standards

• (1) TBX and ISO 16642 (TMF)• (2) TBX and ISO 12620 (Data Categories)• (3) TBX and SKOS

Page 13: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

1: TBX and ISO 16642

• TBX is a TML (Terminological Markup Language) of TMF (ISO 16642) (see Annex B)

• TBX maps to the TMF meta-model– A TBX file is a TDC (terminological data collection)– martifHeader provides GI (global information)– termEntry: TE (terminological entry)– langSet: LS (language section)– tig/ntig: TS (term section)

• A TMF DCS (Data Category Selection) in TBX is in XCS (eXtensible Constraint Specification) format

• TBX uses ISO 12200 for its XML style

Page 14: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

TMF Metamodel

GlobalInformation

(GI)

ComplementaryInformation

(CI)

Term Section(s)(TS)

Term Component Section(s)

(TCS)

Language Section(s)(LS)

Terminological(Concept) Entry/Entries

(TE)

Terminological Data Collection (TDC)

Page 15: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

TMF and lexical resources

• In general, a terminological resource is organized into concept entries, each of which includes one or more terms designating a particular concept

• In general, a lexical resource is organized into lexical entries, each of which includes one or more senses of a particular lexical item (a word or phrase)

• A concept entry containing multiple terms can be split into multiple lexical entries, one per term, and multiple lexical entries associated with the same concept can be combined into one concept entry

Page 16: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

2: TBX and ISO 12620

• All data categories in the default TBX DCS are taken from ISO 12620

Page 17: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

3: TBX and SKOS

• A typical concept entry will contain a subject field to specify the domain of the concept.

• However, the subject field is typically some kind of hierarchy that is flattened into a string within TBX

• SKOS makes it possible to represent the subject field hierarchy as a hierarchy and then create a link within TBX

Page 18: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Simple Knowledge Organization System (SKOS)

• “SKOS is an area of work developing specifications and standards to support the use of knowledge organisation systems (KOS) such as thesauri, classification schemes, subject heading lists, taxonomies, other types of controlled vocabulary, and perhaps also terminologies and glossaries, within the framework of the Semantic Web.”

- http://www.w3.org/2004/02/skos/ (Accessed on 3/17/06)

Page 19: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Sample SKOS• <skos:Concept rdf:about="#s71">• <skos:prefLabel>Food</skos:prefLabel>• <skos:narrower rdf:resource="#s81"/>• <skos:narrower rdf:resource="#s79"/>• </skos:Concept>

• <skos:Concept rdf:about="#s81">• <skos:prefLabel>Recipe Ingredient</skos:prefLabel>• <skos:broader rdf:resource="#s71"/>• </skos:Concept>

• <skos:Concept rdf:about="#s79">• <skos:prefLabel>Restaurant Menu Item</skos:prefLabel>• <skos:broader rdf:resource="#s71"/>• </skos:Concept>

Page 20: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Visual Representation of SKOS

Food

Recipe Ingredient Restaurant Menu Item Grocery Store Item Homemade Item

Appetizer Entree Salad Soup

Page 21: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

GEvTerm Initiative

• The information previously used dealing with food has been taken from FooNaVar, a project of the GEvTerm Initiative.

• The GEvTerm Initiative is a terminological database that has committed to being fully TBX and SKOS compliant

• Another example of TBX in use is...

Page 22: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

C: Multilingual Thesaurus for Medieval Studies (MLTMS)

• “Imagine the ability to search across web-resources using your native modern european language and find appropriate primary and secondary sources in Latin, French, Italian, German, Spanish, English, etc., based upon the meaning rather than the form of the search term. Imagine having a tool that would enable you to search for a concept and be able to construct the forms it has taken historically as well as the ability to link outward for both evidence and argument. Imagine a tool that would enable you to study the slippage of concept which is beyond naming. Imagine having a tool that can deconstruct ontological orders asking for different kinds of readings.”

-http://www.mith2.umd.edu/thes/ (Accessed on 3/17/06)

Page 23: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Why did MLTSM use TBX?

• integration of terminological data from multiple sources;

• querying multiple termbases through a single user interface by passing data through a common intermediate format on a batch or dynamic basis;

• placing data on an FTP site for download by interested parties;

• peer review by colleagues of tentative entries

- http://www.mith2.umd.edu/thes/ytbx.html (Accessed on 3/17/06)

Page 24: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

MLTSM Sample

• <termEntry id='eid-VocCod-211.01'>• <descrip type='subjectField'>personnel</descrip>• <descrip type='definition'>personne qui accomplit un travail copie ou d'&#x00E9;criture</descrip>• <langSet xml:lang='fr'>• <ntig>• <termGrp>• <term id='tid-voccod-211.01-fr1'>copiste</term>• <termNote type='termType'>entryTerm</termNote>• </termGrp>• </ntig>• <ntig>• <termGrp>• <term id='tid-voccod-211.01-fr3'>&#x00E9;crivain</term>• <termNote type='termType'>synonym</termNote>• </termGrp> • </ntig>• </langSet>• <langSet xml:lang='en'>• <ntig>• <termGrp>• <term id='tid-voccod-211.01-en1'>scribe</term>• <termNote type='termType'>entryTerm</termNote>• </termGrp>• </ntig>• </langSet>• </termEntry>

Page 25: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

MLTSM Sample(Rendered with XSLT)

Page 26: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

TBX HTML

• The last few slides have provided an example of rendering HTML from a TBX file. Here is a brief diagram of the process.

TBX XSLT HTML

Processed by Results in

Page 27: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

D: Other Standards

• ISO 11179 and XCS, which defines a flavor of TBX, both provide a list of data element types

• XMDR

Page 28: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

E: Tasks for TBX

• Encourage translation technology vendors to implement TBX

• Revise the specification• Compare ISO 11179 to XCS

Page 29: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Ontology EditorProtege11179 OWL Ontology

XMDR Prototype Architecture: Initial Implemented Modules

MetadataValidator (defer) schema-driven syntax checker

Authentication Service (defer)

MappingEngine (defer)

RegistryExternalInterface

Generalization Composition (tight ownership) Aggregation (loose ownership)

Jena, Xerces

Java

RetrievalIndex

FullTextIndex

Lucene

LogicBasedIndexJena, OWI KSRacer,Kowari

RegistryStore

WritableRegistryStore

Subversion

Page 30: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

OWL, RDF & XML Schema used to specify XMDR as UML used for 11179 Edition 2

UML11179Metamodel

11179 Relational Schema

Relational Metadata

OWL XMDROntology &annotations XMDR’s

Relax NG Schema

XMDRXML Schema

RDF Spec

TRang

XML SchemaLanguage spec

XML Objects

Types &Cardinalities

What things go in own files?Which property direction stored?Sequential ordering of properties

Triples: binarylabeled relationships

Page 31: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

XMDR Prototype Example: dual purposeRDF/XML file: DEALL.1.5394.1.xml

<DataElement rdf:about=""

xml:base="http://xmdr.lbl.gov/xmdr/data/DEALL.1.5394.1.xml">

<container rdf:resource="http://oaspub.epa.gov/edr"/>

<identifier rdf:parseType="Resource">

<string rdf:datatype="&xsd;string">5394</string>

</identifier>

<version rdf:datatype="&xsd;string">1</version>

<administrationRecord rdf:parseType="Resource">

<registrationStatus rdf:datatype="&xsd;string">Standard</registrationStatus>

<administrativeStatus rdf:datatype="&xsd;string">Final</administrativeStatus>

<creationDate rdf:datatype="&xsd;date">1999-09-09</creationDate>

</administrationRecord>

<designation rdf:parseType="Resource">

<context rdf:resource="CXT-Legacy.xml"/>

<sign xml:lang="en">Country Name</sign>

</designation>

<designation rdf:parseType="Resource">

<context rdf:resource="CXT-Long Abbreviation.xml"/>

<context rdf:resource="CXT-Medium Abbreviation.xml"/>

<context rdf:resource="CXT-Short Abbreviation.xml"/>

<sign xml:lang="en">Mail Cntry Nm</sign>

</designation>

<designation rdf:parseType="Resource">

<context rdf:resource="CXT-Registry.xml"/>

<context rdf:resource="CXT-Standard.xml"/>

<sign xml:lang="en">Mailing Address Country Name</sign>

</designation>

<definition rdf:parseType="Resource">

<context rdf:resource="CXT-Legacy.xml"/>

<context rdf:resource="CXT-Long Abbreviation.xml"/>

<context rdf:resource="CXT-Medium Abbreviation.xml"/>

<context rdf:resource="CXT-Registry.xml"/>

<context rdf:resource="CXT-Short Abbreviation.xml"/>

<context rdf:resource="CXT-Standard.xml"/>

<text xml:lang="en">The name of the country where the addressee is located.</text>

</definition>

<type rdf:resource="RCDIS.1.12116.1.xml"/>

<domain rdf:resource="VDALL.1.15147.1.xml"/>

<meaning rdf:resource="DCDIS.1.12800.1.xml"/>

<example rdf:datatype="&xsd;string">United States</example>

</DataElement>

Page 32: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

XMDR XML schema provides a number of important benefits…

• Schema specifies what is required as well as what is legal

• Divides metadata into files conforming to XML schema

• Normalizes data (ala’ relational “one fact in one place”)

• Facilitates XSLT transformations by reducing degrees of freedom to a canonical encoding within the RDF standard

• Relax NG used to create and check XMDR-it schema

• RNG validator enforces many OWL ontology constraints

• TRang automatically translates into XML schema syntax

Page 33: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

From texts and terminologies to ontologies

• Using the Risk scenario– Termbase

• Export XML• Domain Models – meta-models -> patterns

– Text corpus• Term extraction – comparative testing ProTerm,

MultiTerm Extract, MultiCorpora• Aligning with termbase

– Ontology import -> editor

Page 34: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 35: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 36: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Bornemisza

Page 37: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 38: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 39: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 40: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 41: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 42: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 43: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 44: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 45: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 46: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 47: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 48: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 49: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.
Page 50: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

TBX-SKOS interoperability• Differences

– XML vs. RDF– Inherent flexibility + ”open” data modeling for a large

variety of resources vs. traditional thesaurus data model as a default for a KOS (diff. scopes)

– TBX has documented use cases and mapping tools -> language industry standard

– Different semantics + vocabularies (12620 vs. thesaurus standard)

• Commonalities– Conceptual approach– W3C

• Vocabulary mapping (RDF)

Page 51: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

GlobalInformation

(GI)

ComplementaryInformation

(CI)

Term Section(s)(TS)

TMF Metamodel

Term Component Section(s)

(TCS)

Language Section(s)(LS)

Terminological(Concept) Entry/Entries

(TE)

Terminological Data Collection (TDC)

Page 52: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

GlobalInformation

(GI)

ComplementaryInformation

(CI)

Term Entry Level (Level 1)

Terminological(Concept) Entry/Entries

(TE)

Terminological Data Collection (TDC)

Concept-RelatedDat-cats

Subject Field Note

Definition

SourceID

Responsibility

Date

Transaction

Adminis-trative

Dat-catsNotes

ConceptSystemDatCats

Page 53: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Language Section Level (Level 2)

Language Section(s)(LS)

Language Section(s)(LS)Language Section(s)

(LS)(LS * n …)

Concept-RelatedDat-cats

NoteDefinition

SourceID

Responsibility

Date

Transaction

Language-RelatedDat-cats

NotesAdminis-

trativeDat-cats

xml:lang

Transfer-comment

Equivalence

ConceptSystem

Dat-cats

Terminological Entry

Page 54: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Term-Level Information

Language Section(s) (LS)

Term Section(s)(TS)

Term Section(s)(TS)

(TS * n …)

Definition

Term-relatedDatCats (TRD)

Term

Context

Note

SourceID

Responsibility

Date

Transaction

NotesConcept-RelatedDat-cats

Adminis-trative

DatCats

Term Section(s)(TS)

Transfer-comment

Transfer-comment

Page 55: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

SKOS Vocabulary

• SKOS Core is a model for expressing the structure and content of concept schemes (thesauri, classification schemes, subject heading lists, taxonomies, terminologies, glossaries and other types of controlled vocabulary).

• The SKOS Core Vocabulary is an application of the Resource Description Framework (RDF), that can be used to express a concept scheme as an RDF graph. Using RDF allows data to be linked to and/or merged with other RDF data by semantic web applications.

Page 56: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

SKOS Graphs

Page 57: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

SKOS Graphs

Page 58: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

RDF Representation of SKOS Graph

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <skos:Collection> <rdfs:label>milk by source animal</rdfs:label> <skos:member rdf:resource="http://www.example.com/concepts#buffalomilk"/> <skos:member rdf:resource="http://www.example.com/concepts#cowmilk"/> <skos:member rdf:resource="http://www.example.com/concepts#goatmilk"/> <skos:member rdf:resource="http://www.example.com/concepts#sheepmilk"/> </skos:Collection> <skos:Concept rdf:about="http://www.example.com/concepts#buffalomilk"> <skos:prefLabel>buffalo milk</skos:prefLabel> </skos:Concept> <skos:Concept rdf:about="http://www.example.com/concepts#cowmilk"> <skos:prefLabel>cow milk</skos:prefLabel> </skos:Concept> <skos:Concept rdf:about="http://www.example.com/concepts#goatmilk"> <skos:prefLabel>goat milk</skos:prefLabel> </skos:Concept> <skos:Concept rdf:about="http://www.example.com/concepts#sheepmilk"> <skos:prefLabel>sheep milk</skos:prefLabel> </skos:Concept> </rdf:RDF>

Page 59: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Mapping TBX/12620 DatCatsto SKOS Vocabulary

• TBX data categories (data element concepts in the sense of ISO/IEC 11179-3) contain instantiations of information that are expressed in SKOS using SKOS core vocabulary.

• Interoperability (a cross-walk between the two standards) depends on mapping between the two systems

Page 60: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Data Collections• collection

– We do not have this, although collections can be implied in some cases by the use of the coordinateConceptGeneric or possibly subordinateConceptGeneric markers.

• collectableProperty – We do not have this; in SKOS one can assign rules to

collections, which makes this useful as an ontology-like property.

• orderedCollection– Not available in our set, although many of our conceptual

domains are structured as ordered lists. – They are ordered by virtue of proximity, but we don't

have a mechanism for enforcing order within the metadata structure.

Page 61: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Collections, cont.• memberList

– An RDF list containing the members of an ordered collection

– We aren’t sure why this is necessary; why not just use ordered collection?

– We are assuming the collection by itself embodies an unordered list.

• member – Definition: member of a list– If indicated at all, this is embodied in TBX as

• 1) a simple data category listed as a member of a conceptual domain

• 2) as a coordinate concept or subordinate concept associated with a broader concept or topTerm

Page 62: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Concept & Concept Schemes• concept

– Embodied in TMF/TBX as the entire / termEntry /.• conceptScheme

– A concept system; represented via links and notation systems

• properties – Defined links and relations

• TMF/TBX: no open class of properties or edges that can be freely defined

• Many pre-defined sets of property relations between individual data element types and between attributes and the members of their conceptual domains.

Page 63: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Scheme Identification

• inScheme – We have pointers to Classification Schemes, but our

pointers for thesauri and hierarchical relations do not include a pointer to the name or identifier of a specific scheme.

– This is a lacuna for us and needs to be added.

Page 64: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Subject (Domain) Identification• isPrimarySubjectOf

– / subject field level 1 /– Definition: the primary subject of a resource

• 12620 allows for 9 levels of granularty and TBX for 3 in defining the granularity of subject references within a scheme

• isSubjectOf – / subject field level 2 /; primarySubject

• [subject field + a restrictive constraint; 2nd highest level of granularity]

• subject – / subject field level 3 / ; /subject fields 3-9 /

• subjectIndicator – public subject indicator located using a URI– Missing in TBX / 12620

Page 65: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Labels (Terms, ConceptNames)• Missing: label

– / term /• prefLabel (preferredLabel)

– / term termType=preferred term / ; / descriptor /• prefSymbol

– / term termType=preferred term termType=symbol /

• altLabel – / term termType --> admitted term /

• altSymbol – / term termType=admitted term

termType=symbol /• hiddenLabel

– Generally achieved using a security code reference or an authorization code

Page 66: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Hierarchical Relations• hasTopConcept

– / topTerm / – hasTopConcept points to URI which contains the top

concept; we could choose to use this methodology.– topConcept has been deprecated as a vocabulary

item.• broader

– / broader term / (as a pointer to a thesaurus descriptor)

– / superordinate term generic / (terminological concept system)

• narrower (hasNarrower) – / narrower term / (as a pointer to a thesaurus

descriptor)– / narrower concept generic / (terminological concept

system)

Page 67: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

General Relations

• related– / related term / (thesaurus pointer)– / related concept / (terminological concept system)

• semanticRelation– Missing example in the Vocabulary document– how a semantic relation differs (if it does) from other

conceptual relations?

Page 68: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Concept Description

• definition – / definition /

• example – / example /

Page 69: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Notes• changeNote

– / admin type=modification note /– The relation between Note and "change" is determined by the position

of the note embedded in an <adminGroup> of type=modification.– A note about a modification to a concept, not to an entry.]

• editorialNote – / adminNote /– A note concerning the administration of a KOS resource

• historyNote – / termProvenance /

• privateNote– / note / + authorization levels

• publicNote – / note / + authorization levels

• scopeNote

Page 70: Terminology Markup Framework and TBX-SKOS Interoperability Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 3rd Ecoterm Group Meeting FAO, Rome.

Thank you for your attention

Acknowledgements:Slides 5-28 together with Alan Melby, Sue Ellen WrightSlides 29-32 Bruce BargmeyerSlide 35 WordNetSlides 37-42 diff. sources, 43: ThesShow Legat/Stallbaumer44: GEMET, 45: Bandholtz, 46/47: Gangemi, 48: Wright, 56-58 Miles/SKOS, 60-69 together with Wright/Melby

Gerhard Budin2006-05-18