In pursuit of interoperability: Can we standardize mapping types?
description
Transcript of In pursuit of interoperability: Can we standardize mapping types?
In pursuit of interoperability:Can we standardize mapping types?
Stella G Dextre ClarkeProject Leader, ISO NP 25964
Overview Compare mapping types used in some
well-known projects: MACS; CrissCross; RENARDUS; KoMoHe
and in Doerr’s well-cited paper on Semantic problems of thesaurus mapping
And in 3 standards: BS 8723-4, SKOS and the forthcoming ISO 25964-2
Ask how feasible it is to achieve standardization
MACS Project Context: enabling multilingual access to
collections indexed with different vocabularies
Vocabularies are all subject heading schemes All mappings are considered equivalence Equivalence can be simple or compound Two types of compound equivalence:
Heading A = Heading B OR Heading C Heading A = Heading B AND Heading C
CrissCross Project Context: improving access to
vocabularies and heterogeneously indexed collections (in one natural language)
One-way mappings From a subject headings scheme to a
classification scheme Many mappings from one keyword “Degrees of determinacy” rather than
distinct mapping types – D1, D2, D3, D4
RENARDUS Project Context: search/browse across
gateways using different classification schemes
One-way mappings, from DDC to local schemes
Five mapping types: fully equivalent broader or narrower equivalent major or minor overlap
GESIS/KoMoHe Context: distributed search across systems
using 25 different vocabularies (thesauri and classification schemes)
(Separate) mappings in both directions Three basic mapping types:
Equivalence Hierarchical Associative
Also there is an explicit “null relationship” Any mapping can be one-to-one or one-to-many Every mapping can have a “relevance rating” of
high, medium or low.
Doerr’s findings(see http://journals.tdl.org/jodi/article/view/31/32)
Context: query transformation is assumed to be the main application of mappings
All the vocabularies discussed are thesauri, applied to documents and/or museum collections
Basic types of mapping are: exact equivalence inexact equivalence broader equivalence narrower equivalence
Exact, broader and narrower equivalence can be simple or compound
Compound equivalence means a Boolean expression of target terms using AND, OR or NOT (but in practice no examples are given using NOT).
BS 8723-4 Provides for mapping search terms or index
terms Emphasis on thesauri, although other vocabulary
types are taken into account Basic mapping types:
equivalence; hierarchical, associative Hierarchical subdivides into broader/narrower Equivalence subdivides into simple/compound Degrees of equivalence (such as exact, inexact,
partial) are discussed but not formalised as distinct types other than those described above.
SKOS (Simple Knowledge Organization System) data model Context is sharing/linking KOSs via the Web SKOS development began with thesauri, but
has extended to classification schemes, subject heading schemes, etc.
Basic mapping “properties” (skos:mappingRelation): skos:closeMatch (symmetric)
skos:exactMatch (symmetric, transitive) skos:relatedMatch (symmetric) skos:broadMatch (inverse of narrowmatch) skos:narrowMatch (inverse of broadmatch)
No provision for compound mappings
ISO 25964-2 (still in draft) A revision of ISO 2788 and ISO 5964 as well as
BS 8723 Provides for mapping search terms or index
terms Emphasis on thesauri, although other
vocabulary types are taken into account Basic mapping types:
EquivalenceHierarchicalAssociative
“Inexact” can apply to any mapping, but most commonly to equivalence
ISO 25964-2 (still in draft) A revision of ISO 2788 and ISO 5964 as well as
BS 8723 Provides for mapping search terms or index
terms Emphasis on thesauri, although other
vocabulary types are taken into account Basic mapping types:
Equivalence Laptop computers EQ Notebook computersHierarchical Roads NM Streets; Streets BM RoadsAssociative Journals RM Magazines
“Inexact” can apply to any mapping, but most commonly to equivalence
Horticulture ~EQ Gardening
ISO 25964-2 mapping types
Basic mapping types:EquivalenceHierarchicalAssociative
“Inexact” can apply to any mapping, but most commonly to equivalence
ISO 25964-2 mapping types in more detail Basic mapping types:
EquivalenceSimpleCompound
Intersecting compound equivalenceCumulative compound equivalence
HierarchicalBroaderNarrower
Associative “Inexact” can apply to any mapping, but most
commonly to equivalence, including compound equivalence
ISO 25964-2 equivalence mappings in more detail
Simple Laptop computers EQ Notebook computers
Compound Intersecting compound equivalence
Women executives EQ Women + Executives
Cumulative compound equivalenceInland waterways EQ rivers |
canals
Intersecting versus cumulative equivalence
Women executives EQ Women + Executives
Inland waterways EQ rivers | canals
executives
women
women executives
canals
inland waterways
rivers
Some key messages re compound equivalence If you use mappings for conversion of
index terms, you implement intersecting equivalents quite differently from cumulative equivalents.
With simple equivalence (exact or inexact) and with hierarchical or associative mappings, two-way conversions are usually OK; but compound equivalence typically works in one direction only.
Inexact: another complication for equivalence mappings Simple Laptop computers EQ Notebook computers
Compound Intersecting compound equivalence
Women executives EQ Women + Executives Cumulative compound equivalence
Inland waterways EQ rivers | canals Inexact simple equivalence Lawns ~EQ Turf
Inexact compound equivalence Women executives ~EQ Females + Managers
Major/minor overlap: yet another complication Found useful in Renardus project Is there a parallel with the KoMoHe “relevancy
rating”? Earlier versions of SKOS allowed “majorMatch”
and “minorMatch”; these were subsequently deprecated
It would apply to inexact equivalence; maybe also to hierarchical and associative mappings?
How would you judge it in cases of compound equivalence?
A recent draft of ISO 25964 admits major/minor as an optional attribute of inexact equivalence, in the context of a particular application.
Now we come to the crunch:Can we standardize these mapping types?
We can certainly write them in a standards document, but can we make them stick? Will real users implement them according to the guidance rules in the standard?
To make a standard stick: Keep it simple Address a real need Adopt rules that are already broadly
accepted in the user community Keep it within the implementation range
of available software Make the standard available easily and
free – or at least at a low price Commit to lifelong maintenance
Want a copy of ISO 25964-2 ? A draft is due to appear in January 2011,
“ISO DIS 25964-2”, with the hope of attracting comments from potential users
The official way to get it is through your national standards body (e.g. DIN)
Distribution policies vary from one country to another; last time round we found a way to make the draft available online free of charge and free of passwords, on the BSI site.
Send me an email and I’ll alert you when the DIS is released. [email protected]
References (abbreviated) MACS: Landry, Patrice. Multilingual subject access: the linking
approach of MACS. Cataloging & Classification Quarterly. 2004; 37(3/4):177-191
CrissCross: http://linux2.fbi.fh-koeln.de/crisscross/swd-ddc-mapping_en.html RENARDUS: http://www.mpdl.mpg.de/staff/tkoch/publ/preifla-final.html KoMoHe:
http://www.gesis.org/en/research/programs-and-projects/knowledge-technologies/project-overview/komohe/
Doerr: http://journals.tdl.org/jodi/article/view/31/32 SKOS: http://www.w3.org/TR/skos-reference/ BS 8723-4:2007 Structured vocabularies for information
retrieval - Guide - Interoperability between vocabularies. British Standards Institution
ISO 25964-2 (still in draft). Thesauri and interoperability with other vocabularies – Part 2: Interoperability with other vocabularies