Advanced Information Systems Laboratory GeoSpatiumLab...

Advanced Information Systems Laboratory GeoSpatiumLab S.L. ThManager University of Zaragoza Computer Science and Systems Engineering Department Advanced Information Systems Laboratory (IA3) GeoSpatiumLab S.L.

Transcript of Advanced Information Systems Laboratory GeoSpatiumLab...

  • Advanced InformationSystems Laboratory GeoSpatiumLab S.L.


    University of ZaragozaComputer Science and Systems Engineering DepartmentAdvanced Information Systems Laboratory (IA3)

    GeoSpatiumLab S.L.

  • Outline


  • Introduction to thesauri

    „ A thesaurus is a set of terms that describe the vocabulary of a controlled indexing language, formally organized so that the a priori relationships between concepts (for example synonymous terms, broader terms, narrower terms and related terms) are made explicit“ [ISO 2788]Used to improve the precision and recall of information retrieval in digital libraries

    provide a uniform and consistent vocabulary for indexing metadata ("description of the data holdings“) supply users with a suitable vocabulary for the retrieval. expansion of users queries by automatically adding new terms to the query

  • ThManager

    ThManager facilitates the management of knowledge organization systems

    thesauri and other types of controlled vocabularies, such as taxonomies or classification schemes

    In particular, it facilitates the creation and visualization of SKOS RDF vocabularies

    a W3C initiative for the representation of knowledge organization systems using the Resource Description Framework (RDF)

    C on c e p tS c h em e

    C on c ep t

    rd f: lab e l

    s k os .p re fla b e l

    s k os .a ltL ab e l

    s k os .s c op eN o te

    s k os .b roa d er

    s k os .n a rrow e r

    s k os .re la ted

    s k o s .d e fin it io n

    s k o s .h as Top C on c ep t

    d c :t it led c :p u b lis h e r.. .

    s k o s .e xam p le

    s k os .in S c h em e

    s k os :s ym b o l (d c m iTyp e :im ag e)

    s k os .p re fS ym b o l

    s k os .a ltS ym b o l

    D u b lin C ore M od e l

  • General features

    Distributed as an Open Source tool through in JavaMulti-platform (Windows, Unix)

    Storage of metadata and thesauri is managed directly trough file system

    MultilingualJava internationalization methodologyCurrently: Spanish, English, (procedure to support new languages)

  • Capabilities

    Repository of available thesauriDescription of thesauri by means of metadata Browsing of thesaurus contentEdition of thesaurus contentExchange of thesauri according to SKOS formatInterconnection of thesauri through WordNetlexical database

  • Repository of available thesauri

    Main window of the applicationBrowser of available thesauri in the local repository

    Allowed operationsSelection of thesauri for ulterior operations (browse content, export, delete, …)Sorting/filtering of thesauri according to descriptors values (columns)

  • Description of thesauri by means of metadata

    Each thesaurus is described by means of a metadata application profile of Dublin Core

    Metadata can be either visualized in HTML or edited through a form

  • Browsing of thesaurus content

    It allows the browsing of terms with different viewers (language sensitive)

    Hierarchical viewera tree showing the hierarchical structure of thesaurus concepts

    Alphabetic viewerlist of concepts alphabetically ordered in the selected language

    Search toolThe searching process is based on preferred labels allowing the following criteria: “equals”, ”starts with”and “contains”

    For each selected conceptIt shows all the propertiesIt allows the navigation to the related concepts by means of hyperlinks

  • Hierarchical viewer

    a tree showing the hierarchical structure of thesaurus concepts

  • Alphabetic viewer

    list of concepts alphabetically ordered in the selected language

  • Search toolThe searching process is based on preferred labels allowing the following criteria: “equals”, ”starts with” and “contains”

  • Edition of thesaurus content

    The tool provides an edition interface to modify the content of a thesaurus:

    creation of concepts deletion of conceptsedition of properties and relations

    broader and narrower relations to define a hierarchical structure of concepts.mark concepts as top concepts

    o broader concept of a micro-thesauruso or concepts in a plain list

    preferred label, alternative label, definition and scope note as multilingual properties

    o structure: property type + language + valuenotation properties

    o useful for creating classification schemes that provide multiple coding of terms

    o example: ISO-639 list of languages has 2-letter and 3- letter codes

    o structure: type (URI) + value

  • Edition of thesaurus content

  • Exchange of thesauri

    Exchange of thesauri according to SKOS format Import/export operations include metadatadescribing each thesaurus

  • Interconnection of thesauri through WordNet lexical database

    Thesauri are intended for the homogeneousclassification of resources

    They are used to fill metadata keywordsHowever, there is still heterogeneity in metadata keywords

    Metadata creators use different thesauri in different application domainsIf metadata catalogs provide access to general public

    Queries may not contain same terms as keywords in metadata records

    A possible solution to fill the semantic gapInterconnection of thesauri through a general purpose lexical ontology

  • Extraction of related concepts in Wordnet

    ThManager generates an automatic mapping of thesaurus concepts against the concepts of Wordnet lexical databaseThis functionality is activated through the import dialog

    Other knowledgerepresentation


    Thesaurus 1Thesaurus 2Thesaurus N

    Controlled list 1Controlled list 2

    Controlled list NWordNet

  • Extraction of related concepts in Wordnet

  • Conclusions

    ThManager is a flexible tool to manage thesauri

    It provides enhanced functionality forthe improvement of classificationsTested with well known thesauri

    EEA - GEMET (General MultilingualEuropean Thesaurus), FAO –AGROVOC, UNESCO Thesaurus, European Commission - EUROVOC

    This tool can be easily integrated in other tools

    It is integrated within CatMDEdit toselect the appropriate terms for metadata elementsAccesible as a Web Service (Web Ontology Service) for integrationwithin Web applications that requireselection of controlled vocabularies

    ThManagerOutlineIntroduction to thesauriThManagerGeneral featuresCapabilitiesRepository of available thesauriDescription of thesauri by means of metadataNúmero de diapositiva 9Browsing of thesaurus contentHierarchical viewerAlphabetic viewerSearch toolEdition of thesaurus contentEdition of thesaurus contentExchange of thesauriInterconnection of thesauri through WordNet lexical databaseExtraction of related concepts in WordnetExtraction of related concepts in WordnetConclusions