E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group...

10
E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd Gibbon

Transcript of E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group...

Page 1: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

E-Meld Workshop onDigitization of lexical Information3-5 August 2002, EMU, Ypsilanti

Working Group on Lexicon Macrostructures

Chairman’s Report

Dafydd Gibbon

Page 2: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

DefinitionsThe macrostructure of a lexicon is

the arrangement of lexical entriesin the lexicon (extended meaning includes front matter, mesostructure, …)

Declarative determining factors:microstructure (arrangement of types of lexical information)mesostructure (arrangement of generalisations)

Procedural/operational determining factors:medium:

print, electronic, multimodal + multimedia channelsconsultation, navigation:

onomasiologicalsemasiologicalgeneral search

Page 3: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Main points discussedTypes of lexicon in OLAC linguistic type vocabulary:dictionary, wordlist, wordnet, thesaurus, terminology, proper NOUNS, bilingual, etymological, phonetic, frequency, analytical PLUS concordance, glossary, multilingual, encyclopaedic, help text index, thesaurus index, …

Granularity of linguistic type hierarchy (additional levels beyond Dublin Core) and complexity of lexicon type

Factorization of common subtypes out of hierarchy (not only for lexicon type

Heterogeneity of types (structural, subject and functional types)

Page 4: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Structure criteria(3rd level subtypes)

Semasiological:dictionary (complex microstructure)wordlist (glossed; comparative; …)glossary (with definitions)terminology (ISO (non-)conformant)concordance

Onomasiological:wordnetthesaurusencyclopaedia

… index (help, thesaurus…), catalogue?

Page 5: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Formats, media

Format + Medium: Mime-types, modalities, …

Print formatDatabase formatWord-processor formatHypertextSystem component (e.g. for spell checker, dictation)Multimedia (digitized signals: audio, photos, video, …)XML + stylesheets, XSLT mappings …

Question: are there lexicon specific formats which are not covered in the OLAC format type?

Page 6: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Subject, content criteria

Specialized lexica based on subject.linguistic types of lexical information:

Domain:fish, work, …

Linguistic levels of description and categories:phonetic/pronunciation, verb, proper name …

Rank:idiom, (un-)inflected word, stem, morpheme, …

Other:frequency, etymological/historical,translation, bilingual, multilingual, …

Page 7: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

User criteria(construction and/or consultation)

Non-linguist(L1 speaker, L2 speaker, …)

Research linguist(field, theoretical, …)

Computational linguist(machine learning from corpora, inheritance lexica, …)

Language and/or speech system developer(currently several such projects for minority languages)

Page 8: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Recommendations

Consider revising OLAC linguistic type controlled vocabulary to factor out linguistic levels as a common parameter.

Consider using actual linguistic genres such as “sketch grammar”, “field notes”, “domain lexicon”.

Consider cross-classifying a low granularity type vocabulary with format, content and user types.

Definitely provide improved definitions of lexicon types.

Definitely point to examples of existing lexica of a given lexicon genre to help users.

Page 9: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Some remaining questions

Specific points to address:

Is the OLAC list of lexicon types comprehensive enough?Is a taxonomy of lexicon types adequate, or must we parametrise?Which sub-attributes are needed from the relevant components?

And of course a very basic question:

Can all macrostructures be derived formally, i.e. automatically, from a generic declarative macrostructure (like views/indexings of a database) with appropriate microstructure and mesostructure?

Page 10: E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.

Working Group participants

Helen Aristar-DryDafydd GibbonVeronica GrondonaMichael MaxwellDavid WeberJeff …

… oops, sorry - I forgot to make a participant list