Towards portability and interoperability for linguistic annotation and language- specific ontologies...
-
Upload
mackenzie-silva -
Category
Documents
-
view
223 -
download
3
Transcript of Towards portability and interoperability for linguistic annotation and language- specific ontologies...
![Page 1: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/1.jpg)
Towards portability and interoperability for linguistic
annotation and language-specific ontologies
Robert Munro & David Nathan
Endangered Languages Archive, School of Oriental and African Studies
![Page 2: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/2.jpg)
Outline
1. Introduction and motivation
2. Linguistic ontologies and markups
3. Representing knowledge
4. Supporting fieldworkers
5. Supporting speakers
6. Conclusions
![Page 3: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/3.jpg)
1. Introduction and motivation
![Page 4: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/4.jpg)
Introduction
The main goal of this paper:how does GOLD meets the requirements of portability
for language documentation and description (Bird & Simons, 2003)
Road-testing:ability to meet the needs of archive users and
contributors
![Page 5: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/5.jpg)
Motivation
The Endangered Languages Archive (ELAR) is part of the Hans Rausing Endangered Languages Project (HRELP)
HRELP supports:the archivegrants for documentation projectspostgraduate programs focussing on language
documentation
![Page 6: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/6.jpg)
Motivation
We (ELAR):support a digital archive (preserve data and provide
access to it)
We also train students and grantees in:markup strategiesdata management strategiesmultimedia developmentchoice of recording equipment
![Page 7: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/7.jpg)
Motivation
There is concern that cataloguing metadata (IMDI / OLAC) has not yet been sufficiently extended (Nathan and Austin, 2004)rich linguistic and contextual information is not being
recorded in well-formed portable formats/structures
Common ontologies present a solution to this
![Page 8: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/8.jpg)
How does GOLD meet our needs
We find GOLD to be the most suitable ontology for supporting data portability
GOLD’s focus has been on ‘datanalysis sets’
![Page 9: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/9.jpg)
Summary
We suggest extending the focus to:data acquisitiondata access
Key extensions:formalising the definitions of concepts by representing
them as a set of formal propertiesexplicitly capturing the conventions and constraints for
presentation (rendering)modelling features that are inherently indeterminate
and/or complex structures
![Page 10: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/10.jpg)
2. Linguistic ontologies and markups
![Page 11: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/11.jpg)
Linguistic ontologies and markups
Ontology:strictly, what we agree exists
Markup:strictly, what we are certain about
Ontology and markup converge:only with consensus and complete confidencebut there is rarely full confidence in the classification
of new hard-to-classify phenomena in little-studied endangered languages
![Page 12: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/12.jpg)
Indeterminacy
Builders of ontologies outside of linguistics have been reluctant to accept inherent indeterminacy:
In some cases, the incompatibilities [between ontologies] can be smoothed over by tweaking definitions of concepts or formalizations of axioms; in other cases, wholesale theoretical revision may be required. (Niles & Pease, 2001)
If we can identify the incompatibilities, we can model them
![Page 13: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/13.jpg)
Supporting linguistics
A theory-neutral model of linguistics is not possible:Theories are poly-centricThey will change
We need a pan-theory model of linguistics
![Page 14: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/14.jpg)
Formulising definitions
Each concept in GOLD should be represented by a set of properties that describe that concept
Three possible values for a given property: ‘Yes’, ‘No’, or ‘Undefined’ (default)
To accurately represent variance: include enough properties to distinguish terms
For portability: include as many properties as possible
![Page 15: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/15.jpg)
Formulising definitions
‘Yes’ can potentially be expanded: whether the property is mandatory or optional for the
conceptdependencies between properties for a concept
![Page 16: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/16.jpg)
Example
‘Noun’ in GOLD:Noun Definition: A noun is a broad classification of parts of speech which include substantives and nominals (Crystal 1997:371; Mish et al. 1990:1176). (http://emeld.org/gold-ns/description.html#Noun, last checked 23/05/2003)
How do I know if my definition is the same as Crystal or Mish et al?
Is it both definitions, or the common ground?
![Page 17: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/17.jpg)
Example
Will future users of GOLD have the same definition?the core of ‘noun’ may have longevitythe boundaries with other concepts will not
COPEs can define extensions in terms of sets of properties, and add those properties to GOLD
![Page 18: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/18.jpg)
Example
GOLD:
COPEs:
NOUN
GerundNOUN NomVerbNOUN
Can’t formally identify the similarities
![Page 19: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/19.jpg)
Example
GOLD:
COPEs:
NOUN
GerundNOUN NomVerbNOUN
+ property: verb suffix + property: verb suffix
Can formally identify the similarities
Definition of NOUN can grow
![Page 20: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/20.jpg)
3. Representing knowledge
![Page 21: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/21.jpg)
Rendering
Separating form from content:ideal for flexibilitynot possible for some materials (esp. video)
![Page 22: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/22.jpg)
Rendering conventions / constraints
Some are well known:italicize part-of-speech in dictionariesalign interlinear transcriptions
Some are not:representation of language-specific kinship systems,
ethnobotanical ontologies etc
![Page 23: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/23.jpg)
Solution 1
Include a (written) description and/or example of the rendering conventions and constraints:hard-code the interface
![Page 24: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/24.jpg)
Solution 2
Include formal representations of the conventions within the data:interface takes instructions from the data
![Page 25: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/25.jpg)
Solutions
These are two extremeshard-coded and language specificdata driven and language independent
Database architectures and linguistic ontologiesnot designed for navigation‘transparent’ access to such structures – who does it
support?
![Page 26: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/26.jpg)
4. Supporting fieldworkers
![Page 27: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/27.jpg)
Supporting indeterminacy
There are two kinds of indeterminacy in linguistics: confidence in assigning a category (uncertainty) phenomena that are inherently variable, probabilistic,
gradient or continuous
![Page 28: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/28.jpg)
The most valuable information
The most valuable information that a field linguist learns may be the least likely to be annotated
Example: 7uhch in Lakanon Maya:A temporal-modal deictic expressing participant
frames and speaker's footings (Bergqvist 2005)This term has been given the most thought by the
researcher, but it is still not completely understoodThe uncertainty (or the extent of certainty) should be
recorded: all the properties we do know
![Page 29: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/29.jpg)
5 reasons for modelling uncertainty
1. To record our the extent of our knowledge For example, we want everything known about
7uhch in Lakanon Maya to be recorded, even if we don’t yet have a category for it
![Page 30: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/30.jpg)
5 reasons for modelling uncertainty
2. For searchability If an archive implementing an ontology with
uncertain categories exists, then we can more easily find existing solutions to a problem
If a problem is truly new, then we can allow future researchers to find it
![Page 31: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/31.jpg)
5 reasons for modelling uncertainty
3. To reach certainty Even an indeterminate markup can allow a
corpus analysis that can inform a decision about assigning the appropriate category
![Page 32: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/32.jpg)
5 reasons for modelling uncertainty
4. To highlight problems with descriptive frameworks
A feature may only appear to belong to multiple (or no) categories because the descriptive framework does not yet account for it
![Page 33: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/33.jpg)
5 reasons for modelling uncertainty
5. Because the concept is inherently indeterminate
The concept may be inherently fuzzy but not previously encountered as a continuous / contiguous phenomena
![Page 34: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/34.jpg)
Inherently indeterminate features
Eg: cline, gradience, squish, continuities, contiguities, vague, fuzzy, probabilistic
Many prosodic, semantic and discourse features are inherently continuous
Growing arguments for probabilities to be part of our formal linguistic models for morphological and syntactic structures (Aarts, 2004; Bayen, 2003; Manning, 2003)
![Page 35: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/35.jpg)
Inherently indeterminate features
Representing categories by formal properties meets the current requirements of modelling gradience (Aarts, 2004)
Perhaps the “ContinuousObject” concept of SUMO (Niles & Pease, 2001) could also be used?
The problem is, currently, largely unresolved
![Page 36: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/36.jpg)
Incorporating new categories
How do we know that a given category is not the same as another one identified elsewhere?
Formal properties for concepts give us another means for comparison
![Page 37: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/37.jpg)
Incorporating structures
As well as inherently discrete phenomena and inherently indeterminate ones, there is a third kind: concepts that are complex structurescommon in syntax and discourse semantics
How do we model a structure in an ontology?
![Page 38: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/38.jpg)
5. Supporting speakers
![Page 39: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/39.jpg)
Users of EL archives
The largest (and growing) user group for endangered languages materials are the speakers of endangered languages
Rarely interested in linguistic categories or navigating a corpus or archive via them
Supporting language-specific ontologies means supporting information-rich structures for both navigation and analysis
![Page 40: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/40.jpg)
Case Study: Yolngu kinship
The Yolngu languages have an extensive kinship terminology called Gurrutu27 terms that identify individuals and sets of
individuals in terms of moiety, generation, gender, and patriline or matriline.
The terms extend infinitely through cyclicity
![Page 41: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/41.jpg)
Case Study: Yolngu kinship
Speakers draw from the same sets of kinship relations to describe their relationship to the Yolngu lands
We cannot always annotate well-known linguistic concepts independently of language-specific ontologies
![Page 42: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/42.jpg)
6. Conclusions
![Page 43: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/43.jpg)
Conclusions
Ontology building for endangered languages can be very different to other ontology projectsThe uncertain is often more valuable than the certainThe local is often more interesting than the universal… but will still need interoperability
We suggest extending the focus of GOLD todata acquisition data access
![Page 44: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/44.jpg)
Conclusions
Current GOLD does not need to be altered to incorporate our suggestionsexcept to remove assumptions of invariability
Key extensionsformalising the definitions of concepts by representing
them as a set of formal propertiesexplicitly capturing the conventions and constraints for
presentation (rendering)modelling features that are inherently indeterminate
and/or complex structures
![Page 45: Towards portability and interoperability for linguistic annotation and language- specific ontologies Robert Munro & David Nathan Endangered Languages Archive,](https://reader033.fdocuments.us/reader033/viewer/2022061306/55146843550346284e8b5c3b/html5/thumbnails/45.jpg)
References
Aarts, B 2004 Modelling linguistic gradience. Studies in Language, 28(1):1–49.Bateman, J 1992 The theoretical status of ontologies in natural language processing. In Text Representation and Domain Modelling – ideas
from linguistics and AI, Technische Universität BerlinBayen, H 2003 Probabilistic Approaches to Morphology In Bod, R., Hay J. and Jannedy, S. (eds). Probabilistic Linguistics. MIT Press.Bergqvist, H 2005 Semantics of temporal deictics in Lakandon Maya. Presentation given at the ELAP-ELAR seminar series, SOAS, London.Bird, S & G Simons. 2003. Seven Dimensions of Portability for Language Documentation and Description, Language 79/3: 557-582.Christie, M & W Gaykamangu 2003. “Kinship, moiety, land & language in Arnhem Land”. In literacy link. Australian Council for Adult Literacy, vol
23, no 5 Oct 2003.Christie, M, W Gaykamangu & D Nathan. 2001. Yolngu Languages and Culture: Gupapuyngu. Faculty of Aboriginal and Torres Strait Islander
Studies, NTU [Multimedia CD-ROM]Crystal, D. 1997 A dictionary of linguistics and phonetics. 4th edition. Cambridge, MA: BlackwellCysouw, M, J Good, M Albu & HJ Bibiko 2005 Can GOLD “cope” with WALS? Retrofitting an ontology onto the World Atlas of Language
Structures. Proceedings of the E-MELD 2005Farrar, S. & D. T. Langendoen. 2003. A linguistic ontology for the Semantic Web. GLOT International 7 (3), 97-100.Farrar, S. 2003a Markup and the GOLD ontology. Proceedings of the EMELD 2003 Farrar, S. 2003b An ontological account of linguistics: extending SUMO with GOLD. Proceedings of the 2003 IEEE International Conference on
Natural Language Processing and Knowledge Engineering. BeijingFoley, W A 2003 Genre, register and language documentation in literate and preliterate communities. In Peter K Austin (ed.) Language
Documentation and Description vol 1Grinevald, C 2003 Speakers and documentation of endangered languages. In Peter K Austin (ed.) Language Documentation and Description
volume 1Gruber, T R. 1993 A translation approach to portable ontologies. Knowledge Acquisition, 5(2), 199-220Himmelmann, N P 1998 Documentary and descriptive linguistics. Linguistics 36. 161-195. Berlin: de Gruyter. Holton, G 2003 Approaches to digitization and annotation: A survey of language documentation materials in the Alaska Native Language Center
Archive. Proceedings of the EMELD 2003Manning, C. 2003 Probabilistic Syntax In Bod, R., Hay J. and Jannedy, S. (eds). Probabilistic Linguistics. MIT Press.Nathan, D. (ed) 1996. Australia’s Indigenous Languages. Adelaide: SSABSANathan, D and P K Austin (2004) Reconceiving metadata: language documentation through thick and thin. In Peter K Austin (ed.) Language
Documentation and Description Volume 2. Niles, I & A Pease. 2001. Towards a standard upper ontology. Proceedings of the 2nd International Conference on Formal Ontology in
Information Systems (FOIS-2001)Penton, D, C Bow, S Bird & B Hughes. 2004. Towards a General Model for Linguistic Paradigms. Proceedings of EMELD 2004