Towards the implementation of a refined data model for a Zulu machine-readable lexicon
-
Upload
guy-de-pauw -
Category
Technology
-
view
887 -
download
0
description
Transcript of Towards the implementation of a refined data model for a Zulu machine-readable lexicon
TOWARDS THE IMPLEMENTATION OF A REFINED DATA MODEL FOR A ZULU MACHINE-READABLE LEXICON
Ronell van der MerweLaurette PretoriusSonja Bosch
1
GOAL ... to develop ....a complete data repository (database)
Bantu languagesMachine Readable (MR) lexicon
applications: finite state morphological analyser, .... 2
WHAT WE WILL BE DISCUSSING ....1. Brief overview2. Lexicon update framework3. Data model4. Implementation approaches5. Conclusion 6. Future work
3
1. INTRODUCTION - COMPREHENSIVE MR LEXICON
ZULU corpora
Paper Dict
Update framework
4
2. LEXICON UPDATE FRAMEWORK
Corpus
Guesser
MR lexicon
apply
new stems/roots
succ
esse
s failures
update
rebuild
ZulMorph
Morphological Analyser
5
PURPOSE OF MR LEXICON
MR lexicon
Repository of all lexical information
Apps
Apps
ZulMorph
Morphological Analyser
used
re-used
6
Morphosyntactic information:
Zulu morphotacticsMorphophonological
alternation rulesEmbedded
stem/root lexicon
new candidate(s
)
GUESSER VARIANT
GuesserPhonologicall
y possible stems /roots
identify
MR lexicon
update / include
7
3. DATA MODEL - DEVELOPMENT
8
* 0 or more+ 1 or more| optional leaf node
3.1 VERBAL EXTENSIONSSuffixing extensions to verb root:AppliedCausativeIntensiveNeuterPassiveReciprocal 9
-bon- ‘see’
-bon-is- -verb.root-caus ‘show, cause to see’
-bon-an- -verb.root-recip ‘see each other’
-bon-w- -verb.root-pass ‘be seen’
-bon-el- -verb.root-appl ‘see for’
-bon-akal- -verb.root-neut ‘be visible’
-bon-is-an- -verb.root-caus-recip ‘show each other’
-bon-akal-is-
-verb.root-neut-caus ‘make visible 10
-bon-is-an-.mm
11
-bon- ‘see’
-bon-an-w-
-verb.root-recip-pass ‘seen by each other’
-bon-w-an-
-verb.root-pass-recip ‘seen by each other’
-bon-is-el- -verb.root-caus-appl ‘look after for’
-bon-el-is- -verb.root-appl-caus- ‘cause to take care of’
12
3.2 DEVERBATIVES
verb / extendedverb root
deverbative suffix
(-o- | -i- |-a-| -e-| -u-)
noun class prefix
13
optional:nominal suffix
(Aug | Dim | Loc | Fem)
14
-bon- ‘see’
isi-bon-i Class.pref.cl 6/7-verb.root-deverb.suffix
“mourner” (lit. One who looks on at a funeral)
um-bon-i Class.pref.cl 1/2 -verb.root-deverb.suffix
“one who sees”
um-bon-o Class.pref.cl 3/4 -verb.root-deverb.suffix
“apparition, vision”
isi-bon-is-o Class.pref.cl 6/7-verb.root-caus-deverb.suffix
“signpost, signal”
isi-bon-is-el-o-ana
Class.pref.cl 6/7-verb.root-caus-appl-deverb.suffix-dim
“small example”isi-bon-is-el-o-ana.mm
15
4. TOWARDS IMPLEMENTATIONThe capturing of the data must be: rigorous systematic appropriately structured
To ensure that: data exchange is consistent
16
MR lexico
n
ZulMorph
Morphological Analyser
CONSIDERATIONS IN THE CHOICE OF DATABASE...
type of datadifferent views of data
different types of access to data
17
CONSIDERATION: TYPE OF DATA
semi-structuredallows for recursionn-depth structures
18
CONSIDERATION: VIEWS AND ACCESS
Rebuilding the Morphological Analyser:
view: morphological structure of all word roots/stems
access: sequential19
XML – ENABLED DATABASE
20
Relational Database
Object-Oriented Database
<root>bon</root><verbfeatures>.......</verbfeatures>
NATIVE XML DATABASE (NXD)
21
Native XMLDB
<root>bon</root><verbfeatures>.......</verbfeatures>
XQueryXUpdate
XML ENABLED DATABASE:OBJECT-ORIENTED
22
NXD AND OO DATABASE SATISFY THE IMPLEMENTATION CONSIDERATIONS:
Both ...are suitable for our type of data: semi-structured , recursive
satisfy the type of access and view: sequential access to morphological information of all word roots/stems 23
CONCLUSIONRefinement of a data model for
the Zulu MR lexiconverbal extensions and deverbatives
MR Lexicon embedded in an update framework
Native XML and OO Databases possible approaches to implementation
24
FUTURE WORKDevelopment and evaluation of prototypes for the MR Lexicon (Zulu)
Software for semi-automating the lexicon update framework
Bootstrapping of MR Lexicon for other Bantu languages 25