Towards the implementation of a refined data model for a Zulu machine-readable lexicon

25
TOWARDS THE IMPLEMENTATION OF A REFINED DATA MODEL FOR A ZULU MACHINE- READABLE LEXICON Ronell van der Merwe Laurette Pretorius Sonja Bosch 1

description

© Ronell van der Merwe, Laurette Pretorius, Sonja Bosch

Transcript of Towards the implementation of a refined data model for a Zulu machine-readable lexicon

Page 1: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

TOWARDS THE IMPLEMENTATION OF A REFINED DATA MODEL FOR A ZULU MACHINE-READABLE LEXICON

Ronell van der MerweLaurette PretoriusSonja Bosch

1

Page 2: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

GOAL ... to develop ....a complete data repository (database)

Bantu languagesMachine Readable (MR) lexicon

applications: finite state morphological analyser, .... 2

Page 3: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

WHAT WE WILL BE DISCUSSING ....1. Brief overview2. Lexicon update framework3. Data model4. Implementation approaches5. Conclusion 6. Future work

3

Page 4: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

1. INTRODUCTION - COMPREHENSIVE MR LEXICON

ZULU corpora

Paper Dict

Update framework

4

Page 5: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

2. LEXICON UPDATE FRAMEWORK

Corpus

Guesser

MR lexicon

apply

new stems/roots

succ

esse

s failures

update

rebuild

ZulMorph

Morphological Analyser

5

Page 6: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

PURPOSE OF MR LEXICON

MR lexicon

Repository of all lexical information

Apps

Apps

ZulMorph

Morphological Analyser

used

re-used

6

Morphosyntactic information:

Zulu morphotacticsMorphophonological

alternation rulesEmbedded

stem/root lexicon

Page 7: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

new candidate(s

)

GUESSER VARIANT

GuesserPhonologicall

y possible stems /roots

identify

MR lexicon

update / include

7

Page 8: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

3. DATA MODEL - DEVELOPMENT

8

* 0 or more+ 1 or more| optional leaf node

Page 9: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

3.1 VERBAL EXTENSIONSSuffixing extensions to verb root:AppliedCausativeIntensiveNeuterPassiveReciprocal 9

Page 10: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

-bon- ‘see’

-bon-is- -verb.root-caus ‘show, cause to see’

-bon-an- -verb.root-recip ‘see each other’

-bon-w- -verb.root-pass ‘be seen’

-bon-el- -verb.root-appl ‘see for’

-bon-akal- -verb.root-neut ‘be visible’

-bon-is-an- -verb.root-caus-recip ‘show each other’

-bon-akal-is-

-verb.root-neut-caus ‘make visible 10

-bon-is-an-.mm

Page 11: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

11

Page 12: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

-bon- ‘see’

-bon-an-w-

-verb.root-recip-pass ‘seen by each other’

-bon-w-an-

-verb.root-pass-recip ‘seen by each other’

-bon-is-el- -verb.root-caus-appl ‘look after for’

-bon-el-is- -verb.root-appl-caus- ‘cause to take care of’

12

Page 13: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

3.2 DEVERBATIVES

verb / extendedverb root

deverbative suffix

(-o- | -i- |-a-| -e-| -u-)

noun class prefix

13

optional:nominal suffix

(Aug | Dim | Loc | Fem)

Page 14: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

14

-bon- ‘see’

isi-bon-i Class.pref.cl 6/7-verb.root-deverb.suffix

“mourner” (lit. One who looks on at a funeral)

um-bon-i Class.pref.cl 1/2 -verb.root-deverb.suffix

“one who sees”

um-bon-o Class.pref.cl 3/4 -verb.root-deverb.suffix

“apparition, vision”

isi-bon-is-o Class.pref.cl 6/7-verb.root-caus-deverb.suffix

“signpost, signal”

isi-bon-is-el-o-ana

Class.pref.cl 6/7-verb.root-caus-appl-deverb.suffix-dim

“small example”isi-bon-is-el-o-ana.mm

Page 15: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

15

Page 16: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

4. TOWARDS IMPLEMENTATIONThe capturing of the data must be: rigorous systematic appropriately structured

To ensure that: data exchange is consistent

16

MR lexico

n

ZulMorph

Morphological Analyser

Page 17: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

CONSIDERATIONS IN THE CHOICE OF DATABASE...

type of datadifferent views of data

different types of access to data

17

Page 18: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

CONSIDERATION: TYPE OF DATA

semi-structuredallows for recursionn-depth structures

18

Page 19: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

CONSIDERATION: VIEWS AND ACCESS

Rebuilding the Morphological Analyser:

view: morphological structure of all word roots/stems

access: sequential19

Page 20: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

XML – ENABLED DATABASE

20

Relational Database

Object-Oriented Database

<root>bon</root><verbfeatures>.......</verbfeatures>

Page 21: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

NATIVE XML DATABASE (NXD)

21

Native XMLDB

<root>bon</root><verbfeatures>.......</verbfeatures>

XQueryXUpdate

Page 22: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

XML ENABLED DATABASE:OBJECT-ORIENTED

22

Page 23: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

NXD AND OO DATABASE SATISFY THE IMPLEMENTATION CONSIDERATIONS:

Both ...are suitable for our type of data: semi-structured , recursive

satisfy the type of access and view: sequential access to morphological information of all word roots/stems 23

Page 24: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

CONCLUSIONRefinement of a data model for

the Zulu MR lexiconverbal extensions and deverbatives

MR Lexicon embedded in an update framework

Native XML and OO Databases possible approaches to implementation

24

Page 25: Towards the implementation of a refined data model for a Zulu machine-readable lexicon

FUTURE WORKDevelopment and evaluation of prototypes for the MR Lexicon (Zulu)

Software for semi-automating the lexicon update framework

Bootstrapping of MR Lexicon for other Bantu languages 25