Arabic Morphology Using Only Finite State Operations -Review
-
Upload
lushanthan-sivaneasharajah -
Category
Education
-
view
1.228 -
download
3
description
Transcript of Arabic Morphology Using Only Finite State Operations -Review
ARABIC MORPHOLOGY USING ONLY FINITE-STATE OPERATIONS
Supervisor: Dr. A. R. Weerasinghe
Sivaneasharajah Lushanthan
2006/CS/154
?
?
AUTHOR – KENNETH R BEESLEY
INTRODUCING MORPHOLOGY
Morphology ‘Structure of words’ and ‘how words are formed’
Morpheme The smallest linguistic unit within a word that can carry a
meaning, such as "un-", "break", and "-able" in the word "unbreakable“
Morphotactics The ordering restrictions in place on the ordering of
morphemes
Orthographic/Variation Rule Models the changes that occur in a word usually when two
morphemes combine (Spelling Rules)
WHAT IF…?
patches
PL
Npatch
GENERATINGANALYSING
lexical form
surface form
WHY MORPHOLOGICAL ANALYZER?
Phonetics
Phonology
Morphology Syntax Semanti
c
Pragmatics
Grammar Checker
Text Summarize
r
Machine Translation
Data Retrieval
TTS
TO DO A MORPHOLOGICAL PARSING
Lexicon List of Morphemes (stem+ affixes) POS information of morphemes
Morphotactics
Orthographic Rules
A FINITE STATE TRANSDUCER – COLA MACHINE
ALPHABET - {F},{T} WORDS - {FFF}, {FT}, {TF} LANGUAGE - {FFF, FT, TF}
0 5 10 15
F F F
T T
FS LANGUAGES & NATURAL LANGUAGES
A Network that accepts One-Word Language
A two level transducer
t ba el
t ba el +Noun + Pl
t ba el ε s
WRITING REGULAR EXPRESSIONS - LEXICON
[ {kick} | {try} | {bore} ][%+Verb:0][ %+Bare:0 | %+Pres3PSg :s | %+Past: {ed} ];
a:a = a{kick} = [ k:k i:i c:c k:k ] = [ k i c k ]
word +Verb + Case
word ε suffix
Possible words
Solution?
[ {kick} | {try} | {bore} ][%+Verb:0][ %+Bare:0 | %+Pres3PSg :s | %+Past: {ed} ];
Kick try bore
kicks trys bores
kicked tryed boreed
Another layer!
WRITING REGULAR EXPRESSIONS - RULES
α → β || γ _ δ is read as “α is rewritten as β between γ and δ”
[y -> i e || Cons _ s .#.,, y -> i || Cons _ e d .#. ]
.o.e -> 0 || Cons _ e d .#. ;
trys tryed boreed
IN THE PAPER,
Discontiguous dependencies between morphemes in a word – Filtering
Non-concatinative morphotactics Reduplication Semitic interdigitation
Variation rules
FILTERING OUT OVER-GENERATION
Art+word+Noun+Indef+Case?* %+ Art %+ ?* %+ Indef ?*$ [ %+ Art %+ ?* %+ Indef ]
Prep+word+Noun+Def/Indef+Nom/Acc$ [%+ Prep %+ ?* [%+Acc | %+Nom]]
$ [ %+ Art %+ ?* %+ Indef ] $ [%+ Prep %+ ?* [%+Acc | %+Nom]]
|]
[~
NON- CONCATENATIVE MORPHOTACTICS
Semitic stem interdigitationRoot – ktb, drs
Template - CVCVC
Vocalization – ui, a*
K t b
C V C V Cu i
K u t i b
Root tier
Template tier
Vocalization tier
Stem tier
^[{ktb}.m>.{CVCVC}.<m.[u*i]^]
THE CURRENT SYSTEM
4930 words 72,000,000 abstract fully-voweled words Sixty six finite state variation rules New-words added easily into the lexical
database
DISCUSSION
THOUGHT FOR THE DAY
Never say No for Education!