Arabic Morphology Using Only Finite State Operations -Review

20
ARABIC MORPHOLOGY USING ONLY FINITE-STATE OPERATIONS Supervisor: Dr. A. R. Weerasinghe Sivaneasharajah Lushanthan 2006/CS/154

description

I presented the paper "Arabic Morphology Using Only Finite State Operations" written by Mr. Kenneth R beesley for my research seminar presentation.

Transcript of Arabic Morphology Using Only Finite State Operations -Review

Page 1: Arabic Morphology Using Only Finite State Operations -Review

ARABIC MORPHOLOGY USING ONLY FINITE-STATE OPERATIONS

Supervisor: Dr. A. R. Weerasinghe

Sivaneasharajah Lushanthan

2006/CS/154

Page 2: Arabic Morphology Using Only Finite State Operations -Review

?

Page 3: Arabic Morphology Using Only Finite State Operations -Review

?

Page 4: Arabic Morphology Using Only Finite State Operations -Review

AUTHOR – KENNETH R BEESLEY

Page 5: Arabic Morphology Using Only Finite State Operations -Review

INTRODUCING MORPHOLOGY

Morphology ‘Structure of words’ and ‘how words are formed’

Morpheme The smallest linguistic unit within a word that can carry a

meaning, such as "un-", "break", and "-able" in the word "unbreakable“

Morphotactics The ordering restrictions in place on the ordering of

morphemes

Orthographic/Variation Rule Models the changes that occur in a word usually when two

morphemes combine (Spelling Rules)

Page 6: Arabic Morphology Using Only Finite State Operations -Review

WHAT IF…?

patches

PL

Npatch

GENERATINGANALYSING

lexical form

surface form

Page 7: Arabic Morphology Using Only Finite State Operations -Review

WHY MORPHOLOGICAL ANALYZER?

Phonetics

Phonology

Morphology Syntax Semanti

c

Pragmatics

Grammar Checker

Text Summarize

r

Machine Translation

Data Retrieval

TTS

Page 8: Arabic Morphology Using Only Finite State Operations -Review

TO DO A MORPHOLOGICAL PARSING

Lexicon List of Morphemes (stem+ affixes) POS information of morphemes

Morphotactics

Orthographic Rules

Page 9: Arabic Morphology Using Only Finite State Operations -Review

A FINITE STATE TRANSDUCER – COLA MACHINE

ALPHABET - {F},{T} WORDS - {FFF}, {FT}, {TF} LANGUAGE - {FFF, FT, TF}

0 5 10 15

F F F

T T

Page 10: Arabic Morphology Using Only Finite State Operations -Review

FS LANGUAGES & NATURAL LANGUAGES

A Network that accepts One-Word Language

A two level transducer

t ba el

t ba el +Noun + Pl

t ba el ε s

Page 11: Arabic Morphology Using Only Finite State Operations -Review

WRITING REGULAR EXPRESSIONS - LEXICON

[ {kick} | {try} | {bore} ][%+Verb:0][ %+Bare:0 | %+Pres3PSg :s | %+Past: {ed} ];

a:a = a{kick} = [ k:k i:i c:c k:k ] = [ k i c k ]

word +Verb + Case

word ε suffix

Page 12: Arabic Morphology Using Only Finite State Operations -Review

Possible words

Solution?

[ {kick} | {try} | {bore} ][%+Verb:0][ %+Bare:0 | %+Pres3PSg :s | %+Past: {ed} ];

Kick try bore

kicks trys bores

kicked tryed boreed

Another layer!

Page 13: Arabic Morphology Using Only Finite State Operations -Review

WRITING REGULAR EXPRESSIONS - RULES

α → β || γ _ δ is read as “α is rewritten as β between γ and δ”

[y -> i e || Cons _ s .#.,, y -> i || Cons _ e d .#. ]

.o.e -> 0 || Cons _ e d .#. ;

trys tryed boreed

Page 14: Arabic Morphology Using Only Finite State Operations -Review

IN THE PAPER,

Discontiguous dependencies between morphemes in a word – Filtering

Non-concatinative morphotactics Reduplication Semitic interdigitation

Variation rules

Page 15: Arabic Morphology Using Only Finite State Operations -Review

FILTERING OUT OVER-GENERATION

Art+word+Noun+Indef+Case?* %+ Art %+ ?* %+ Indef ?*$ [ %+ Art %+ ?* %+ Indef ]

Prep+word+Noun+Def/Indef+Nom/Acc$ [%+ Prep %+ ?* [%+Acc | %+Nom]]

$ [ %+ Art %+ ?* %+ Indef ] $ [%+ Prep %+ ?* [%+Acc | %+Nom]]

|]

[~

Page 16: Arabic Morphology Using Only Finite State Operations -Review

NON- CONCATENATIVE MORPHOTACTICS

Semitic stem interdigitationRoot – ktb, drs

Template - CVCVC

Vocalization – ui, a*

K t b

C V C V Cu i

K u t i b

Root tier

Template tier

Vocalization tier

Stem tier

^[{ktb}.m>.{CVCVC}.<m.[u*i]^]

Page 17: Arabic Morphology Using Only Finite State Operations -Review

THE CURRENT SYSTEM

4930 words 72,000,000 abstract fully-voweled words Sixty six finite state variation rules New-words added easily into the lexical

database

Page 18: Arabic Morphology Using Only Finite State Operations -Review
Page 19: Arabic Morphology Using Only Finite State Operations -Review

DISCUSSION

Page 20: Arabic Morphology Using Only Finite State Operations -Review

THOUGHT FOR THE DAY

Never say No for Education!