Arabic Nominals in HPSG A Verbal Noun Perspective

Arabic Nominals in HPSG: A Verbal Noun PerspectiveAbstract

Semitic languages exhibit rich nonconcatenative morphological operations, which can gen-

erate a myriad of derived lexemes. Especially, the feature rich, root-driven morphology

in the Arabic language demonstrates the construction of several verbal nouns such as

gerunds, active participles, passive participles, locative nouns, etc. To capture this rich

morphology by natural linguistic processing, the best choice can be Head-driven Phrase

Structure Grammar (HPSG). It combines the best ideas from its predecessors and inte-

grates all linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) of

natural language processing. Although HPSG is a successful syntactic theory, it lacks the

representation of complex nonconcatenative morphology. In this work, we propose a novel

HPSG representation which includes the morphological, syntactical and semantic features

for Arabic nominals and various verbal nouns. We also present the lexical type hierarchy

and derivational rules for generating these verbal nouns using the HPSG framework. Fi-

nally, we have implemented the lexical type hierarchy, Attribute Value Matrix (AVM) and

construction rules in the TRALE (An extension of the Attribute Logic Engine) platform

to validate the proposed HPSG formalism.

Chapter 1

Introduction

Head-driven Phrase Structure Grammar (HPSG) is an attractive tool for capturing com-

plex linguistic constructs. It combines the best ideas from its predecessor - Generalized

phrase structure grammar (GPSG) [15], Lexical functional grammar (LFG) [6], Govern-

ment and binding theory (GB) [8]. It is very suitable for natural language processing as

it integrates the essential linguistic layers (Phonology, Morphology, Syntax, Semantics,

Context etc.) of natural language processing. It is also flexible to modify for specific

language.

1.1 Motivation

Semitic languages like Arabic, Amharic and Hebrew, exhibit rich nonconcatenative mor-

phological operations for construction of lexicons. We can have a large coverage of vo-

cabulary in these languages by computational linguistic modeling of their morphology.

Among these Semitic languages, we have chosen Arabic for nonconcatenative morpholog-

ical analysis. It is the best instance of nonconcatenative morphology among the living

languages. More than two hundred and eighty million people speak in this language as a

first language and it is official language of twenty two countries. It ranks fifth by number

CHAPTER 1. INTRODUCTION

of native speakers. Despite these facts, the morphological analysis of Arabic language is

a relatively new area of research. It is also the intellectual and liturgical language of the

Islamic World.

1.2 Scope of the Work

The HPSG analysis for nonconcatenative morphology in general and for Semitic languages

in particular are relatively new. However, the intricate nature of Arabic morphology

motivated several research projects addressing the issues [1, 7, 40]. HPSG representations

of Arabic verbs and morphologically complex predicates are discussed in [2–4]. An in-

depth analysis of declensions in Arabic nouns has been presented in [18]. The diversity

and importance of Arabic nominals is broader than that of their counterparts in other

languages. Modifiers, such as adjectives and adverbs, are treated as nominals in Arabic.

Moreover, Arabic nouns can be derived from verbs or other nouns. Derivation from verbs

is one of the primary means of forming Arabic nouns, for which no HPSG analysis has

been conducted yet.

Arabic nouns can be categorized based on several dimensions like derivation (derived

from verb or noun), ending type (sound ending or weak ending), declension (declinable or

indeclinable), etc. Based on derivation, Arabic nouns can be divided into two categories

as follows:

1. Non-derived nouns: These are not derived from any other noun or verb.

2. Derived nouns: These are derived from other nouns or verbs.

☛An example of a non-derived, static noun is à❆➆❦☛ (h. i.sanun - which means “horse”):

it is not derived from any noun or verb and no verb is generated from this word. On the

✔ ✏ ☛other hand, ■✳

❑☛ ❆➾ (katibun - which means “writer”) is an example of a derived noun. This


✏word is generated from the verb

■✳

❏➺ (kataba ) which means “He wrote” in English.

This

simple example provides a glimpse of the complexity of the derivational, nonconcatenative

morphology for constructing a noun from a verb in Arabic. In this work, we analyze and

propose the HPSG constructs required for capturing the syntactic and semantic effects of

this rich morphology.

An HPSG formalization of Arabic nominal sentences has been presented in [29]. The

formalization covers seven types of simple Arabic nominal sentences while taking care of

the agreement aspect. In [24], an HPSG analysis of broken plural and gerund has been

presented. Main assumption in that work revolves around the Concrete Lexical Repre-

sentations (CLRs) located between an HPSG type lexicon and phonological realization.

But in that work the authors have not addressed other forms of verbal nouns including

participles.

In this work, we analyze all type of verbal noun generated from strong (or sound)

triliteral root verb. We analyze their derivation from verb, their syntactic and semantic

information. We do not analyze derivation of any type of verbal noun generated from

strong quadriliteral or weak verb. Because, All eight types of verbal nouns are derived

from strong triliteral root verb and these derivations follow regular patterns. On the

other hand, the pattern of derivation from quadriliteral or weak verb is not so regular.

So, analyzing their derivations need more effort. Moreover, most of the time maximum

three types of verbal nouns are derived from these type of root verbs.

1.3 Contribution

Our contributions towards the HPSG analysis of Arabic nouns presented in this disserta-

tion are as follows:

• We formulate the structure of Attribute Value Matrix (AVM) for Arabic noun

and extend the AVM for Arabic verb proposed in [2]. We make this design robust so

that


it can handle not only lexeme and word construction but also phrase and sentence

construction.

• We capture the syntactic and semantic effects of Arabic morphology.

• We determine the placement of verbal nouns and its subtypes in lexical type

hierarchy with proper justification.

• Generally, Arabic morphology is root pattern morphology. Different lexemes

can be generated from same root, using different patterns. We utilize this root

pattern morphology to design lexical rules to avoid the requirement of exhaustive

lexical entry for four types of verbal noun derived from all strong triliteral root

verbs. As a result, hundreds of verbal nouns can be recognized by barely

associating the root verbs with set of lexical rules applicable for that root verbs.

Thus, lexical entry in the dictionary is very much optimized.

• We implement the designed AVM, type hierarchy and lexical rules in TRALE

(An extension of the Attribute Logic Engine) [34] which is a freeware system

developed in prolog and integrates phrase structure parsing, semantic-head-driven

generation and constraint logic programming with typed feature structures as term.

1.4 Organization of Rest of the Dissertation

Chapter 2 gives a background by explaining the linguistic concepts and necessary tools.

It discusses about several linguistic topics ranging from morphology, syntax to semantics.

Then it provides a sketch of Arabic grammar, mainly the morphology associated to its

word construction. Next, it gives a brief introduction about HPSG, the mathematical

theory of languages used in our thesis. At the last part of this chapter, a detail discussion

is presented on related works done so far.

Chapter 3 presents our contribution to the development of a generic structure of the


Attribute Value Matrix of Arabic noun. It also describes the type hierarchy of Arabic noun

and its subtypes based on derivation dimension. Next, it discusses about the construction

rules for four type Arabic verbal nouns derived from strong triliteral root verb. It also

designs the lexical entry for other four types of verbal noun which do not follow rigorous

regular patterns.

Chapter 4 gives a brief description of TRALE lexical compiler. Then, it shows neces-

sary components of TRALE and how we implement our HPSG formalism using TRALE.

Finally, Chapter 5 gives the conclusion. In this chapter, we gives the concrete con-

tribution of our work from a technical point of view. We finish this chapter by giving

direction for further research on this topics.

Chapter 2

Background and Related Works

The topics discussed in this chapter serve as a background of the rest of the thesis. In

Section 2.1 we explain some theoretical linguistic which is necessary to develop linguistic

models. Section 2.2 gives an introduction of morphology and more specifically morphology

in Arabic language and its effect on other linguistic layers. Section 2.3 gives an overview of

Head-driven Phrase Structure Grammar (HPSG). Finally, in Section 2.4, we present the

state of the research works on HPSG modeling with emphasis on the Arabic language. In

this chapter, we frequenly use Arabic alphabet. We present the transliteration of Arabic

alphabet in the Table 5.1.

2.1 Theoretical Linguistics

Scientific study of human language is called linguistic. Among all branches of linguistic,

theoretical linguistic is the most important for developing models of linguistic knowl-

edge. The core subjects of theoretical linguistics are phonology, morphology, syntax and

semantics. All parts of theoretical linguistics can be summarized as follows:

• Phonology: is the systematic use of sound to encode meaning in any spoken human

language, or the field of linguistics studying this use. In other words, it is concerned

CHAPTER 2. BACKGROUND AND RELATED WORKS

with the function, behaviour and organization of sounds as linguistic items.

• Morphology: is the study of word formation. It is the study of the internal struc-

ture of words or in other words it is the study of the patterns of word formation in a

particular language, description of such patterns and the behavior and combination

of morphemes.

• Syntax: is the study of the principles and rules for constructing phrases or

sentences in natural languages.

• Semantics: is the study of meaning. It typically focuses on the relation between

signifiers, such as words, phrases, signs and symbols.

• Pragmatics: is the study the ways in which context contributes to meaning. It

studies how the transmission of meaning depends not only on the linguistic knowl-

edge (e.g. grammar, lexicon etc.) of the speaker and listener, but also on the context

of the utterance, knowledge about the status of those involved, the inferred intent

of the speaker, and so on.

• Discourse: is the study of connected speech. A discourse constitutes sequences

of relations to objects, subjects or predicates. Discourse can be observed in mul-

timodal/multimedia forms of communication including the use of spoken, written

and signed language in contexts spanning from oral history to instant message con-

versations to textbooks.

Although phonology is a significant part of theoretical linguistics, it is beyond the scope

of this thesis. Because, it deals with language sounds and our works begins from the word

formation i.e. morphology. For the background purpose, we discuss the concepts related

to the morphology, syntax and semantic layer. We have taken the linguistic definitions

from [25].


2.1.1 Morphology

Morphology is the study of the internal structure of words or in other words it is the

study of the patterns of word formation in a particular language, description of such

patterns and the behavior and combination of morphemes. It can be thought of as a

system of adjustments in the shapes of words that contribute to adjustments in the way

speakers intend their utterances to be interpreted. A word is sometimes placed, in a

hierarchy of grammatical constituents, above the morpheme level and below the phrase

level. We will discuss more on the concept of constituents in Section 2.1.2.

A morpheme is the smallest meaningful unit in the grammar of a language. The

word ‘dogs’ consists of two morphemes: ‘dog’, and ‘-s’, a plural marker on nouns.

A morpheme can be categorized based upon how it combines with other morphemes

to form a word. Here are some kinds of morpheme types:

• Bound morpheme: A bound morpheme is a grammatical unit that never

occurs by itself, but is always attached to some other morpheme. In above example,

‘-s’ is a bound morpheme.

• Free morpheme: A free morpheme is a grammatical unit that can occur by itself.

However, other morphemes such as affixes can be attached to it. In above example,

‘dog’ is a bound morpheme.

• Affix: An affix is a bound morpheme that is joined before, after, or within a

root or stem. In above example, ‘-s’ is an affix.

• Root: A root is the portion of a word that carries the principle portion of meaning

of the words in which it functions. It is common to a set of derived or inflected

forms, if any, when all affixes are removed. A root is a stem also. In above example,

‘dog’ is a root. Another example of root is ‘speak’. It carries the principle portion

of meaning of this word. ‘speaker’ is not a root rather it derived from root.


• Stem: A stem is the root or roots of a word, together with any derivational

affixes, to which inflectional affixes are added. In the above two examples, ‘dog’ is

a root and a stem. But, ‘speaker’ is a stem and ‘speak’ is its root.

• Clitic: A clitic is a morpheme that has syntactic characteristics of a word, but

shows evidence of being phonologically bound to another word. Example of clitic

can be ‘within’, ‘into’, etc.

Among these morphemes clitic is beyond the scope of this thesis. Root, stem and affix

will be discussed after the discussion of morphosyntactic operations.

Morphosyntactic operation is an ordered, dynamic relation between one linguistic

form and another. There are two kinds of morphosyntactic operations:

• Derivation - is the formation of a new word or inflectable stem from another

word or stem. It typically occurs by the addition of an affix. The derived word

is often of a different word class (or category) from the original. It may thus take

the inflectional affixes of the new word class. Example - ‘speaker’ is derived from

‘speak’. ‘Speak’ is root and stem also. ‘Speaker’ is a new stem which is derived from

‘speak’ by derivational operation. Here derivational affix (suffix) ‘er’ is used for this

operation. The derived word ‘speaker’ is a stem but not root. This is because, it

can be further analyzed into meaningful unit ‘speak’ which is the root of ‘speaker’.

Another notable thing in this example is ‘speak’ is a verb where the derived word

‘speaker’ is a noun. Thus the word class of derived word is changed from its root.

• Inflection - is variation in the form of a word, typically by means of an affix,

that expresses a grammatical contrast which is obligatory for the stems word

class in some given grammatical context. As an example, ‘speakers’ is inflected

from the stem ‘speaker’. This inflection is necessary if ‘speaker’ is used for plural

form. Here

‘s’ suffix is used for inflection. The word ‘speakers’ is not a stem. Its category is

✏


same as the category of ‘speaker’. Thus, it is different from derivation as syntactic

category does not change here.

Morphology deals with two kinds of information.

• Firstly, what information is encoded by the morpheme. For example, we can

take an Arabic word kataba - he wrote. A variety of information is encoded in this

word and its other inflected or derived form. Some are listed below:

– Agreement: kataba - ■☛

✳

☛ ☛❏ ➺ – he wrote. Person – 3rd, Number – Singular,

Gender – Masculine., Mood – Indicative.

☛ ✏☛☛– Event structure: kataba -

■✳

☛ ✏☞

❏➺ – he wrote. Tense – Past, Aspect – Perfect.

– Agency: kutiba -

■✳❏☛➺ – it was written. Voice – Passive.

❏ ☞– Illocutionary force: uktub - ■✳

✳ ❏

✏☞➺ ❅ – Write. Mode – Command.

❏☛

– Part-of-Speech: kitabun - á✠ ❑☞ ❆✏☛➺☛ – a

book. kataba - ■☛✳

✏☛➺ – verb.

☞ ✏☛☛

– Definiteness: al-kitabu - ❍✳ ❆❏➸☛ ❐❅ – the book Determiner – Definite.

☛ ✎✏☛☛– Complex Predicate: kattaba - ■✳

– Causation.

❏➺ – he made to write. Semantic relation

There are many more syntactic and semantic phenomena those can be expressed

using morphology.

• Secondly, morphological process which is a means of changing a stem to adjust

its meaning to fit its syntactic and communicational context. It encodes mor-

phosyntactic operations. As an example, plural formation is a morphosyntactic


operation, whereas suffixation is a kind of morphological process that English uses

to encode plural formation. The morphological process for concatenative and non-

concatenative morphosyntactic operations are shown below:

– Concatenative operations are those where morphemes are linearly concate-

nated. This process is also called Agglutination and the language that use it

extensively, is called Agglutinative language. For example:

∗ Prefixation: Morphemes concatenated at the front, e.g., clear – un clear∗ Suffixation: Morphemes concatenated at the back, e.g., walk – walked∗ Circumfixation: Morphemes concatenated both at the front and back,

e.g., mind – un mindful

– Nonconcatenative operations are those where morphemes are nonlinearly

embedded. The language that use this process frequently, is called Fusional

language. For example:

∗ Infixation: Root letter morphemes embedded at the middle, e.g., kataba

— kat taba∗ Simulfixation: Front morpheme shifted to the back, e.g., e at — ate∗ Modification: Middle vowel changed, e.g., man — me n∗ Suppletion: Whole stem changed, e.g., go — went

In this thesis, we mainly focus on nonconcatenative operation as well as concatenative

operation and give a mathematical formalism to capture their rich diversity of Arabic.

2.1.2 Syntax

Syntax is the study of the principles and rules for constructing phrases or sentences in

natural languages. In addition, the term syntax is also used to refer directly to the rules

and principles that govern the sentence structure of any individual language. There are


a number of theoretical approaches to the discipline of syntax. Some popular approaches

among these are -

• Generative grammar,

• Categorical grammar,

• Dependency grammar,

• Stochastic/probabilistic grammars/network theories,

• Functionalist grammars.

Modern research in syntax attempts to describe languages in terms of such rules

which are often addressed as construction rules. These rules are the base of generative

grammar. Our current research is also on forming these rules. So, in the discussion of

syntax we put much emphasis on construction.

A construction is an ordered arrangement of grammatical units forming a larger unit.

Different usages of the term construction include or exclude stems and words. There are

several kinds of construction. Some of these are -

• Apposition - is a construction consisting of two or more adjacent units that have

identical referents. Example - My friend John.

• Clause - is a grammatical unit that includes, at minimum, a predicate and an

ex- plicit or implied subject, and expresses a proposition. Example - It is cold,

although the sun is shining. This sentence contains two clauses. It is cold - it is

the main clause and although the sun is shining - it is the subordinate clause.

• Direct speech - is quoted speech that is presented without modification, as it

might have been uttered by the original speaker. Example - Patrick Henry said,

“Give me liberty or give me death”.


• Indirect speech - is reported speech that is presented with grammatical modifica-

tions, rather than as it might have been uttered by the original speaker. Example

- Patrick Henry said to give him liberty or give him death.

• Phrase - is a syntactic structure that consists of more than one word but lacks

the subject-predicate organization of a clause. For example, the house at the end of

the street is a phrase. It acts like a noun. Unlike clause, phrase lacks the subject-

predicate organization.

• Sentence - is a grammatical unit that is composed of one or more words or phrases

that generally bear minimal syntactic relation to the words or phrases that precede

or follow it.. Example - I am reading a book. This sentence is a composition of three

phrases.

• Stem - is the root or roots of a word, together with any derivational affixes, to

which inflectional affixes are added. It has been discussed in detail in Section 2.1.1.

• Word - is a unit which is a constituent at the phrase level and above. It is

sometimes identifiable according to such criteria as being the minimal possible unit

in a reply.

All these constructions can be classified into two categories. These are lexical con-

struction and phrasal or combinatoric construction. Lexical construction deals

with the forming of lexicon that is forming of words and stems. As an example, forming

of speaker from speak is a lexical construction. On the other hand phrasal construction

deals with formation of larger unit than word and stem. So, this type of construction

forms phrase, clauses and sentences.

Constituent is an import concept in discussion of construction. A constituent is

one of two or more grammatical units that enter syntactically or morphologically into a

construction at any level. For example, the sentence, I eat bananas every day. – contains

the following constituents:

1. Immediate constituents: I, eat bananas everyday


2. Ultimate constituents: I, eat, banana, -s, everyday

There are several related, cross-cutting and sometimes confusing concepts related to

constituents. We explain the concepts at syntactic level. Syntactic constituents can

be classified under syntactic category. A syntactic category is a set of words and/or

phrases in a language which share a significant number of common characteristics. The

classification is based on similar structure and sameness of distribution (the structural

relationships between these elements and other items in a larger grammatical structure),

and not on meaning. It is also known as syntactic class. Among the major syntactic

categories there are phrasal syntactic categories like NP (noun phrase), VP (verb phrase),

PP (prepositional phrase) and lexical categories that serve as heads of phrasal syntactic

categories like noun, verb and others. For example a prepositional phrase (PP) is a

phrase that has a preposition as its head. The definition is similar for noun phrase (NP)

and A verb phrase (VP).

Constituents can perform syntactic functions in the construction. A syntactic

function is the grammatical relationship of one constituent to another within a syntactic

construction. There are various kinds of syntactic functions such as subject, predicate,

object, complement, adjunct, modifier and others.

Syntactic functions are significant in categorical grammar. As HPSG is based on

generative paradigm, here syntactic function are not used for syntax modeling. Here we

model the syntax by construction rules.

2.1.3 Semantics

Semantics is the study of meaning. It typically focuses on the relation between signifiers,

such as words, phrases, signs and symbols. In linguistics, it is the study of interpretation

of signs or symbols as used by agents or communities within particular circumstances

and contexts. The formal study of semantics intersects with many other fields of inquiry,

including lexicology, syntax, pragmatics, etymology and others. The formal study of


semantics is therefore complex.

Semantics is very much related with reference. References are used for agreement.

There are several types of agreements as mentioned in HPSG 94 [33]. Some of these are -

1. Index agreement: It arises when indices are required to be token identical. That

is the value of semantic index of a lexicon needs to agree with the same value of

semantic index of other lexicon.

2. Syntactic agreement: It arises when strictly syntactic objects (e.g. CASE values)

are identified. That is the a lexicon has a syntactic requirement and this requirement

can be fulfilled by other lexicon which has certain syntactic object value.

3. Pragmatic agreement: It arises when contextual background assumptions are

required to be consistent.

Agreement is not syntactic in most of the languages. To show this, we consider

this sentence - the beef sandwich at table six is getting restless. The referent of subject

in this sentence is not “the beef sandwich” rather the customer who ordered it. Like

English, agreement in Arabic language is not syntactic; rather it is semantic. Which

properties of referents are encoded by agreement features is subject to cross-linguistic

variation, but common choices include person, number, gender. In some languages, gender

distinctions correspond to semantic sortal distinctions such as sex, human/nonhuman,

animate/inanimate or shape. Arabic is an example of this type of language. So, here

along with person, number and gender, human/nonhuman distinction must be preserved

for agreement. We will discuss this with example in Section 3.1.3.

2.2 Arabic Morphology

Arabic is rich in nonconcatenative morphology. This nonconcatenative morphology is

mainly root-pattern morphology. In this section, we introduce root-pattern morphology


and its effect in Arabic verb and verbal noun. Then, we discuss different types of Arabic

verbal nouns.

2.2.1 Root-Pattern Morphology

Arabic verb is an excellent example of nonconcatenative root-pattern based morphology.

A combination of root letters are plugged in a variety of morphological patterns with priory

fixed letters and particular vowel melody that generates verbs of a particular type which

has some syntactic and semantic information [3]. Root of any stem denotes a semantic

core and vowel pattern bears the syntactic information. Derivation from common root but

different pattern shares common meaning. Similarly, derivation from same pattern but

different root shares common syntactic information. A particular combination of root-

pattern brings fixed syntactic and semantic meaning. Root and pattern must co-exist and

combination of root and pattern specify semantic meaning.

These information will be conceivable from the following figures. Figure 2.1 shows

how different sets of root letters plugged into the same vowel pattern generate different

verbs with same syntactic information. Similarly, Figure 2.2 shows how same set of root

letters plugged into different vowel pattern generate two lexemes with completely different

syntactic information. But at the same time, these two lexemes share related semantic

meaning.

Besides vowel pattern, a particular verb type depends on the root class. This root class

is determined on basis of the phonological characteristics of the root letters. Root classes

can be categorized on basis of the number of root letters, position or existence of vowels

among these root letters and the existence of a gemination (tashdeed). Most Arabic

verbs are generated from triliteral and quadriliteral roots. In Modern Standard Arabic

five character root letters are obsolete. Phonological and morphophonemic rules can be

applied to various kinds of sound and irregular roots. Among these root classes, sound

root class is the simplest and it is easy to categorize its morphological information. A

k t b


Root (k,t,b) Root (n,s,r)

kataba

nasara(He wrote) (He helped)

stem

stem

Pattern (_a_a_a)

Figure 2.1: Root-pattern morphology1: 3rd person singular masculine sound perfect active

form-I verb formation from same pattern ( a a a)

sound root consists of three consonants all of which are different [37]. On the other hand,

non-sound root classes are categorized in several subtypes depending on the position of

weak letters (i.e., vowels) and gemination or hamza ( ✆➠). All these subtypes carry

mor-phological information.

2.2.2 Morphology in Arabic Verb and Verbal Noun

From any particular sequence of root letters (i.e., triliteral or quadriliteral or weak or

sound), up to fifteen different verb stems may be derived, each with its own template or

vowel pattern. These stems have different semantic information. Western scholars usually

refer to these forms as Form I, II, . . . , XV. Form XI to Form XV are rare in Classical

Arabic and are even more rare in Modern Standard Arabic. These forms are discussed in

detail in [37]. Table 2.1 shows the semantic effect and example of the mostly used verb


Root (k,t,b)

kataba(He wrote) stem

kaa ti bun(Writer)

Pattern(_a_a_a) Pattern

(_aa_i_un)

Figure 2.2: Root-pattern morphology2: same root (k,t,b) contains same kind of semantic

meaning

forms [i.e. Form I to X]. Every particular sequence of root letters may not have a meaning

word for a particular verb form. As an example, the root sequence - k, t, b, does not have

a meaning word for Form IX.

These morphological verb forms has no relation with the verb form based on events

structure. There are three type of verb form based on event structure - perfect, imperfect

and imperative. Perfect indicates that the event has been completed, imperfect indicates

that the event has not yet been completed, and imperative indicates that the event is a

command. It is worth mentioning that Form I has eight subtypes depending on the vowel

following the middle letter in perfect and imperfect forms. Some types of verbal noun

formation depend on these subtypes. Any combination of root letters for Form I verb will

follow any one of these eight patterns. We refer these patterns as Form IA, IB, IC, . . .,

IH. These subtypes are shown in Table 2.2 with corresponding examples. For example,

the vowels on the middle letter for Form IA: nasara yansuru are a and u for perfect and

imperfect forms, respectively. Similarly, other forms depend on the combination of vowels

on these two positions. Not all kinds of combinations exist. In Form IH, the middle letter

is a long vowel and there is no short vowel on this letter. In summary, we can generate

different types of verbal nouns based on these verb forms, root types (position of weak

Form Example Meaning

Form I (Transitive) ☛ ✏❏☛He wrote

Form II (Causative) ☛ ✎✏☛☛He caused to write

Form III (Ditransitive) ☛ ✏❑ ☛

He corresponded

Form IV (Factitive) ☛ ❏➺ ☛

He dictated

Form V (Reflexive) ☛ ✎✏☛☛ ❑ It was written on its own

Form VI (Reciprocity) ☛ ✏❑ ☛

✏❑They wrote to each other

Form VII (Submissive) ☛ ✏❏☛ ✠ He was subscribed

Form VIII (Reciprocity) ☛ ✏☛✏☛They wrote to each other

Form IX (Color or bodily defect) ✎☛☛ It turned to red

Form X (Control) ■☛✳ ❏☛➸✏☛ ☛❅ (istaktaba

)

He asked to write


Table 2.1: Arabic Verb Form

■✳ ☛➺

(kataba )

■✳ ❏➺ (kattaba )

■✳ ☛ ❆➾

(kataba )

■✳ ☛ ❅ (aktaba )

■✳ ❏➸✏☛

(takattaba )

■✳ ☛ ❆➽☛

(takataba )

■✳ ☛➸❑☛❅

(inkataba )

■✳ ✜❏➺ ☛❅ (iktataba )

◗Ô❣☛❅ (ih. marra )

Form Example Perfect

mid-vowel

Imperfect

mid-vowel

Form−IA ☞ ☞ ✠ ☛ ☛ ☛ ☛✠ a u

Form−IB ☞✠ ☛ ☛ ☛ ☛✠ a i

Form−IC ☞ ✏☛ ✠ ☛ ☛ ✏☛☛✠

a a

Form−ID ☞ ☛ ☛ ☛❹ i a

Form−IE ☞☛ ☛ ☛ u u

Form−IF ☞ ☛ ☛ i i

Form−IG☞ ✠ ✠ ☛ ☛ ☞✠ ☛✠

u i

Form−IH ❳☞ ❆ ☛ ❑✡☛ ❳☛ ☛➽ ❆➾ (kada yakadu )

✏


letter or gemination) and number of root letters.

Table 2.2: Subtype of Form I Root Verb

◗å➈❏ ◗å➈✢ (na.sara yan.suru )

❍✳ ◗☛å➈ ❍✳ ◗å➉ (d. araba yad. ribu )

✐❏➤ ✐❏➥ (fatah. a yaftah. u )

➞Ò❶ ➞☛ Ö

☛Þ

(sami,a yasma,u )

Ð◗☞➸❑✡■✳ ❶☛ ♠✚

➱➆☛ ➤❑✡

Ð◗☞➺ (karuma yakrumu )

■☛✳ ❶☛ ❦ (h. asiba yah. sibu )

➱➆➥ (fad. ula yafd. ilu )

All these verb stems, derived from a single root verb, have different verbal nouns.

Table 2.3 shows the list of active participle and passive participle for all verb stems

including the root verb ■☛

✳

☛ ☛❏ ➺ (kataba ). All type of verbal noun may not exist for a

particular form. In Table 2.3 passive participle does not exist for Form−IX.

2.2.3 Classification of Arabic Verbal Nouns

In this part, we discuss the eight types of nouns derived from verbs [22]:

Form Verb Stem Active Participle Passive Participle

Form−I ☛ ✏☛☛ ✔ ✏ ☛ ✔✏☞ ☛

Form−II ☛ ✎✏☛☛ ✎ ✎✏☛☛ ☞

Form−III ☛ ✏☛ ☛ ✔ ✏ ☛ ☞ ✔ ✏☛ ☛ ☞

Form−IV ☛ ✏☛ ☛ ✔ ✏ ☞ ✔ ✏☛ ☞

Form−V ☛ ✎✏☛☛

✏☛✔ ✎ ✔ ✎✏☛☛ ✏☛

☞Form−VI ☛ ✏☛ ☛ ✏

☛✔ ✏ ☛ ✏☛ ☞ ✔ ✏☛ ☛ ✏☛

☞Form−VII ☛ ✏☛☛ ✠ ✔ ✏☛ ✠ ☞ ✔ ✏☛☛ ✠ ☞

Form−VIII ☛ ✏☛✏☛ ✔ ✏✏☛ ☞ ✔ ✏☛✏☛ ☞

Form−IX ✎☛ ✏☛ ✔✎ ✏☛ ☞ N/A

Form−X ☛ ✏☛ ✏☛ ✔ ✏ ✏☛ ☞ ✔ ✏☛ ✏☛ ☞


Table 2.3: Verbal Nouns Derived from Different Forms

■✳ ❏➺ (kataba )

■✳ ❏➺ (kattaba )

■✳ ❑❆➾ (kataba )

■✳ ❏➺ ❅ (aktaba )

■✳ ❏➸❑ (takattaba )

■✳ ❑❆➽❑ (takataba )

■✳ ❏➸❑☛❅ (inkataba )

■✳ ✜❏➺ ☛❅ (iktataba )

■✳ ❏➺ ☛❅ (iktabba )

■✳ ❏➸❏❷☛❅ (istaktaba )

■✳ ❑☛ ❆➾ (katibun )

■✔✳ ❏➸Ó (mukattibun )

■✳ ❑☛ ❆➽Ó (muk¯atibun )

■✳ ❏☛➸Ó (muktibun )

■✳ ❏➸❏Ó (mutakattibun )

■✳ ❑☛ ❆➽❏Ó (mutakatibun )

■✳ ❏☛➸❏Ó (munkatibun )

■✳ ✜☛❏➸Ó(muktatibun )

■✳ ❏➸Ó (muktabbun )

■✳ ❏☛➸❏❶Ó (mustaktibun )

❍✳ ñ❏➸Ó (maktuwbun )

■✔✳ ❏➸Ó (mukattabun )

■✳ ❑❆➽Ó (muk¯atabun )

■✳ ❏➸Ó (muktabun )

■✳ ❏➸❏Ó (mutakattabun )

■✳ ❑❆➽❏Ó (mutakatabun )

■✳ ❏➸❏Ó (munkatabun )

■✳ ✜❏➸Ó(muktatabun

)

■✳ ❏➸❏❶Ó (mustaktabun )

✠

✏


1. Gerund ( P❨☛

➆Ó☛

verb.

Õæ❹☛❅ - ism ma.sdar )- names the action denoted by its corresponding

☛2. Active participle ( ➱➠☛ ❆➤❐❅ Õæ❹☛❅ - ism alf¯a,il )- entity that enacts the base meaning i.e.

the general actor.

☛3. Hyperbolic participle ( é➟❐❆❏✳ ÜÏ❅ Õæ❹☛❅ - ism almubalag˙ ah )- entity that enacts the base

meaning exaggeratedly. So it modifies the actor with the meaning that actor does

it excessively.

4. Passive participle ( ➮ñ➟☞ ➤✠ Ü☛ Ï❅ Õæ❹☛❅ - ism almaf,uwl )- entity upon which the

base meaning is enacted. Corresponds to the object of the verb.

✏ ☛ ✎☛ ✑☛ ☞ ✏☞ ☛✠ ☛5. Resembling participle ( é î❉✳ ❶ ÜÏ❅ é ➤ ➆☛ ❐❅ - al.sifatu’lmuˇsabbahah )- entity

enacting (or upon which is enacted) the base meaning intrinsically or inherently. Modifies the

actor with the meaning that the actor does the action inherently.

✏☛ ☛6. Utilitarian noun ( é❐❇❅ Õæ❹☛❅ - ism alalah )- entity used to enact the base

meaning i.e. instrument used to conduct the action.

✠ ☛✠7. Locative noun ( ➡◗➣❐❅ Õæ❹☛❅ - ism al.zarf )- time or place at which the base

meaning is enacted.

8. Comparative and superlative ( ➱ ➆✠ ➤✠ ✏☛ ❐❅ Õæ❹☛❅ - ism altafdil )- entity that enacts (or☛ ❏ .

upon whom is enacted) the base meaning the most. In Arabic, this type of word is

categorized as a noun, but it is similar to an English adjective.

Examples of these eight types of verbal nouns are presented in Table 2.4. Each of

these types can be subcategorized on the basis of types of verbs. To understand complete

variation of verb and its morphology we should have some preliminary knowledge of the

Arabic verb [20].

Root verb Verbal noun Example Meaning

,alima (alima)

means

“he knew”

Gerund ☞

☛“Knowing”

Active participle ✔Õ

☛❐

“One who knows”

Hyperbolic participle✔✏ ☛ ✎

“One who knows

a lot”

Passive participle ✔ ☞

(ma,luwmun )

“That which is known”

Resembling participle ✔Õæ

✡✃

“One who knows

intrinsically”

Utilitarian noun ✔ ☛

“Through which

we know”

Locative noun ✔Õ

❰☛

“Where/when we know”

Comparative and

Superlative

☞ ☛ ✌

“One who knows

the most”


Table 2.4: Different Types of Verbal Nouns

Õ❰➟☛

❐❅

éÓ❈➠

Ðñ✃➟Ó☛

Õ❰➟Ó

☛

Õ❰➠❅


2.3 An HPSG Primer

HPSG is highly lexicalized, non-derivational constraint-based, surface oriented grammat-

ical architecture developed by Carl Pollard and Ivan Sag [32, 33]. It combines the best

idea from its predecessors - Generalized phrase structure grammar (GPSG) [15], Lexical

functional grammar (LFG) [6], Government and binding theory (GB) [8]. It combines

linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) and for this

reason, it is very attractive in Natural Language Processing. Its highly lexicalized prop-

erty gives the flexibility to modify the lexicon depending on language to capture different

features. A lexical entry, represented in AVM (Attribute Value Matrix), may describe

the sign partially. Each lexical entry must have a type, and its subtypes are part of a

big structure that forms the type hierarchy. Thus, HPSG is seen consisting of inheritance

hierarchy of sorts with constraints of various kinds on the sort of linguistic object in the

hierarchy [16]. There is no distinction between terminal and non-terminal nodes in HPSG.

This is related to the fact that HPSG is a “fractal” [A fractal is a rough or fragmented

geometric shape that can be split into parts, each of which is (at least approximately)

a reduced-size copy of the whole], every sign down to the word level has syntactic, se-

mantic and phonological features encoded in a similar manner [31]. Thus we can work

on a specific level or surface of this hierarchy and use unification to reuse and extend the

structure.

HPSG includes grammar rules and lexical entities. Normally, the latter are not con-

sidered to belong to a grammar. The formalism is centered around lexicons. This means

that the lexicon is more than just a list of entries; it is in itself richly structured.

In HPSG terminology, the basic grammatical type is the sign, which is a formal rep-

resentation of words, phrases and sentences. All human utterances are captured by signs.

A rule that licenses a sign, is captured by another object called construct. Signs and

constructs are formalized as typed feature structure which is a set of attribute-value pairs.

Attributes are called linguistic objects. The value of an attribute may be either atomic or

⎢⎥

⎢


complex i.e. function. Functions are those feature structures which are described using

an attribute value matrix (AVM).

The generic construct of a sign is presented in Figure 2.3. The AVM basically maps

features to feature structure. A feature in an AVM can be of two types: (a) category name,

i.e., sort description and (b) agreement (or constraints), which is a list of attributes and

their values.

Feature

Value⎡ PHON⎢MORPHphonobj ⎤

morphobj ⎥Phonology

Morphology⎢ SYN⎢ SEM⎣⎢ M

synobj

semobjM

⎥ Syntax⎥⎥ Semantics⎦⎥An HPSG

Sign

Figure 2.3: An HPSG Sign.

A construct is represented using a feature structure with MOTHER (MTR) feature

and DAUGHTERS (DTRS) feature. The value of MTR feature is a sign and the value

of DTRS is a nonempty list of signs. A typical description of a construct is shown in

Figure 2.4. The licensing of signs follows the Sign Principle which states that “Every

sign must be lexically or constructionally licensed. A sign is lexically licensed only if it

satisfies some lexical entry, and constructionally licensed only if it is the mother of some

construct ” [39].

HPSG modeling of any language starts from building a very detailed type hierarchy

which is both linguistically motivated as well as captures the language independent con-

straints. From this type hierarchy, the attribute value matrix for linguistic signs can be

constructed. In this thesis, we use the Sign-Based Construction Grammar (SBCG) [38]

version of HPSG. Unlike standard presentations of HPSG, where the type constraints form

part of the signature of a grammar, the type constraints of SBCG are an essential part of


Feature

Value

⎡ MTR⎢ signMother⎤⎥⎣DTRS list (sign)⎦

List of DaughtersAn HPSG

Construction

Figure 2.4: An HPSG Construction.

the body of the grammar. A standard SBCG type hierarchy is shown in Figure 2.5.

From the type hierarchy, we know that every linguistic object can be modeled using

feature-structure. There are two types of feature structures. Atoms are simple feature

structures, which indicate the terminal value of various linguistic attributes. Functions

are complex feature structure, which are expressed using attribute value matrix and can

contain other feature structures as their feature values. Sign and cxt(construct) both

are feature-structure. The attribute of signs are also feature-structure; phon-obj, syn-obj,

sem-obj, etc. Frames are semantic representation of events. There are two types

of constructions; phr-cxt (phrasal) and lex-cxt (lexical). There are also two types of

signs; lex-sign and expression. For the detail of this type hierarchy, see [38].

In HPSG, the semantic information is expressed in Minimal Recursion Semantics

(MRS), as developed in CSLI’s Linguistic Grammars Online (LinGO) project [10, 11].

Most semantic information in MRS is contained under the feature FRAMES. In this

list, for verb there is a frame event-fr which contains a Davidsonian event variable and

index-valued features such as act(or) and und(ergoer) [12, 13]. These variables are used

for contain information which is used for agreement purpose also. In Section 2.1.3, we

discuss about these semantic agreements.


feature-structure

function atom

cat phon-obj pos

sign syn-obj

cxt sem-obj

frame

noun

verb

lex-sign

expression

phr-cxt lex-cxt

event-fr

word phrase … infl-cxt

und-fr

act-fr

soa-fr

lexeme … deriv-cxt … act-und-fr

act-soa-fr

si-lxm sc-lxm

… und-only-fr

act-und-soa-fr

try-fr

trans-lxmsr-lxm

to-be-split-fr

write-fr cause-fr

Figure 2.5: A Standard SBCG type hierarchy

2.4 Related Works

This section is dedicated for discussion of linguistic modeling of morphology related works.

At the beginning of this section, we give an overview of overall works related to computa-

tional modeling of Morphology. Then we put emphasis on HPSG modeling of morphology.

As Semitic languages like Arabic, Amharic and Hebrew are rich in morphology, we give a

glimpse on HPSG modeling of Hebrew as there are mentionable amount of works done in

this area. At the end of this section, we discuss HPSG modeling of Arabic language and

its morphology.

2.4.1 HPSG Modeling of Morphology

HPSG is one of the most successful grammars to process natural languages specially to

process syntactic and semantic aspects but it has inadequate coverage on morphological


construction specially for nonconcatenative morphology. Nonconcatenative morphology

is not so plentiful in the mostly used languages. But this phenomenon is abundant in

Semitic languages such as Arabic, Amharic, Hebrew, etc. Among these Semitic languages,

Arabic is the mostly used and very rich in nonconcatenative morphology. Its precious mor-

phology attracts several series of research projects [1, 7, 40]. These research projects are

mainly based on development of toolkit for Arabic morphological analysis. These projects

are not based on compiler development rather these are dedicated for morphological an-

alyzer which designs and implements finite state morphological models. From linguistic

perspective, these models describe rules of lexicon development and derive lexicons.

Morphology of Sierra Miwok and French were modeled in HPSG by phonological

realization [5]. The author also showed how nonconcatenative morphology can be captured

by his framework. He further mentioned the idea how consonant and vowel melody forms

the word in Arabic. But he did not show any construction rule for any language.

Susanne modeled concatenative morphology in German and English by HPSG formal-

ism in 1998 [35, 36]. In that paper, she captured the morphological derivation by a special

feature called MORPH-B which means morphological base. This MORPH-B feature

serves the purpose of derivation. This MORPH-B feature can be used to capture non-

concatenative morphology also. The alternative of this mechanism is lexical construction

rule [38]. This is also widely used in HPSG modeling.

An HPSG formalism of morphological complex predicate is outlined [9]. Here the au-

thor mostly focused on syntax and semantics of causative construction. He used lexical

rule with semantic frames to capture morphological effect. As Japanese is an Agglu-

tinative language, the morphology used here is concatenative morphology. Thus HPSG

modeling of nonconcatenative morphology is still untouched.

As mentioned earlier, HPSG modeling of nonconcatenative morphology is relatively

new area of research. There are few mentionable works in nonconcatenative morphology

of Semitic languages. We discuss about this in detail in the Sections 2.4.2 and Section


2.4.2 HPSG Modeling of Hebrew

Semitic languages exhibit rich morphological operations. Both concatenative and noncon-

catenative morphology are abundant in these languages. Among these languages, HPSG

modeling of Hebrew is not new but it lacks its coverage on morphology. In 2000, Nathan

Vaillette presented a paper on Hebrew relative clauses [41]. In this paper, he nicely mod-

eled the phrasal construction rules to capture Hebrew relative clauses. He did not put

emphasis on morphological operation.

Susanne extended her work on German and English concatenative morphology in

2001 and along with German and English, she added the nonconcatenative morphology

of Hebrew verbal nouns [36]. She proposed an AVM for Hebrew verbal noun. This AVM

has similarity with the AVM we proposed for verbal noun regarding the morphological

feature. But she did not show any syntactic effect of this morphology. She articulated

the AVM by placeholders for consonants. By placing the list root consonants, from this

AVM, verbal noun AVM will be generated. She did not ensure that only valid verbal

nouns will be generated from this AVM. Her solution can be used to automate lexical

entry in dictionary or corpus but will not reduce the number of entry. Actually, she just

gave a glimpse on morphology of Hebrew verbal noun in her massive work.

A detail work on verb initial construction (which is also called verbal sentence as

opposed to nominal sentence and in this type of sentence verb precedes the subject)

was shown [26]. In that work, the authoress put emphasis on Modern Hebrew verb

related phrasal construction. She discussed the agreement of verb with its subject and

complement. She also showed concatenative and nonconcatenative morphology of Hebrew

verb in that paper but did not give any formalism of this morphology like what were

modeled in German or Japanese [9, 36]. She mainly discussed the syntactic effect of these

inflected verb forms. She also presented an implementation framework of HPSG grammar.


In 2007, Nurit presented a comparision of the implementation platform of HPSG [27].

She discussed the advantages and disadvantages of TRALE (An extension of the Attribute

Logic Engine) and Linguistic Knowledge Building (LKB). This paper is very useful to

choose the implementation platform of HPSG.

2.4.3 HPSG Modeling of Arabic

In 2006, an HPSG analysis of broken plural and gerund has been presented [24]. Main

assumption in that work revolves around the Concrete Lexical Representations (CLRs)

located between an HPSG type lexicon and phonological realization. Here, HPSG sign

was represented using CLR function not by AVM and this function put more emphasis on

phonology instead of morpho-syntactic operation. But main drawback of this work is it

does not deal with other type of verbal noun and it does not dictate any implementation

of CLR.

HPSG modeling of Arabic triliteral strong verb was proposed in 2008 [2–4]. The

authors in these papers, show regular morphology of Arabic verb. They designed the

SBCG AVM of Arabic verb. They also designed several verb lexeme construction and

morphologically complex predicates (MCP). But they did not touch the morphological

derivation of verbal noun. Also, they did not give any distinct way to implement the

construct proposed in their works. During our work on verbal noun construct, we have to

work with SBCG verb lexeme too. We adopt the verb lexeme proposed in these papers and

modify it to cope with all the cases that we have found. The authors did not propose any

idea about SIT-INDEX and INDEX and they actually duplicated the INDEX feature with

ref-fr semantic frame which is never used in any HPSG or SBCG literatures. The atomic

features (person, number and gender), that are used under INDEX function feature by

Pollard and Sag [33], are used under ref-fr in these papers where at the same time they

still keep INDEX feature and does not show its components. We correct this INDEX and

SIT-INDEX related problem. This will be discussed in Section 3.2.


A nice HPSG formalism of Arabic nominal sentence is presented [29]. The paper intro-

duces a grammar for Arabic nominal sentence. They have implemented their formalization

using LKB system. The main limitation of this work is it deals with only agreement of

nominal sentences and it does not discuss on morphology at all. Another big limitation

in this work is the assumption - agreement information in Arabic arises from syntactic

rules and that it obeys grammar rules. But in Section 2.1.3 and 3.1, we have established

that agreement in Arabic is not always syntactic and the agreement feature needs another

feature humanness (HUM) which is not mentioned in the discussed work.

A parser on Arabic relative clause is designed in [17]. It is not a deep research and a

study about different forms of relative clauses to process relative sentences. Thus,

we can conclude that the rich nonconcatenative morphology of Arabic verbal noun is

not yet explored and we have the opportunity to do it. In 2010, part of this work was

published [19]. In that paper, we proposed the construction rules but did not articulate

any implementation.

Chapter 3

HPSG Formalism for Verbal Noun

In this chapter, we model the HPSG categories of verbal nouns and their derivation from

different types of verbs through HPSG formalism. In Section 2.3, we mention that

we adopt the SBCG [38] for this analysis. Here, we give an AVM for nouns and

extend it for verbal nouns. We extend the verb AVM proposed by Bhuyan et al. [2–4].

We propose a multiple inheritance hierarchical model for Arabic verbal nouns and how

to get a sort description from the type hierarchy. Finally, we propose construction rules of

verbal nouns derived from strong triliteral i.e. Form I root verbs.

3.1 AVM of Arabic Nouns

We modify the SBCG feature geometry for English and adopt it for Arabic. The SBCG

AVMs for nouns in English and in Arabic are shown in Figure 3.1 and Figure 3.2, respec-

lid . . .

tively.

The PHON feature is out of the scope of this paper. Three main function features -

MORPH, SYN and SEM are discussed in the following subsections.

CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

noun-

lex phon [] form []

arg-st list(sign)

noun

case . . . cat select . . .syn

xarg . . .

sem

val list(sign) mrkg mrk

index i

frames list(f

rame)

Figure 3.1: AVM for English noun

3.1.1 MORPH

The MORPH feature captures the morphological information of signs and replaces the

FORM feature of English AVMs. This feature is similar to MORPH feature used for

Hebrew verbal noun [36]. The value of the feature FORM is a sequence of morphological

objects (formatives); these are the elements that will be phonologically realized within the

sign’s PHON value [38]. On the other hand, MORPH is a function feature. It not only

contains these phonologically realized elements but also contains their origins. MORPH

contains three features - ROOT, STEM and DEC. ROOT feature contains root letters

for the following cases:

1. The root is characterized as a part of a lexeme, and is common to a set of derived

or inflected forms

2. The root cannot be further analyzed into meaningful units when all affixes are

removed

3. The root carries the principal portion of meaning of the lexeme


noun-

lex phon []

root list(letter)

morph stem list(letter)

dec . . .

arg-st list(sign)

noun

case . . . cat def . . .

syn select . . .

xarg . . .

lid . . .

val list(sign) mrkg mrk person

. . .

number . . . index

sem gender . . . hum . . . frames list(f rame)

Figure 3.2: AVM for Arabic noun


In rest of the cases, the content of this feature is empty.

The STEM feature contains a list of letters, which comprises the word or phrase or

lexeme. We can identify the pattern in a lexeme by substituting the root letters by the

placeholders if any root exists in STEM. As an example, the ROOT of the lexeme ‘kataba’

contains ‘k’, ‘t’ and ‘b’ and the pattern of the STEM is ( a a a). Without the existence

of this pattern, the ROOT is irrelevant. Thus a pattern bears the syntactic information

and a ROOT bears the semantic information. Lexemes which share a common pattern

must also share some common syntactic information. Similarly, lexemes which share a

common root must also share some common semantic information. STEM is derived from

the root letters by morphology if root exists.

The DEC (declension type) feature under the MORPH feature maps to the declension

type of noun. It determines how the end vowel of noun lexemes changes to reflect its case.

The change of end vowel changes the form of a lexicon. There exists nine possible ways in

which grammatical cases can be represented on an Arabic noun. So, for declinable noun,

value of DEC feature can be T 1, T 2, T 3, . . . , T 9, corresponding to the nine declension

types. The value of this DEC feature can be determined from type hierarchy of noun

lexeme. It needs further research and it is beyond the scope of this thesis. In our current

research, we will not mention this feature in the following AVM’s but we keep it in our

basic design to make our design robust for inflection also.

3.1.2 SYNTAX

The SYN feature contains CAT, VAL and MRKG features. We modify the CAT feature

of SBCG to adopt it for Arabic language. Note that, for all kinds of verbal nouns the sort

description of the CAT feature is noun. In Arabic there are only three parts of speech

(POS) for lexemes or words: noun (in Arabic pronoun is also considered as noun), verb

and particle. Any verbal noun serving as a modifier is also treated as noun. In the case

of the Arabic noun, the CAT feature consists of CASE, DEF, SELECT, XARG and LID

☛


features. Among these features, we introduce DEF feature which is used for syntactic

agreement in phrasal construction. This feature also strengthen our design. As Arabic

has three cases for noun, the value of CASE will be nominative, accusative and genitive.

The DEF feature denotes the value of definiteness of an Arabic noun. There are eight

ways by which a noun word or lexeme becomes definite [21]. Personal pronouns such as

☞ ✎✗ ✌“he”, “I” and “you” are inherently definite. Proper nouns are also definite. é❁❐❅ (-al-

lahu )

is another instance of definite lexeme. These examples confirm that definiteness has to be

specified at the lexeme level. The article ‘al’ also expresses the definite state of a noun of

any gender and number. Thus if the state of a noun is definite, the noun lexeme contains

yes as the value of DEF, otherwise its value will be no.

In Arabic, there is a significant role of this definiteness (DEF) feature for syntactic

agreement. A nouns and its modifier must agree on the DEF feature value. For example,

◗☞Ô☛❣ ❅ ❍☞ ❆✏☛➸❐

☛ ❍❆✏☛➸❐☛

❇ ✳ ❏☛

❅ (alkitabu ’l--ah. maru ) means “the red book”. ☞

✳ ❏☛

❅ (alkitabu ) means “the

☞ ☛book” and ◗Ô❣❅ (-ah. maru ) means “red”. As “red” is used as a modifier for “the book”,

☞☛ ☛ ☛the definiteness prefix ‘al’ has been added to ◗Ô❣❅ (-ah. maru ) yielding ◗☞Ô❣❇❅ (al--ah. maru ).

3.1.3 SEMANTICS

Like SBCG in English, SEM feature in Arabic contains two function features - INDEX and

FRAMES. The INDEX is used for index based semantic agreement which is mentioned in

Section 2.1.3 and FRAMES contains the list of frames which contain semantic information

in Minimal Recursion Semantics (MRS).

☛

.


As mentioned earlier in Section 2.1.3, person, number, gender and human/nonhuman

- these information must be kept for semantic agreement. So, INDEX feature is composed

of PERSON, NUMBER, GENDER and HUM and it is contained under SEM. We use this

index based agreement [33] as opposed to putting the agreements under AGR feature [23].

This is because index based agreement is more customary in HPSG and most of the

scholars use index based agreement.

HUM feature is introduced by us for Arabic. The other three features are also used for

semantic agreement in English [33]. This HUM feature denotes humanness. Depending on

languages, agreement may have gender, human/non-human, animate/inanimate or shape

features [33]. In Arabic, Humanness is a crucial grammatical factor for predicting certain

kinds of plural formation and for the purpose of agreement with other components of a

phrase or clause within a sentence. The grammatical criterion of humanness only applies

☞ ☛ ☛ ✕☛ ☛to nouns in the plural form. As an example, “these boys are intelligent” ( ❳❇ð❇❅

❩❇ñ ë

❩✕☛ ✔✏✎☛ ☛✠ ☞ ☛❆❏✡➺☛ ❳

✠ ❅ - ha-ul¯a- alawl¯adu -adkiy¯a- ) and “these birds are intelligent” ( é❏✡➺

☛ ❳ P☞ñ❏✡➣❐❅ è☛ ❨✠

ë - hadihi¯ ¯’ltuywru dakiyyatun ). Both of these sentences are plural. But the former refers to human

¯beings whereas latter refers to non-humans. So the same word “intelligent” (dakiyyun )

✕☛ ✔✏✎☛ ☛✠ ¯

has taken two different plural forms in two sentences: ❩❆❏✡➺☛ ❳✠ ❅ (-adkiy

- ) and é❏✡➺☛ ❳ (dakiyyatun

¯✕☛ ☛✌

¯

). In the case of boys, it is in the third person masculine plural form ( ❩❆❏✡➺☛ ❳✠

❅ - -adkiy - )✔✏✎☛ ☛✠

¯

whereas in case of birds, it is in the third person feminine singular form ( é❏✡➺☛ ❳ - dakiyyatun

✔✏ ✎☛ ☛✠ ¯

). Also, from the third person feminine singular form ( é ❏✡ ➺☛ ❳ - dakiyyatun ), we cannot

¯readily say that it refers to feminine. In fact, it may refer not plural of nonhuman beings

too. This is why, along with PERSON, NUMBER and GENDER, we keep HUM as a

semantic agreement feature.

lid . . .


If the noun refers to a human being then the value of HUM is yes, otherwise it is no.

The value of PERSON for Arabic noun can be 1st, 2nd or 3rd. There are three number

values in Arabic. So, the value of NUMBER can be sg, dual or pl denoting singular,

dual or plural, respectively. The GENDER feature contains either masc or f em denoting

masculine and feminine respectively.

3.2 AVM of Arabic Verbs

As we will formulate construction rules which capture the linguistic derivation of noun

from verb, we need to model the AVM of verb. We modify the verb AVM proposed by

Bhuyan et al. [2]. We correct the index related problem found in that work. We disscuss

the problem in detail in Section 2.4.3. We try to align the design of verb AVM with that

of noun AVM. Figure 3.3 shows the SBCG AVM of Arabic verb. verb-

lex phon [] morph

root list(letter) stem list(letter) vdec list(letter)

arg-st list(sign)

verb

vform . . . cat

voice . . .

mood . . . syn

select . . . xarg . . . val list(sign)

mrkg mrk

sem

sit-index

situation . . .

frames list(f rame)

Figure 3.3: AVM for Arabic verb


MORPH feature in the verb AVM is similar to MORPH in noun AVM except the

VDEC feature. It captures the declension type of verb and it replaces the DEC feature

which captures the declension type of noun. Like DEC, it determines how the end

vowel of noun lexemes changes to reflect the mood of vowel. The change of end vowel

changes the form of a verb lexicon. There exists five possible ways in which

grammatical cases can be represented on an Arabic verb. So, for declinable verb, value

of VDEC feature can be V T 1, V T 2, V T 3, . . . , V T 5, corresponding to the five

declension types. The value of this VDEC feature can be determined from type

hierarchy of verb lexeme. It needs further research and it is beyond the scope of this

thesis. In our current research, we will not mention this feature in the following AVM’s

of verb. We keep it in our basic design to make our design robust for inflection also.

SYN in this AVM is same as in standard SBCG verb AVM. VFORM contains the

type of verb form. In Arabic, there are three types of verb form. The feature value

of VFORM can be perf ect, imperf ect or imperative. Perfect indicates that the event

has been completed, imperfect indicates that the event has not yet been completed, and

imperative indicates that the event is a command.

There are two types of voice in Arabic; active and passive. So, the value of VOICE fea-

ture can be either active or passive. The value of MOOD can be indicative, subjunctive

or jussive.

Like SYN, SEM feature in this AVM are same as in SBCG English verb AVM. SIT-

INDEX i.e. situation index is used for index based semantic agreement. SBCG does not

show any distinction between INDEX and SIT-INDEX. Also, it does not show the feature

description of SIT-INDEX. We put it as a function feature but currently it has only one

atomic attribute. This attribute is SITUATION. It contains the name of the verb. This

SIT-INDEX is used in event-frames of verb and verbal noun lexeme. Thus ultimately it

is very similar to Davidsonian event variable [12].

Like AVM for noun, FRAMES contains the list of frames which contain semantic


information in Minimal Recursion Semantics (MRS). These frames contain indices of

both kinds INDEX and SIT-INDEX.

3.3 Type Hierarchy of Verbal Noun

As mentioned in Section 2.2, the derivation of verbal nouns from verbs depends on the

number of root letters, the verb form and the root type. In Figure 3.4, we give a type

hierarchy of Arabic verbal nouns.

noun-lex

… DERIVATION …

non-derived derived

… verb-derived

gerund active-

hyperbolic- passive- resembling- locative-

utilitarian- comparative

participle participle

participle

participle

noun noun

Figure 3.4: Lexical type hierarchy of Arabic noun lexeme.

We analyze the Arabic noun from verbal noun perspective. So, we classify noun

lexeme only on derivation dimension. Some other dimension can be the end ending type

or declension type of noun lexeme. As shown in this Figure 3.4, eight types of verbal nouns

are immediate daughters of verb-derived-noun. Each of these eight different verbal nouns

can be subcategorized on the basis of the properties of the root verb, which are mentioned

in Section 2.2. Each verb carries distinct information on these properties, which form the


dimensions of classification for verbs. So, the three dimensions for root verbs are: number

of root letter, type of the root and verb form. For lack of space we discuss in detail only

the subtypes of active participles.

active-participle

NUMBER OF ROOT LETTER

TYPE OF ROOT

VERB FORM

triliteral-root- derived

sound-root- derived

formI- derived

formII- derived

formI-triliteral- sound-active-

participle

formII-triliteral- sound-active-

participle

Figure 3.5: Lexical type hierarchy of Active participle.

In Figure 3.5, active-participle is at the root. Categorizing it along the number of

letters in root verb, we get two types of active participles, derived from triliteral and

quadriliteral root verb. Again classifying the active participle along the root type, we

find several types of roots and thus verbal nouns. Categorizing along the verb form

dimension, we get Form I, . . ., Form X active participles. Categories in one dimen-

sion cross-classifies with categories in other dimensions and form different subtypes like

form-I-triliteral-sound-active-participle, form-I-triliteral-sound-passive-participle, form-I-

triliteral-sound-gerund, etc. Not all these forms generate all types of verbal nouns i.e.

some of these forms do not have verbal nouns of all corresponding types. For example,

locative nouns are generated from triliteral Form I root verbs only. So for this type of

verbal noun, classifying along other forms does not generate any new type.


3.4 Construction Rules for Verbal Nouns

We have mentioned in Section 2.2.3, there are eight types of verbal nouns. These are

gerund, active participle, hyperbolic participle, passive participle, resembling participle,

utilitarian noun, locative noun and comparative participle. We have developed construc-

tion rules for active participle, passive participle, locative noun and comparative participle

derived from Form I strong root verb i.e. strong triliteral root verb which has no extra

character. We have found that for other categories of verbal nouns, we have to give

exhaustive lexical entry.

All eight types of verbal noun are derived from strong Form I root verb. On the other

hand, only gerund, active participle and passive participle are derived from quadriliteral

root verbs or weak verbs. Also, the derivation pattern is not so regular. So, it requires

further research.

3.4.1 Active Participle

A sample AVM for an active participle is shown in Figure 3.6. All features of this AVM

have been discussed before. In this example, the event frame is the write-fr which denotes

write frame.

Throughout this whole formalism, we use the event frame for verb and verbal nouns

to capture their semantic content efficiently. This event frame takes a event or situational

index variable (SIT) and index-valued features such as actor, undergoer, instrument,

location. In case of write-fr, this event frame contains three indices: one for action or

event (SIT), another for the actor (ACTOR) and the last one is for undergoer of the

action(UNDGR) i.e. the object of the verb.

We do not store this AVM as a lexical entry. Rather, this AVM is recognized from the

AVM in Figure 3.7 by our lexical construction rules.

The construction rule in Figure 3.8 does this job. As we use the SBCG version of

+


kaatibun-form-I-triliteral-sound-active-participle-lex

morph

root

k, t, b

stem

k, a, a, t, i, b,u,n

arg-st hi

noun

case nominative syn

cat

def no

select none

xarg none

lid none

val hi

mrkg none

person 3rd

number sg index i gender masc

sem

hum yes

* write

-fr

frames

sitactor i

situation writing

Figure 3.6: AVM for active participle


kataba-form-IA-triliteral-sound-active-

perfect-3rd-sg-masc-verb-lex phon [] morph

root

k, t, b

stem

k, a, t, a, b, a

syn

cat

noun

* case accusative

arg-st

1 sem

opt −

+

index j

verb

cat

vform perf ect voice active

mood indicative syn

select none xarg none lid none val 1 mrkg

nonesit-index s

situation writing

write-fr

s

sit

person 3rd

sem number sg

*actor i + frames gender masc undgr j location k

ins

m

hum yes

Figure 3.7: AVM for a sample root verb

+

* syn

+


HPSG, the construction rule contains two parts: MTR which contains the AVM of the

verbal noun and DTRS which contains the AVM of the base verb. This rule demonstrates

how a Form I triliteral sound active participle is recognized from the lexeme of Form I

triliteral sound root verb.form-I-triliteral-sound-active-participle-lex-cxt

form-I-triliteral-sound-active-participle-lex

morph

"stem

1 , a, a, 2 , i, 3 ,u,n

#

arg-st hi

syn

cat

noun

mtr

case nominative val hi

index i

sem

frames

event-fr

*

sit s

actor i

form-I-triliteral-sound-active-perfect-3rd-sg-masc-verb-lex

morph

"stem

1 , a,

2 , VOWEL,

# 3 ,a

cat

verb dtrs vform perf ect

voice active

sit-index s

event-fr

sem * + frames

sit s

actor i

Figure 3.8: Lexical rule for active participle construction

The construction rule contains three placeholders for the three root letters. Thus from

this construction rule, an active participle generated from letters ‘k’, ‘t’ and ‘b’ or ‘n’, ‘s’

and ‘r’ can be recognized. In other words, ‘kaatibun’, ‘naasirun’, ‘saajidun’ or ‘saamiun’


all theres active participles can be recognized.

There is no difference between constructing an active participle from a sound triliteral

Form IB−IF verb and a sound triliteral Form IA verb. This is denoted in the Figure 3.8

by the VOWEL variable positioned just after the middle place holder in the daughter

AVM. The distinction between all the subtypes of Form I verb is reflected by this vowel

in perfect form. As variation of vowel in this position has no impact on active participle

formation, Form I verb may contain any of the three vowel letter in this position.

Rule shown in Figure 3.8, also shows how semantic information are propagated from

root verb to active participle. The content of event frame is same in mother and daughter.

The only change in SEM feature is the semantic index. The actor index, (i) of the event

frame becomes the semantic index in active participle and event index, (s) of the event

frame is the semantic index in the verb lexeme. Thus, semantic information of active

participle are successfully derived from the root verb. The syntactic information is fixed

from the vowel pattern. This process of derivation is same for other verbal nouns too.

Note that, we have derived active participle stem. To use it in a sentence, we need other

construction rule which caputes the inflection of noun.

The construction of the active participle from Form I verb is most regular. Construc-

tions from other verb forms are complex and the derviation pattern is not regular. Thus,

it requires further analysis.

3.4.2 Passive Participle

Like that of the active participle, the construction of the passive participle from Form I

triliteral sound root verb is simple. There is just one pattern for its construction from

Form I triliteral sound root verb. So for all Form I subtypes, the construction rule of

Figure 3.9 will be applicable. Derivation from other forms of verbs is complex and not

regular. For some forms this type of participle does not exist either, which requires further

analysis.

+

*


form-I-triliteral-sound-passive-participle-lex-cxt

form-I-triliteral-sound-passive-participle-lex

morph

"stem

m, a, 1 , 2 , u, u, 3 ,u,n

#

arg-st hi

syn

cat

noun

mtr

case nominative

val hi

index j

sem

frames

event-fr

*

sit s

undgr j

form-I-triliteral-sound-passive-perfect-3rd-sg-masc-verb-lex

morph

"stem

1 , a,

2 , VOWEL,

# 3 ,a

syn

cat

noun

* case accusative+ arg-st 4

opt −

dtrs

sem

index j

verb

+ cat vform perf ect syn

voice active val 4

sit-index s

event-fr

sem *frames + sit s

undgr j

Figure 3.9: Lexical rule for passive participle construction


The verbs from which passive participles are derived should be transitive. For this

reason, in the AVM of the DTR, the ARG-ST feature is not empty and its semantic index,

(j) is co-indexed with the undergoer index in the event-fr. Note that the ARG-ST of the

DTR contains one sign for object only, and it is in accusative case. It does not contain

any sign for the actor. This is because, in Arabic, the actor is implicitly mentioned in

the verb and the verb does not syntactically require the actor. If a subject is explicitly

mentioned in the sentence, it can be parsed by phrasal construction rule.

Like active participle, here semantic information is derived from root verb. The un-

dergoer index, (j) in event frame of root verb becomes the semantic index of passive

participle. The VOWEL variable in STEM of passive participle works same as it works

in active participle construction.

As an example, the verb (‘kataba’) shown in Figure 3.7 is a transitive verb. So, from

this verb lexeme, we can recognize the passive participle (‘maktuubun’) shown in Figure

3.10.

3.4.3 Locative Noun

A locative noun can be generated from triliteral Form I root verbs only. There are

two patterns of derivation, and which pattern will be used for derivation is predictable.

Locative noun generated from Form IA, IC, ID, IE and IG root verbs use same pattern

where Form IB and IF use another pattern. For this reason, locative noun is of two types

- Form IA locative noun and Form IB locative noun. Figure 3.7 shows AVM of Form IA

root verb (‘kataba’). The locative noun (‘maktabun’) derived from this verb is shown in

Figure 3.11.

Being of same pattern, one construction rule shown in Figure 3.12 captures the deriva-

tion of locative noun derived from verb form IA, IC, ID and IE. Like construction rules

of active and passive participle, the syntactic information is derived from vowel pattern

of lexeme and semantic information is derived from root verb. Thus, the location index


maktuubun-form-I-triliteral-sound-passive-participle-lex

morph

root

k, t, b

stem

m, a, k, t, u, u, b,u,n

arg-st hi

noun

case nominative

syn

cat

def no

select none xarg none

lid none

val hi

mrkg none

person 3rd number sg index j gender masc

sem

hum no

write-fr *sit s situation writing +

frames

actor

iundgr

j

Figure 3.10: AVM for passive participle

k


maktabun-form-IA-triliteral-sound-locative-noun-lex

morph

root

k, t, b

stem

m, a, k, t, a, b, u, n

arg-st hi

noun

case nominative

syn

cat

def no


lid none

val hi

mrkg none

person 3rd

index

number sg

gender masc sem

hum no

write-fr *sit s situation writing +

frames

actor ilocation

k

Figure 3.11: AVM for Form IA locative noun

+


(j) in the event frame of root verb, becomes the semantic index in the locative noun. form-IA-triliteral-sound-locative-noun-lex-cxt

form-IA-triliteral-sound-locative-noun-lex

morph

"stem

m, a, 1 , 2 , a, 3 ,u,n

#

arg-st hi

syn

cat

noun

mtr

case nominative

val hi index j

sem

frames

event-fr*

sit s location j

form-IA-triliteral-sound-locative-perfect-3rd-sg-masc-verb-lex morph

"stem

1 , a,

2 , a,

# 3 ,a

syn

cat

verb

* vform perf ect+

dtrs

voice active

sit-index

s

event-fr

sem

* + sit s frames

actor i

location j

Figure 3.12: Lexical rule for the locative noun construction from Form IA sound root verb

Similarly, Figure 3.13 shows AVM of Form IB root verb ‘sajada’. The locative noun

(‘masjidun’) derived from this verb is shown in Figure 3.14.

As mentioned above, locative noun generated from Form IB and IF verb has same

pattern. Thus, one construction rule captures the derivation from both of these two types

of verb. The construction rule is shown in Figure 3.15. This is same as construction rule

*


sajada-form-IB-triliteral-sound-active-perfect-

3rd-sg-masc-verb-lex

phon []

morph

root

s, j, d

stem

s, a, j, a, d, a

syn

cat

noun

* case accusative+ arg-st 1

opt −

sem

index j verb

cat

vform perf ect

voice active

mood indicative syn

select none

xarg none

lid none

val 1 mrkg none sit-index s

situation prostration write-fr sit s

sem

person 3rd

+

number sg frames

actor i gender masc

undgr j

location khum

yes

Figure 3.13: AVM for a Form IB root verb

k


masjidun-form-IB-triliteral-sound-locative-noun-lex

morph

root

s, j, d

stem

m, a, s, j, i, d, u, n

arg-st hi

noun

case nominative

syn

cat

def no


lid none

val hi mrkg none

person 3rd

index

number sg

gender masc sem

hum no

prostrate-fr *sit s situation prostration +

frames

actor ilocation

k

Figure 3.14: AVM for Form IB locative noun

+


form-IB-triliteral-sound-locative-noun-lex-cxt except the vowel pattern in the mother i.e.

the vowel patter in locative noun and the sort description of the daughter. form-IB-triliteral-sound-locative-noun-lex-cxt

form-IB-triliteral-sound-locative-noun-lex

morph

"stem

m, a, 1 , 2 , i, 3 ,u,n

#

arg-st hi

syn

cat

noun

mtr

case nominative

val hi index j

sem

frames

event-fr*

sit s location j

form-IB-triliteral-sound-locative-perfect-3rd-sg-masc-verb-lex morph

"stem

1 , a,

2 , a,

# 3 ,a

syn

cat

verb

* vform perf ect+

dtrs

voice active

sit-index

s

event-fr

sem

* + sit s frames

actor i

location j

Figure 3.15: Lexical rule for locative noun construction from Form IB sound root verb

3.4.4 Comparative Participle

Figure 3.17 shows AVM for comparative participle ‘aktabu’. It is derived from root verb

‘kataba’ shown in Figure 3.7. We have introduced a new semantic frame compare-fr


inspired by the analysis of Farkas, et.al. [14]. This frame has three features. The first

feature is COMPARED which contains the index for the object that we want to compare.

The second feature is COMPAREWITH. This feature contains the index for the object

with which we want to compare. The last feature, DIMENSION, is the dimension of

comparison. This dimension is actually a SIT-INDEX. This situational index must be

co-indexed with the situational index of the verb lexeme from which this participle is

derived.

Figure 3.17 shows the construction rule for comparative participles. This participle

has an optional syntactic requirement, which is contained in the ARG-ST feature. The

case of the required sign must be genitive. Its semantic index is co-indexed with the

index of “COMPAREWITH” in compare-fr. At the same time, the situational index

of “DIMENSION” in compare-fr, must be co-indexed with the SIT-INDEX of the verb

lexeme. From, this rule what we can say is that - comparative participle expresses the

comparision of two things from the verb dimension.

3.4.5 Other Types of Verbal Noun

The constructions of the remaining four types of verbal nouns are complex and we cannot

resolve these by construction rules. We have to give the lexical entries for these verbal

nouns individually.

Each verb form has a gerund that uses the most unpredictable pattern. Modeling its

construction rule is a vast area of research. For now we can only list lexical entries for

all gerunds individually. Figure 3.18 shows a lexical entry for gerund ‘kitaabatun’ which

means writing.

Hyperbolic participles are generated only from triliteral sound Form I root verbs. But

not all verbs possess a corresponding hyperbolic participle. There are eleven patterns

for deriving hyperbolic participles from verbs. However, we cannot predict from the root

letters which of these eleven patterns will be used; neither can we infer the existence

+

*


aktabu-form-I-triliteral-sound-comparative-participle-lex

morph

root

k, t, b

stem

a, k, t, a, b, u

syn

cat

noun

+

agr-st 1 case genitive

opt

+

sem

index j

noun

case nominative syn

cat

def no select none

xarg none

lid none

val 1

mrkg none person 3rd

number sg

index i gender masc sem

hum yes

compare-fr *write

-fr

compared i frames

sits

situation writing ,

comparewith j

actori dimension s

Figure 3.16: AVM for Form I comparative participle

*

*


form-I-triliteral-sound-comparative-participle-lex-cxt

form-I-triliteral-sound-comparative-participle-lex morph

"stem

a, 1 ,

2 , a,

#3 ,u

syn

cat

noun

+

arg-st 4 case genetive

opt +

sem

index j

noun

mtr

syn

cat

case nominative

val

4

personPERS

number sg

index i gender masc sem

hum HUM

event-fr

compare-fr

* i + compared frames sit s , , comparewith j

actor

i dimension s

form-I-triliteral-sound-comparative-perfect-3rd-sg-masc-verb-lex

morph

"stem

# 1 , a, 2 , VOWEL, 3 ,a

dtrs

syn

cat

verb

+

vform perf ect

voice active

sit-index s

sem * event-fr + frames

sit s

Figure 3.17: Lexical rule for comparative participle construction

+


kitaabatun-gerund-lex

morph

root

k, t, b

stem

k, i, t, a, a, b, a, t,u,n

arg-st hi cat

noun case nominative syn

def no

val hi

mrkg none

person 3rd

number sg index i gender f em

sem

hum no

* write-fr

framessit s situation writing

action

i

Figure 3.18: Sample lexical entry for ‘kitaabatun’ gerund

+


of a hyperbolic participle for the given root letter. So we have to list a lexical entry

for each of these hyperbolic participles. Figure 3.19 shows a sample lexical entry for

hyperbolic participle ‘kattaabun’ which means the person who writes a lot. We have

used the modifier-fr frame to capture the modification information. The LEVEL feature

contains the value excessive which means that the actor does this action excessively. kattaabun-hyperbolic-participle-lex

morph

root

k, t, b

stem

k, a, t, t, a, a, b,u,n

arg-st hi

cat

noun

case nominative syn

def no

val hi

mrkg none

person 3rd


sem

hum yes

*write-fr

modifier-fr

framessit s situation writing , arg i

actor i

level excessive

Figure 3.19: Sample lexical entry for ‘kattaabun’ hyperbolic participle

Resembling Participles are similar to hyperbolic participles. These are generated only

from triliteral sound FORM-I root verbs. There exist a large number of derivational

patterns in this case. So, it is not feasible to formulate a lexical construction rule for

these nouns. Thus in this case we also need to give the lexical entries. Figure 3.20 shows

the lexical entry for ‘katiibun’ which means a person who always writes. Like hyperbolic

participle, here we have used the modifier-fr frame to capture its information as a modifier.

Unlike hyperbolic participle, the value of LEVEL feature is intrinsic which capurtes its

difference with hyperbolic participle.

+


katiibun-resembling-participle-lex

morph

root

k, t, b

stem

k, a, t, i, i, b,u,n

arg-st hi

cat

noun

case nominative syn

def no

val hi

mrkg none

person 3rd


sem

hum yes

*write-fr

modifier-fr

framessit s situation writing , arg i

actor i

level intrinsic

Figure 3.20: Sample lexical entry for ‘katiibun’ resemble participle

*


Utilitarian Nouns are also generated from triliteral sound Form I root verbs only.

There are four patterns of derivation. For a given set of root letters it is unpredictable

which pattern will be used. For this reason, despite the limited number of patterns, we

have to list the lexical entries exhaustively. Figure 3.21 shows a lexical entry for utilitarian

noun ‘miktabun’ which means instrument for writing.miktabun-utilitarian-noun-lex

morph

rootstem

k, t, b

arg-st hi

m, i, k, t, a, b,u,n

cat

noun

case nominative syn

def no

val hi

mrkg none

person 3rd

number sg index k gender masc

hum no

sem

write-frs

frames

sitactor iundgr jinstrumen

t

k

situation writing +

Figure 3.21: Sample lexical entry for ‘miktabun’ utilitarian noun

Chapter 4

TRALE Implementation

We have implemented the HPSG formalism described in previous chapter in TRALE (An

extension of the Attribute Logic Engine) compiler. Here, Section 4.1 gives an introduc-

tion about the TRALE system. Then Section 4.2 discusses the necessary components

of TRALE compiler. Finally, Section 4.3 discusses the methodology that we follow to

implement the HPSG formalism in TRALE.

4.1 Introduction to TRALE

TRALE is a lexical rule compiler. It integrates phrase structure parsing, semantic-head-

driven generation and constraint logic programming with typed feature structures as

terms. It is responsible for compiling feature structure descriptions into Prolog code. It

is descendent of two other compiler Attribute Logic Engine (ALE) and Contoll [30, 34].

Both, of these compilers are designed based on the formalism of HPSG 87 [32]. With

Gerald Penn being the chief developer of TRALE, TRALE inherited the core of the ALE

system, but with the underlying logic specialized to the case where it becomes a logic in

the tradition of HPSG 94 [33].

We have decided to use TRALE for our implementation. We have the other alternative

CHAPTER 4. TRALE IMPLEMENTATION

i.e. LKB. Both, HPSG and LFG grammars can be implemented in LKB. But TRALE was

solely developed to capute the HPSG grammar and it was developed aftr LKB. For this

reason, we have decided to use TRALE. Before choosing the implementation platform,

we have also read the comparison between TRALE and LKB presented by Nurit [27].

We use TRALE on Grammix operation system version of June 01, 2007 [28]. This

is the older version of TRALE. There is another new version of TRALE which is not

complete but can be run stand alone on Linux platform. This new version was published

on 2008 [34]. Grammix is developed for grammar development. It contains two complete

grammar development systems - TRALE and LKB. Its TRALE system was last updated

on May 31, 2007.

4.2 Necessary Components for Implementation

TRALE has two major component files- signature file and theory file, an I/O console and a

Graphical Interface (GRISU) which shows output of AVM and type hierarchy graphically.

4.2.1 Signature File

This file contains the type hierarchy of features in HPSG. This file also contains the

description of function features i.e. the features that constitute a function feature. This

file does not have any extension. This file is called from theory file. The hierarchy in this

file is maintained by specific spacing. A feature in following line with three blank spaces

indicates an immediate child. Figure 4.1 shows some sample lines of the signature file.

The bot feature is always at the top of this hierarchy. TRALE requires this feature

at top of all features. The constituents of function features are specified at the same line.

Here sign and all of its children are function features. The constituents of sign feature

are listed in the same line.

In signature file, multiple inheritance is marked by &. That is, mentioning & before


type_hier

archy bot

sign phon:ne_list morph:morph

arg_st:list syn:syn sem:sem lexeme

noun_lex. . .

. . ..

active_participle_le

x

trilateral_root_deri

ved_ap_lex

formI_sound_trilat

eral_ap_lex

sound_root_derived

_ap_lex

&formI_sound_trila

teral_ap_lex

formI_derived_ap_le

x

&formI_sound_trilateral_ap_lex

Figure 4.1: Signature file

any type means that this type is also mentioned in another place of this file and in that

place, it is a child of another type.

4.2.2 Theory File

This file is composed of SWI Prolog code and this file must have pl extension. This is

the starting point of TRALE compiler. It loads the signature files and additional prolog

files. This file along with other prolog files contains the lexical entries and construction

rules. Figure 4.2 shows the lexical entry for the root verb ‘kataba’.

Detail lexical property of a lexicon is entered after ˜˜>. This entry in TRALE file is

very much similar to AVM of root verb ‘kataba’ shown in Figure 3.7.

Figure 4.3 shows part of the lexical construction rule.

Detail of construction rule starts after ##. In lexical construction rule, the daughter

comes first then the mother. Mother and daughter are separated by ∗∗ >. At the end


kataba ~~> (formIA_triliteral_root_verb_lex, (morph:(root:[k,t,b]),

arg_st:[(OBJ_SIGN,(syn:(cat:(case:acc,def:no)), sem:(index:OBJ_INDEX)))],syn:(cat:

(verb,

), val:[OBJ_SIGN]),

vform:perf, voice:active, mood:subjunctivesem:(sit_index:(SIT_INDEX, (situation:writing)),frames:[

(

))).

sit:SIT_INDEX, actor:(SUB_INDEX, (pers:third, num:sg, gen:male, hum:y)), undgr:OBJ_INDEX, location:LOC_INDEX)]

Figure 4.2: Sample lexical entry in theory file

trilateral-active-lex-cxt##(formI_triliteral_

root_verb_lex, (morph:

**>

(root:ROOTS ), syn:(. . .), sem:(sit_index:SIT_INDEX,

frames:[ (sit:SIT_INDEX,actor:SUB_INDEX)

] ) ))(formI_sound_trilateral_ap_lex, (

(morph:

morphs

(root:ROOT

S), syn:(. . .), sem:(index:SUB_INDEX,

frames:[ (sit:SIT_INDEX,actor:SUB_INDEX)] ) ) ))

(X,a,Y,a,Z,a) becomes (X,a,a,Y,i,Z,u,n), (X,a,Y,u,Z,a) becomes (X,a,a,Y,i,Z,u,n), (X,a,Y,i,Z,a) becomes (X,a,a,Y,i,Z,u,n).

Figure 4.3: Sample lexical construction rule in theory file


of lexical construction rule, morphs keyword is used to incorporate the morphological

operation done by this construction rule. The change of STEM feature from daughter to

mother is captured by the command - morphs DTR becomes MTR.

4.2.3 Input and Output System

TRALE has an I/O console in EMACS editor. The output for AVM and type hierarchy

are graphically shown in a Graphical Interface which is called GRISU. The I/O console

takes input to compile signature and theory file and show compiled output. It also takes

phrases and sentences which are to be recognized by the grammar mentioned in signature

and theory file. Commands can be issued in this console to show GRISU output.

GRISU gives very nice and attractive pictorial definition of AVM, construction rule

and type hierarchy. The order of features in GRISU AVM can be configured from theory.pl

file in the following manner -

>>>phon.phon <<< morph.

Here phon <<< morph denotes MORPH feature preceeds ARG−ST feature in GRISU

AVM and >>>phon denotes PHON is at the top position in the feature listing. The

GRISU output for AVM of root verb ‘kataba’ in Figure 3.7 is shown in Figure 4.4.

Similarly, GRISU output for type hierarchy of active participle in Figure 3.5 is shown

in Figure 4.5.

4.3 Implementation Methodology

Our Implementation can be divided in the following steps -

1. We have translated the SBCG type hierarchy in Figure 3.4 and Figure 3.5 into

signature file.


Figure 4.4: GRISU output for AVM of ‘kataba’ root verb.

Figure 4.5: GRISU output for type hierarchy of active participle.


2. We have given the description of function feature in signature file.

3. We have loaded the signature file from theory file and tested whether the hierarchy

and function feature description are correct or incorrect by view the GRISU outputs

like Figure 4.5.

4. We have mentioned the orders of features at the beginning of theory file by using

>>> and <<< operator.

5. We have given the lexical entry for all types (Form IA, IB, IC, ..., IG) of Form I

triliteral strong root verb. Then we have checked whether our entries are correct or

not by viewing the GRISU outputs like Figure 4.4.

6. We have given the lexical construction rules which is at the core of our implemen-

tation. Then in Input console, we have entered the verbal nouns which are derived

from the root verbs given as lexical entry. We have found these verbal nouns are

recognized by TRALE compiler from their root verbs and construction rules. Thus,

when we have entered ‘kaatibun’ to be recognized by the compiler, it gives the

GRISU output in Figure 4.6.

7. In the above manner, we have tested all the five construction rule proposed in

Section 3.4 for all types of Form I triliteral strong root verb.

Following these steps, we have successfully implemented the HPSG formalism. The

detail content of signature and theory file are provided in Appendix.


Figure 4.6: GRISU output of AVM for ‘kaatibun’ active participle.

Chapter 5

Conclusion

In this last chapter, we draw the conclusion of our thesis by describing the major contri-

butions made through this research followed by some directions for future research.

5.1 Summary of Contributions

The contributions that have been made in this thesis can be enumerated as follows:

• We have formulated a concrete AVM for Arabic noun and verb. We have made

the design robust so that it can not only handle lexical construction but also

phrasal construction. We have implemented it in TRALE. We have extended it to

capture the root pattern morphology.

• When a verbal noun stem is derived from a root verb, some syntactic and

semantic information is encoded in the derived stem. We have captured this

syntactic and semantic information. We have modified the INDEX feature for

Arabic to reference and incorporate semantic meaning.

• We have given concrete description of SIT-INDEX and show its differences with

INDEX. Before this no literature, show its use and distinction from INDEX.

CHAPTER 5. CONCLUSION

• We have articulated the type hierarchy of Arabic noun and placed the verb-derived

nouns and its subtypes in lexical type hierarchy. We have also provided the justifi-

cation of this placement and implemented it in TRALE platform.

• We have utilized the root pattern morphology. Thus we need minimal lexical

entry to resolve a lexicon. We have developed lexical construction rules for four

types of verbal noun (active participle, passive participle, locative participle and

comparative participle) derived from triliteral sound root verb. The other four

types of verbal noun need exhaustive lexical entry for each verb stem. We have

given the sample lexical entry for each of these stem. Some lexical construction

rules can be constructed which inflection of these verb stem for different gender and

number.

• We have implemented all the lexical type hierarchy, lexical entry and lexical

construction rules proposed in this thesis. Then we have verified these

construction rules by recognizing the verbal noun stems from their root verbs.

5.2 Future Directions for Further Research

Modeling a natural language is a big part of research. Ours is the starting of this massive

work. The following directions should be considered for best utilization of our research.

• Verbal noun derived from strong quadriliteral verb and weak verb should be

modeled. To accomplished this, we have to model the quadriliteral and weak

verbs as well.

• We have not developed any construction rules for four type of verbal nouns (Gerund,

Hyperbolic participle, Resembling participle and Utilitarian noun). But our inves-

tigation says, Hyperbolic participle and Utilitarian noun can be derived by con-

struction rules based on some specific root classes. For this, Arabic roots should be

classified into more granular level so that from those root classes different construc-

tion rules can be generated.

CHAPTER 5. CONCLUSION

• As per mentioned in Section 5.1, we have developed construction rules for verbal

noun stems. Other construction rules should be generated to capture the inflection

of these verbal noun stems. Some examples of inflecions are number, gender and

declension. Declension is a unique linguistic feature for Arabic. Noun and verb

lexemes are inflected and got different forms based on the case and mood in phrases.

• So far, we have only discussed with lexical construction rules. It is obvious that

to parse Arabic noun lexicon in a sentence, phrasal construction rules must be

constructed.

Bibliography

[1] Kenneth R. Beesley. Finite-State Morphological Analysis and Generation of Arabic

at Xerox Research: Status and Plans in 2001. In Proceedings of the Workshop on

Arabic Language Processing: Status and Prospects, Association for Computational

Linguistics, 2001.

[2] Md. Shariful Islam Bhuyan and Reaz Ahmed. An HPSG Analysis of Arabic Passive.

In Proceedings of the 11th International Conference on Computer and Information

Technology, 2008.

[3] Md. Shariful Islam Bhuyan and Reaz Ahmed. An HPSG Analysis of Arabic Verb.

In Proceedings of the 8th International Arab Conference on Information Technology,

2008.

[4] Md. Shariful Islam Bhuyan and Reaz Ahmed. Nonconcatenative Morphology: An

HPSG Analysis. In Proceedings of the 5th International Conference on Electrical and

Computer Engineering, 2008.

[5] Steven Bird and Ewan Klein. Phonological Analysis in Typed Feature Systems.

Computational Linguistics, 20:455–491, 1994.

[6] Joan Bresnan. The Mental Representation of Grammatical Relations. Cambridge,

MA, USA: MIT Press, 1982.

[7] Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 2.0. In Lin-

guistic Data Consortium, Philadelphia, PA, USA, 2004.

BIBLIOGRAPHY 74

[8] Noam Chomsky. Lectures on Government and Binding, 1981.

[9] Domenic Cipollone. Morphologically complex predicates in Japanese and what they

tell us about grammar architecture. In OSU Working Papers in Lingusitics 56, pages

1–52. Ohio State University, 2001.

[10] Ann Copestake, Dan Flickinger, Rob Malouf, Susanne Riehemann, and Ivan Sag.

Translation using minimal recursion semantics. In Proceedings of the 6th Interna-

tional Conference on Theoretical and Methodological Issues in Machine

Translation, Leuven, 1995.

[11] Ann Copestake, Dan Flickinger, Susanne Riehemann, and Ivan Sag. Minimal recur-

sion semantics: An introduction. Research on Language and Computation,

3(4):281–

332, 2006.

[12] Anthony R. Davis. Linking and the Hierarchical Lexicon. PhD thesis, Stanford

University, 1996.

[13] Anthony R. Davis. Linking by Types in the Hierarchical Lexicon. Chicago: University

of Chicago Press, 2001.

[14] Donka F. Farkas and Katalin E´ . Kiss. On the comparative and absolute readings

of superlatives. Natural Language and Linguistic Theory, 18(3):417–455, 2000.

[15] Gerald Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan A. Sag. Generalized Phrase

Structure Grammar. Chicago: University of Chicago Press, 1985.

[16] Georgia M. Green. Elementary principles of HPSG. In FIPS PUB, pages 140–1,

1999.

[17] Kais Haddar, Ines Zalila, and Sirine Boukedi. An HPSG parser generation with

the LKB for Arabic relatives. International Journal of Computing and Information

Sciences, 7(5):51–60, 2009.

BIBLIOGRAPHY

[18] Md. Sadiqul Islam, Mahmudul Hasan Masum, Md. Shariful Islam Bhuyan, and Reaz

Ahmed. An HPSG Analysis of Declension in Arabic Grammar. In Proceedings of the

9th International Arab Conference on Information Technology, 2009.

[19] Md. Sadiqul Islam, Mahmudul Hasan Masum, Md. Shariful Islam Bhuyan, and Reaz

Ahmed. Arabic Nominals in HPSG: A Verbal Noun Perspective. In Proceedings of

the 17th International HPSG Conference, Universit Paris Diderot, pages 158–178.

On-line: CSLI Publications, 2010.

[20] Mohtanick Jamil. Araboc verb paradigms. Website, 2003-2011.http://www.learnarabiconline.com/verbal-paradigms.shtml.

[21] Mohtanick Jamil. Definiteness. Website, 2003-2011.http://www.learnarabiconline.com/definiteness.shtml/.

[22] Mohtanick Jamil. Derived nouns. Website, 2003-2011.http://www.learnarabiconline.com/derived-nouns.shtml.

[23] Andreas Kathol. Agreement and the syntax-morphology interface in HPSG. In

Studies in contemporary phrase structure grammar, pages 223–274. UC Berkeley,

1999.

[24] Alain Kihm. Nonsegmental Concatenation: A Study of Classical Arabic Broken

Plurals and Verbal Nouns . Morphology, 16:69–105, 2006.

[25] Eugene E. Loos, Susan Anderson, Jr. Dwight H., Day, Paul C. Jordan,

and J. Douglas Wingate. Glossary of linguistic terms. Website, 2011.

http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/.

[26] Nurit Melnik. Verb-Initial Construction in Modern Hebrew. PhD thesis, university

of california, berkeley, 2002.

BIBLIOGRAPHY

[27] Nurit Melnik. From “hand-written” to computationally implemented HPSG theories.

In Proceedings of the 12th International HPSG Conference, University of Lisbon,

pages 311–321. On-line: CSLI Publications, 2005.

[28] Stefan Muller. Grammix. Website, 2007.http://hpsg.fu-berlin.de/Software/Grammix/.

[29] A.M. Mutawa, Salah Alnajem, and Fadi Alzhouri. An HPSG Approach to Arabic

Nominal Sentences. Journal of the American Society for Information Science and

Technology, 59(3):422–434, 2008.

[30] Gerald Penn and Mohammad Haji-Abdolhosseini. ALE Documentation. Website,2003. http://www.ale.cs.toronto.edu/docs/.

[31] Carl J. Pollard. Lectures on the foundations of HPSG, 1997.

[32] Carl J. Pollard and Ivan A. Sag. Information-based syntax and semantics. Stanford:

Center for the Study of Language and Information (CSLI), 1:262–267, 1987.

[33] Carl J. Pollard and Ivan A. Sag. Head-Driven Phrase Structure Grammar. Chicago:

University of Chicago Press, 1994.

[34] Frank Richter. Priliminary TRALE Page. Website, 2008.http://milca.sfs.uni-tuebingen.de/A4/Course/trale/.

[35] Susanne Z. Riehemann. Type-Based Derivational Morphology. Journal of Compar-

ative Germanic Linguistics, 2:49–77, 1998.

[36] Susanne Z. Riehemann. A Constructional Approach to Idioms and Word Formation.

PhD thesis, Stanford University, 2001.

[37] Karin C. Ryding. Modern Standard Arabic. Cambridge University Press, UK, 2005.

[38] Ivan A. Sag. Sign-Based Construction Grammar, chapter 2. Stanford University,

August 2010.

BIBLIOGRAPHY

[39] Ivan A. Sag and Thomas Wasow. Syntactic Theory: A Formal Introduction. Stanford

University Center for the Study, 1999.

[40] Otakar Smrzˇ. Functional Arabic Morphology. Formal System and Implementation.

PhD thesis, Charles University in Prague, 2007.

[41] Nathan Vaillette. Hebrew relative clauses in HPSG. In Proceedings of the 7th In-

ternational HPSG Conference, UC Berkeley (2223), pages 305–324. On-line: CSLI

Publications, 2000.

Glossary of Terms

Affix: is a bound morpheme that is joined before, after, or within a root or stem.

Agreement: refers to a formal relationship between elements whereby a form of one

word requires a corresponding form of another. It is also known as concord.

Arabic quadriliteral verb: is the Arabic verb which contains four consonants.

Arabic strong verb: is the Arabic verb which does not contain any vowel in its long

form.

Arabic triliteral verb: is the Arabic verb which contains three consonants. Arabic

weak verb: is the Arabic verb which contains any vowel in its long form. Arabic verb

form: In Arabic, from any particular sequence of root letters, up to fifteen different verb

stems may be derived, each with its own template or vowel pattern and semantic

information. These stems are called verb forms.

AVM: Attribute Value Matrix.

Bound morpheme: is a morpheme that never occurs by itself, but is always attached

to some other morpheme.

Concatenative morphology: is the process where bound morphemes are linearly con-

catenated.

Construct: is a formal linguistic representation of construction rule.

Construction: is an ordered arrangement of grammatical units forming a larger unit.

Construction rule: is the rules for constructing phrases or sentences.

Declension: is the process of disambiguating the grammatical roles of words by slightly

changing their end vowels. In Arabic, end vowel implies grammatical case for nominal

BIBLIOGRAPHY

and mood for verb.

Derivation: is the formation of a new word or inflectable stem from another word or

stem. It typically occurs by the addition of an affix. The derived word is often of a

different syntactic category from the original.

Free morpheme: is a morpheme that can occur by itself. However, other morphemes

such as affixes can be attached to it.

Gemination: is the consecutive double occurrence of an alphabet.

HPSG: Head-driven Phrase Structure Grammar developed by Ivan Sag and Pollard Sag

in 1994.

Inflection: is variation in the form of a word, that expresses a grammatical contrast

which is obligatory for the stems. It does not change the syntactic category of the word.

Lexeme: is the minimal unit of language which has a semantic interpretation and em-

bodies a distinct cultural concept. For example, in the English language, run, runs, ran

and running are forms of the same lexeme, conventionally written as run.

Lexical construction rule: deals with the forming of lexicon i.e. forming of words and

stems.

Lexical entry: is the entry of lexeme in the dictionary.

Lexicon: is its vocabulary, including its words and expressions. A lexicon is also a

synonym of the word thesaurus. It includes the lexemes used to actualize words. Gram-

matical rules are not considered part of the lexicon.

LFG: Lexical functional grammar developed by Joan Bresnan in 1982. Morpheme: is

the smallest meaningful unit in the grammar of a language.

Morphological process: is a means of changing a stem to adjust its meaning to fit its

syntactic and communicational context. There are two types of processes - concatenative

and nonconcatenative.

Morphology: is the study of word formation i.e. the internal structure of words.

Morphosyntactic operation: is an ordered, dynamic relation between one linguistic

form and another. Derivation and Inflection are morphosyntactic operations.

BIBLIOGRAPHY

Nonconcatenative morphology: is the process where bound morphemes are

nonlinearly concatenated.

Phonology: is the systematic use of sound to encode meaning.

Phrasal construction rule: is a construction rule that deals with the forming of

phrase and sentence.

Root: is the portion of a word that carries the principle portion of meaning of the

words in which it functions. It is common to a set of derived or inflected forms, if

any, when all affixes are removed. A root is a stem also.

Root class: is a set of roots, which share a common derivational and

inflectional paradigm.

SBCG: Sign-Based Construction Grammar. It is a variation of HPSG and proposed by

Ivan Sag in 2007.

Semantics: is the study of meaning. It typically focuses on the relation between

signifiers, such as words, phrases, signs and symbols.

Sign: is a formal linguistic representations of words, phrases as well as sentences.

All human utterances are captured by signs.

Stem: is the root or roots of a word, together with any derivational affixes, to

which inflectional affixes are added.

Syntactic category: is a set of words and/or phrases in a language which share a

significant number of common characteristics. It is also known syntactic class.

Syntax: is the study of the principles and rules for constructing phrases or

sentences. TRALE: is a lexical rule compiler specially developed for HPSG. It is an

extension of the Attribute Logic Engine.

Appendix A

In the Table 5.1, we give the romanized transliteration of Arabic alphabet.

Table 5.1: Transliteration Table of Arabic Alphabet

Arabic Letter Transliteration Arabic Letter Transliteration

❅❍✳❍✏❍✑❤✳❤♣❳❳✠P P

b

t

t¯ˇ

g

h

.

h˘d

d¯r

z

s

ˇs

➔➔✠➝➝✠➡✠✏➻➮ Ðà✠ð✆

.t

.z

,

g

˙

f

q

k

l

m

n

w

-

h

y

❺

Appendix B

Content of signature file -

type_hierarchy bot

listne_list hd:bot tl:list

e_listchar

ktbaiunrsjdmhsign phon:ne_list morph:morph arg_st:list

syn:syn sem:semword dtrs:list dtr:signlexeme

noun_lexderived_noun_lex

verbderived_noun_lexgerund_lexactive_participle_lex

trilateral_root_derived_ap_lexformI_sound_trilateral_ap_lex

sound_root_derived_ap_lex&formI_sound_trilateral_ap_lex

formI_derived_ap_lex&formI_sound_trilateral_ap_lex

hyperbolic_participle_lexpassive_participle_lex

trilateral_root_derived_pp_lexformI_sound_trilateral_pp_lex

sound_root_derived_pp_lex&formI_sound_trilateral_pp_lex

formI_derived_pp_lex&formI_sound_trilateral_pp_lex

resembling_participle_lexlocative_noun_lex

trilateral_root_derived_ln_lexformI_sound_trilateral_ln_lex

sound_root_derived_ln_lex&formI_sound_trilateral_ln_lex

formI_derived_ap_lex

BIBLIOGRAPHY

&formI_sound_trilateral_ln_lex formIA_sound_trilateral_ln_lex formIB_sound_trilateral_ln_lex

utilitarian_noun_lex comparative_lex

trilateral_root_derived_com_lex formI_sound_trilateral_com_lex

sound_root_derived_com_lex

&formI_sound_trilateral_com_lexformI_derived_com_lex

&formI_sound_trilateral_com_lexnounderived_noun_lexdual_noun_lexsg_noun_lexpl_noun_lex

verb_lextriliteral_root_verb_lex

formI_triliteral_root_verb_lexformIA_triliteral_root_verb_lexformIB_triliteral_root_verb_lexformIC_triliteral_root_verb_lexformID_triliteral_root_verb_lexformIE_triliteral_root_verb_lexformIF_triliteral_root_verb_lexformIG_triliteral_root_verb_lexformIH_triliteral_root_verb_lex

formII_root_verb_lex

quadriliteral_root_verb_lexmorph root:list stem:listsyn cat:cat val:list mrkg:mrkgcat case:case def:def mood:mood vform:vform voice:voice

nounverb

casenomaccgen

personfirstsecondthird

numbersgdualplural

situation

writingprostrationhelpinghonoursufficehearingdrinking

gendermalefemale

def

hum

yes no

y n

vform perf

imperf

voice active passive

BIBLIOGRAPHY

mood subjunctive indicative jussive

mrkg none that

lid selectsem index:index sit_index:sit_index frames:list index pers:person num:number gen:gender hum:hum sit_index situation:situationframe

event_fr sit:sit_index actor:index undgr:index location:indexcompare_fr compared:index comparedwith:index dimension:sit_index

ref_fr ref_index:cat

.

Content of theory.pl is -

%theory.pl% Multifile declarations.%:- multifile ’##’/2.%:- multifile ’~~>’/2.% load phonology and tree output:- [trale_home(tree_extensions)].% maximum 4 rules will be used for licensing.:-lex_rule_depth(4).% specify signature file signature(signature).

>>>phon.phon <<< morph.morph <<< arg_st.arg_st <<< syn.syn <<< sem.index <<< frames.sit_index <<< frames.sit <<< actor.actor <<< undgr.undgr <<< location.

%lexical entrykataba ~~> (formIA_triliteral_root_verb_lex,

(morph:(root:[k,t,b]),

arg_st:[(OBJ_SIGN,

(syn:(cat:(case:acc,def:no)),sem:(index:OBJ_INDEX)))],

syn:(cat: ,

(v

erb,

vform:perf, voice:active, mood:subjunctive)

val:[OBJ_SIGN]

BIBLIOGRAPHY

sem:(),sit_index:(SIT_INDEX, (

)), frames:[

situation:writing

( sit:SIT_INDEX, actor:(SUB_INDEX,

(

)), undgr:OBJ_INDEX,

pers:third,num:sg,gen:male,hum:y

location:LOC_INDEX

)]

))

).

nasara ~~> (formIA_triliteral_root_ver

b_lex, (morph:(root:[n,s,r]),

arg_st:[(OBJ_SIGN,


syn:(cat:

,

(verb, vform:perf, voice:active, mood:subjunctive)

sem:(val:[OBJ_SIGN]),sit_index:(SIT_INDEX, (

)), frames:[

situation:helping


(



location:LOC_INDEX

)]

))

).

BIBLIOGRAPHY 86

sajada ~~> (formIB_triliteral_root_ver

b_lex, (morph:(root:[s,j,d]),

arg_st:[(OBJ_SIGN,

(syn:(cat:(case:acc,def:no)),

sem:(index:OBJ_INDEX)

))],syn:

(cat:

,


sem:(val:[OBJ_SIGN]),sit_index:(SIT_INDEX, (

)), frames:[

situation:prostration


(



location:LOC_INDEX

)]

))

).

%lexical construction rulestrilateral-active-lex-cxt## (

formI_triliteral_root_verb_lex, (morph:

syn:

sem:

(), (cat:

), (

root:ROOTS


sit_index:SIT_INDEX,frames:[

(sit:SIT_INDEX,actor:SUB_INDEX)]

BIBLIOGRAPHY

))

)**>(

formI_sound_trilateral_ap_lex,((morph:

(),

arg_st:[], syn:

root:ROOTS

(cat:

, val:[],

(noun, case:nom, def:no)

sem:(

)))

)morphs

mrkg:none),index:SUB_INDEX, frames:[

(sit:SIT_INDEX,actor:SUB_INDEX)]

(X,a,Y,a,Z,a) becomes (X,a,a,Y,i,Z,u,n),(X,a,Y,u,Z,a) becomes (X,a,a,Y,i,Z,u,n),(X,a,Y,i,Z,a) becomes (X,a,a,Y,i,Z,u,n).

trilateral-passive-lex-cxt## (

formI_triliteral_root_verb_lex, (morph: (

root:ROOTS),arg_st:[(OBJ_SIGN,


syn:(cat:

,


sem:(

val:[OBJ_SIGN]),sit_index:SIT_INDEX, frames:[

(sit:SIT_INDEX, undgr:OBJ_INDEX)]

))

BIBLIOGRAPHY

)**>( formI_sound_trilateral_p

p_lex, (morph: (

root:ROOTS),arg_st:[],syn:

(cat:

, val:[],


sem:(

))morphs

mrkg:none),index:OBJ_INDEX, frames:[

(sit:SIT_INDEX, undgr:OBJ_INDEX)]

)

(X,a,Y,a,Z,a) becomes (m,a,X,Y,u,u,Z,u,n),(X,a,Y,u,Z,a) becomes (m,a,X,Y,u,u,Z,u,n),(X,a,Y,i,Z,a) becomes (m,a,X,Y,u,u,Z,u,n).trilateral-locative-formIA-lex-cxt## (

formIA_triliteral_root_verb_lex, (morph:

syn:

sem:

(), (cat:

), (

root:ROOTS



))**>(

(sit:SIT_INDEX,location:LOC_INDEX)]

)

formIA_sound_trilateral_ln_lex,(morph:

(root:ROOTS

BIBLIOGRAPHY

), arg_st:[], syn:

(cat:

, val:[],


sem:(

))morphs

mrkg:none),index:LOC_INDEX, frames:[

(sit:SIT_INDEX, location:LOC_INDEX)]

)

(X,a,Y,a,Z,a) becomes (m,a,X,Y,a,Z,u,n).trilateral-locative-formIB-lex-cxt## (

formIB_triliteral_root_verb_lex, (morph:

syn:

sem:

(), (cat:

), (

root:ROOTS



))**>(

(sit:SIT_INDEX,location:LOC_INDEX)]

)

formIB_sound_trilateral_ln_lex,(morph:

(),

arg_st:[], syn:

root:ROOTS

(cat:

,


BIBLIOGRAPHY

sem:(

))morphs

val:[], mrkg:none),index:LOC_INDEX, frames:[

(sit:SIT_INDEX, location:LOC_INDEX)]

)

(X,a,Y,a,Z,a) becomes (m,a,X,Y,i,Z,u,n).

trilateral-comparative-formI-lex-cxt## (

formI_triliteral_root_verb_lex, (morph:

syn:

sem:

(), (cat:

), (

root:ROOTS



))**>(

(sit:SIT_INDEX)]

)

formI_sound_trilateral_com_lex,(morph:

(root:ROOTS

),arg_st:[(OBJ_SIGN,

(syn:(cat:(case:gen,def:no)),sem:(index:OBJ_INDEX)))],

syn:(cat:

, val:[],


sem:(

mrkg:none),index:SUB_INDEX, frames:[

(sit:SIT_INDEX,

BIBLIOGRAPHY

actor:(SUB_INDEX2,( pers:PERS, num:sg, gen:male, hum:HUM))

))morphs

), (dimension:SIT_INDEX, compared:SUB_INDEX2, comparedwith:OBJ_INDEX)]

)(X,a,Y,a,Z,a) becomes (a,X,Y,a,Z,u),(X,a,Y,u,Z,a) becomes (a,X,Y,a,Z,u),(X,a,Y,i,Z,a) becomes (a,X,Y,a,Z,u).

Arabic Nominals in HPSG A Verbal Noun Perspective

Documents

Transcript of Arabic Nominals in HPSG A Verbal Noun Perspective