Arabic Nominals in HPSG A Verbal Noun Perspective
-
Upload
zareen-aabedin -
Category
Documents
-
view
219 -
download
2
description
Transcript of Arabic Nominals in HPSG A Verbal Noun Perspective
Arabic Nominals in HPSG: A Verbal Noun PerspectiveAbstract
Semitic languages exhibit rich nonconcatenative morphological operations, which can gen-
erate a myriad of derived lexemes. Especially, the feature rich, root-driven morphology
in the Arabic language demonstrates the construction of several verbal nouns such as
gerunds, active participles, passive participles, locative nouns, etc. To capture this rich
morphology by natural linguistic processing, the best choice can be Head-driven Phrase
Structure Grammar (HPSG). It combines the best ideas from its predecessors and inte-
grates all linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) of
natural language processing. Although HPSG is a successful syntactic theory, it lacks the
representation of complex nonconcatenative morphology. In this work, we propose a novel
HPSG representation which includes the morphological, syntactical and semantic features
for Arabic nominals and various verbal nouns. We also present the lexical type hierarchy
and derivational rules for generating these verbal nouns using the HPSG framework. Fi-
nally, we have implemented the lexical type hierarchy, Attribute Value Matrix (AVM) and
construction rules in the TRALE (An extension of the Attribute Logic Engine) platform
to validate the proposed HPSG formalism.
Chapter 1
Introduction
Head-driven Phrase Structure Grammar (HPSG) is an attractive tool for capturing com-
plex linguistic constructs. It combines the best ideas from its predecessor - Generalized
phrase structure grammar (GPSG) [15], Lexical functional grammar (LFG) [6], Govern-
ment and binding theory (GB) [8]. It is very suitable for natural language processing as
it integrates the essential linguistic layers (Phonology, Morphology, Syntax, Semantics,
Context etc.) of natural language processing. It is also flexible to modify for specific
language.
1.1 Motivation
Semitic languages like Arabic, Amharic and Hebrew, exhibit rich nonconcatenative mor-
phological operations for construction of lexicons. We can have a large coverage of vo-
cabulary in these languages by computational linguistic modeling of their morphology.
Among these Semitic languages, we have chosen Arabic for nonconcatenative morpholog-
ical analysis. It is the best instance of nonconcatenative morphology among the living
languages. More than two hundred and eighty million people speak in this language as a
first language and it is official language of twenty two countries. It ranks fifth by number
CHAPTER 1. INTRODUCTION
of native speakers. Despite these facts, the morphological analysis of Arabic language is
a relatively new area of research. It is also the intellectual and liturgical language of the
Islamic World.
1.2 Scope of the Work
The HPSG analysis for nonconcatenative morphology in general and for Semitic languages
in particular are relatively new. However, the intricate nature of Arabic morphology
motivated several research projects addressing the issues [1, 7, 40]. HPSG representations
of Arabic verbs and morphologically complex predicates are discussed in [2–4]. An in-
depth analysis of declensions in Arabic nouns has been presented in [18]. The diversity
and importance of Arabic nominals is broader than that of their counterparts in other
languages. Modifiers, such as adjectives and adverbs, are treated as nominals in Arabic.
Moreover, Arabic nouns can be derived from verbs or other nouns. Derivation from verbs
is one of the primary means of forming Arabic nouns, for which no HPSG analysis has
been conducted yet.
Arabic nouns can be categorized based on several dimensions like derivation (derived
from verb or noun), ending type (sound ending or weak ending), declension (declinable or
indeclinable), etc. Based on derivation, Arabic nouns can be divided into two categories
as follows:
1. Non-derived nouns: These are not derived from any other noun or verb.
2. Derived nouns: These are derived from other nouns or verbs.
☛An example of a non-derived, static noun is à❆➆❦☛ (h. i.sanun - which means “horse”):
it is not derived from any noun or verb and no verb is generated from this word. On the
✔ ✏ ☛other hand, ■✳
❑☛ ❆➾ (katibun - which means “writer”) is an example of a derived noun. This
CHAPTER 1. INTRODUCTION
✏word is generated from the verb
■✳
❏➺ (kataba ) which means “He wrote” in English.
This
simple example provides a glimpse of the complexity of the derivational, nonconcatenative
morphology for constructing a noun from a verb in Arabic. In this work, we analyze and
propose the HPSG constructs required for capturing the syntactic and semantic effects of
this rich morphology.
An HPSG formalization of Arabic nominal sentences has been presented in [29]. The
formalization covers seven types of simple Arabic nominal sentences while taking care of
the agreement aspect. In [24], an HPSG analysis of broken plural and gerund has been
presented. Main assumption in that work revolves around the Concrete Lexical Repre-
sentations (CLRs) located between an HPSG type lexicon and phonological realization.
But in that work the authors have not addressed other forms of verbal nouns including
participles.
In this work, we analyze all type of verbal noun generated from strong (or sound)
triliteral root verb. We analyze their derivation from verb, their syntactic and semantic
information. We do not analyze derivation of any type of verbal noun generated from
strong quadriliteral or weak verb. Because, All eight types of verbal nouns are derived
from strong triliteral root verb and these derivations follow regular patterns. On the
other hand, the pattern of derivation from quadriliteral or weak verb is not so regular.
So, analyzing their derivations need more effort. Moreover, most of the time maximum
three types of verbal nouns are derived from these type of root verbs.
1.3 Contribution
Our contributions towards the HPSG analysis of Arabic nouns presented in this disserta-
tion are as follows:
• We formulate the structure of Attribute Value Matrix (AVM) for Arabic noun
and extend the AVM for Arabic verb proposed in [2]. We make this design robust so
that
CHAPTER 1. INTRODUCTION
it can handle not only lexeme and word construction but also phrase and sentence
construction.
• We capture the syntactic and semantic effects of Arabic morphology.
• We determine the placement of verbal nouns and its subtypes in lexical type
hier- archy with proper justification.
• Generally, Arabic morphology is root pattern morphology. Different lexemes
can be generated from same root, using different patterns. We utilize this root
pattern morphology to design lexical rules to avoid the requirement of exhaustive
lexical entry for four types of verbal noun derived from all strong triliteral root
verbs. As a result, hundreds of verbal nouns can be recognized by barely
associating the root verbs with set of lexical rules applicable for that root verbs.
Thus, lexical entry in the dictionary is very much optimized.
• We implement the designed AVM, type hierarchy and lexical rules in TRALE
(An extension of the Attribute Logic Engine) [34] which is a freeware system
developed in prolog and integrates phrase structure parsing, semantic-head-driven
generation and constraint logic programming with typed feature structures as term.
1.4 Organization of Rest of the Dissertation
Chapter 2 gives a background by explaining the linguistic concepts and necessary tools.
It discusses about several linguistic topics ranging from morphology, syntax to semantics.
Then it provides a sketch of Arabic grammar, mainly the morphology associated to its
word construction. Next, it gives a brief introduction about HPSG, the mathematical
theory of languages used in our thesis. At the last part of this chapter, a detail discussion
is presented on related works done so far.
Chapter 3 presents our contribution to the development of a generic structure of the
CHAPTER 1. INTRODUCTION
Attribute Value Matrix of Arabic noun. It also describes the type hierarchy of Arabic noun
and its subtypes based on derivation dimension. Next, it discusses about the construction
rules for four type Arabic verbal nouns derived from strong triliteral root verb. It also
designs the lexical entry for other four types of verbal noun which do not follow rigorous
regular patterns.
Chapter 4 gives a brief description of TRALE lexical compiler. Then, it shows neces-
sary components of TRALE and how we implement our HPSG formalism using TRALE.
Finally, Chapter 5 gives the conclusion. In this chapter, we gives the concrete con-
tribution of our work from a technical point of view. We finish this chapter by giving
direction for further research on this topics.
Chapter 2
Background and Related Works
The topics discussed in this chapter serve as a background of the rest of the thesis. In
Section 2.1 we explain some theoretical linguistic which is necessary to develop linguistic
models. Section 2.2 gives an introduction of morphology and more specifically morphology
in Arabic language and its effect on other linguistic layers. Section 2.3 gives an overview of
Head-driven Phrase Structure Grammar (HPSG). Finally, in Section 2.4, we present the
state of the research works on HPSG modeling with emphasis on the Arabic language. In
this chapter, we frequenly use Arabic alphabet. We present the transliteration of Arabic
alphabet in the Table 5.1.
2.1 Theoretical Linguistics
Scientific study of human language is called linguistic. Among all branches of linguistic,
theoretical linguistic is the most important for developing models of linguistic knowl-
edge. The core subjects of theoretical linguistics are phonology, morphology, syntax and
semantics. All parts of theoretical linguistics can be summarized as follows:
• Phonology: is the systematic use of sound to encode meaning in any spoken human
language, or the field of linguistics studying this use. In other words, it is concerned
CHAPTER 2. BACKGROUND AND RELATED WORKS
with the function, behaviour and organization of sounds as linguistic items.
• Morphology: is the study of word formation. It is the study of the internal struc-
ture of words or in other words it is the study of the patterns of word formation in a
particular language, description of such patterns and the behavior and combination
of morphemes.
• Syntax: is the study of the principles and rules for constructing phrases or
sentences in natural languages.
• Semantics: is the study of meaning. It typically focuses on the relation between
signifiers, such as words, phrases, signs and symbols.
• Pragmatics: is the study the ways in which context contributes to meaning. It
studies how the transmission of meaning depends not only on the linguistic knowl-
edge (e.g. grammar, lexicon etc.) of the speaker and listener, but also on the context
of the utterance, knowledge about the status of those involved, the inferred intent
of the speaker, and so on.
• Discourse: is the study of connected speech. A discourse constitutes sequences
of relations to objects, subjects or predicates. Discourse can be observed in mul-
timodal/multimedia forms of communication including the use of spoken, written
and signed language in contexts spanning from oral history to instant message con-
versations to textbooks.
Although phonology is a significant part of theoretical linguistics, it is beyond the scope
of this thesis. Because, it deals with language sounds and our works begins from the word
formation i.e. morphology. For the background purpose, we discuss the concepts related
to the morphology, syntax and semantic layer. We have taken the linguistic definitions
from [25].
CHAPTER 2. BACKGROUND AND RELATED WORKS
2.1.1 Morphology
Morphology is the study of the internal structure of words or in other words it is the
study of the patterns of word formation in a particular language, description of such
patterns and the behavior and combination of morphemes. It can be thought of as a
system of adjustments in the shapes of words that contribute to adjustments in the way
speakers intend their utterances to be interpreted. A word is sometimes placed, in a
hierarchy of grammatical constituents, above the morpheme level and below the phrase
level. We will discuss more on the concept of constituents in Section 2.1.2.
A morpheme is the smallest meaningful unit in the grammar of a language. The
word ‘dogs’ consists of two morphemes: ‘dog’, and ‘-s’, a plural marker on nouns.
A morpheme can be categorized based upon how it combines with other morphemes
to form a word. Here are some kinds of morpheme types:
• Bound morpheme: A bound morpheme is a grammatical unit that never
occurs by itself, but is always attached to some other morpheme. In above example,
‘-s’ is a bound morpheme.
• Free morpheme: A free morpheme is a grammatical unit that can occur by itself.
However, other morphemes such as affixes can be attached to it. In above example,
‘dog’ is a bound morpheme.
• Affix: An affix is a bound morpheme that is joined before, after, or within a
root or stem. In above example, ‘-s’ is an affix.
• Root: A root is the portion of a word that carries the principle portion of meaning
of the words in which it functions. It is common to a set of derived or inflected
forms, if any, when all affixes are removed. A root is a stem also. In above example,
‘dog’ is a root. Another example of root is ‘speak’. It carries the principle portion
of meaning of this word. ‘speaker’ is not a root rather it derived from root.
CHAPTER 2. BACKGROUND AND RELATED WORKS
• Stem: A stem is the root or roots of a word, together with any derivational
affixes, to which inflectional affixes are added. In the above two examples, ‘dog’ is
a root and a stem. But, ‘speaker’ is a stem and ‘speak’ is its root.
• Clitic: A clitic is a morpheme that has syntactic characteristics of a word, but
shows evidence of being phonologically bound to another word. Example of clitic
can be ‘within’, ‘into’, etc.
Among these morphemes clitic is beyond the scope of this thesis. Root, stem and affix
will be discussed after the discussion of morphosyntactic operations.
Morphosyntactic operation is an ordered, dynamic relation between one linguistic
form and another. There are two kinds of morphosyntactic operations:
• Derivation - is the formation of a new word or inflectable stem from another
word or stem. It typically occurs by the addition of an affix. The derived word
is often of a different word class (or category) from the original. It may thus take
the inflectional affixes of the new word class. Example - ‘speaker’ is derived from
‘speak’. ‘Speak’ is root and stem also. ‘Speaker’ is a new stem which is derived from
‘speak’ by derivational operation. Here derivational affix (suffix) ‘er’ is used for this
operation. The derived word ‘speaker’ is a stem but not root. This is because, it
can be further analyzed into meaningful unit ‘speak’ which is the root of ‘speaker’.
Another notable thing in this example is ‘speak’ is a verb where the derived word
‘speaker’ is a noun. Thus the word class of derived word is changed from its root.
• Inflection - is variation in the form of a word, typically by means of an affix,
that expresses a grammatical contrast which is obligatory for the stems word
class in some given grammatical context. As an example, ‘speakers’ is inflected
from the stem ‘speaker’. This inflection is necessary if ‘speaker’ is used for plural
form. Here
‘s’ suffix is used for inflection. The word ‘speakers’ is not a stem. Its category is
✏
CHAPTER 2. BACKGROUND AND RELATED WORKS
same as the category of ‘speaker’. Thus, it is different from derivation as syntactic
category does not change here.
Morphology deals with two kinds of information.
• Firstly, what information is encoded by the morpheme. For example, we can
take an Arabic word kataba - he wrote. A variety of information is encoded in this
word and its other inflected or derived form. Some are listed below:
– Agreement: kataba - ■☛
✳
☛ ☛❏ ➺ – he wrote. Person – 3rd, Number – Singular,
Gender – Masculine., Mood – Indicative.
☛ ✏☛☛– Event structure: kataba -
■✳
☛ ✏☞
❏➺ – he wrote. Tense – Past, Aspect – Perfect.
– Agency: kutiba -
■✳❏☛➺ – it was written. Voice – Passive.
❏ ☞– Illocutionary force: uktub - ■✳
✳ ❏
✏☞➺ ❅ – Write. Mode – Command.
❏☛
– Part-of-Speech: kitabun - á✠ ❑☞ ❆✏☛➺☛ – a
book. kataba - ■☛✳
✏☛➺ – verb.
☞ ✏☛☛
– Definiteness: al-kitabu - ❍✳ ❆❏➸☛ ❐❅ – the book Determiner – Definite.
☛ ✎✏☛☛– Complex Predicate: kattaba - ■✳
– Causation.
❏➺ – he made to write. Semantic relation
There are many more syntactic and semantic phenomena those can be expressed
using morphology.
• Secondly, morphological process which is a means of changing a stem to adjust
its meaning to fit its syntactic and communicational context. It encodes mor-
phosyntactic operations. As an example, plural formation is a morphosyntactic
CHAPTER 2. BACKGROUND AND RELATED WORKS
operation, whereas suffixation is a kind of morphological process that English uses
to encode plural formation. The morphological process for concatenative and non-
concatenative morphosyntactic operations are shown below:
– Concatenative operations are those where morphemes are linearly concate-
nated. This process is also called Agglutination and the language that use it
extensively, is called Agglutinative language. For example:
∗ Prefixation: Morphemes concatenated at the front, e.g., clear – un clear∗ Suffixation: Morphemes concatenated at the back, e.g., walk – walked∗ Circumfixation: Morphemes concatenated both at the front and back,
e.g., mind – un mindful
– Nonconcatenative operations are those where morphemes are nonlinearly
embedded. The language that use this process frequently, is called Fusional
language. For example:
∗ Infixation: Root letter morphemes embedded at the middle, e.g., kataba
— kat taba∗ Simulfixation: Front morpheme shifted to the back, e.g., e at — ate∗ Modification: Middle vowel changed, e.g., man — me n∗ Suppletion: Whole stem changed, e.g., go — went
In this thesis, we mainly focus on nonconcatenative operation as well as concatenative
operation and give a mathematical formalism to capture their rich diversity of Arabic.
2.1.2 Syntax
Syntax is the study of the principles and rules for constructing phrases or sentences in
natural languages. In addition, the term syntax is also used to refer directly to the rules
and principles that govern the sentence structure of any individual language. There are
CHAPTER 2. BACKGROUND AND RELATED WORKS
a number of theoretical approaches to the discipline of syntax. Some popular approaches
among these are -
• Generative grammar,
• Categorical grammar,
• Dependency grammar,
• Stochastic/probabilistic grammars/network theories,
• Functionalist grammars.
Modern research in syntax attempts to describe languages in terms of such rules
which are often addressed as construction rules. These rules are the base of generative
grammar. Our current research is also on forming these rules. So, in the discussion of
syntax we put much emphasis on construction.
A construction is an ordered arrangement of grammatical units forming a larger unit.
Different usages of the term construction include or exclude stems and words. There are
several kinds of construction. Some of these are -
• Apposition - is a construction consisting of two or more adjacent units that have
identical referents. Example - My friend John.
• Clause - is a grammatical unit that includes, at minimum, a predicate and an
ex- plicit or implied subject, and expresses a proposition. Example - It is cold,
although the sun is shining. This sentence contains two clauses. It is cold - it is
the main clause and although the sun is shining - it is the subordinate clause.
• Direct speech - is quoted speech that is presented without modification, as it
might have been uttered by the original speaker. Example - Patrick Henry said,
“Give me liberty or give me death”.
CHAPTER 2. BACKGROUND AND RELATED WORKS
• Indirect speech - is reported speech that is presented with grammatical modifica-
tions, rather than as it might have been uttered by the original speaker. Example
- Patrick Henry said to give him liberty or give him death.
• Phrase - is a syntactic structure that consists of more than one word but lacks
the subject-predicate organization of a clause. For example, the house at the end of
the street is a phrase. It acts like a noun. Unlike clause, phrase lacks the subject-
predicate organization.
• Sentence - is a grammatical unit that is composed of one or more words or phrases
that generally bear minimal syntactic relation to the words or phrases that precede
or follow it.. Example - I am reading a book. This sentence is a composition of three
phrases.
• Stem - is the root or roots of a word, together with any derivational affixes, to
which inflectional affixes are added. It has been discussed in detail in Section 2.1.1.
• Word - is a unit which is a constituent at the phrase level and above. It is
sometimes identifiable according to such criteria as being the minimal possible unit
in a reply.
All these constructions can be classified into two categories. These are lexical con-
struction and phrasal or combinatoric construction. Lexical construction deals
with the forming of lexicon that is forming of words and stems. As an example, forming
of speaker from speak is a lexical construction. On the other hand phrasal construction
deals with formation of larger unit than word and stem. So, this type of construction
forms phrase, clauses and sentences.
Constituent is an import concept in discussion of construction. A constituent is
one of two or more grammatical units that enter syntactically or morphologically into a
construction at any level. For example, the sentence, I eat bananas every day. – contains
the following constituents:
1. Immediate constituents: I, eat bananas everyday
CHAPTER 2. BACKGROUND AND RELATED WORKS
2. Ultimate constituents: I, eat, banana, -s, everyday
There are several related, cross-cutting and sometimes confusing concepts related to
constituents. We explain the concepts at syntactic level. Syntactic constituents can
be classified under syntactic category. A syntactic category is a set of words and/or
phrases in a language which share a significant number of common characteristics. The
classification is based on similar structure and sameness of distribution (the structural
relationships between these elements and other items in a larger grammatical structure),
and not on meaning. It is also known as syntactic class. Among the major syntactic
categories there are phrasal syntactic categories like NP (noun phrase), VP (verb phrase),
PP (prepositional phrase) and lexical categories that serve as heads of phrasal syntactic
categories like noun, verb and others. For example a prepositional phrase (PP) is a
phrase that has a preposition as its head. The definition is similar for noun phrase (NP)
and A verb phrase (VP).
Constituents can perform syntactic functions in the construction. A syntactic
function is the grammatical relationship of one constituent to another within a syntactic
construction. There are various kinds of syntactic functions such as subject, predicate,
object, complement, adjunct, modifier and others.
Syntactic functions are significant in categorical grammar. As HPSG is based on
generative paradigm, here syntactic function are not used for syntax modeling. Here we
model the syntax by construction rules.
2.1.3 Semantics
Semantics is the study of meaning. It typically focuses on the relation between signifiers,
such as words, phrases, signs and symbols. In linguistics, it is the study of interpretation
of signs or symbols as used by agents or communities within particular circumstances
and contexts. The formal study of semantics intersects with many other fields of inquiry,
including lexicology, syntax, pragmatics, etymology and others. The formal study of
CHAPTER 2. BACKGROUND AND RELATED WORKS
semantics is therefore complex.
Semantics is very much related with reference. References are used for agreement.
There are several types of agreements as mentioned in HPSG 94 [33]. Some of these are -
1. Index agreement: It arises when indices are required to be token identical. That
is the value of semantic index of a lexicon needs to agree with the same value of
semantic index of other lexicon.
2. Syntactic agreement: It arises when strictly syntactic objects (e.g. CASE values)
are identified. That is the a lexicon has a syntactic requirement and this requirement
can be fulfilled by other lexicon which has certain syntactic object value.
3. Pragmatic agreement: It arises when contextual background assumptions are
required to be consistent.
Agreement is not syntactic in most of the languages. To show this, we consider
this sentence - the beef sandwich at table six is getting restless. The referent of subject
in this sentence is not “the beef sandwich” rather the customer who ordered it. Like
English, agreement in Arabic language is not syntactic; rather it is semantic. Which
properties of referents are encoded by agreement features is subject to cross-linguistic
variation, but common choices include person, number, gender. In some languages, gender
distinctions correspond to semantic sortal distinctions such as sex, human/nonhuman,
animate/inanimate or shape. Arabic is an example of this type of language. So, here
along with person, number and gender, human/nonhuman distinction must be preserved
for agreement. We will discuss this with example in Section 3.1.3.
2.2 Arabic Morphology
Arabic is rich in nonconcatenative morphology. This nonconcatenative morphology is
mainly root-pattern morphology. In this section, we introduce root-pattern morphology
CHAPTER 2. BACKGROUND AND RELATED WORKS
and its effect in Arabic verb and verbal noun. Then, we discuss different types of Arabic
verbal nouns.
2.2.1 Root-Pattern Morphology
Arabic verb is an excellent example of nonconcatenative root-pattern based morphology.
A combination of root letters are plugged in a variety of morphological patterns with priory
fixed letters and particular vowel melody that generates verbs of a particular type which
has some syntactic and semantic information [3]. Root of any stem denotes a semantic
core and vowel pattern bears the syntactic information. Derivation from common root but
different pattern shares common meaning. Similarly, derivation from same pattern but
different root shares common syntactic information. A particular combination of root-
pattern brings fixed syntactic and semantic meaning. Root and pattern must co-exist and
combination of root and pattern specify semantic meaning.
These information will be conceivable from the following figures. Figure 2.1 shows
how different sets of root letters plugged into the same vowel pattern generate different
verbs with same syntactic information. Similarly, Figure 2.2 shows how same set of root
letters plugged into different vowel pattern generate two lexemes with completely different
syntactic information. But at the same time, these two lexemes share related semantic
meaning.
Besides vowel pattern, a particular verb type depends on the root class. This root class
is determined on basis of the phonological characteristics of the root letters. Root classes
can be categorized on basis of the number of root letters, position or existence of vowels
among these root letters and the existence of a gemination (tashdeed). Most Arabic
verbs are generated from triliteral and quadriliteral roots. In Modern Standard Arabic
five character root letters are obsolete. Phonological and morphophonemic rules can be
applied to various kinds of sound and irregular roots. Among these root classes, sound
root class is the simplest and it is easy to categorize its morphological information. A
k t b
CHAPTER 2. BACKGROUND AND RELATED WORKS
Root (k,t,b) Root (n,s,r)
kataba
nasara(He wrote) (He helped)
stem
stem
Pattern (_a_a_a)
Figure 2.1: Root-pattern morphology1: 3rd person singular masculine sound perfect active
form-I verb formation from same pattern ( a a a)
sound root consists of three consonants all of which are different [37]. On the other hand,
non-sound root classes are categorized in several subtypes depending on the position of
weak letters (i.e., vowels) and gemination or hamza ( ✆➠). All these subtypes carry
mor-phological information.
2.2.2 Morphology in Arabic Verb and Verbal Noun
From any particular sequence of root letters (i.e., triliteral or quadriliteral or weak or
sound), up to fifteen different verb stems may be derived, each with its own template or
vowel pattern. These stems have different semantic information. Western scholars usually
refer to these forms as Form I, II, . . . , XV. Form XI to Form XV are rare in Classical
Arabic and are even more rare in Modern Standard Arabic. These forms are discussed in
detail in [37]. Table 2.1 shows the semantic effect and example of the mostly used verb
CHAPTER 2. BACKGROUND AND RELATED WORKS
Root (k,t,b)
kataba(He wrote) stem
kaa ti bun(Writer)
Pattern(_a_a_a) Pattern
(_aa_i_un)
Figure 2.2: Root-pattern morphology2: same root (k,t,b) contains same kind of semantic
meaning
forms [i.e. Form I to X]. Every particular sequence of root letters may not have a meaning
word for a particular verb form. As an example, the root sequence - k, t, b, does not have
a meaning word for Form IX.
These morphological verb forms has no relation with the verb form based on events
structure. There are three type of verb form based on event structure - perfect, imperfect
and imperative. Perfect indicates that the event has been completed, imperfect indicates
that the event has not yet been completed, and imperative indicates that the event is a
command. It is worth mentioning that Form I has eight subtypes depending on the vowel
following the middle letter in perfect and imperfect forms. Some types of verbal noun
formation depend on these subtypes. Any combination of root letters for Form I verb will
follow any one of these eight patterns. We refer these patterns as Form IA, IB, IC, . . .,
IH. These subtypes are shown in Table 2.2 with corresponding examples. For example,
the vowels on the middle letter for Form IA: nasara yansuru are a and u for perfect and
imperfect forms, respectively. Similarly, other forms depend on the combination of vowels
on these two positions. Not all kinds of combinations exist. In Form IH, the middle letter
is a long vowel and there is no short vowel on this letter. In summary, we can generate
different types of verbal nouns based on these verb forms, root types (position of weak
Form Example Meaning
Form I (Transitive) ☛ ✏❏☛He wrote
Form II (Causative) ☛ ✎✏☛☛He caused to write
Form III (Ditransitive) ☛ ✏❑ ☛
He corresponded
Form IV (Factitive) ☛ ❏➺ ☛
He dictated
Form V (Reflexive) ☛ ✎✏☛☛ ❑ It was written on its own
Form VI (Reciprocity) ☛ ✏❑ ☛
✏❑They wrote to each other
Form VII (Submissive) ☛ ✏❏☛ ✠ He was subscribed
Form VIII (Reciprocity) ☛ ✏☛✏☛They wrote to each other
Form IX (Color or bodily defect) ✎☛☛ It turned to red
Form X (Control) ■☛✳ ❏☛➸✏☛ ☛❅ (istaktaba
)
He asked to write
CHAPTER 2. BACKGROUND AND RELATED WORKS
Table 2.1: Arabic Verb Form
■✳ ☛➺
(kataba )
■✳ ❏➺ (kattaba )
■✳ ☛ ❆➾
(kataba )
■✳ ☛ ❅ (aktaba )
■✳ ❏➸✏☛
(takattaba )
■✳ ☛ ❆➽☛
(takataba )
■✳ ☛➸❑☛❅
(inkataba )
■✳ ✜❏➺ ☛❅ (iktataba )
◗Ô❣☛❅ (ih. marra )
Form Example Perfect
mid-vowel
Imperfect
mid-vowel
Form−IA ☞ ☞ ✠ ☛ ☛ ☛ ☛✠ a u
Form−IB ☞✠ ☛ ☛ ☛ ☛✠ a i
Form−IC ☞ ✏☛ ✠ ☛ ☛ ✏☛☛✠
a a
Form−ID ☞ ☛ ☛ ☛❹ i a
Form−IE ☞☛ ☛ ☛ u u
Form−IF ☞ ☛ ☛ i i
Form−IG☞ ✠ ✠ ☛ ☛ ☞✠ ☛✠
u i
Form−IH ❳☞ ❆ ☛ ❑✡☛ ❳☛ ☛➽ ❆➾ (kada yakadu )
✏
CHAPTER 2. BACKGROUND AND RELATED WORKS
letter or gemination) and number of root letters.
Table 2.2: Subtype of Form I Root Verb
◗å➈❏ ◗å➈✢ (na.sara yan.suru )
❍✳ ◗☛å➈ ❍✳ ◗å➉ (d. araba yad. ribu )
✐❏➤ ✐❏➥ (fatah. a yaftah. u )
➞Ò❶ ➞☛ Ö
☛Þ
(sami,a yasma,u )
Ð◗☞➸❑✡■✳ ❶☛ ♠✚
➱➆☛ ➤❑✡
Ð◗☞➺ (karuma yakrumu )
■☛✳ ❶☛ ❦ (h. asiba yah. sibu )
➱➆➥ (fad. ula yafd. ilu )
All these verb stems, derived from a single root verb, have different verbal nouns.
Table 2.3 shows the list of active participle and passive participle for all verb stems
including the root verb ■☛
✳
☛ ☛❏ ➺ (kataba ). All type of verbal noun may not exist for a
particular form. In Table 2.3 passive participle does not exist for Form−IX.
2.2.3 Classification of Arabic Verbal Nouns
In this part, we discuss the eight types of nouns derived from verbs [22]:
Form Verb Stem Active Participle Passive Participle
Form−I ☛ ✏☛☛ ✔ ✏ ☛ ✔✏☞ ☛
Form−II ☛ ✎✏☛☛ ✎ ✎✏☛☛ ☞
Form−III ☛ ✏☛ ☛ ✔ ✏ ☛ ☞ ✔ ✏☛ ☛ ☞
Form−IV ☛ ✏☛ ☛ ✔ ✏ ☞ ✔ ✏☛ ☞
Form−V ☛ ✎✏☛☛
✏☛✔ ✎ ✔ ✎✏☛☛ ✏☛
☞Form−VI ☛ ✏☛ ☛ ✏
☛✔ ✏ ☛ ✏☛ ☞ ✔ ✏☛ ☛ ✏☛
☞Form−VII ☛ ✏☛☛ ✠ ✔ ✏☛ ✠ ☞ ✔ ✏☛☛ ✠ ☞
Form−VIII ☛ ✏☛✏☛ ✔ ✏✏☛ ☞ ✔ ✏☛✏☛ ☞
Form−IX ✎☛ ✏☛ ✔✎ ✏☛ ☞ N/A
Form−X ☛ ✏☛ ✏☛ ✔ ✏ ✏☛ ☞ ✔ ✏☛ ✏☛ ☞
CHAPTER 2. BACKGROUND AND RELATED WORKS
Table 2.3: Verbal Nouns Derived from Different Forms
■✳ ❏➺ (kataba )
■✳ ❏➺ (kattaba )
■✳ ❑❆➾ (kataba )
■✳ ❏➺ ❅ (aktaba )
■✳ ❏➸❑ (takattaba )
■✳ ❑❆➽❑ (takataba )
■✳ ❏➸❑☛❅ (inkataba )
■✳ ✜❏➺ ☛❅ (iktataba )
■✳ ❏➺ ☛❅ (iktabba )
■✳ ❏➸❏❷☛❅ (istaktaba )
■✳ ❑☛ ❆➾ (katibun )
■✔✳ ❏➸Ó (mukattibun )
■✳ ❑☛ ❆➽Ó (muk¯atibun )
■✳ ❏☛➸Ó (muktibun )
■✳ ❏➸❏Ó (mutakattibun )
■✳ ❑☛ ❆➽❏Ó (mutakatibun )
■✳ ❏☛➸❏Ó (munkatibun )
■✳ ✜☛❏➸Ó(muktatibun )
■✳ ❏➸Ó (muktabbun )
■✳ ❏☛➸❏❶Ó (mustaktibun )
❍✳ ñ❏➸Ó (maktuwbun )
■✔✳ ❏➸Ó (mukattabun )
■✳ ❑❆➽Ó (muk¯atabun )
■✳ ❏➸Ó (muktabun )
■✳ ❏➸❏Ó (mutakattabun )
■✳ ❑❆➽❏Ó (mutakatabun )
■✳ ❏➸❏Ó (munkatabun )
■✳ ✜❏➸Ó(muktatabun
)
■✳ ❏➸❏❶Ó (mustaktabun )
✠
✏
CHAPTER 2. BACKGROUND AND RELATED WORKS
1. Gerund ( P❨☛
➆Ó☛
verb.
Õæ❹☛❅ - ism ma.sdar )- names the action denoted by its corresponding
☛2. Active participle ( ➱➠☛ ❆➤❐❅ Õæ❹☛❅ - ism alf¯a,il )- entity that enacts the base meaning i.e.
the general actor.
☛3. Hyperbolic participle ( é➟❐❆❏✳ ÜÏ❅ Õæ❹☛❅ - ism almubalag˙ ah )- entity that enacts the base
meaning exaggeratedly. So it modifies the actor with the meaning that actor does
it excessively.
4. Passive participle ( ➮ñ➟☞ ➤✠ Ü☛ Ï❅ Õæ❹☛❅ - ism almaf,uwl )- entity upon which the
base meaning is enacted. Corresponds to the object of the verb.
✏ ☛ ✎☛ ✑☛ ☞ ✏☞ ☛✠ ☛5. Resembling participle ( é î❉✳ ❶ ÜÏ❅ é ➤ ➆☛ ❐❅ - al.sifatu’lmuˇsabbahah )- entity
enacting (or upon which is enacted) the base meaning intrinsically or inherently. Modifies the
actor with the meaning that the actor does the action inherently.
✏☛ ☛6. Utilitarian noun ( é❐❇❅ Õæ❹☛❅ - ism alalah )- entity used to enact the base
meaning i.e. instrument used to conduct the action.
✠ ☛✠7. Locative noun ( ➡◗➣❐❅ Õæ❹☛❅ - ism al.zarf )- time or place at which the base
meaning is enacted.
8. Comparative and superlative ( ➱ ➆✠ ➤✠ ✏☛ ❐❅ Õæ❹☛❅ - ism altafdil )- entity that enacts (or☛ ❏ .
upon whom is enacted) the base meaning the most. In Arabic, this type of word is
categorized as a noun, but it is similar to an English adjective.
Examples of these eight types of verbal nouns are presented in Table 2.4. Each of
these types can be subcategorized on the basis of types of verbs. To understand complete
variation of verb and its morphology we should have some preliminary knowledge of the
Arabic verb [20].
Root verb Verbal noun Example Meaning
,alima (alima)
means
“he knew”
Gerund ☞
☛“Knowing”
Active participle ✔Õ
☛❐
“One who knows”
Hyperbolic participle✔✏ ☛ ✎
“One who knows
a lot”
Passive participle ✔ ☞
(ma,luwmun )
“That which is known”
Resembling participle ✔Õæ
✡✃
“One who knows
intrinsically”
Utilitarian noun ✔ ☛
“Through which
we know”
Locative noun ✔Õ
❰☛
“Where/when we know”
Comparative and
Superlative
☞ ☛ ✌
“One who knows
the most”
CHAPTER 2. BACKGROUND AND RELATED WORKS
Table 2.4: Different Types of Verbal Nouns
Õ❰➟☛
❐❅
éÓ❈➠
Ðñ✃➟Ó☛
Õ❰➟Ó
☛
Õ❰➠❅
CHAPTER 2. BACKGROUND AND RELATED WORKS
2.3 An HPSG Primer
HPSG is highly lexicalized, non-derivational constraint-based, surface oriented grammat-
ical architecture developed by Carl Pollard and Ivan Sag [32, 33]. It combines the best
idea from its predecessors - Generalized phrase structure grammar (GPSG) [15], Lexical
functional grammar (LFG) [6], Government and binding theory (GB) [8]. It combines
linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) and for this
reason, it is very attractive in Natural Language Processing. Its highly lexicalized prop-
erty gives the flexibility to modify the lexicon depending on language to capture different
features. A lexical entry, represented in AVM (Attribute Value Matrix), may describe
the sign partially. Each lexical entry must have a type, and its subtypes are part of a
big structure that forms the type hierarchy. Thus, HPSG is seen consisting of inheritance
hierarchy of sorts with constraints of various kinds on the sort of linguistic object in the
hierarchy [16]. There is no distinction between terminal and non-terminal nodes in HPSG.
This is related to the fact that HPSG is a “fractal” [A fractal is a rough or fragmented
geometric shape that can be split into parts, each of which is (at least approximately)
a reduced-size copy of the whole], every sign down to the word level has syntactic, se-
mantic and phonological features encoded in a similar manner [31]. Thus we can work
on a specific level or surface of this hierarchy and use unification to reuse and extend the
structure.
HPSG includes grammar rules and lexical entities. Normally, the latter are not con-
sidered to belong to a grammar. The formalism is centered around lexicons. This means
that the lexicon is more than just a list of entries; it is in itself richly structured.
In HPSG terminology, the basic grammatical type is the sign, which is a formal rep-
resentation of words, phrases and sentences. All human utterances are captured by signs.
A rule that licenses a sign, is captured by another object called construct. Signs and
constructs are formalized as typed feature structure which is a set of attribute-value pairs.
Attributes are called linguistic objects. The value of an attribute may be either atomic or
⎢⎥
⎢
CHAPTER 2. BACKGROUND AND RELATED WORKS
complex i.e. function. Functions are those feature structures which are described using
an attribute value matrix (AVM).
The generic construct of a sign is presented in Figure 2.3. The AVM basically maps
features to feature structure. A feature in an AVM can be of two types: (a) category name,
i.e., sort description and (b) agreement (or constraints), which is a list of attributes and
their values.
Feature
Value⎡ PHON⎢MORPHphonobj ⎤
morphobj ⎥Phonology
Morphology⎢ SYN⎢ SEM⎣⎢ M
synobj
semobjM
⎥ Syntax⎥⎥ Semantics⎦⎥An HPSG
Sign
Figure 2.3: An HPSG Sign.
A construct is represented using a feature structure with MOTHER (MTR) feature
and DAUGHTERS (DTRS) feature. The value of MTR feature is a sign and the value
of DTRS is a nonempty list of signs. A typical description of a construct is shown in
Figure 2.4. The licensing of signs follows the Sign Principle which states that “Every
sign must be lexically or constructionally licensed. A sign is lexically licensed only if it
satisfies some lexical entry, and constructionally licensed only if it is the mother of some
construct ” [39].
HPSG modeling of any language starts from building a very detailed type hierarchy
which is both linguistically motivated as well as captures the language independent con-
straints. From this type hierarchy, the attribute value matrix for linguistic signs can be
constructed. In this thesis, we use the Sign-Based Construction Grammar (SBCG) [38]
version of HPSG. Unlike standard presentations of HPSG, where the type constraints form
part of the signature of a grammar, the type constraints of SBCG are an essential part of
CHAPTER 2. BACKGROUND AND RELATED WORKS
Feature
Value
⎡ MTR⎢ signMother⎤⎥⎣DTRS list (sign)⎦
List of DaughtersAn HPSG
Construction
Figure 2.4: An HPSG Construction.
the body of the grammar. A standard SBCG type hierarchy is shown in Figure 2.5.
From the type hierarchy, we know that every linguistic object can be modeled using
feature-structure. There are two types of feature structures. Atoms are simple feature
structures, which indicate the terminal value of various linguistic attributes. Functions
are complex feature structure, which are expressed using attribute value matrix and can
contain other feature structures as their feature values. Sign and cxt(construct) both
are feature-structure. The attribute of signs are also feature-structure; phon-obj, syn-obj,
sem-obj, etc. Frames are semantic representation of events. There are two types
of constructions; phr-cxt (phrasal) and lex-cxt (lexical). There are also two types of
signs; lex-sign and expression. For the detail of this type hierarchy, see [38].
In HPSG, the semantic information is expressed in Minimal Recursion Semantics
(MRS), as developed in CSLI’s Linguistic Grammars Online (LinGO) project [10, 11].
Most semantic information in MRS is contained under the feature FRAMES. In this
list, for verb there is a frame event-fr which contains a Davidsonian event variable and
index-valued features such as act(or) and und(ergoer) [12, 13]. These variables are used
for contain information which is used for agreement purpose also. In Section 2.1.3, we
discuss about these semantic agreements.
CHAPTER 2. BACKGROUND AND RELATED WORKS
feature-structure
function atom
cat phon-obj pos
sign syn-obj
cxt sem-obj
frame
noun
verb
lex-sign
expression
phr-cxt lex-cxt
event-fr
word phrase … infl-cxt
und-fr
act-fr
soa-fr
lexeme … deriv-cxt … act-und-fr
act-soa-fr
si-lxm sc-lxm
… und-only-fr
act-und-soa-fr
try-fr
trans-lxmsr-lxm
to-be-split-fr
write-fr cause-fr
Figure 2.5: A Standard SBCG type hierarchy
2.4 Related Works
This section is dedicated for discussion of linguistic modeling of morphology related works.
At the beginning of this section, we give an overview of overall works related to computa-
tional modeling of Morphology. Then we put emphasis on HPSG modeling of morphology.
As Semitic languages like Arabic, Amharic and Hebrew are rich in morphology, we give a
glimpse on HPSG modeling of Hebrew as there are mentionable amount of works done in
this area. At the end of this section, we discuss HPSG modeling of Arabic language and
its morphology.
2.4.1 HPSG Modeling of Morphology
HPSG is one of the most successful grammars to process natural languages specially to
process syntactic and semantic aspects but it has inadequate coverage on morphological
CHAPTER 2. BACKGROUND AND RELATED WORKS
construction specially for nonconcatenative morphology. Nonconcatenative morphology
is not so plentiful in the mostly used languages. But this phenomenon is abundant in
Semitic languages such as Arabic, Amharic, Hebrew, etc. Among these Semitic languages,
Arabic is the mostly used and very rich in nonconcatenative morphology. Its precious mor-
phology attracts several series of research projects [1, 7, 40]. These research projects are
mainly based on development of toolkit for Arabic morphological analysis. These projects
are not based on compiler development rather these are dedicated for morphological an-
alyzer which designs and implements finite state morphological models. From linguistic
perspective, these models describe rules of lexicon development and derive lexicons.
Morphology of Sierra Miwok and French were modeled in HPSG by phonological
realization [5]. The author also showed how nonconcatenative morphology can be captured
by his framework. He further mentioned the idea how consonant and vowel melody forms
the word in Arabic. But he did not show any construction rule for any language.
Susanne modeled concatenative morphology in German and English by HPSG formal-
ism in 1998 [35, 36]. In that paper, she captured the morphological derivation by a special
feature called MORPH-B which means morphological base. This MORPH-B feature
serves the purpose of derivation. This MORPH-B feature can be used to capture non-
concatenative morphology also. The alternative of this mechanism is lexical construction
rule [38]. This is also widely used in HPSG modeling.
An HPSG formalism of morphological complex predicate is outlined [9]. Here the au-
thor mostly focused on syntax and semantics of causative construction. He used lexical
rule with semantic frames to capture morphological effect. As Japanese is an Agglu-
tinative language, the morphology used here is concatenative morphology. Thus HPSG
modeling of nonconcatenative morphology is still untouched.
As mentioned earlier, HPSG modeling of nonconcatenative morphology is relatively
new area of research. There are few mentionable works in nonconcatenative morphology
of Semitic languages. We discuss about this in detail in the Sections 2.4.2 and Section
CHAPTER 2. BACKGROUND AND RELATED WORKS
2.4.2 HPSG Modeling of Hebrew
Semitic languages exhibit rich morphological operations. Both concatenative and noncon-
catenative morphology are abundant in these languages. Among these languages, HPSG
modeling of Hebrew is not new but it lacks its coverage on morphology. In 2000, Nathan
Vaillette presented a paper on Hebrew relative clauses [41]. In this paper, he nicely mod-
eled the phrasal construction rules to capture Hebrew relative clauses. He did not put
emphasis on morphological operation.
Susanne extended her work on German and English concatenative morphology in
2001 and along with German and English, she added the nonconcatenative morphology
of Hebrew verbal nouns [36]. She proposed an AVM for Hebrew verbal noun. This AVM
has similarity with the AVM we proposed for verbal noun regarding the morphological
feature. But she did not show any syntactic effect of this morphology. She articulated
the AVM by placeholders for consonants. By placing the list root consonants, from this
AVM, verbal noun AVM will be generated. She did not ensure that only valid verbal
nouns will be generated from this AVM. Her solution can be used to automate lexical
entry in dictionary or corpus but will not reduce the number of entry. Actually, she just
gave a glimpse on morphology of Hebrew verbal noun in her massive work.
A detail work on verb initial construction (which is also called verbal sentence as
opposed to nominal sentence and in this type of sentence verb precedes the subject)
was shown [26]. In that work, the authoress put emphasis on Modern Hebrew verb
related phrasal construction. She discussed the agreement of verb with its subject and
complement. She also showed concatenative and nonconcatenative morphology of Hebrew
verb in that paper but did not give any formalism of this morphology like what were
modeled in German or Japanese [9, 36]. She mainly discussed the syntactic effect of these
inflected verb forms. She also presented an implementation framework of HPSG grammar.
CHAPTER 2. BACKGROUND AND RELATED WORKS
In 2007, Nurit presented a comparision of the implementation platform of HPSG [27].
She discussed the advantages and disadvantages of TRALE (An extension of the Attribute
Logic Engine) and Linguistic Knowledge Building (LKB). This paper is very useful to
choose the implementation platform of HPSG.
2.4.3 HPSG Modeling of Arabic
In 2006, an HPSG analysis of broken plural and gerund has been presented [24]. Main
assumption in that work revolves around the Concrete Lexical Representations (CLRs)
located between an HPSG type lexicon and phonological realization. Here, HPSG sign
was represented using CLR function not by AVM and this function put more emphasis on
phonology instead of morpho-syntactic operation. But main drawback of this work is it
does not deal with other type of verbal noun and it does not dictate any implementation
of CLR.
HPSG modeling of Arabic triliteral strong verb was proposed in 2008 [2–4]. The
authors in these papers, show regular morphology of Arabic verb. They designed the
SBCG AVM of Arabic verb. They also designed several verb lexeme construction and
morphologically complex predicates (MCP). But they did not touch the morphological
derivation of verbal noun. Also, they did not give any distinct way to implement the
construct proposed in their works. During our work on verbal noun construct, we have to
work with SBCG verb lexeme too. We adopt the verb lexeme proposed in these papers and
modify it to cope with all the cases that we have found. The authors did not propose any
idea about SIT-INDEX and INDEX and they actually duplicated the INDEX feature with
ref-fr semantic frame which is never used in any HPSG or SBCG literatures. The atomic
features (person, number and gender), that are used under INDEX function feature by
Pollard and Sag [33], are used under ref-fr in these papers where at the same time they
still keep INDEX feature and does not show its components. We correct this INDEX and
SIT-INDEX related problem. This will be discussed in Section 3.2.
CHAPTER 2. BACKGROUND AND RELATED WORKS
A nice HPSG formalism of Arabic nominal sentence is presented [29]. The paper intro-
duces a grammar for Arabic nominal sentence. They have implemented their formalization
using LKB system. The main limitation of this work is it deals with only agreement of
nominal sentences and it does not discuss on morphology at all. Another big limitation
in this work is the assumption - agreement information in Arabic arises from syntactic
rules and that it obeys grammar rules. But in Section 2.1.3 and 3.1, we have established
that agreement in Arabic is not always syntactic and the agreement feature needs another
feature humanness (HUM) which is not mentioned in the discussed work.
A parser on Arabic relative clause is designed in [17]. It is not a deep research and a
study about different forms of relative clauses to process relative sentences. Thus,
we can conclude that the rich nonconcatenative morphology of Arabic verbal noun is
not yet explored and we have the opportunity to do it. In 2010, part of this work was
published [19]. In that paper, we proposed the construction rules but did not articulate
any implementation.
Chapter 3
HPSG Formalism for Verbal Noun
In this chapter, we model the HPSG categories of verbal nouns and their derivation from
different types of verbs through HPSG formalism. In Section 2.3, we mention that
we adopt the SBCG [38] for this analysis. Here, we give an AVM for nouns and
extend it for verbal nouns. We extend the verb AVM proposed by Bhuyan et al. [2–4].
We propose a multiple inheritance hierarchical model for Arabic verbal nouns and how
to get a sort description from the type hierarchy. Finally, we propose construction rules of
verbal nouns derived from strong triliteral i.e. Form I root verbs.
3.1 AVM of Arabic Nouns
We modify the SBCG feature geometry for English and adopt it for Arabic. The SBCG
AVMs for nouns in English and in Arabic are shown in Figure 3.1 and Figure 3.2, respec-
lid . . .
tively.
The PHON feature is out of the scope of this paper. Three main function features -
MORPH, SYN and SEM are discussed in the following subsections.
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
noun-
lex phon [] form []
arg-st list(sign)
noun
case . . . cat select . . .syn
xarg . . .
sem
val list(sign) mrkg mrk
index i
frames list(f
rame)
Figure 3.1: AVM for English noun
3.1.1 MORPH
The MORPH feature captures the morphological information of signs and replaces the
FORM feature of English AVMs. This feature is similar to MORPH feature used for
Hebrew verbal noun [36]. The value of the feature FORM is a sequence of morphological
objects (formatives); these are the elements that will be phonologically realized within the
sign’s PHON value [38]. On the other hand, MORPH is a function feature. It not only
contains these phonologically realized elements but also contains their origins. MORPH
contains three features - ROOT, STEM and DEC. ROOT feature contains root letters
for the following cases:
1. The root is characterized as a part of a lexeme, and is common to a set of derived
or inflected forms
2. The root cannot be further analyzed into meaningful units when all affixes are
removed
3. The root carries the principal portion of meaning of the lexeme
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
noun-
lex phon []
root list(letter)
morph stem list(letter)
dec . . .
arg-st list(sign)
noun
case . . . cat def . . .
syn select . . .
xarg . . .
lid . . .
val list(sign) mrkg mrk person
. . .
number . . . index
sem gender . . . hum . . . frames list(f rame)
Figure 3.2: AVM for Arabic noun
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
In rest of the cases, the content of this feature is empty.
The STEM feature contains a list of letters, which comprises the word or phrase or
lexeme. We can identify the pattern in a lexeme by substituting the root letters by the
placeholders if any root exists in STEM. As an example, the ROOT of the lexeme ‘kataba’
contains ‘k’, ‘t’ and ‘b’ and the pattern of the STEM is ( a a a). Without the existence
of this pattern, the ROOT is irrelevant. Thus a pattern bears the syntactic information
and a ROOT bears the semantic information. Lexemes which share a common pattern
must also share some common syntactic information. Similarly, lexemes which share a
common root must also share some common semantic information. STEM is derived from
the root letters by morphology if root exists.
The DEC (declension type) feature under the MORPH feature maps to the declension
type of noun. It determines how the end vowel of noun lexemes changes to reflect its case.
The change of end vowel changes the form of a lexicon. There exists nine possible ways in
which grammatical cases can be represented on an Arabic noun. So, for declinable noun,
value of DEC feature can be T 1, T 2, T 3, . . . , T 9, corresponding to the nine declension
types. The value of this DEC feature can be determined from type hierarchy of noun
lexeme. It needs further research and it is beyond the scope of this thesis. In our current
research, we will not mention this feature in the following AVM’s but we keep it in our
basic design to make our design robust for inflection also.
3.1.2 SYNTAX
The SYN feature contains CAT, VAL and MRKG features. We modify the CAT feature
of SBCG to adopt it for Arabic language. Note that, for all kinds of verbal nouns the sort
description of the CAT feature is noun. In Arabic there are only three parts of speech
(POS) for lexemes or words: noun (in Arabic pronoun is also considered as noun), verb
and particle. Any verbal noun serving as a modifier is also treated as noun. In the case
of the Arabic noun, the CAT feature consists of CASE, DEF, SELECT, XARG and LID
☛
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
features. Among these features, we introduce DEF feature which is used for syntactic
agreement in phrasal construction. This feature also strengthen our design. As Arabic
has three cases for noun, the value of CASE will be nominative, accusative and genitive.
The DEF feature denotes the value of definiteness of an Arabic noun. There are eight
ways by which a noun word or lexeme becomes definite [21]. Personal pronouns such as
☞ ✎✗ ✌“he”, “I” and “you” are inherently definite. Proper nouns are also definite. é❁❐❅ (-al-
lahu )
is another instance of definite lexeme. These examples confirm that definiteness has to be
specified at the lexeme level. The article ‘al’ also expresses the definite state of a noun of
any gender and number. Thus if the state of a noun is definite, the noun lexeme contains
yes as the value of DEF, otherwise its value will be no.
In Arabic, there is a significant role of this definiteness (DEF) feature for syntactic
agreement. A nouns and its modifier must agree on the DEF feature value. For example,
◗☞Ô☛❣ ❅ ❍☞ ❆✏☛➸❐
☛ ❍❆✏☛➸❐☛
❇ ✳ ❏☛
❅ (alkitabu ’l--ah. maru ) means “the red book”. ☞
✳ ❏☛
❅ (alkitabu ) means “the
☞ ☛book” and ◗Ô❣❅ (-ah. maru ) means “red”. As “red” is used as a modifier for “the book”,
☞☛ ☛ ☛the definiteness prefix ‘al’ has been added to ◗Ô❣❅ (-ah. maru ) yielding ◗☞Ô❣❇❅ (al--ah. maru ).
3.1.3 SEMANTICS
Like SBCG in English, SEM feature in Arabic contains two function features - INDEX and
FRAMES. The INDEX is used for index based semantic agreement which is mentioned in
Section 2.1.3 and FRAMES contains the list of frames which contain semantic information
in Minimal Recursion Semantics (MRS).
☛
.
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
As mentioned earlier in Section 2.1.3, person, number, gender and human/nonhuman
- these information must be kept for semantic agreement. So, INDEX feature is composed
of PERSON, NUMBER, GENDER and HUM and it is contained under SEM. We use this
index based agreement [33] as opposed to putting the agreements under AGR feature [23].
This is because index based agreement is more customary in HPSG and most of the
scholars use index based agreement.
HUM feature is introduced by us for Arabic. The other three features are also used for
semantic agreement in English [33]. This HUM feature denotes humanness. Depending on
languages, agreement may have gender, human/non-human, animate/inanimate or shape
features [33]. In Arabic, Humanness is a crucial grammatical factor for predicting certain
kinds of plural formation and for the purpose of agreement with other components of a
phrase or clause within a sentence. The grammatical criterion of humanness only applies
☞ ☛ ☛ ✕☛ ☛to nouns in the plural form. As an example, “these boys are intelligent” ( ❳❇ð❇❅
❩❇ñ ë
❩✕☛ ✔✏✎☛ ☛✠ ☞ ☛❆❏✡➺☛ ❳
✠ ❅ - ha-ul¯a- alawl¯adu -adkiy¯a- ) and “these birds are intelligent” ( é❏✡➺
☛ ❳ P☞ñ❏✡➣❐❅ è☛ ❨✠
ë - hadihi¯ ¯’ltuywru dakiyyatun ). Both of these sentences are plural. But the former refers to human
¯beings whereas latter refers to non-humans. So the same word “intelligent” (dakiyyun )
✕☛ ✔✏✎☛ ☛✠ ¯
has taken two different plural forms in two sentences: ❩❆❏✡➺☛ ❳✠ ❅ (-adkiy
- ) and é❏✡➺☛ ❳ (dakiyyatun
¯✕☛ ☛✌
¯
). In the case of boys, it is in the third person masculine plural form ( ❩❆❏✡➺☛ ❳✠
❅ - -adkiy - )✔✏✎☛ ☛✠
¯
whereas in case of birds, it is in the third person feminine singular form ( é❏✡➺☛ ❳ - dakiyyatun
✔✏ ✎☛ ☛✠ ¯
). Also, from the third person feminine singular form ( é ❏✡ ➺☛ ❳ - dakiyyatun ), we cannot
¯readily say that it refers to feminine. In fact, it may refer not plural of nonhuman beings
too. This is why, along with PERSON, NUMBER and GENDER, we keep HUM as a
semantic agreement feature.
lid . . .
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
If the noun refers to a human being then the value of HUM is yes, otherwise it is no.
The value of PERSON for Arabic noun can be 1st, 2nd or 3rd. There are three number
values in Arabic. So, the value of NUMBER can be sg, dual or pl denoting singular,
dual or plural, respectively. The GENDER feature contains either masc or f em denoting
masculine and feminine respectively.
3.2 AVM of Arabic Verbs
As we will formulate construction rules which capture the linguistic derivation of noun
from verb, we need to model the AVM of verb. We modify the verb AVM proposed by
Bhuyan et al. [2]. We correct the index related problem found in that work. We disscuss
the problem in detail in Section 2.4.3. We try to align the design of verb AVM with that
of noun AVM. Figure 3.3 shows the SBCG AVM of Arabic verb. verb-
lex phon [] morph
root list(letter) stem list(letter) vdec list(letter)
arg-st list(sign)
verb
vform . . . cat
voice . . .
mood . . . syn
select . . . xarg . . . val list(sign)
mrkg mrk
sem
sit-index
situation . . .
frames list(f rame)
Figure 3.3: AVM for Arabic verb
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
MORPH feature in the verb AVM is similar to MORPH in noun AVM except the
VDEC feature. It captures the declension type of verb and it replaces the DEC feature
which captures the declension type of noun. Like DEC, it determines how the end
vowel of noun lexemes changes to reflect the mood of vowel. The change of end vowel
changes the form of a verb lexicon. There exists five possible ways in which
grammatical cases can be represented on an Arabic verb. So, for declinable verb, value
of VDEC feature can be V T 1, V T 2, V T 3, . . . , V T 5, corresponding to the five
declension types. The value of this VDEC feature can be determined from type
hierarchy of verb lexeme. It needs further research and it is beyond the scope of this
thesis. In our current research, we will not mention this feature in the following AVM’s
of verb. We keep it in our basic design to make our design robust for inflection also.
SYN in this AVM is same as in standard SBCG verb AVM. VFORM contains the
type of verb form. In Arabic, there are three types of verb form. The feature value
of VFORM can be perf ect, imperf ect or imperative. Perfect indicates that the event
has been completed, imperfect indicates that the event has not yet been completed, and
imperative indicates that the event is a command.
There are two types of voice in Arabic; active and passive. So, the value of VOICE fea-
ture can be either active or passive. The value of MOOD can be indicative, subjunctive
or jussive.
Like SYN, SEM feature in this AVM are same as in SBCG English verb AVM. SIT-
INDEX i.e. situation index is used for index based semantic agreement. SBCG does not
show any distinction between INDEX and SIT-INDEX. Also, it does not show the feature
description of SIT-INDEX. We put it as a function feature but currently it has only one
atomic attribute. This attribute is SITUATION. It contains the name of the verb. This
SIT-INDEX is used in event-frames of verb and verbal noun lexeme. Thus ultimately it
is very similar to Davidsonian event variable [12].
Like AVM for noun, FRAMES contains the list of frames which contain semantic
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
information in Minimal Recursion Semantics (MRS). These frames contain indices of
both kinds INDEX and SIT-INDEX.
3.3 Type Hierarchy of Verbal Noun
As mentioned in Section 2.2, the derivation of verbal nouns from verbs depends on the
number of root letters, the verb form and the root type. In Figure 3.4, we give a type
hierarchy of Arabic verbal nouns.
noun-lex
… DERIVATION …
non-derived derived
… verb-derived
gerund active-
hyperbolic- passive- resembling- locative-
utilitarian- comparative
participle participle
participle
participle
noun noun
Figure 3.4: Lexical type hierarchy of Arabic noun lexeme.
We analyze the Arabic noun from verbal noun perspective. So, we classify noun
lexeme only on derivation dimension. Some other dimension can be the end ending type
or declension type of noun lexeme. As shown in this Figure 3.4, eight types of verbal nouns
are immediate daughters of verb-derived-noun. Each of these eight different verbal nouns
can be subcategorized on the basis of the properties of the root verb, which are mentioned
in Section 2.2. Each verb carries distinct information on these properties, which form the
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
dimensions of classification for verbs. So, the three dimensions for root verbs are: number
of root letter, type of the root and verb form. For lack of space we discuss in detail only
the subtypes of active participles.
active-participle
NUMBER OF ROOT LETTER
TYPE OF ROOT
VERB FORM
triliteral-root- derived
sound-root- derived
formI- derived
formII- derived
formI-triliteral- sound-active-
participle
formII-triliteral- sound-active-
participle
Figure 3.5: Lexical type hierarchy of Active participle.
In Figure 3.5, active-participle is at the root. Categorizing it along the number of
letters in root verb, we get two types of active participles, derived from triliteral and
quadriliteral root verb. Again classifying the active participle along the root type, we
find several types of roots and thus verbal nouns. Categorizing along the verb form
dimension, we get Form I, . . ., Form X active participles. Categories in one dimen-
sion cross-classifies with categories in other dimensions and form different subtypes like
form-I-triliteral-sound-active-participle, form-I-triliteral-sound-passive-participle, form-I-
triliteral-sound-gerund, etc. Not all these forms generate all types of verbal nouns i.e.
some of these forms do not have verbal nouns of all corresponding types. For example,
locative nouns are generated from triliteral Form I root verbs only. So for this type of
verbal noun, classifying along other forms does not generate any new type.
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
3.4 Construction Rules for Verbal Nouns
We have mentioned in Section 2.2.3, there are eight types of verbal nouns. These are
gerund, active participle, hyperbolic participle, passive participle, resembling participle,
utilitarian noun, locative noun and comparative participle. We have developed construc-
tion rules for active participle, passive participle, locative noun and comparative participle
derived from Form I strong root verb i.e. strong triliteral root verb which has no extra
character. We have found that for other categories of verbal nouns, we have to give
exhaustive lexical entry.
All eight types of verbal noun are derived from strong Form I root verb. On the other
hand, only gerund, active participle and passive participle are derived from quadriliteral
root verbs or weak verbs. Also, the derivation pattern is not so regular. So, it requires
further research.
3.4.1 Active Participle
A sample AVM for an active participle is shown in Figure 3.6. All features of this AVM
have been discussed before. In this example, the event frame is the write-fr which denotes
write frame.
Throughout this whole formalism, we use the event frame for verb and verbal nouns
to capture their semantic content efficiently. This event frame takes a event or situational
index variable (SIT) and index-valued features such as actor, undergoer, instrument,
location. In case of write-fr, this event frame contains three indices: one for action or
event (SIT), another for the actor (ACTOR) and the last one is for undergoer of the
action(UNDGR) i.e. the object of the verb.
We do not store this AVM as a lexical entry. Rather, this AVM is recognized from the
AVM in Figure 3.7 by our lexical construction rules.
The construction rule in Figure 3.8 does this job. As we use the SBCG version of
+
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
kaatibun-form-I-triliteral-sound-active-participle-lex
morph
root
k, t, b
stem
k, a, a, t, i, b,u,n
arg-st hi
noun
case nominative syn
cat
def no
select none
xarg none
lid none
val hi
mrkg none
person 3rd
number sg index i gender masc
sem
hum yes
* write
-fr
frames
sitactor i
situation writing
Figure 3.6: AVM for active participle
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
kataba-form-IA-triliteral-sound-active-
perfect-3rd-sg-masc-verb-lex phon [] morph
root
k, t, b
stem
k, a, t, a, b, a
syn
cat
noun
* case accusative
arg-st
1 sem
opt −
+
index j
verb
cat
vform perf ect voice active
mood indicative syn
select none xarg none lid none val 1 mrkg
nonesit-index s
situation writing
write-fr
s
sit
person 3rd
sem number sg
*actor i + frames gender masc undgr j location k
ins
m
hum yes
Figure 3.7: AVM for a sample root verb
+
* syn
+
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
HPSG, the construction rule contains two parts: MTR which contains the AVM of the
verbal noun and DTRS which contains the AVM of the base verb. This rule demonstrates
how a Form I triliteral sound active participle is recognized from the lexeme of Form I
triliteral sound root verb.form-I-triliteral-sound-active-participle-lex-cxt
form-I-triliteral-sound-active-participle-lex
morph
"stem
1 , a, a, 2 , i, 3 ,u,n
#
arg-st hi
syn
cat
noun
mtr
case nominative val hi
index i
sem
frames
event-fr
*
sit s
actor i
form-I-triliteral-sound-active-perfect-3rd-sg-masc-verb-lex
morph
"stem
1 , a,
2 , VOWEL,
# 3 ,a
cat
verb dtrs vform perf ect
voice active
sit-index s
event-fr
sem * + frames
sit s
actor i
Figure 3.8: Lexical rule for active participle construction
The construction rule contains three placeholders for the three root letters. Thus from
this construction rule, an active participle generated from letters ‘k’, ‘t’ and ‘b’ or ‘n’, ‘s’
and ‘r’ can be recognized. In other words, ‘kaatibun’, ‘naasirun’, ‘saajidun’ or ‘saamiun’
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
all theres active participles can be recognized.
There is no difference between constructing an active participle from a sound triliteral
Form IB−IF verb and a sound triliteral Form IA verb. This is denoted in the Figure 3.8
by the VOWEL variable positioned just after the middle place holder in the daughter
AVM. The distinction between all the subtypes of Form I verb is reflected by this vowel
in perfect form. As variation of vowel in this position has no impact on active participle
formation, Form I verb may contain any of the three vowel letter in this position.
Rule shown in Figure 3.8, also shows how semantic information are propagated from
root verb to active participle. The content of event frame is same in mother and daughter.
The only change in SEM feature is the semantic index. The actor index, (i) of the event
frame becomes the semantic index in active participle and event index, (s) of the event
frame is the semantic index in the verb lexeme. Thus, semantic information of active
participle are successfully derived from the root verb. The syntactic information is fixed
from the vowel pattern. This process of derivation is same for other verbal nouns too.
Note that, we have derived active participle stem. To use it in a sentence, we need other
construction rule which caputes the inflection of noun.
The construction of the active participle from Form I verb is most regular. Construc-
tions from other verb forms are complex and the derviation pattern is not regular. Thus,
it requires further analysis.
3.4.2 Passive Participle
Like that of the active participle, the construction of the passive participle from Form I
triliteral sound root verb is simple. There is just one pattern for its construction from
Form I triliteral sound root verb. So for all Form I subtypes, the construction rule of
Figure 3.9 will be applicable. Derivation from other forms of verbs is complex and not
regular. For some forms this type of participle does not exist either, which requires further
analysis.
+
*
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
form-I-triliteral-sound-passive-participle-lex-cxt
form-I-triliteral-sound-passive-participle-lex
morph
"stem
m, a, 1 , 2 , u, u, 3 ,u,n
#
arg-st hi
syn
cat
noun
mtr
case nominative
val hi
index j
sem
frames
event-fr
*
sit s
undgr j
form-I-triliteral-sound-passive-perfect-3rd-sg-masc-verb-lex
morph
"stem
1 , a,
2 , VOWEL,
# 3 ,a
syn
cat
noun
* case accusative+ arg-st 4
opt −
dtrs
sem
index j
verb
+ cat vform perf ect syn
voice active val 4
sit-index s
event-fr
sem *frames + sit s
undgr j
Figure 3.9: Lexical rule for passive participle construction
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
The verbs from which passive participles are derived should be transitive. For this
reason, in the AVM of the DTR, the ARG-ST feature is not empty and its semantic index,
(j) is co-indexed with the undergoer index in the event-fr. Note that the ARG-ST of the
DTR contains one sign for object only, and it is in accusative case. It does not contain
any sign for the actor. This is because, in Arabic, the actor is implicitly mentioned in
the verb and the verb does not syntactically require the actor. If a subject is explicitly
mentioned in the sentence, it can be parsed by phrasal construction rule.
Like active participle, here semantic information is derived from root verb. The un-
dergoer index, (j) in event frame of root verb becomes the semantic index of passive
participle. The VOWEL variable in STEM of passive participle works same as it works
in active participle construction.
As an example, the verb (‘kataba’) shown in Figure 3.7 is a transitive verb. So, from
this verb lexeme, we can recognize the passive participle (‘maktuubun’) shown in Figure
3.10.
3.4.3 Locative Noun
A locative noun can be generated from triliteral Form I root verbs only. There are
two patterns of derivation, and which pattern will be used for derivation is predictable.
Locative noun generated from Form IA, IC, ID, IE and IG root verbs use same pattern
where Form IB and IF use another pattern. For this reason, locative noun is of two types
- Form IA locative noun and Form IB locative noun. Figure 3.7 shows AVM of Form IA
root verb (‘kataba’). The locative noun (‘maktabun’) derived from this verb is shown in
Figure 3.11.
Being of same pattern, one construction rule shown in Figure 3.12 captures the deriva-
tion of locative noun derived from verb form IA, IC, ID and IE. Like construction rules
of active and passive participle, the syntactic information is derived from vowel pattern
of lexeme and semantic information is derived from root verb. Thus, the location index
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
maktuubun-form-I-triliteral-sound-passive-participle-lex
morph
root
k, t, b
stem
m, a, k, t, u, u, b,u,n
arg-st hi
noun
case nominative
syn
cat
def no
select none xarg none
lid none
val hi
mrkg none
person 3rd number sg index j gender masc
sem
hum no
write-fr *sit s situation writing +
frames
actor
iundgr
j
Figure 3.10: AVM for passive participle
k
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
maktabun-form-IA-triliteral-sound-locative-noun-lex
morph
root
k, t, b
stem
m, a, k, t, a, b, u, n
arg-st hi
noun
case nominative
syn
cat
def no
select none xarg none
lid none
val hi
mrkg none
person 3rd
index
number sg
gender masc sem
hum no
write-fr *sit s situation writing +
frames
actor ilocation
k
Figure 3.11: AVM for Form IA locative noun
+
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
(j) in the event frame of root verb, becomes the semantic index in the locative noun. form-IA-triliteral-sound-locative-noun-lex-cxt
form-IA-triliteral-sound-locative-noun-lex
morph
"stem
m, a, 1 , 2 , a, 3 ,u,n
#
arg-st hi
syn
cat
noun
mtr
case nominative
val hi index j
sem
frames
event-fr*
sit s location j
form-IA-triliteral-sound-locative-perfect-3rd-sg-masc-verb-lex morph
"stem
1 , a,
2 , a,
# 3 ,a
syn
cat
verb
* vform perf ect+
dtrs
voice active
sit-index
s
event-fr
sem
* + sit s frames
actor i
location j
Figure 3.12: Lexical rule for the locative noun construction from Form IA sound root verb
Similarly, Figure 3.13 shows AVM of Form IB root verb ‘sajada’. The locative noun
(‘masjidun’) derived from this verb is shown in Figure 3.14.
As mentioned above, locative noun generated from Form IB and IF verb has same
pattern. Thus, one construction rule captures the derivation from both of these two types
of verb. The construction rule is shown in Figure 3.15. This is same as construction rule
*
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
sajada-form-IB-triliteral-sound-active-perfect-
3rd-sg-masc-verb-lex
phon []
morph
root
s, j, d
stem
s, a, j, a, d, a
syn
cat
noun
* case accusative+ arg-st 1
opt −
sem
index j verb
cat
vform perf ect
voice active
mood indicative syn
select none
xarg none
lid none
val 1 mrkg none sit-index s
situation prostration write-fr sit s
sem
person 3rd
+
number sg frames
actor i gender masc
undgr j
location khum
yes
Figure 3.13: AVM for a Form IB root verb
k
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
masjidun-form-IB-triliteral-sound-locative-noun-lex
morph
root
s, j, d
stem
m, a, s, j, i, d, u, n
arg-st hi
noun
case nominative
syn
cat
def no
select none xarg none
lid none
val hi mrkg none
person 3rd
index
number sg
gender masc sem
hum no
prostrate-fr *sit s situation prostration +
frames
actor ilocation
k
Figure 3.14: AVM for Form IB locative noun
+
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
form-IB-triliteral-sound-locative-noun-lex-cxt except the vowel pattern in the mother i.e.
the vowel patter in locative noun and the sort description of the daughter. form-IB-triliteral-sound-locative-noun-lex-cxt
form-IB-triliteral-sound-locative-noun-lex
morph
"stem
m, a, 1 , 2 , i, 3 ,u,n
#
arg-st hi
syn
cat
noun
mtr
case nominative
val hi index j
sem
frames
event-fr*
sit s location j
form-IB-triliteral-sound-locative-perfect-3rd-sg-masc-verb-lex morph
"stem
1 , a,
2 , a,
# 3 ,a
syn
cat
verb
* vform perf ect+
dtrs
voice active
sit-index
s
event-fr
sem
* + sit s frames
actor i
location j
Figure 3.15: Lexical rule for locative noun construction from Form IB sound root verb
3.4.4 Comparative Participle
Figure 3.17 shows AVM for comparative participle ‘aktabu’. It is derived from root verb
‘kataba’ shown in Figure 3.7. We have introduced a new semantic frame compare-fr
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
inspired by the analysis of Farkas, et.al. [14]. This frame has three features. The first
feature is COMPARED which contains the index for the object that we want to compare.
The second feature is COMPAREWITH. This feature contains the index for the object
with which we want to compare. The last feature, DIMENSION, is the dimension of
comparison. This dimension is actually a SIT-INDEX. This situational index must be
co-indexed with the situational index of the verb lexeme from which this participle is
derived.
Figure 3.17 shows the construction rule for comparative participles. This participle
has an optional syntactic requirement, which is contained in the ARG-ST feature. The
case of the required sign must be genitive. Its semantic index is co-indexed with the
index of “COMPAREWITH” in compare-fr. At the same time, the situational index
of “DIMENSION” in compare-fr, must be co-indexed with the SIT-INDEX of the verb
lexeme. From, this rule what we can say is that - comparative participle expresses the
comparision of two things from the verb dimension.
3.4.5 Other Types of Verbal Noun
The constructions of the remaining four types of verbal nouns are complex and we cannot
resolve these by construction rules. We have to give the lexical entries for these verbal
nouns individually.
Each verb form has a gerund that uses the most unpredictable pattern. Modeling its
construction rule is a vast area of research. For now we can only list lexical entries for
all gerunds individually. Figure 3.18 shows a lexical entry for gerund ‘kitaabatun’ which
means writing.
Hyperbolic participles are generated only from triliteral sound Form I root verbs. But
not all verbs possess a corresponding hyperbolic participle. There are eleven patterns
for deriving hyperbolic participles from verbs. However, we cannot predict from the root
letters which of these eleven patterns will be used; neither can we infer the existence
+
*
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
aktabu-form-I-triliteral-sound-comparative-participle-lex
morph
root
k, t, b
stem
a, k, t, a, b, u
syn
cat
noun
+
agr-st 1 case genitive
opt
+
sem
index j
noun
case nominative syn
cat
def no select none
xarg none
lid none
val 1
mrkg none person 3rd
number sg
index i gender masc sem
hum yes
compare-fr *write
-fr
compared i frames
sits
situation writing ,
comparewith j
actori dimension s
Figure 3.16: AVM for Form I comparative participle
*
*
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
form-I-triliteral-sound-comparative-participle-lex-cxt
form-I-triliteral-sound-comparative-participle-lex morph
"stem
a, 1 ,
2 , a,
#3 ,u
syn
cat
noun
+
arg-st 4 case genetive
opt +
sem
index j
noun
mtr
syn
cat
case nominative
val
4
personPERS
number sg
index i gender masc sem
hum HUM
event-fr
compare-fr
* i + compared frames sit s , , comparewith j
actor
i dimension s
form-I-triliteral-sound-comparative-perfect-3rd-sg-masc-verb-lex
morph
"stem
# 1 , a, 2 , VOWEL, 3 ,a
dtrs
syn
cat
verb
+
vform perf ect
voice active
sit-index s
sem * event-fr + frames
sit s
Figure 3.17: Lexical rule for comparative participle construction
+
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
kitaabatun-gerund-lex
morph
root
k, t, b
stem
k, i, t, a, a, b, a, t,u,n
arg-st hi cat
noun case nominative syn
def no
val hi
mrkg none
person 3rd
number sg index i gender f em
sem
hum no
* write-fr
framessit s situation writing
action
i
Figure 3.18: Sample lexical entry for ‘kitaabatun’ gerund
+
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
of a hyperbolic participle for the given root letter. So we have to list a lexical entry
for each of these hyperbolic participles. Figure 3.19 shows a sample lexical entry for
hyperbolic participle ‘kattaabun’ which means the person who writes a lot. We have
used the modifier-fr frame to capture the modification information. The LEVEL feature
contains the value excessive which means that the actor does this action excessively. kattaabun-hyperbolic-participle-lex
morph
root
k, t, b
stem
k, a, t, t, a, a, b,u,n
arg-st hi
cat
noun
case nominative syn
def no
val hi
mrkg none
person 3rd
number sg index i gender masc
sem
hum yes
*write-fr
modifier-fr
framessit s situation writing , arg i
actor i
level excessive
Figure 3.19: Sample lexical entry for ‘kattaabun’ hyperbolic participle
Resembling Participles are similar to hyperbolic participles. These are generated only
from triliteral sound FORM-I root verbs. There exist a large number of derivational
patterns in this case. So, it is not feasible to formulate a lexical construction rule for
these nouns. Thus in this case we also need to give the lexical entries. Figure 3.20 shows
the lexical entry for ‘katiibun’ which means a person who always writes. Like hyperbolic
participle, here we have used the modifier-fr frame to capture its information as a modifier.
Unlike hyperbolic participle, the value of LEVEL feature is intrinsic which capurtes its
difference with hyperbolic participle.
+
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
katiibun-resembling-participle-lex
morph
root
k, t, b
stem
k, a, t, i, i, b,u,n
arg-st hi
cat
noun
case nominative syn
def no
val hi
mrkg none
person 3rd
number sg index i gender masc
sem
hum yes
*write-fr
modifier-fr
framessit s situation writing , arg i
actor i
level intrinsic
Figure 3.20: Sample lexical entry for ‘katiibun’ resemble participle
*
CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN
Utilitarian Nouns are also generated from triliteral sound Form I root verbs only.
There are four patterns of derivation. For a given set of root letters it is unpredictable
which pattern will be used. For this reason, despite the limited number of patterns, we
have to list the lexical entries exhaustively. Figure 3.21 shows a lexical entry for utilitarian
noun ‘miktabun’ which means instrument for writing.miktabun-utilitarian-noun-lex
morph
rootstem
k, t, b
arg-st hi
m, i, k, t, a, b,u,n
cat
noun
case nominative syn
def no
val hi
mrkg none
person 3rd
number sg index k gender masc
hum no
sem
write-frs
frames
sitactor iundgr jinstrumen
t
k
situation writing +
Figure 3.21: Sample lexical entry for ‘miktabun’ utilitarian noun
Chapter 4
TRALE Implementation
We have implemented the HPSG formalism described in previous chapter in TRALE (An
extension of the Attribute Logic Engine) compiler. Here, Section 4.1 gives an introduc-
tion about the TRALE system. Then Section 4.2 discusses the necessary components
of TRALE compiler. Finally, Section 4.3 discusses the methodology that we follow to
implement the HPSG formalism in TRALE.
4.1 Introduction to TRALE
TRALE is a lexical rule compiler. It integrates phrase structure parsing, semantic-head-
driven generation and constraint logic programming with typed feature structures as
terms. It is responsible for compiling feature structure descriptions into Prolog code. It
is descendent of two other compiler Attribute Logic Engine (ALE) and Contoll [30, 34].
Both, of these compilers are designed based on the formalism of HPSG 87 [32]. With
Gerald Penn being the chief developer of TRALE, TRALE inherited the core of the ALE
system, but with the underlying logic specialized to the case where it becomes a logic in
the tradition of HPSG 94 [33].
We have decided to use TRALE for our implementation. We have the other alternative
CHAPTER 4. TRALE IMPLEMENTATION
i.e. LKB. Both, HPSG and LFG grammars can be implemented in LKB. But TRALE was
solely developed to capute the HPSG grammar and it was developed aftr LKB. For this
reason, we have decided to use TRALE. Before choosing the implementation platform,
we have also read the comparison between TRALE and LKB presented by Nurit [27].
We use TRALE on Grammix operation system version of June 01, 2007 [28]. This
is the older version of TRALE. There is another new version of TRALE which is not
complete but can be run stand alone on Linux platform. This new version was published
on 2008 [34]. Grammix is developed for grammar development. It contains two complete
grammar development systems - TRALE and LKB. Its TRALE system was last updated
on May 31, 2007.
4.2 Necessary Components for Implementation
TRALE has two major component files- signature file and theory file, an I/O console and a
Graphical Interface (GRISU) which shows output of AVM and type hierarchy graphically.
4.2.1 Signature File
This file contains the type hierarchy of features in HPSG. This file also contains the
description of function features i.e. the features that constitute a function feature. This
file does not have any extension. This file is called from theory file. The hierarchy in this
file is maintained by specific spacing. A feature in following line with three blank spaces
indicates an immediate child. Figure 4.1 shows some sample lines of the signature file.
The bot feature is always at the top of this hierarchy. TRALE requires this feature
at top of all features. The constituents of function features are specified at the same line.
Here sign and all of its children are function features. The constituents of sign feature
are listed in the same line.
In signature file, multiple inheritance is marked by &. That is, mentioning & before
CHAPTER 4. TRALE IMPLEMENTATION
type_hier
archy bot
sign phon:ne_list morph:morph
arg_st:list syn:syn sem:sem lexeme
noun_lex. . .
. . ..
active_participle_le
x
trilateral_root_deri
ved_ap_lex
formI_sound_trilat
eral_ap_lex
sound_root_derived
_ap_lex
&formI_sound_trila
teral_ap_lex
formI_derived_ap_le
x
&formI_sound_trilateral_ap_lex
Figure 4.1: Signature file
any type means that this type is also mentioned in another place of this file and in that
place, it is a child of another type.
4.2.2 Theory File
This file is composed of SWI Prolog code and this file must have pl extension. This is
the starting point of TRALE compiler. It loads the signature files and additional prolog
files. This file along with other prolog files contains the lexical entries and construction
rules. Figure 4.2 shows the lexical entry for the root verb ‘kataba’.
Detail lexical property of a lexicon is entered after ˜˜>. This entry in TRALE file is
very much similar to AVM of root verb ‘kataba’ shown in Figure 3.7.
Figure 4.3 shows part of the lexical construction rule.
Detail of construction rule starts after ##. In lexical construction rule, the daughter
comes first then the mother. Mother and daughter are separated by ∗∗ >. At the end
CHAPTER 4. TRALE IMPLEMENTATION
kataba ~~> (formIA_triliteral_root_verb_lex, (morph:(root:[k,t,b]),
arg_st:[(OBJ_SIGN,(syn:(cat:(case:acc,def:no)), sem:(index:OBJ_INDEX)))],syn:(cat:
(verb,
), val:[OBJ_SIGN]),
vform:perf, voice:active, mood:subjunctivesem:(sit_index:(SIT_INDEX, (situation:writing)),frames:[
(
))).
sit:SIT_INDEX, actor:(SUB_INDEX, (pers:third, num:sg, gen:male, hum:y)), undgr:OBJ_INDEX, location:LOC_INDEX)]
Figure 4.2: Sample lexical entry in theory file
trilateral-active-lex-cxt##(formI_triliteral_
root_verb_lex, (morph:
**>
(root:ROOTS ), syn:(. . .), sem:(sit_index:SIT_INDEX,
frames:[ (sit:SIT_INDEX,actor:SUB_INDEX)
] ) ))(formI_sound_trilateral_ap_lex, (
(morph:
morphs
(root:ROOT
S), syn:(. . .), sem:(index:SUB_INDEX,
frames:[ (sit:SIT_INDEX,actor:SUB_INDEX)] ) ) ))
(X,a,Y,a,Z,a) becomes (X,a,a,Y,i,Z,u,n), (X,a,Y,u,Z,a) becomes (X,a,a,Y,i,Z,u,n), (X,a,Y,i,Z,a) becomes (X,a,a,Y,i,Z,u,n).
Figure 4.3: Sample lexical construction rule in theory file
CHAPTER 4. TRALE IMPLEMENTATION
of lexical construction rule, morphs keyword is used to incorporate the morphological
operation done by this construction rule. The change of STEM feature from daughter to
mother is captured by the command - morphs DTR becomes MTR.
4.2.3 Input and Output System
TRALE has an I/O console in EMACS editor. The output for AVM and type hierarchy
are graphically shown in a Graphical Interface which is called GRISU. The I/O console
takes input to compile signature and theory file and show compiled output. It also takes
phrases and sentences which are to be recognized by the grammar mentioned in signature
and theory file. Commands can be issued in this console to show GRISU output.
GRISU gives very nice and attractive pictorial definition of AVM, construction rule
and type hierarchy. The order of features in GRISU AVM can be configured from theory.pl
file in the following manner -
>>>phon.phon <<< morph.
Here phon <<< morph denotes MORPH feature preceeds ARG−ST feature in GRISU
AVM and >>>phon denotes PHON is at the top position in the feature listing. The
GRISU output for AVM of root verb ‘kataba’ in Figure 3.7 is shown in Figure 4.4.
Similarly, GRISU output for type hierarchy of active participle in Figure 3.5 is shown
in Figure 4.5.
4.3 Implementation Methodology
Our Implementation can be divided in the following steps -
1. We have translated the SBCG type hierarchy in Figure 3.4 and Figure 3.5 into
signature file.
CHAPTER 4. TRALE IMPLEMENTATION
Figure 4.4: GRISU output for AVM of ‘kataba’ root verb.
Figure 4.5: GRISU output for type hierarchy of active participle.
CHAPTER 4. TRALE IMPLEMENTATION
2. We have given the description of function feature in signature file.
3. We have loaded the signature file from theory file and tested whether the hierarchy
and function feature description are correct or incorrect by view the GRISU outputs
like Figure 4.5.
4. We have mentioned the orders of features at the beginning of theory file by using
>>> and <<< operator.
5. We have given the lexical entry for all types (Form IA, IB, IC, ..., IG) of Form I
triliteral strong root verb. Then we have checked whether our entries are correct or
not by viewing the GRISU outputs like Figure 4.4.
6. We have given the lexical construction rules which is at the core of our implemen-
tation. Then in Input console, we have entered the verbal nouns which are derived
from the root verbs given as lexical entry. We have found these verbal nouns are
recognized by TRALE compiler from their root verbs and construction rules. Thus,
when we have entered ‘kaatibun’ to be recognized by the compiler, it gives the
GRISU output in Figure 4.6.
7. In the above manner, we have tested all the five construction rule proposed in
Section 3.4 for all types of Form I triliteral strong root verb.
Following these steps, we have successfully implemented the HPSG formalism. The
detail content of signature and theory file are provided in Appendix.
CHAPTER 4. TRALE IMPLEMENTATION
Figure 4.6: GRISU output of AVM for ‘kaatibun’ active participle.
Chapter 5
Conclusion
In this last chapter, we draw the conclusion of our thesis by describing the major contri-
butions made through this research followed by some directions for future research.
5.1 Summary of Contributions
The contributions that have been made in this thesis can be enumerated as follows:
• We have formulated a concrete AVM for Arabic noun and verb. We have made
the design robust so that it can not only handle lexical construction but also
phrasal construction. We have implemented it in TRALE. We have extended it to
capture the root pattern morphology.
• When a verbal noun stem is derived from a root verb, some syntactic and
semantic information is encoded in the derived stem. We have captured this
syntactic and semantic information. We have modified the INDEX feature for
Arabic to reference and incorporate semantic meaning.
• We have given concrete description of SIT-INDEX and show its differences with
INDEX. Before this no literature, show its use and distinction from INDEX.
CHAPTER 5. CONCLUSION
• We have articulated the type hierarchy of Arabic noun and placed the verb-derived
nouns and its subtypes in lexical type hierarchy. We have also provided the justifi-
cation of this placement and implemented it in TRALE platform.
• We have utilized the root pattern morphology. Thus we need minimal lexical
entry to resolve a lexicon. We have developed lexical construction rules for four
types of verbal noun (active participle, passive participle, locative participle and
compar- ative participle) derived from triliteral sound root verb. The other four
types of verbal noun need exhaustive lexical entry for each verb stem. We have
given the sample lexical entry for each of these stem. Some lexical construction
rules can be constructed which inflection of these verb stem for different gender and
number.
• We have implemented all the lexical type hierarchy, lexical entry and lexical
con- struction rules proposed in this thesis. Then we have verified these
construction rules by recognizing the verbal noun stems from their root verbs.
5.2 Future Directions for Further Research
Modeling a natural language is a big part of research. Ours is the starting of this massive
work. The following directions should be considered for best utilization of our research.
• Verbal noun derived from strong quadriliteral verb and weak verb should be
mod- eled. To accomplished this, we have to model the quadriliteral and weak
verbs as well.
• We have not developed any construction rules for four type of verbal nouns (Gerund,
Hyperbolic participle, Resembling participle and Utilitarian noun). But our inves-
tigation says, Hyperbolic participle and Utilitarian noun can be derived by con-
struction rules based on some specific root classes. For this, Arabic roots should be
classified into more granular level so that from those root classes different construc-
tion rules can be generated.
CHAPTER 5. CONCLUSION
• As per mentioned in Section 5.1, we have developed construction rules for verbal
noun stems. Other construction rules should be generated to capture the inflection
of these verbal noun stems. Some examples of inflecions are number, gender and
declension. Declension is a unique linguistic feature for Arabic. Noun and verb
lexemes are inflected and got different forms based on the case and mood in phrases.
• So far, we have only discussed with lexical construction rules. It is obvious that
to parse Arabic noun lexicon in a sentence, phrasal construction rules must be
constructed.
Bibliography
[1] Kenneth R. Beesley. Finite-State Morphological Analysis and Generation of Arabic
at Xerox Research: Status and Plans in 2001. In Proceedings of the Workshop on
Arabic Language Processing: Status and Prospects, Association for Computational
Linguistics, 2001.
[2] Md. Shariful Islam Bhuyan and Reaz Ahmed. An HPSG Analysis of Arabic Passive.
In Proceedings of the 11th International Conference on Computer and Information
Technology, 2008.
[3] Md. Shariful Islam Bhuyan and Reaz Ahmed. An HPSG Analysis of Arabic Verb.
In Proceedings of the 8th International Arab Conference on Information Technology,
2008.
[4] Md. Shariful Islam Bhuyan and Reaz Ahmed. Nonconcatenative Morphology: An
HPSG Analysis. In Proceedings of the 5th International Conference on Electrical and
Computer Engineering, 2008.
[5] Steven Bird and Ewan Klein. Phonological Analysis in Typed Feature Systems.
Computational Linguistics, 20:455–491, 1994.
[6] Joan Bresnan. The Mental Representation of Grammatical Relations. Cambridge,
MA, USA: MIT Press, 1982.
[7] Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 2.0. In Lin-
guistic Data Consortium, Philadelphia, PA, USA, 2004.
BIBLIOGRAPHY 74
[8] Noam Chomsky. Lectures on Government and Binding, 1981.
[9] Domenic Cipollone. Morphologically complex predicates in Japanese and what they
tell us about grammar architecture. In OSU Working Papers in Lingusitics 56, pages
1–52. Ohio State University, 2001.
[10] Ann Copestake, Dan Flickinger, Rob Malouf, Susanne Riehemann, and Ivan Sag.
Translation using minimal recursion semantics. In Proceedings of the 6th Interna-
tional Conference on Theoretical and Methodological Issues in Machine
Translation, Leuven, 1995.
[11] Ann Copestake, Dan Flickinger, Susanne Riehemann, and Ivan Sag. Minimal recur-
sion semantics: An introduction. Research on Language and Computation,
3(4):281–
332, 2006.
[12] Anthony R. Davis. Linking and the Hierarchical Lexicon. PhD thesis, Stanford
University, 1996.
[13] Anthony R. Davis. Linking by Types in the Hierarchical Lexicon. Chicago: University
of Chicago Press, 2001.
[14] Donka F. Farkas and Katalin E´ . Kiss. On the comparative and absolute readings
of superlatives. Natural Language and Linguistic Theory, 18(3):417–455, 2000.
[15] Gerald Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan A. Sag. Generalized Phrase
Structure Grammar. Chicago: University of Chicago Press, 1985.
[16] Georgia M. Green. Elementary principles of HPSG. In FIPS PUB, pages 140–1,
1999.
[17] Kais Haddar, Ines Zalila, and Sirine Boukedi. An HPSG parser generation with
the LKB for Arabic relatives. International Journal of Computing and Information
Sciences, 7(5):51–60, 2009.
BIBLIOGRAPHY
[18] Md. Sadiqul Islam, Mahmudul Hasan Masum, Md. Shariful Islam Bhuyan, and Reaz
Ahmed. An HPSG Analysis of Declension in Arabic Grammar. In Proceedings of the
9th International Arab Conference on Information Technology, 2009.
[19] Md. Sadiqul Islam, Mahmudul Hasan Masum, Md. Shariful Islam Bhuyan, and Reaz
Ahmed. Arabic Nominals in HPSG: A Verbal Noun Perspective. In Proceedings of
the 17th International HPSG Conference, Universit Paris Diderot, pages 158–178.
On-line: CSLI Publications, 2010.
[20] Mohtanick Jamil. Araboc verb paradigms. Website, 2003-2011.http://www.learnarabiconline.com/verbal-paradigms.shtml.
[21] Mohtanick Jamil. Definiteness. Website, 2003-2011.http://www.learnarabiconline.com/definiteness.shtml/.
[22] Mohtanick Jamil. Derived nouns. Website, 2003-2011.http://www.learnarabiconline.com/derived-nouns.shtml.
[23] Andreas Kathol. Agreement and the syntax-morphology interface in HPSG. In
Studies in contemporary phrase structure grammar, pages 223–274. UC Berkeley,
1999.
[24] Alain Kihm. Nonsegmental Concatenation: A Study of Classical Arabic Broken
Plurals and Verbal Nouns . Morphology, 16:69–105, 2006.
[25] Eugene E. Loos, Susan Anderson, Jr. Dwight H., Day, Paul C. Jordan,
and J. Douglas Wingate. Glossary of linguistic terms. Website, 2011.
http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/.
[26] Nurit Melnik. Verb-Initial Construction in Modern Hebrew. PhD thesis, university
of california, berkeley, 2002.
BIBLIOGRAPHY
[27] Nurit Melnik. From “hand-written” to computationally implemented HPSG theories.
In Proceedings of the 12th International HPSG Conference, University of Lisbon,
pages 311–321. On-line: CSLI Publications, 2005.
[28] Stefan Muller. Grammix. Website, 2007.http://hpsg.fu-berlin.de/Software/Grammix/.
[29] A.M. Mutawa, Salah Alnajem, and Fadi Alzhouri. An HPSG Approach to Arabic
Nominal Sentences. Journal of the American Society for Information Science and
Technology, 59(3):422–434, 2008.
[30] Gerald Penn and Mohammad Haji-Abdolhosseini. ALE Documentation. Website,2003. http://www.ale.cs.toronto.edu/docs/.
[31] Carl J. Pollard. Lectures on the foundations of HPSG, 1997.
[32] Carl J. Pollard and Ivan A. Sag. Information-based syntax and semantics. Stanford:
Center for the Study of Language and Information (CSLI), 1:262–267, 1987.
[33] Carl J. Pollard and Ivan A. Sag. Head-Driven Phrase Structure Grammar. Chicago:
University of Chicago Press, 1994.
[34] Frank Richter. Priliminary TRALE Page. Website, 2008.http://milca.sfs.uni-tuebingen.de/A4/Course/trale/.
[35] Susanne Z. Riehemann. Type-Based Derivational Morphology. Journal of Compar-
ative Germanic Linguistics, 2:49–77, 1998.
[36] Susanne Z. Riehemann. A Constructional Approach to Idioms and Word Formation.
PhD thesis, Stanford University, 2001.
[37] Karin C. Ryding. Modern Standard Arabic. Cambridge University Press, UK, 2005.
[38] Ivan A. Sag. Sign-Based Construction Grammar, chapter 2. Stanford University,
August 2010.
BIBLIOGRAPHY
[39] Ivan A. Sag and Thomas Wasow. Syntactic Theory: A Formal Introduction. Stanford
University Center for the Study, 1999.
[40] Otakar Smrzˇ. Functional Arabic Morphology. Formal System and Implementation.
PhD thesis, Charles University in Prague, 2007.
[41] Nathan Vaillette. Hebrew relative clauses in HPSG. In Proceedings of the 7th In-
ternational HPSG Conference, UC Berkeley (2223), pages 305–324. On-line: CSLI
Publications, 2000.
Glossary of Terms
Affix: is a bound morpheme that is joined before, after, or within a root or stem.
Agreement: refers to a formal relationship between elements whereby a form of one
word requires a corresponding form of another. It is also known as concord.
Arabic quadriliteral verb: is the Arabic verb which contains four consonants.
Arabic strong verb: is the Arabic verb which does not contain any vowel in its long
form.
Arabic triliteral verb: is the Arabic verb which contains three consonants. Arabic
weak verb: is the Arabic verb which contains any vowel in its long form. Arabic verb
form: In Arabic, from any particular sequence of root letters, up to fifteen different verb
stems may be derived, each with its own template or vowel pattern and semantic
information. These stems are called verb forms.
AVM: Attribute Value Matrix.
Bound morpheme: is a morpheme that never occurs by itself, but is always attached
to some other morpheme.
Concatenative morphology: is the process where bound morphemes are linearly con-
catenated.
Construct: is a formal linguistic representation of construction rule.
Construction: is an ordered arrangement of grammatical units forming a larger unit.
Construction rule: is the rules for constructing phrases or sentences.
Declension: is the process of disambiguating the grammatical roles of words by slightly
changing their end vowels. In Arabic, end vowel implies grammatical case for nominal
BIBLIOGRAPHY
and mood for verb.
Derivation: is the formation of a new word or inflectable stem from another word or
stem. It typically occurs by the addition of an affix. The derived word is often of a
different syntactic category from the original.
Free morpheme: is a morpheme that can occur by itself. However, other morphemes
such as affixes can be attached to it.
Gemination: is the consecutive double occurrence of an alphabet.
HPSG: Head-driven Phrase Structure Grammar developed by Ivan Sag and Pollard Sag
in 1994.
Inflection: is variation in the form of a word, that expresses a grammatical contrast
which is obligatory for the stems. It does not change the syntactic category of the word.
Lexeme: is the minimal unit of language which has a semantic interpretation and em-
bodies a distinct cultural concept. For example, in the English language, run, runs, ran
and running are forms of the same lexeme, conventionally written as run.
Lexical construction rule: deals with the forming of lexicon i.e. forming of words and
stems.
Lexical entry: is the entry of lexeme in the dictionary.
Lexicon: is its vocabulary, including its words and expressions. A lexicon is also a
synonym of the word thesaurus. It includes the lexemes used to actualize words. Gram-
matical rules are not considered part of the lexicon.
LFG: Lexical functional grammar developed by Joan Bresnan in 1982. Morpheme: is
the smallest meaningful unit in the grammar of a language.
Morphological process: is a means of changing a stem to adjust its meaning to fit its
syntactic and communicational context. There are two types of processes - concatenative
and nonconcatenative.
Morphology: is the study of word formation i.e. the internal structure of words.
Morphosyntactic operation: is an ordered, dynamic relation between one linguistic
form and another. Derivation and Inflection are morphosyntactic operations.
BIBLIOGRAPHY
Nonconcatenative morphology: is the process where bound morphemes are
nonlin- early concatenated.
Phonology: is the systematic use of sound to encode meaning.
Phrasal construction rule: is a construction rule that deals with the forming of
phrase and sentence.
Root: is the portion of a word that carries the principle portion of meaning of the
words in which it functions. It is common to a set of derived or inflected forms, if
any, when all affixes are removed. A root is a stem also.
Root class: is a set of roots, which share a common derivational and
inflectional paradigm.
SBCG: Sign-Based Construction Grammar. It is a variation of HPSG and proposed by
Ivan Sag in 2007.
Semantics: is the study of meaning. It typically focuses on the relation between
signi- fiers, such as words, phrases, signs and symbols.
Sign: is a formal linguistic representations of words, phrases as well as sentences.
All human utterances are captured by signs.
Stem: is the root or roots of a word, together with any derivational affixes, to
which inflectional affixes are added.
Syntactic category: is a set of words and/or phrases in a language which share a
sig- nificant number of common characteristics. It is also known syntactic class.
Syntax: is the study of the principles and rules for constructing phrases or
sentences. TRALE: is a lexical rule compiler specially developed for HPSG. It is an
extension of the Attribute Logic Engine.
Appendix A
In the Table 5.1, we give the romanized transliteration of Arabic alphabet.
Table 5.1: Transliteration Table of Arabic Alphabet
Arabic Letter Transliteration Arabic Letter Transliteration
❅❍✳❍✏❍✑❤✳❤♣❳❳✠P P
b
t
t¯ˇ
g
h
.
h˘d
d¯r
z
s
ˇs
➔➔✠➝➝✠➡✠✏➻➮ Ðà✠ð✆
.t
.z
,
g
˙
f
q
k
l
m
n
w
-
h
y
❺
Appendix B
Content of signature file -
type_hierarchy bot
listne_list hd:bot tl:list
e_listchar
ktbaiunrsjdmhsign phon:ne_list morph:morph arg_st:list
syn:syn sem:semword dtrs:list dtr:signlexeme
noun_lexderived_noun_lex
verbderived_noun_lexgerund_lexactive_participle_lex
trilateral_root_derived_ap_lexformI_sound_trilateral_ap_lex
sound_root_derived_ap_lex&formI_sound_trilateral_ap_lex
formI_derived_ap_lex&formI_sound_trilateral_ap_lex
hyperbolic_participle_lexpassive_participle_lex
trilateral_root_derived_pp_lexformI_sound_trilateral_pp_lex
sound_root_derived_pp_lex&formI_sound_trilateral_pp_lex
formI_derived_pp_lex&formI_sound_trilateral_pp_lex
resembling_participle_lexlocative_noun_lex
trilateral_root_derived_ln_lexformI_sound_trilateral_ln_lex
sound_root_derived_ln_lex&formI_sound_trilateral_ln_lex
formI_derived_ap_lex
BIBLIOGRAPHY
&formI_sound_trilateral_ln_lex formIA_sound_trilateral_ln_lex formIB_sound_trilateral_ln_lex
utilitarian_noun_lex comparative_lex
trilateral_root_derived_com_lex formI_sound_trilateral_com_lex
sound_root_derived_com_lex
&formI_sound_trilateral_com_lexformI_derived_com_lex
&formI_sound_trilateral_com_lexnounderived_noun_lexdual_noun_lexsg_noun_lexpl_noun_lex
verb_lextriliteral_root_verb_lex
formI_triliteral_root_verb_lexformIA_triliteral_root_verb_lexformIB_triliteral_root_verb_lexformIC_triliteral_root_verb_lexformID_triliteral_root_verb_lexformIE_triliteral_root_verb_lexformIF_triliteral_root_verb_lexformIG_triliteral_root_verb_lexformIH_triliteral_root_verb_lex
formII_root_verb_lex
quadriliteral_root_verb_lexmorph root:list stem:listsyn cat:cat val:list mrkg:mrkgcat case:case def:def mood:mood vform:vform voice:voice
nounverb
casenomaccgen
personfirstsecondthird
numbersgdualplural
situation
writingprostrationhelpinghonoursufficehearingdrinking
gendermalefemale
def
hum
yes no
y n
vform perf
imperf
voice active passive
BIBLIOGRAPHY
mood subjunctive indicative jussive
mrkg none that
lid selectsem index:index sit_index:sit_index frames:list index pers:person num:number gen:gender hum:hum sit_index situation:situationframe
event_fr sit:sit_index actor:index undgr:index location:indexcompare_fr compared:index comparedwith:index dimension:sit_index
ref_fr ref_index:cat
.
Content of theory.pl is -
%theory.pl% Multifile declarations.%:- multifile ’##’/2.%:- multifile ’~~>’/2.% load phonology and tree output:- [trale_home(tree_extensions)].% maximum 4 rules will be used for licensing.:-lex_rule_depth(4).% specify signature file signature(signature).
>>>phon.phon <<< morph.morph <<< arg_st.arg_st <<< syn.syn <<< sem.index <<< frames.sit_index <<< frames.sit <<< actor.actor <<< undgr.undgr <<< location.
%lexical entrykataba ~~> (formIA_triliteral_root_verb_lex,
(morph:(root:[k,t,b]),
arg_st:[(OBJ_SIGN,
(syn:(cat:(case:acc,def:no)),sem:(index:OBJ_INDEX)))],
syn:(cat: ,
(v
erb,
vform:perf, voice:active, mood:subjunctive)
val:[OBJ_SIGN]
BIBLIOGRAPHY
sem:(),sit_index:(SIT_INDEX, (
)), frames:[
situation:writing
( sit:SIT_INDEX, actor:(SUB_INDEX,
(
)), undgr:OBJ_INDEX,
pers:third,num:sg,gen:male,hum:y
location:LOC_INDEX
)]
))
).
nasara ~~> (formIA_triliteral_root_ver
b_lex, (morph:(root:[n,s,r]),
arg_st:[(OBJ_SIGN,
(syn:(cat:(case:acc,def:no)),sem:(index:OBJ_INDEX)))],
syn:(cat:
,
(verb, vform:perf, voice:active, mood:subjunctive)
sem:(val:[OBJ_SIGN]),sit_index:(SIT_INDEX, (
)), frames:[
situation:helping
( sit:SIT_INDEX, actor:(SUB_INDEX,
(
)), undgr:OBJ_INDEX,
pers:third,num:sg,gen:male,hum:y
location:LOC_INDEX
)]
))
).
BIBLIOGRAPHY 86
sajada ~~> (formIB_triliteral_root_ver
b_lex, (morph:(root:[s,j,d]),
arg_st:[(OBJ_SIGN,
(syn:(cat:(case:acc,def:no)),
sem:(index:OBJ_INDEX)
))],syn:
(cat:
,
(verb, vform:perf, voice:active, mood:subjunctive)
sem:(val:[OBJ_SIGN]),sit_index:(SIT_INDEX, (
)), frames:[
situation:prostration
( sit:SIT_INDEX, actor:(SUB_INDEX,
(
)), undgr:OBJ_INDEX,
pers:third,num:sg,gen:male,hum:y
location:LOC_INDEX
)]
))
).
%lexical construction rulestrilateral-active-lex-cxt## (
formI_triliteral_root_verb_lex, (morph:
syn:
sem:
(), (cat:
), (
root:ROOTS
(verb, vform:perf, voice:active, mood:subjunctive)
sit_index:SIT_INDEX,frames:[
(sit:SIT_INDEX,actor:SUB_INDEX)]
BIBLIOGRAPHY
))
)**>(
formI_sound_trilateral_ap_lex,((morph:
(),
arg_st:[], syn:
root:ROOTS
(cat:
, val:[],
(noun, case:nom, def:no)
sem:(
)))
)morphs
mrkg:none),index:SUB_INDEX, frames:[
(sit:SIT_INDEX,actor:SUB_INDEX)]
(X,a,Y,a,Z,a) becomes (X,a,a,Y,i,Z,u,n),(X,a,Y,u,Z,a) becomes (X,a,a,Y,i,Z,u,n),(X,a,Y,i,Z,a) becomes (X,a,a,Y,i,Z,u,n).
trilateral-passive-lex-cxt## (
formI_triliteral_root_verb_lex, (morph: (
root:ROOTS),arg_st:[(OBJ_SIGN,
(syn:(cat:(case:acc,def:no)),sem:(index:OBJ_INDEX)))],
syn:(cat:
,
(verb, vform:perf, voice:active, mood:subjunctive)
sem:(
val:[OBJ_SIGN]),sit_index:SIT_INDEX, frames:[
(sit:SIT_INDEX, undgr:OBJ_INDEX)]
))
BIBLIOGRAPHY
)**>( formI_sound_trilateral_p
p_lex, (morph: (
root:ROOTS),arg_st:[],syn:
(cat:
, val:[],
(noun, case:nom, def:no)
sem:(
))morphs
mrkg:none),index:OBJ_INDEX, frames:[
(sit:SIT_INDEX, undgr:OBJ_INDEX)]
)
(X,a,Y,a,Z,a) becomes (m,a,X,Y,u,u,Z,u,n),(X,a,Y,u,Z,a) becomes (m,a,X,Y,u,u,Z,u,n),(X,a,Y,i,Z,a) becomes (m,a,X,Y,u,u,Z,u,n).trilateral-locative-formIA-lex-cxt## (
formIA_triliteral_root_verb_lex, (morph:
syn:
sem:
(), (cat:
), (
root:ROOTS
(verb, vform:perf, voice:active, mood:subjunctive)
sit_index:SIT_INDEX,frames:[
))**>(
(sit:SIT_INDEX,location:LOC_INDEX)]
)
formIA_sound_trilateral_ln_lex,(morph:
(root:ROOTS
BIBLIOGRAPHY
), arg_st:[], syn:
(cat:
, val:[],
(noun, case:nom, def:no)
sem:(
))morphs
mrkg:none),index:LOC_INDEX, frames:[
(sit:SIT_INDEX, location:LOC_INDEX)]
)
(X,a,Y,a,Z,a) becomes (m,a,X,Y,a,Z,u,n).trilateral-locative-formIB-lex-cxt## (
formIB_triliteral_root_verb_lex, (morph:
syn:
sem:
(), (cat:
), (
root:ROOTS
(verb, vform:perf, voice:active, mood:subjunctive)
sit_index:SIT_INDEX,frames:[
))**>(
(sit:SIT_INDEX,location:LOC_INDEX)]
)
formIB_sound_trilateral_ln_lex,(morph:
(),
arg_st:[], syn:
root:ROOTS
(cat:
,
(noun, case:nom, def:no)
BIBLIOGRAPHY
sem:(
))morphs
val:[], mrkg:none),index:LOC_INDEX, frames:[
(sit:SIT_INDEX, location:LOC_INDEX)]
)
(X,a,Y,a,Z,a) becomes (m,a,X,Y,i,Z,u,n).
trilateral-comparative-formI-lex-cxt## (
formI_triliteral_root_verb_lex, (morph:
syn:
sem:
(), (cat:
), (
root:ROOTS
(verb, vform:perf, voice:active, mood:subjunctive)
sit_index:SIT_INDEX,frames:[
))**>(
(sit:SIT_INDEX)]
)
formI_sound_trilateral_com_lex,(morph:
(root:ROOTS
),arg_st:[(OBJ_SIGN,
(syn:(cat:(case:gen,def:no)),sem:(index:OBJ_INDEX)))],
syn:(cat:
, val:[],
(noun, case:nom, def:no)
sem:(
mrkg:none),index:SUB_INDEX, frames:[
(sit:SIT_INDEX,
BIBLIOGRAPHY
actor:(SUB_INDEX2,( pers:PERS, num:sg, gen:male, hum:HUM))
))morphs
), (dimension:SIT_INDEX, compared:SUB_INDEX2, comparedwith:OBJ_INDEX)]
)(X,a,Y,a,Z,a) becomes (a,X,Y,a,Z,u),(X,a,Y,u,Z,a) becomes (a,X,Y,a,Z,u),(X,a,Y,i,Z,a) becomes (a,X,Y,a,Z,u).