Adpositional Grammars

37
Introduction Theory Application Implementation Machine translation Conclusions Adpositional Grammars A multilingual grammar formalism for NLP Federico Gobbo [email protected] Universit` a dell’Insubria, Varese CC BY: $ \ C Varese, 9 January 2009

Transcript of Adpositional Grammars

Page 1: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Adpositional GrammarsA multilingual grammar formalism for NLP

Federico [email protected]

Universita dell’Insubria, VareseCC! BY:! $\! C!

Varese, 9 January 2009

Page 2: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Two key questions posed by Leibniz

1. How are the laws of human thought made?

2. How can linguistic knowledge be formalised?

Page 3: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Two current approaches for question 2

Since Chomsky 1956: generative-transformational grammars.

Pro: well formalised, good algorithms.

Con: constituents are not linguistically adherent.

Since Tesniere 1959: dependency-based grammars.

Pro: linguistically adherent and rich.

Con: di!cult to be rendered computationally.

Page 4: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Pennacchietti’s research

Since 1974, studies on prepositions.

moving from the Tesnerian school;

unsatisfied by the vague notion of ‘dependency’;

looking for a clear notion;

found in the Langacker’s dichotomy trajector/landmark(tr/lm).

The dichotomy tr/lm is derived from Gestalt psichology.

Page 5: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Pennacchietti’s main results

Prepositions build the structure of natural languages (NL):

expressing the relation between a trajector (yellow)...

...and an active landmark (red);

...or a passive landmark (blue).

A pseudo-formal system is built:

2 directions (tr ! lm);

4 configurations in a Cartesian space;

each NL is depicted by a prepositional space.

Page 6: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

The prepositional space of English (specimen)

!"#$

non-dimensional dimensional

applic

ativ

ere

troap

plic

ativ

e

": to, ... #: between, ...

$: with, ...%: from, ...

Page 7: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Trajector/landmark in a glance 1/4

&(A book) is between "(two men).&(Un libro) e tra "(due uomini).

Page 8: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Trajector/landmark in a glance 2/4

&(Two men) hold ! "(a book).&(Due uomini) tengono ! "(un libro).

Page 9: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Trajector/landmark in a glance 3/4

&(Two men) are with $(a book).&(Due uomini) sono con $(un libro).

Page 10: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Trajector/landmark in a glance 4/4

&(A book) is with '(two men).&(Un libro) e con '(due uomini).

Page 11: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

This structure is composable and recursive

&(Mr. C) receives ! "(the book) from %(Mr. A).&(Il Sig. C) riceve ! "(un libro) da- %(-l Sig. A).

Page 12: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Foundation of adtrees

Adpositional trees (adtrees) represent that structure. Caveats:

adtrees are always Porfirian, i.e., binary;

prepositions are generalised in adpositions;

each tree has a hook, i.e., where adpositions stand;

the basic unit of NLs is the morpheme, not the word;

adpositional morphemes form morphosyntax;

non-adpositional morphemes are lexemes.

Adtrees represent morphosyntax and some pragmatic phenomena.

Page 13: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

The abstract minimal adtree types

Chapter 2. Adpositional trees

!!

!!

!!

""""""

!!"

adp

lm tr

!!

!!

!!

""""""

!#$

adp

tr lm

!!

!!

!!

""""""

!%$

adp

lm tr

!!

!!

!!

""""""

!&"

adp

tr lm

Figure 2.8: The abstract minimal adtree types

!!

!!

!!

""""""

!'(

adp

tr|lm lm|tr

Figure 2.9: The minimal anonymous adtree

!!

!!

!!

""""""

!'(e

carne !!

!!

!!

""""""

!'(in

!!

!!

!!

""""""

!'(da

ieri frigorifero

pesce

Figure 2.10: The relation between adtrees and Ceccato’s translation system

47

The hook is derived from Ceccato’s work in machine translation.

Page 14: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Semantics

Lexemes form semantics;

there are four grammar characters;

each lexeme has a fundamental grammar character;

lexemes can be tranferred to another grammar character;

each lexeme has a valence value.

Adpositional grammar = adtrees + dictionary.

Page 15: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Semantics

Lexemes form semantics;

there are four grammar characters;

each lexeme has a fundamental grammar character;

lexemes can be tranferred to another grammar character;

each lexeme has a valence value.

Adpositional grammar = adtrees + dictionary.

Page 16: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

The 4 grammar characters

From Whorf and Tesniere:

stativation (O);

adjunctivation, i.e., what modifies stativation (A);

verbification (I);

circumstantiation, i.e., what modifies verbification (E).

This structure of semantics is cross-lingually valid (Whorf).

Page 17: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

How the dictionary is built (example)

transference English Italian French GermanA long lung-o long langA>O length lungh-ezz-a longu-er LangeA>E long lung-amente longu-ement ent-langA>I length-en al-lung-are (r)al-long-er ver-lang-ern

Page 18: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Valence and actants

Valence is derived from Tesniere. Some examples:

nevica (valence = 0);

Carl smiled (valence = 1);

Liza reads a book (valence = 2);

Liza gives a kiss to Paul (valence = 3).

Actants expresses pragmatics:

Carl and Liza are Agents (Na);

Paul is an Experiencer (Ne);

a book is a Patient (Np).

Page 19: Adpositional Grammars

A man is with a book

2.11. A cross-lingual example

• (49-en.) A man is !(with) a man.

• (49-it.) Un uomo e !(con) un libro.

• (49-tu.) O kitap!-lı adam-dır.

Again, in example 49 the English and Italian adtrees are very similar (seeFigure 2.55. Adpositions with : con are false, as the bivalent verb is to be

!!

!!

!!

""""""

!"#!

!!

!!

!!

""""""

!$#!

A manNa

""""""!

!!

!!!

""""""

!!%

with

!!

!!

!!

""""""

!$#!

a bookNp

is

!!

!!

!!

""""""

!"#!

!!

!!

!!

""""""

!$#"-o

un uom-Na

""""""!

!!

!!!

""""""

!!%

con

!!

!!

!!

""""""

!$#"-o

un libr-Np

e

Figure 2.55: The English and Italian adtrees of A man is with a man (49).

with : essere con, a situation already explained in section 2.9.1. In contrast,the Turkish adtree is far more simple (see Figure 2.56). Example 49 is builtmorphologically as follows:

O kitap- -lı adam- -dırThat book his man is

Interestingly, the Turkish language use the determiner O as the syntacticsubject S, instead of the Agent adam-, which is part of the second valence,unlike English and Italian.

2.11.4 The relation of possession

Let us see how the convention for PhAdS acts in NLs like Italian and En-glish, while in Turkish is not so heavily needed. In conceptual space 50, the

86

Page 20: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Why Esperanto?

Structural facts:

linguistic phenomena of English, German, Russian, Frenchall-in-one;

highly regular, as quasi-natural languages (QNLs) by children;

the morphology is considerable small compared to NLs;

grammar characters are always expressed, even redundantly.

Basic sociolinguistical facts:

Launched in 1887, it survived two world wars.

Stable speech community, approx. 50,000 active speakersworldwide.

Free web corpora available (e.g., Le Monde Diplomatique).

Page 21: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Phraseology is flexible

English, German and Italian:

Alice had a shower.

Alice duscht.

Alice fece una doccia.

The same sentence in Esperanto:

Alico havis duson. (English-like)

Alico faris duson. (Italian-like)

Alico dusis. (German-like)

Page 22: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

What is a QNL?

Esperanto is a child-like language for grammar regularity,although planned.

Quasi-Natural English vs. English:

Two child-s run-ed towards the mouse-s.

Two children run towards the mice.

Quasi-Natural Italian vs. Italian:

I due uov-i and-ano cuoci-uti piu bene.

Le due uova vanno cotte meglio.

Page 23: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Application at a sentence level

%-junctives (example):

&(Alfredo povas pagi), car li estas rica.

&(Al can pay), because he is rich.

&(Alfredo puo pagare), poiche e ricco.

$-junctives (example):

Alfredo povas pagi, do &(li estas rica).

Al is rich, therefore &(he can pay).

Alfredo e ricco, dunque &(puo pagare).

Page 24: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

The architecture

strings ( words ( tokens;

a token is a set of tags;

the parser builds the adtrees;

adtrees are the data structures.

Page 25: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

How it is implemented

Von Neumann’s virtual machine with enough memory;

C-like pseudo-language for the lexical analyser;

non-determinism simulated through backtracking;

parsing algorithm as a set of derivation rules;

complex logic syntax;

suitable for a logical framework (e.g., Isabelle).

Page 26: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

The description of parsing (specimen)

S. Prefix* Spec {-o-|-a-|-e-|-i-|)} Atom Su!x* finalE. demand -o- sign -as

kun- vojag -ufrenez -ig- -os

daur -i- pov -isP. I5* I4 I3 I2 I1* I0M. * + + , * ,

The parsing of the verbal group in Esperanto.

Page 27: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

The predicate of the verbal group (specimen)Chapter 7. The formal model

recognising a I-group is as follows.

I-group(u, v, n, d, c) ! (I " v # Rep(d) " v # (7.7)# I-comp-group(u,$%v , n)) &

& ('c!.(c = c! ( -n & c = c!) ## n = 1 # Rep(est-) " u ## I " %)u # Rep(d) " %)u #

# (Adj-group(%)%)u , v,*, c!) &

& e-group(%)%)u , v) &

& nom-stative-group(%)%)u , v))) .

Except for extracting the valence from the lexeme acting as the root ofthe verb, the I-comp-group is a !-group.

I-comp-group(u, v, n) ! 'a.!-atom-group(u, v, n, a) & (7.8)& ('[u,v)w.'l.!-spec-group(u, w, l, a) #

# !-atom-group(%)w , v, n)) .

7.4.3 O-predicates

A stative group is a noun with a precise function within a phrase. Stativegroups can be distinguished in valence arguments and extra arguments.The valence stative groups are:

1. the first valence argument (subject S);

2. the second valence argument (object O);

3. the third valence argument (dative D).13

In Esperanto, extra arguments can be stative groups, where the hook canbe a preposition. As this case is deeply relevant in order to write predi-cates and rules, I call these particular stative groups prepositional clauses:analogously to correlative clauses, prepositional clauses are pseudophrasesacting as adjectives or circumstantials (see section 5.3.9 for C-correlatives).14

The most fundamental stative group is an O-group, so I call this set ofpredicates ‘O-predicates’. A stative group is either a simple stative group(O-S-group), or a composition of stative groups. The p parameter repre-sents the main adposition. More precisely, it is the one that determines theadtype: in Esperanto, this is mostly the preposition. Under a formal pointof view, the main adposition is the one that is used to attach the adtreecorresponding to the stative group to the governing adtree. Visually, themain adposition is never ‘pushed down’ by a left triangle (") or a right one

269

Page 28: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

The rule for the verb esti

7.5. Parsing rules

tural rule takes care of concluding the derivation (section 6.3).

! "-suffix-range(u, v)! T! Rep(s) ! u

!-suffix! {+"-suffix-range("#u , v),""-suffix-range(u, v)}

!

!!

!!

!!

""""""

!$##

s T

(7.68)

7.5.3 The I-group

The I-group describes a verb. Thus, this set of rules parse an I-range, corre-sponding to a sequence of tokens satisfying the I-group predicate.

The general I-range gets parsed by the following rule. The extraction ofthe verbal final is left to the rules that build the adtree for a phrase.

! I-range(u, v)! %! "-group(u,&"v )

I1! {+"-range(u,&"v ),"I-range(u, v)}! %

(7.69)

As said before, the verb esti is an exception (section 5.5.5), so the fol-lowing rules apply when the it is used in conjunction with an adjective, acircumstantial or a nominal stative, respectively.

! I-range(u, v)! %! Adj-group(

"#"#u , v,', c)

! {Adj-range("#"#u , v)}

! %..."Adj

! %! A[Rep(a),Sign(!),Dir(")]

I2! {"I-range(u, v)}

!

!!

!!

!!

""""""

!'&a

A est-

(7.70)

284

Page 29: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

The machine translation architecture

rule-based, transfer system;

from Esperanto to English/Chinese.

Two steps are performed:

1 the first step: adtree transformation (metataxis);2 the second step: substitution (i.e., lexeme-by-lexeme).

Page 30: Adpositional Grammars

English2.11. A cross-lingual example

!!

!!

!!

""""""

!!"!

!!

!!

!!

""""""

!#"!

The !!

!!

!!

""""""

!#"

where

!!

!!

!!

""""""

!$%!

often !!

!!

!!

""""""

!!"!

I !!

!!

!!

""""""

!$%!

literature study

library

""""""""""""""""""""""""!

!!

!!!

""""""

!$%!

!!

!!

!!

""""""

!#"!

far away

is

!!

!!

!!

""""""

!!"-a"

!!

!!

!!

""""""

!#"!

La !!

!!

!!

""""""

!#"

dove

!!

!!

!!

""""""

!$%!

spesso !!

!!

!!

""""""

!!"#-o

! !!

!!

!!

""""""

!$%-a"

letteratur- studi-

bibliotec-

""""""""""""""""""""""""!

!!

!!!

""""""

!$%!

!!

!!

!!

""""""

!#"-a"

molto lontan-

e

Figure 2.47: The English and Italian adtrees of The library where I... (46).

80

Page 31: Adpositional Grammars

Italian

2.11. A cross-lingual example

!!

!!

!!

""""""

!!"!

!!

!!

!!

""""""

!#"!

The !!

!!

!!

""""""

!#"

where

!!

!!

!!

""""""

!$%!

often !!

!!

!!

""""""

!!"!

I !!

!!

!!

""""""

!$%!

literature study

library

""""""""""""""""""""""""!

!!

!!!

""""""

!$%!

!!

!!

!!

""""""

!#"!

far away

is

!!

!!

!!

""""""

!!"-a"

!!

!!

!!

""""""

!#"!

La !!

!!

!!

""""""

!#"

dove

!!

!!

!!

""""""

!$%!

spesso !!

!!

!!

""""""

!!"#-o

! !!

!!

!!

""""""

!$%-a"

letteratur- studi-

bibliotec-

""""""""""""""""""""""""!

!!

!!!

""""""

!$%!

!!

!!

!!

""""""

!#"-a"

molto lontan-

e

Figure 2.47: The English and Italian adtrees of The library where I... (46).

80

Page 32: Adpositional Grammars

From Esperanto to “Chinese-anto”

Page 33: Adpositional Grammars

“Chinese-anto”

Page 34: Adpositional Grammars

Chinese

Chapter 2. Adpositional trees

!!

!!

!!

""""""

!!"de

!!

!!

!!!

!!

!!!

""""""

!#"!

wo !!

!!

!!

""""""

!$%!

jıng chang !!

!!

!!

""""""

!$%!

wen xue xue xı

!!

!!

!!

""""""

!#"!

tushu guan !!

!!

!!

""""""

!$%lı

zher !!

!!

!!

""""""

!!"!

hen yuan

!!

!!

!!

""""""

!!"

suru

!!

!!

!!!

!!

!!!

""""""

!#"

ga"

watashi !!

!!

!!

""""""

!$%wo

bun gaku benkyoo

!!

!!

!!

""""""

!#"

wa" #desu

toshokan tooi

Figure 2.48: The Chinese and Japanese adtrees of wo jıng... (46).

Table 2.1: Pragmatic analysis of the cross-lingual example

Actants English Italian TurkishAgent (Na) man uom- adam-Patient (Np) book libr- kitab-Action (V ) read- legg- oku-,...

81

Page 35: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Main results of this dissertation

Adgrams are:

cognitively grounded (on the dichotomy trajector/landmark);

cross-linguistically valid (English, Italian, Chinese...);

not domain-specific(full Esperanto implementation: 179 logic formulas, 56predicates);

formally robust (described in a complex logic syntax);

computationally well found(suitable for up-to-date logical frameworks).

Adtree macros implemented in Latex will be relased in CTAN.

Page 36: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Pennacchietti’s essential references

Pennacchietti, Fabrizio A. 2008. The Prepositional System ofClassic Syriac and that of Sureth. J/Simtha. forthcoming.

Pennacchietti, Fabrizio A. 2006. Come classificare lepreposizioni? Una nuova proposta. Quaderni del Laboratoriodi Linguistica. 6. Normale: Pisa.

Pennacchietti, Fabrizio A. 2006. Propono klasifiki laprepoziciojn de Esperanto. IKU 2006. UEA: Firenze.

Pennacchietti, Fabrizio A. 1976. La prepozicia sistemo deEsperanto. Esperantologiaj Kajeroj I. ELTE: Budapest.

Pennacchietti, Fabrizio A. 1974. Appunti per una storiacomparata dei sistemi preposizionali semitici. Annali. IstitutoOrientale: Napoli.

Page 37: Adpositional Grammars

Introduction Theory Application Implementation Machine translation Conclusions

Gobbo’s essential references

Gobbo, Federico. 2008. Pianificare il lessico scientificointernazionale. Giuseppe Peano and his School. Universitadegli Studi: Torino.

Gobbo, Federico. 2006. L’esperanto e la traduzioneautomatica, in Borbone et al. Loquentes linguis. Harassowitz:Wiesbaden.

Gobbo, Federico. 2005. The digital way to spread conlangs.ICIL 2005. Universitat Jaume I: Castellon (Spain).

Gobbo, Federico. 2005. The European Union’s Need for anInternational Auxiliary Language. Journal of UniversalLanguage. 6.

Gobbo, Federico. 1998. Il dilemma dell’esperanto. Master’sthesis. Universita degli Studi: Torino.