Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of...

22
Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational Linguistic Project Tibor Tibor Laczk Laczk ó, György Rákosi & ó, György Rákosi & Ágoston Tóth Ágoston Tóth Department of English Linguistics University of Debrecen {laczkot, rakosigy, tagoston}@delfin.unideb.hu Sponsored by Sponsored by OTKA research grant K OTKA research grant K 72983 72983

Transcript of Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of...

Page 1: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

Husse-9 ConferencePécs, 22-24 January, 2009

HunGram vs. EngGram in ParGram

On the Comparison of Hungarian and English in an International Computational Linguistic

Project

TiborTibor Laczk Laczkó, György Rákosi & Ágoston ó, György Rákosi & Ágoston TóthTóth

Department of English LinguisticsUniversity of Debrecen

{laczkot, rakosigy, tagoston}@delfin.unideb.hu

Sponsored by Sponsored by OTKA research grant K 72983OTKA research grant K 72983

Page 2: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

OverviewOverview

1.1. Lexical-Functional Grammar (LFG) Lexical-Functional Grammar (LFG)

2.2. The ParGram Project at PARCThe ParGram Project at PARC

3.3. The HunGram Project in DebrecenThe HunGram Project in Debrecen

4.4. A short demonstration: possible A short demonstration: possible ParGram treatments of certain ParGram treatments of certain elliptical noun phrases in English elliptical noun phrases in English and Hungarianand Hungarian

Page 3: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

1/11/1 Stanford and LFGStanford and LFG LFG as a linguistic theory was developed in the late 1970s. LFG as a linguistic theory was developed in the late 1970s.

One of the principOne of the principalal aims was to create a framework aims was to create a framework suitable for massive computational applications, and there suitable for massive computational applications, and there has been a lively co-operation between theory and has been a lively co-operation between theory and computational linguistic practice ever since.computational linguistic practice ever since.

The two co-founders:The two co-founders: Joan Bresnan (Stanford University, SU)Joan Bresnan (Stanford University, SU)

mainly mainly linguistic aspectslinguistic aspects Ronald Kaplan (Palo Alto Research Center, PARCRonald Kaplan (Palo Alto Research Center, PARC and SU, and SU,

now at Powerset, Inc.now at Powerset, Inc.))

mainlymainly computational aspectscomputational aspects

General information on LFG is available at:General information on LFG is available at:

http://www.essex.ac.uk/linguistics/LFG/http://www.essex.ac.uk/linguistics/LFG/

Page 4: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

1/21/2 Design Principles of LFG Design Principles of LFG

LexicalismLexicalism ModularismModularism Parallel architecture Parallel architecture Generating and parsing structures are Generating and parsing structures are

equally importantequally important Rule system that is directly renderable Rule system that is directly renderable

in a mathematical formalismin a mathematical formalism

Page 5: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

1/31/3 Central Modules of LFGCentral Modules of LFG constituent structureconstituent structure

phonologyphonology (language-specific) (language-specific) word orderword order

lexiconlexicon (powerful)(powerful)

functional structurefunctional structure semanticssemantics (universal) (universal)

grammatical relationsgrammatical relations

Page 6: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

1/4 1/4 Adpositional phrases in Adpositional phrases in LFGLFG

PPPP PPPP NPNP

PrPr NPNP NPNP Po Po Det Det NN

DetDet N N DetDet N N

nearnear thethe box box aa doboz doboz mellettmellett a a doboz doboz--banban

inin

PREDPRED near/in/mellett/-bannear/in/mellett/-ban, Pr ‘NEAR/IN <(OBJ)>’, Pr ‘NEAR/IN <(OBJ)>’

OBJOBJ PRED box, N ‘BOX’PRED box, N ‘BOX’DEFDEF ++PERSPERS 33NUMNUM sg sg

near/innear/in, Pr ‘NEAR/IN <(OBJ)>’, Pr ‘NEAR/IN <(OBJ)>’ mellett,mellett, Po ‘NEAR Po ‘NEAR <(OBJ)>’<(OBJ)>’

--ban,ban, Nsuff ‘IN <(OBJ)>’ Nsuff ‘IN <(OBJ)>’

Page 7: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

2/1 2/1 PargGram at PARCPargGram at PARC

TThe he ParParallel allel GramGrammar (mar (ParGramParGram) project ) project – – launchedlaunched and organized by PARC and organized by PARC

LFG-based computational programLFG-based computational program CCapitalizes on LFG’s flexible general apitalizes on LFG’s flexible general

linguistic and computationally linguistic and computationally implementable architecture implementable architecture

PParser and generatorarser and generator GGoal: to analyze more and more languages oal: to analyze more and more languages

on a maximally uniform platform – in the on a maximally uniform platform – in the spirit of Universal Grammarspirit of Universal Grammar

Page 8: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

2/2 2/2 PargGram at PARCPargGram at PARC

AA truly international project truly international project::

English, German, French, Norwegian,English, German, French, Norwegian, Japanese, ChineseJapanese, Chinese, , Urdu (India), Urdu (India), Malagasy (Madagascar), Arabic, Malagasy (Madagascar), Arabic, Vietnamese, Spanish, Welsh, Vietnamese, Spanish, Welsh, Indonesian, Turkish, Georgian, &Indonesian, Turkish, Georgian, & HungarianHungarian

Further information:Further information:http://www2.parc.com/isl/groups/nltt/default.htmlhttp://www2.parc.com/isl/groups/nltt/default.html

Page 9: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

2/32/3 XLEXLE parser parser

a deep, grammar-based parsing system for implementing lexical-functional grammars; constructed as part of the ParGram project

output: c-structures and f-structures supports tokenization and morphological analysis

through finite-state transducers (with alternative analyses)

can select the most probable analysis from the potentially large candidate set using stochastic disambiguation (if implemented)

has a generator mode implemented in C; runs on Solaris, linux, and MacOSX. bottom line: a facility for writing syntactic rules and

lexical entries, and for testing and editing them

Page 10: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

toy-eng.lfgtoy-eng.lfg

the D * (^ DEF)=+. the D * (^ DEF)=+. girl N * (^ PRED) = 'GIRL'. girl N * (^ PRED) = 'GIRL'. walk V * (^ PRED)='WALK<(^ walk V * (^ PRED)='WALK<(^

SUBJ)>'; N * (^ SUBJ)>'; N * (^ PRED)='WALK'.PRED)='WALK'.

c-structurecontext-free

phrase-structure tree

encoding constituency

and linear order

f-structure attribute-

value matrices

that encode predicate-argument relations and other

grammatical

information (e.g.

number, tense, case)

Page 11: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

22//44 Challenging natural Challenging natural language phenomenalanguage phenomena

Lexical ambiguityLexical ambiguityHomonymyHomonymy: : bank, fluke; ár, légy, írPolysemyPolysemy: : bulb, line; körte, toll

Structural ambiguityStructural ambiguityI saw the girl with the telescope. Részegen láttam Jánost. // Egész nap a hajókat néztük a Dunán. //

Word formation (compounding, derivation, minor processesWord formation (compounding, derivation, minor processes))horror, horrid, horrify; terror, (*terrid), terrify;candor, candid, (*candify) student film society committee scandal video…

Anaphoric referencesAnaphoric referencesa)We gave the bananas to the monkeys because they were hungry.b)We gave the bananas to the monkeys because they were ripe.

EllipsisEllipsis

Page 12: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

22//55 Direct challenges Direct challenges

Non-toy lexicon of the Hungarian language, Non-toy lexicon of the Hungarian language, empirical techniquesempirical techniques

Tokenization, morphological analysisTokenization, morphological analysiswalkswalkswalk +Verb +Pres +3sg

walk +Noun +PlNamed entity recognitionNamed entity recognitionTypes: person, role, location, organization, brand,

title, etc.This is the website of [the University of Debrecen org]. [The University of Debrecen loc] is not far from us.

Parsing performance Parsing performance t trade-off between accuracy, usability and speedrade-off between accuracy, usability and speed

Page 13: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

3/1 3/1 HunGramHunGram

Tibor Laczkó – 2005/2006: Fulbright Tibor Laczkó – 2005/2006: Fulbright research grant to Stanford Universityresearch grant to Stanford University

a ParGram invitation to PARCa ParGram invitation to PARC

research at two host institutionsresearch at two host institutions two goals at PARC:two goals at PARC:

(i)(i) familiarity with the formalism (XLE)familiarity with the formalism (XLE)

(ii)(ii) starting the implementation in XLE of the starting the implementation in XLE of the results of the research on the morpho-syntax of results of the research on the morpho-syntax of Hungarian noun phrases (in an LFG framework)Hungarian noun phrases (in an LFG framework)

Page 14: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

3/2 3/2 HunGramHunGram

LFG Research Group (LFGRG) at the LFG Research Group (LFGRG) at the Department of English Linguistics, Department of English Linguistics, UDUD Tibor LaczkóTibor Laczkó György RákosiGyörgy Rákosi Ágoston TóthÁgoston Tóth 2 PhD students2 PhD students

XLE software licence from PARCXLE software licence from PARC

Page 15: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

3/3 3/3 HunGramHunGram

OTKA research grant for 2008—2012OTKA research grant for 2008—2012 (K 72983) (K 72983)(Hungarian Scientific Research Fund)(Hungarian Scientific Research Fund)

objectivesobjectives1.1. developing a comprehensive LFG grammar of the developing a comprehensive LFG grammar of the

Hungarian language (morphology, syntax, lexicon, Hungarian language (morphology, syntax, lexicon, semantic issues)semantic issues)

2.2. implementing it in HunGram/ParGramimplementing it in HunGram/ParGram3.3. launching an English vs. Hungarian comparative launching an English vs. Hungarian comparative

research project on the ParGram platformresearch project on the ParGram platform4.4. incorporating the results in various course materials at incorporating the results in various course materials at

the English Linguistics Departmentthe English Linguistics Department (1 & 2)(1 & 2) 3 3 4 4

Page 16: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

4/1 4/1 DemoDemo

Elliptical noun Elliptical noun phrasesphrases

Page 17: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

4/2 4/2 ““az öt nagy zöldetaz öt nagy zöldet””

c-structure c-structure + morphology

f-structure

Page 18: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

4/3 4/3 ““the five large green onesthe five large green ones” ”

Page 19: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

4/4 4/4 ““a három ügyes fiú öt nagy zöldjéta három ügyes fiú öt nagy zöldjét””

Page 20: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

4/5 4/5 ““the three boys’ five large green onesthe three boys’ five large green ones” ”

Page 21: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

4/6 Elliptical noun phrases4/6 Elliptical noun phrases differencesdifferences

1.1. c-structurec-structure2.2. morphology (e. g.: +case vs. –case)morphology (e. g.: +case vs. –case)3.3. English: ‘pro’ realized by an overt element, in the lexicon, in c-English: ‘pro’ realized by an overt element, in the lexicon, in c-

structure, in f-structurestructure, in f-structure4.4. Hungarian: ‘pro’ is covert, introduced by a functional annotation Hungarian: ‘pro’ is covert, introduced by a functional annotation

in c-structure, present in f-structurein c-structure, present in f-structure5.5. EngGram vs. Hungram wrt to the number of featuresEngGram vs. Hungram wrt to the number of features

similaritiessimilarities1.1. f-structure – except for typological differences (case etc. f-structure – except for typological differences (case etc.

features)features)2.2. as Hungram gets more and more developed, more and more as Hungram gets more and more developed, more and more

shared EngGram (ParGram) featuresshared EngGram (ParGram) features proposalproposal

(previous talk) – a more lexical solution: ‘pro’ introduced by (previous talk) – a more lexical solution: ‘pro’ introduced by (case-marked) adjectival lexical items(case-marked) adjectival lexical items

planplan testing its implementability in HunGram (and EngGram?)testing its implementability in HunGram (and EngGram?)

Page 22: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.

Hungram1.lfgHungram1.lfg

FIRST HUNGARIAN CONFIG (1.0)FIRST HUNGARIAN CONFIG (1.0)ROOTCAT ROOT.ROOTCAT ROOT.FILES common.templates.lfg hun-lex.lfg hun-templates.lfg FILES common.templates.lfg hun-lex.lfg hun-templates.lfg

hun-morphconfig.lfg hun-rules.lfg.hun-morphconfig.lfg hun-rules.lfg.LEXENTRIES (FIRST HUNGARIAN).LEXENTRIES (FIRST HUNGARIAN).CHARACTERENCODING iso-8859-2.CHARACTERENCODING iso-8859-2.MORPHOLOGY (STANDARD HUNGARIAN).MORPHOLOGY (STANDARD HUNGARIAN).RULES (FIRST HUNGARIAN).RULES (FIRST HUNGARIAN).TEMPLATES (STANDARD COMMON) (FIRST HUNGARIAN).TEMPLATES (STANDARD COMMON) (FIRST HUNGARIAN).GOVERNABLERELATIONS SUBJ OBJ POSS OBL OBL-? COMP GOVERNABLERELATIONS SUBJ OBJ POSS OBL OBL-? COMP

XCOMP PREDLINK.XCOMP PREDLINK.SEMANTICFUNCTIONS ADJUNCT TOPIC FOCUS.SEMANTICFUNCTIONS ADJUNCT TOPIC FOCUS.--------