Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of...
-
Upload
kenia-pegg -
Category
Documents
-
view
215 -
download
0
Transcript of Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of...
![Page 1: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/1.jpg)
Husse-9 ConferencePécs, 22-24 January, 2009
HunGram vs. EngGram in ParGram
On the Comparison of Hungarian and English in an International Computational Linguistic
Project
TiborTibor Laczk Laczkó, György Rákosi & Ágoston ó, György Rákosi & Ágoston TóthTóth
Department of English LinguisticsUniversity of Debrecen
{laczkot, rakosigy, tagoston}@delfin.unideb.hu
Sponsored by Sponsored by OTKA research grant K 72983OTKA research grant K 72983
![Page 2: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/2.jpg)
OverviewOverview
1.1. Lexical-Functional Grammar (LFG) Lexical-Functional Grammar (LFG)
2.2. The ParGram Project at PARCThe ParGram Project at PARC
3.3. The HunGram Project in DebrecenThe HunGram Project in Debrecen
4.4. A short demonstration: possible A short demonstration: possible ParGram treatments of certain ParGram treatments of certain elliptical noun phrases in English elliptical noun phrases in English and Hungarianand Hungarian
![Page 3: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/3.jpg)
1/11/1 Stanford and LFGStanford and LFG LFG as a linguistic theory was developed in the late 1970s. LFG as a linguistic theory was developed in the late 1970s.
One of the principOne of the principalal aims was to create a framework aims was to create a framework suitable for massive computational applications, and there suitable for massive computational applications, and there has been a lively co-operation between theory and has been a lively co-operation between theory and computational linguistic practice ever since.computational linguistic practice ever since.
The two co-founders:The two co-founders: Joan Bresnan (Stanford University, SU)Joan Bresnan (Stanford University, SU)
mainly mainly linguistic aspectslinguistic aspects Ronald Kaplan (Palo Alto Research Center, PARCRonald Kaplan (Palo Alto Research Center, PARC and SU, and SU,
now at Powerset, Inc.now at Powerset, Inc.))
mainlymainly computational aspectscomputational aspects
General information on LFG is available at:General information on LFG is available at:
http://www.essex.ac.uk/linguistics/LFG/http://www.essex.ac.uk/linguistics/LFG/
![Page 4: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/4.jpg)
1/21/2 Design Principles of LFG Design Principles of LFG
LexicalismLexicalism ModularismModularism Parallel architecture Parallel architecture Generating and parsing structures are Generating and parsing structures are
equally importantequally important Rule system that is directly renderable Rule system that is directly renderable
in a mathematical formalismin a mathematical formalism
![Page 5: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/5.jpg)
1/31/3 Central Modules of LFGCentral Modules of LFG constituent structureconstituent structure
phonologyphonology (language-specific) (language-specific) word orderword order
lexiconlexicon (powerful)(powerful)
functional structurefunctional structure semanticssemantics (universal) (universal)
grammatical relationsgrammatical relations
![Page 6: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/6.jpg)
1/4 1/4 Adpositional phrases in Adpositional phrases in LFGLFG
PPPP PPPP NPNP
PrPr NPNP NPNP Po Po Det Det NN
DetDet N N DetDet N N
nearnear thethe box box aa doboz doboz mellettmellett a a doboz doboz--banban
inin
PREDPRED near/in/mellett/-bannear/in/mellett/-ban, Pr ‘NEAR/IN <(OBJ)>’, Pr ‘NEAR/IN <(OBJ)>’
OBJOBJ PRED box, N ‘BOX’PRED box, N ‘BOX’DEFDEF ++PERSPERS 33NUMNUM sg sg
near/innear/in, Pr ‘NEAR/IN <(OBJ)>’, Pr ‘NEAR/IN <(OBJ)>’ mellett,mellett, Po ‘NEAR Po ‘NEAR <(OBJ)>’<(OBJ)>’
--ban,ban, Nsuff ‘IN <(OBJ)>’ Nsuff ‘IN <(OBJ)>’
![Page 7: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/7.jpg)
2/1 2/1 PargGram at PARCPargGram at PARC
TThe he ParParallel allel GramGrammar (mar (ParGramParGram) project ) project – – launchedlaunched and organized by PARC and organized by PARC
LFG-based computational programLFG-based computational program CCapitalizes on LFG’s flexible general apitalizes on LFG’s flexible general
linguistic and computationally linguistic and computationally implementable architecture implementable architecture
PParser and generatorarser and generator GGoal: to analyze more and more languages oal: to analyze more and more languages
on a maximally uniform platform – in the on a maximally uniform platform – in the spirit of Universal Grammarspirit of Universal Grammar
![Page 8: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/8.jpg)
2/2 2/2 PargGram at PARCPargGram at PARC
AA truly international project truly international project::
English, German, French, Norwegian,English, German, French, Norwegian, Japanese, ChineseJapanese, Chinese, , Urdu (India), Urdu (India), Malagasy (Madagascar), Arabic, Malagasy (Madagascar), Arabic, Vietnamese, Spanish, Welsh, Vietnamese, Spanish, Welsh, Indonesian, Turkish, Georgian, &Indonesian, Turkish, Georgian, & HungarianHungarian
Further information:Further information:http://www2.parc.com/isl/groups/nltt/default.htmlhttp://www2.parc.com/isl/groups/nltt/default.html
![Page 9: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/9.jpg)
2/32/3 XLEXLE parser parser
a deep, grammar-based parsing system for implementing lexical-functional grammars; constructed as part of the ParGram project
output: c-structures and f-structures supports tokenization and morphological analysis
through finite-state transducers (with alternative analyses)
can select the most probable analysis from the potentially large candidate set using stochastic disambiguation (if implemented)
has a generator mode implemented in C; runs on Solaris, linux, and MacOSX. bottom line: a facility for writing syntactic rules and
lexical entries, and for testing and editing them
![Page 10: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/10.jpg)
toy-eng.lfgtoy-eng.lfg
the D * (^ DEF)=+. the D * (^ DEF)=+. girl N * (^ PRED) = 'GIRL'. girl N * (^ PRED) = 'GIRL'. walk V * (^ PRED)='WALK<(^ walk V * (^ PRED)='WALK<(^
SUBJ)>'; N * (^ SUBJ)>'; N * (^ PRED)='WALK'.PRED)='WALK'.
c-structurecontext-free
phrase-structure tree
encoding constituency
and linear order
f-structure attribute-
value matrices
that encode predicate-argument relations and other
grammatical
information (e.g.
number, tense, case)
![Page 11: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/11.jpg)
22//44 Challenging natural Challenging natural language phenomenalanguage phenomena
Lexical ambiguityLexical ambiguityHomonymyHomonymy: : bank, fluke; ár, légy, írPolysemyPolysemy: : bulb, line; körte, toll
Structural ambiguityStructural ambiguityI saw the girl with the telescope. Részegen láttam Jánost. // Egész nap a hajókat néztük a Dunán. //
Word formation (compounding, derivation, minor processesWord formation (compounding, derivation, minor processes))horror, horrid, horrify; terror, (*terrid), terrify;candor, candid, (*candify) student film society committee scandal video…
Anaphoric referencesAnaphoric referencesa)We gave the bananas to the monkeys because they were hungry.b)We gave the bananas to the monkeys because they were ripe.
EllipsisEllipsis
![Page 12: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/12.jpg)
22//55 Direct challenges Direct challenges
Non-toy lexicon of the Hungarian language, Non-toy lexicon of the Hungarian language, empirical techniquesempirical techniques
Tokenization, morphological analysisTokenization, morphological analysiswalkswalkswalk +Verb +Pres +3sg
walk +Noun +PlNamed entity recognitionNamed entity recognitionTypes: person, role, location, organization, brand,
title, etc.This is the website of [the University of Debrecen org]. [The University of Debrecen loc] is not far from us.
Parsing performance Parsing performance t trade-off between accuracy, usability and speedrade-off between accuracy, usability and speed
![Page 13: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/13.jpg)
3/1 3/1 HunGramHunGram
Tibor Laczkó – 2005/2006: Fulbright Tibor Laczkó – 2005/2006: Fulbright research grant to Stanford Universityresearch grant to Stanford University
a ParGram invitation to PARCa ParGram invitation to PARC
research at two host institutionsresearch at two host institutions two goals at PARC:two goals at PARC:
(i)(i) familiarity with the formalism (XLE)familiarity with the formalism (XLE)
(ii)(ii) starting the implementation in XLE of the starting the implementation in XLE of the results of the research on the morpho-syntax of results of the research on the morpho-syntax of Hungarian noun phrases (in an LFG framework)Hungarian noun phrases (in an LFG framework)
![Page 14: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/14.jpg)
3/2 3/2 HunGramHunGram
LFG Research Group (LFGRG) at the LFG Research Group (LFGRG) at the Department of English Linguistics, Department of English Linguistics, UDUD Tibor LaczkóTibor Laczkó György RákosiGyörgy Rákosi Ágoston TóthÁgoston Tóth 2 PhD students2 PhD students
XLE software licence from PARCXLE software licence from PARC
![Page 15: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/15.jpg)
3/3 3/3 HunGramHunGram
OTKA research grant for 2008—2012OTKA research grant for 2008—2012 (K 72983) (K 72983)(Hungarian Scientific Research Fund)(Hungarian Scientific Research Fund)
objectivesobjectives1.1. developing a comprehensive LFG grammar of the developing a comprehensive LFG grammar of the
Hungarian language (morphology, syntax, lexicon, Hungarian language (morphology, syntax, lexicon, semantic issues)semantic issues)
2.2. implementing it in HunGram/ParGramimplementing it in HunGram/ParGram3.3. launching an English vs. Hungarian comparative launching an English vs. Hungarian comparative
research project on the ParGram platformresearch project on the ParGram platform4.4. incorporating the results in various course materials at incorporating the results in various course materials at
the English Linguistics Departmentthe English Linguistics Department (1 & 2)(1 & 2) 3 3 4 4
![Page 16: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/16.jpg)
4/1 4/1 DemoDemo
Elliptical noun Elliptical noun phrasesphrases
![Page 17: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/17.jpg)
4/2 4/2 ““az öt nagy zöldetaz öt nagy zöldet””
c-structure c-structure + morphology
f-structure
![Page 18: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/18.jpg)
4/3 4/3 ““the five large green onesthe five large green ones” ”
![Page 19: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/19.jpg)
4/4 4/4 ““a három ügyes fiú öt nagy zöldjéta három ügyes fiú öt nagy zöldjét””
![Page 20: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/20.jpg)
4/5 4/5 ““the three boys’ five large green onesthe three boys’ five large green ones” ”
![Page 21: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/21.jpg)
4/6 Elliptical noun phrases4/6 Elliptical noun phrases differencesdifferences
1.1. c-structurec-structure2.2. morphology (e. g.: +case vs. –case)morphology (e. g.: +case vs. –case)3.3. English: ‘pro’ realized by an overt element, in the lexicon, in c-English: ‘pro’ realized by an overt element, in the lexicon, in c-
structure, in f-structurestructure, in f-structure4.4. Hungarian: ‘pro’ is covert, introduced by a functional annotation Hungarian: ‘pro’ is covert, introduced by a functional annotation
in c-structure, present in f-structurein c-structure, present in f-structure5.5. EngGram vs. Hungram wrt to the number of featuresEngGram vs. Hungram wrt to the number of features
similaritiessimilarities1.1. f-structure – except for typological differences (case etc. f-structure – except for typological differences (case etc.
features)features)2.2. as Hungram gets more and more developed, more and more as Hungram gets more and more developed, more and more
shared EngGram (ParGram) featuresshared EngGram (ParGram) features proposalproposal
(previous talk) – a more lexical solution: ‘pro’ introduced by (previous talk) – a more lexical solution: ‘pro’ introduced by (case-marked) adjectival lexical items(case-marked) adjectival lexical items
planplan testing its implementability in HunGram (and EngGram?)testing its implementability in HunGram (and EngGram?)
![Page 22: Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551a7e0c550346e0158b4854/html5/thumbnails/22.jpg)
Hungram1.lfgHungram1.lfg
FIRST HUNGARIAN CONFIG (1.0)FIRST HUNGARIAN CONFIG (1.0)ROOTCAT ROOT.ROOTCAT ROOT.FILES common.templates.lfg hun-lex.lfg hun-templates.lfg FILES common.templates.lfg hun-lex.lfg hun-templates.lfg
hun-morphconfig.lfg hun-rules.lfg.hun-morphconfig.lfg hun-rules.lfg.LEXENTRIES (FIRST HUNGARIAN).LEXENTRIES (FIRST HUNGARIAN).CHARACTERENCODING iso-8859-2.CHARACTERENCODING iso-8859-2.MORPHOLOGY (STANDARD HUNGARIAN).MORPHOLOGY (STANDARD HUNGARIAN).RULES (FIRST HUNGARIAN).RULES (FIRST HUNGARIAN).TEMPLATES (STANDARD COMMON) (FIRST HUNGARIAN).TEMPLATES (STANDARD COMMON) (FIRST HUNGARIAN).GOVERNABLERELATIONS SUBJ OBJ POSS OBL OBL-? COMP GOVERNABLERELATIONS SUBJ OBJ POSS OBL OBL-? COMP
XCOMP PREDLINK.XCOMP PREDLINK.SEMANTICFUNCTIONS ADJUNCT TOPIC FOCUS.SEMANTICFUNCTIONS ADJUNCT TOPIC FOCUS.--------