dictionary and corpus - DCU School of Computingjwagner/doc/Ludewig_collocationsH04.pdf ·...

4
1 Collocations – Mediating between Lexical Abstractions and Textual Concretions Petra Ludewig Institute of Cognitive Science University of Osnabrück Germany Joachim Wagner National Centre for Language Technology School of Computing Dublin City University Ireland Overview 1. Introduction Gap between Abstractions and Concretions Collocations 2. LogoTax General Objectives Three Layer Representation Technological Aspects 3. Conclusion Abstractions vs. Concretions Linguistic paradigms • Generative Grammar • Structuralism Pedagogical paradigms • Instructivsm • Constructivism Gap between Abstractions and Concretions Abstract description of a single word Authentic example sentences ? LogoTax, a system combining dictionary and corpus

Transcript of dictionary and corpus - DCU School of Computingjwagner/doc/Ludewig_collocationsH04.pdf ·...

Page 1: dictionary and corpus - DCU School of Computingjwagner/doc/Ludewig_collocationsH04.pdf · Collocations – Mediating between Lexical Abstractions ... dictionary and corpus. 2 Collocations

1

Collocations –Mediating between Lexical Abstractions

and Textual Concretions

Petra LudewigInstitute of Cognitive Science

University of OsnabrückGermany

Joachim WagnerNational Centre for

Language TechnologySchool of ComputingDublin City University

Ireland

Overview1. Introduction

– Gap between Abstractions and Concretions– Collocations

2. LogoTax– General Objectives– Three Layer Representation– Technological Aspects

3. Conclusion

Abstractions vs. Concretions

Linguistic paradigms• Generative

Grammar• Structuralism

Pedagogical paradigms• Instructivsm• Constructivism

Gap between Abstractions and Concretions

Abstract description of a single word

Authentic example sentences

?LogoTax, a system combining dictionary and corpus

Page 2: dictionary and corpus - DCU School of Computingjwagner/doc/Ludewig_collocationsH04.pdf · Collocations – Mediating between Lexical Abstractions ... dictionary and corpus. 2 Collocations

2

Collocations• Associations of two or more lexemes

– are more or less semantically transparent– involve an arbitrary choice of at least one lexeme– usually cannot be translated compositionally – often highly frequent– sometimes show a special morpho-syntactic

behaviour • “give a talk”

– German: “einen Vortrag halten”– French: “faire une conférence”

Morpho-syntactic Behavior of Collocations

• to put an end to something– *But then I decided to put the end to these

unedifying contacts.– *The end to which I put these unedifying

contacts was pleasant.• Normal behaviour

– to give a talk– the talk that I give today ...

LogoTaxGeneral Objectives

• A combination of dictionary and corpus• Tool to build up a personal dictionary

– tailored to individual needs– reading-based and production oriented– learning as knowledge construction– data-driven entry design– German verb-noun combinations

LogoTaxThree Layer Representation

Abstract Layer: canonical form

full set subsetExample Layer: full, authentic sentences

Intermediate Layer:

morpho-syntactic featuresand their frequency counts

LogoTax - Three Layer RepresentationAbstract Layer

LogoTax - Three Layer RepresentationExample Layer

• Screenshot “Examples”

Page 3: dictionary and corpus - DCU School of Computingjwagner/doc/Ludewig_collocationsH04.pdf · Collocations – Mediating between Lexical Abstractions ... dictionary and corpus. 2 Collocations

3

LogoTax - Three Layer RepresentationIntermediate Layer

• Screenshot “Variations”

LogoTax - Three Layer RepresentationExample Layer – Grouped

LogoTax - Three Layer RepresentationConnecting the Representation Layers

Mediating description

Textual concretion

Lexical abstraction

Connecting the Representation Layers

How is this done?gepardlfg-parser

featureextraction

not

parseable ~ 30% parseable

Light the fire.

irrelevant: no/wrong relation examples +

feature description

The explosion lit a fire at a nearby mobile home park.

Light the fire!He lit

the cand

le

that c

aused th

e fire.

The explosion lit a fire at a

nearby mobile home park.

He lit the candle that caused the fire.

LogoTaxTechnological Aspects

• Automatic retrieval of examples– POS Tagger (IMS)– Der Spiegel 1994– aligned (en/de) corpus of EU publications

• LFG-based parsing• Parser coverage:

– approx. 30%– low recall, high precision

• Chart parser: exponential degradation

Conclusion

LogoTax• does more than just showing examples• uses parsing

– to automate feature identification– to distinguish compatible sentences from

incompatible ones • groups examples according to featues• gives relevant statistics of features

Page 4: dictionary and corpus - DCU School of Computingjwagner/doc/Ludewig_collocationsH04.pdf · Collocations – Mediating between Lexical Abstractions ... dictionary and corpus. 2 Collocations

4

Thank you!

Discussion

ReferencesHeid, U. (1994): On Ways Words Work together – Research Topics in Lex-

ical Combinatorics. In Martin, W., W. Meijs, M. Moerland, E. ten Pas, P. van Sterkenburg and P. Vossen (Ed.): EURALEX ´94, Proceedings of the VIth Euralex International Congress, S. 226 – 257, Amsterdam.

Lewis, M. (2000): Teaching Collocation: Further Developments in the Lexical Approach. Language Teaching Publications (LTP), Hove.

Ludewig, P. (2001): LogoTax – un outil exploratoire pour l'étude de collocations en corpus. In: tal (traitement automatique des langues), vol. 42:2, Special Issue on: Natural Language Processing and Corpus Linguistics / Traitement automatique des langues et linguistique de corpus. Hermès, Paris.

Ludewig, P. (2003): Korpusbasiertes Kollokationslernen – Computer-Assisted Language Learning als prototypisches Anwendungsszenario der Computerlinguistik. Habilitation thesis, University of Osnabrück.

Spitzer, M. (2002): Lernen – Gehirnforschung und die Schule des Lebens.Spektrum – Akademischer Verlag, Heidelberg.