1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte...

25
1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université de Montréal Centlex Aarhus School of Business Aarhus University Advanced Encoding for Multilingual Access in a Terminological Data Base A Matter of Balance

Transcript of 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte...

Page 1: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

1

Marie-Claude L'Homme

Patrick Leroyer

Benoit Robichaud

Observatoire de linguistique Sens-Texte (OLST)Département de linguistique et de traduction

Université de Montréal

CentlexAarhus School of Business

Aarhus University

Advanced Encoding for Multilingual Access in a Terminological Data

BaseA Matter of Balance

Page 2: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

2

Outline

Objectives Access to translations of specialized collocations encoded

in a terminological database

The terminological database: The DiCoInfo Current contents and structure Current functionalities and limitations for translation

needs

A model for accessing specialized collocations The linguistic apparatus The technical apparatus

Challenges and future work

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 3: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

3

Objectives

Implementing new translation functionalitiesin an existing terminological database: Direct access to data for L1-L2 translation:

what is the translation of a specific collocation? Esp. providing users access to translations of collocations:

send a file as an attachment -> envoyer, transmettre un fichier en pièce

jointe

Define a method that allows for the enrichment of the database without having to translate collocations one by one

Access functionalities should not presuppose technical linguistic knowledge from the user

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 4: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

4

The DiCoInfo: a dynamic, polyfunctional tool (1):

an overview

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

An XML database containing terms related to the fieldsof computing and the Internet

Approx. 1,000 entries in French and 400 in English Based mainly on the lexical framework of Explanatory

Combinatorial Lexicology (ECL, Mel’cuk et al. 1984-1999; 1995)

Descriptions based on corpora(2 million words in French; 1 million words in English)

Page 5: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

5

The DiCoInfo: a dynamic, polyfunctional tool (2):

the term record

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 6: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

6

The DiCoInfo: a dynamic, polyfunctional tool (3):

the user interface

Search Language

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Search Mode

Search Precision

Page 7: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

7

The DiCoInfo: a dynamic, polyfunctional tool (4):

narrowing searches

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Words starting with the string program (14)

Terms starting with the word program (6)

Page 8: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

8

The DiCoInfo as a translation aid:current functionalities

For L1-L2 translation L1 reception phase:

Comprehensive coverage of the domain Headwords and definitions Access to lists of semantically related items

L2 production phase: Presentation of grammatical data Actancial structures and linguistic forms of actants Contexts for pragmatic and stylistic information

(professional discourse)

L1 > L2 translation phase: Equivalents to the headwords Equivalents to collocations

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 9: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

One DicoInfo database = one dictionary

9

Dicoinfo database

Lexicographic team

Dictionary

Interface

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 10: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

One DicoInfo database = several dictionaries

10

Dicoinfo database

Lexicographic team

Search engine

L1 & L2 ProductionDictionary

L1<>L2 Translation Dictionary

LSP-learning

Dictionary

Other dictionary

applications

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 11: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

11

The DiCoInfo as a translation tool:potential improvements

Extensive lists of collocations◦ Comprehensiveness leads to lists of collocations that are not

discriminated according to specific situations or user needs E.g. file has a long list of collocates (e.g., create, delete,

compress. generate, use, edit a file, etc.); its French equivalent fichier has more than 100 collocates

Limited multilingual assistance Established at the level of headwords, but not at the level of

lexical relationships (this includes collocations) E.g. there is a formal link between attachment and pièce jointe,

but not between send something as an attachment and envoyer qqch. en pièce jointe

For L2 production phase, the translator needs direct access to translations of collocations

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 12: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

Accessing translations of collocations: the model

Two components A linguistic apparatus based on Combinatorial

Explanatory Lexicology: lexical functions (LFs) Encodings and formalization

Explanation-based grouping

A technical apparatus based on advanced search functions Searching for lexical relations and expressions

Displaying equivalences

12

Page 13: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

13

The linguistic apparatus (1)

Lexical function (LF) encodes:1. Syntactic relationship between the base and the collocate:

Space bar:

Verb + 1st complement: press the ~ Verb + 1st complement: release the ~Verb + 2nd complement: insert ... (a space) with the ~

2. Argument structure of the base:Space bar: ~ used by someone (arg1) to act on something (arg2)

1st argument: press the ~1st and 2nd arguments: insert something with the ~

3. General and abstract meaning of the collocate:Typical uses: press, release a space bar,

insert something with a space barCreation: create, define a password

write, develop a program

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 14: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

14

The linguistic apparatus (2)

Lexical function (LF)

Written: f (x) = y f = function x = keyword y = value

Real1(space bar) = press the ~

FinReal1(space bar) = release the ~

Labreal12(space bar) = insert … with the ~

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 15: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

15

The linguistic apparatus (3)

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 16: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

Searching the database:

16

The technical apparatus:Equivalents of collocates (1)

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

• Find the term records that describe the searched term in a lexical relation

• Find the term records of the equivalents

Page 17: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

The technical apparatus:Equivalents of collocates (2)

Linking the equivalents inthe interface:

17

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

?

Page 18: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

18

The technical apparatus:Equivalents of collocations (1)

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

• Find the term records in which:(i) a first word appears as the headword(ii) a second word appears as a collocate

• Find the equivalent term records

Page 19: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

The technical apparatus:Equivalents of collocations (2)

19

Linking the equivalents inthe interface:

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 20: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

The technical apparatus:Side effects

20

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 21: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

Challenges (1)

Different syntactic structures Different LFs according to syntactic functions of the

key word:

Real12: Search the Internet for information

Labreal12: Chercher de l’information dans Internet

Split actants partition: ~ created by user1 to act on data1 or

software1

Labreal121 : Save data on a partition

Labreal122 : Install a program on a partition

21

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 22: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

Challenges (2)22

L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Page 23: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

Concluding remarks

We proposed a model to retrieve translations of collocations

That meets user needs

That is transparent (does not presuppose special linguistic or technical knowledge)

That does not require that all collocations be translated on an individual basis

23

Page 24: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

24

Future work

Extension of the coverage of English terminology adding English collocations to the database

Extension to other languages, namely Spanish which is currently under development

Extension to other subject fields Ongoing project in the field of climate change

Extension of search capabilities To allow users to discover collocates based on an

onomasiological search

Page 25: 1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

Go raibh maith agat

References

L'Homme, M.-C. (2008) Le DiCoInfo. Méthodologie pour une nouvelle génération de dictionnaires spécialisés, Traduire 217, pp. 78-103.

L’Homme, M.-C. et al. (2009). Le manuel du DiCoInfo. http://olst.ling.umontreal.ca/dicoinfo/manuel-DiCoInfo.pdf

L’Homme, M.-C. and P. Leroyer (2009). Combining the semantics of collocations with situation-driven search paths in specialized dictionaries. Terminology 15(2), pp. 258-283.

Leroyer, P. (2007) Terminologie et dictionnaires: la porte des utilisateurs. In Quirion, J. : Terminologies, Approches Transdisciplinaires. Actes en ligne. Gatineau : Université du Québec en Outaouais. http://www.uqo.ca/terminologie2007/documents/Leroyer.pdf

Mel’čuk, I., A. Clas and A. Polguère (1995) Introduction à la lexicologie explicative et combinatoire. Louvain-la-Neuve (Belgique): Duculot / Aupelf - UREF.

Mel’čuk, I. et al. (1984-1999) Dictionnaire explicatif et combinatoire du français contemporain. Montréal: Presses de l’Université de Montréal.

25