Applicative evaluation of bilingual terminologies
-
Upload
estelle-delpech -
Category
Technology
-
view
99 -
download
2
description
Transcript of Applicative evaluation of bilingual terminologies
1
Applicative evaluation of bilingual terminologies
Estelle DelpechNODALIDA 12th May 2011
2
Outline
1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements
2 / 47
3
Outline
1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements
3 / 47
4
Context of the work
• Bilingual terminology mining from comparable corpora
• Application to: – computer-aided translation– computer-aided terminology
4 / 47
5
Scope of the work
• Find a way to show the "added-value" of the acquired terminology when used for technical translation– do translators translate better and/or faster ?
• Conception and experimentation of an "applicative" evaluation protocol for bilingual terminologies
5 / 47
6
Outline
1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements
6 / 47
7
Comparable corpora
English texts on breast cancer
French texts on breast cancer
It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer...
Histological evaluation revealed the presence of DCIS...
L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire....
Un diagnostic histologique est nécessaire...
7 / 47
8
Comparable corpora
English texts on breast cancer
French texts on breast cancer
It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer...
Histological evaluation revealed the presence of DCIS...
L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire....
Un diagnostic histologique est nécessaire...
8 / 47
9
Comparable corpora
English texts on breast cancer French texts on breast cancer
It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer...
Histological evaluation revealed the presence of ductal carcinoma in situ.
L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire....
Un diagnostic histologique est nécessaire...
9 / 47
10
Advantages of comparable corpora
• More available– new domains– unprecedented language pairs
• Quality– spontaneous language– not influenced from source texts
10 / 47
11
Reference evaluation of bilingual terminologies
• Reference evaluation: – output of the program is compared with a list
of reference translations• Precision:
– percentage of output translations which are in the reference
output∩referenceoutput
11 / 47
12
Reference evaluation with comparable corpora
• Output:– source term → ordered list of candidate
translations• Example:
– histological → diagnostic1, histologie2,
histologique3, … nécessairen
12 / 47
13
Reference evaluation with comparable corpora
• Precision: – percentage of output translations which are in
the reference when you take into account the Top 20 or Top 10 candidate translations
• State-of-the-art:– between 42% and 80% on Top 20
depending on corpus size, corpus type, nature of translated elements [Morin and Daille, 2009]
13 / 47
14
Reference vs. Applicative evaluation
• Reference evaluation: – ok for testing/developing the alignment
program– fast, cheap, reproducible, objective
• Applicative evaluation:– how much does the alignment program help
the end-users ?– can the terminologies improve translation
quality?
14 / 47
15
Outline
1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements
16
Applicative evaluation scenario
16 / 47
17
Applicative evaluation scenario
17 / 47
18
Applicative evaluation scenario
18 / 47
19
Applicative evaluation scenario
19 / 47
20
Questions raised
2) Evaluate the whole of the translations or technical terms only ?
1) How do you assess translation quality ?
20 / 47
21
1) How do you assess translation quality ?
• Translation studies evaluation grids:– SICAL, SAE J 2450– too complex, scarcely documented
• Machine translation objective metrics – BLEU, METEOR– not adapted to human translation– reproducibility is not an advantage in our case
21 / 47
22
1) How do you assess translation quality ?
• Machine translation subjective evaluation– translations evaluated by humans:
• quality judgement: adequacy, fluency... • ranking
– use annotator agreement measure to ensure judges agreement is sufficient
22 / 47
23
2) Evaluate the whole text or just some terms ?
• Quality of a text translation = complex interaction of several parameters
• Focus on those elements for which the translator felt he/she needed a linguistic resource:– evaluates only the part of the translation on
which the terminology has an impact– easier and faster
23 / 47
24
Applicative evaluation protocol
• Compare 3 different "situations of translations" – one situation = one type of resource
• Translators do the translation, note down the terms they had to look up
• The quality of the terms' translations is assessed by human judges
24 / 4724 / 47
25
Situations of translation
25 / 47
26
Situations of translation
26 / 47
27
Situations of translation
27 / 47
28
Situations of translation
28 / 47
29
Translations' assessment
1. Quality judgement : – correct: standard term or expression– acceptable: meaning is retained– wrong: no meaning is retained
2. Ranking : – from best to worst– ties allowed
29 / 47
30
Outline
1. Context and scope of work2. Comparable corpora and terminology evaluation3. Applicative evaluation protocol4. Experimentation and results5. Future improvements
31
Data
• Comparable corpora : – breast cancer: 400k words/language– water science: 2M words/language
• Texts to translate :– research paper abstracts: ~500 words/domain– lay science texts: ~500 words/domain
32
Translators' feedback
" Globally, 75% of technical words aren't in the glossary, and for the other 25%, 99% have between 10 and 20 candidate translations and none has been validated. So most of the time, you are just partly sure, but you are never totally sure of your translation. And in the worst cases, you translate instinctively ".
Translators were not prepared to use a bilingual terminology with many candidate translations The terminology covered partially the vocabulary of the texts to translate
32 / 47
33
Terminology coverage of texts to translate
• Breast Cancer – 94% of the vocabulary of the texts is in the
terminology – fine-grained topic
• Water Science– 14% of the vocabulary of the texts is in the
terminology– topic is too general
33 / 47
34
Quality judgement / Breast Cancer
SIT. 0 / GEN. LANG.SIT. 1 / CC
SIT. 2 / WEB
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
38% 43% 47%
42% 38% 35%
20% 19% 18%
BREAST CANCERK = 0,25
• equivalent proportion of incorrect translations
• Internet gives the more correct translations, then the Comparable Corpora.
34 / 47
35
Quality judgement / Water Science
• Translations are much better with Internet
• Comparable corpora produces worse translations than the general resources
SIT. 0/ GEN. LANG.SIT. 1 / CC
SIT. 2 / WEB
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
59% 56%77%
23% 23%
16%18% 21%
7%
WATER SCIENCEK = 0,42
35 / 47
36
Results seem incoherent
• Translations produced in situation 1 are worse than translations produced in sit. 2
• But they share the same "general language resource" basis
BASELINE Situation 1
generallanguageresources
Terminologymined from
COMPARABLECORPORA
general languageresources
36 / 47
37
Possible explanation
When translators have a specialized ressource they tend to ignore the general language resource
BASELINE SITUATION 1Comparable corpora
SITUATION 2Web
General Language resource 43% 14% 3%
Specialized resource - 25% 56%
Intuition 79% 77% 44%
37 / 47
38
Possible explanation
If translators of situation 1 had always looked up the general resource first, translations of situation 1 would have been at least as good as translations of situation 0
BASELINESITUATION 1Comparable
corporaSITUATION 2
Web
General Language resource 43% 14% 3%
Specialized resource - 25% 56%
Intuition 79% 77% 44%
38 / 47
39
Ranking / Breast Cancer
CC vs. GEN. LANG. CC vs. WEB0
5
10
15
20
25
30
35
40
45
28% 26%
47% 42%
26%
32%
BREAST CANCERK=0,69
39 / 47
40
Ranking / Water science
CC vs. GEN. LANG. CC vs. WEB0
10
20
30
40
50
60
70
80
90
18% 16%
49%41%
33%
43%
WATER SCIENCEK=0,63
40 / 47
41
Outline
1. Context and scope of work2. Bilingual terminology mining : comparable vs. parallel corpora3. Evaluation of bilingual terminologies4. Applicative evaluation protocol5. Experimentation and results6. Future improvements
42
Improvements: terminology coverage
• dependency between:– added-value of the bilingual terminology– its coverage of the texts to translate
• any added-value measure should also indicate to what extent the terminology contains the vocabulary of the translated texts
42 / 47
43
Improvement 1: terminology coverage
• Perspectives:– create a "coverage" measure– find out what is the minimum coverage for a
terminology to be "useful" to translate a given text
– gather smaller but finer-grained corpora
43 / 47
44
Improvement 2: situations of translations
• When translators have several ressources at their disposal, they tend to ignore the general language resource
• Consequence : the same resource is used differently depending on the situation
• Seems to be the cause for incoherent results
44 / 47
45
Improvement 2: situations of translations
• Perspective : use 0 or 1 resource per situation of translation
Situation 0 Situation 1
terminologymined fromComparable
Corpora
Web
Situation 2
45 / 47
46
Improvement 3: train translators
• Prepare translators to use "ambiguous", unvalidated terminologies
• Do a first blank evaluation to :– train the translators– train the judges → results in higher
agreement
46 / 47
47
Acknowledgements
This work was funded by:– French National Research Agency, subvention
n° ANR-08-CORD-009– Lingua et Machina, www.lingua-et-machina.com
Annotators:– Clémence De Baudus– Mathieu Delage