Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting...
Transcript of Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting...
![Page 1: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/1.jpg)
Challenges in Predicting Machine TranslationUtility for Human Post-Editors
Michael Denkowski and Alon Lavie
Language Technologies InstituteCarnegie Mellon University
October 29, 2012
![Page 2: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/2.jpg)
Source Text FastTranslation
MT System
Good fast translation?
Source Text GoodTranslation
Translators
![Page 3: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/3.jpg)
Source Text FastTranslation
MT System
Good fast translation?
Source Text GoodTranslation
Translators
![Page 4: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/4.jpg)
Source Text FastTranslation
MT System
Good fast translation?
Source Text GoodTranslation
Translators
![Page 5: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/5.jpg)
MT with Human Post-Editing
Source Text
FastTranslation
Translators
MT System
Good FastTranslation
![Page 6: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/6.jpg)
Source Text
FastTranslation
Translators
MT System
Very SlowRe-Translation
![Page 7: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/7.jpg)
Source Text
FastTranslation
Translators
MT System
Very SlowRe-Translation
![Page 8: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/8.jpg)
Introduction
Utility prediction: We need to reliably predict the usability ofautomatic translations.
“Referenceless” utility prediction:
• Corresponds to confidence estimation task
• Confidence Estimation for post-editing (Specia 2011)
• WMT 2012 Shared Quality (for post-editing) Estimation Task(Callison-Burch et al., 2012)
Reference-aided utility prediction
• Corresponds to MT evaluation task
• This work
![Page 9: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/9.jpg)
Introduction
Utility prediction: We need to reliably predict the usability ofautomatic translations.
“Referenceless” utility prediction:
• Corresponds to confidence estimation task
• Confidence Estimation for post-editing (Specia 2011)
• WMT 2012 Shared Quality (for post-editing) Estimation Task(Callison-Burch et al., 2012)
Reference-aided utility prediction
• Corresponds to MT evaluation task
• This work
![Page 10: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/10.jpg)
Introduction
Utility prediction: We need to reliably predict the usability ofautomatic translations.
“Referenceless” utility prediction:
• Corresponds to confidence estimation task
• Confidence Estimation for post-editing (Specia 2011)
• WMT 2012 Shared Quality (for post-editing) Estimation Task(Callison-Burch et al., 2012)
Reference-aided utility prediction
• Corresponds to MT evaluation task
• This work
![Page 11: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/11.jpg)
This Work
Machine translation as a starting point for human translators
• Goal is utility for post-editing
• Compare post-editing to traditional adequacy-driven tasks
Examine results of a post-editing experiment
• Simulate a real-world localization scenario
• Examine challenges in predicting translation usefulness forhuman translators
![Page 12: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/12.jpg)
Adequacy Tasks
Adequacy: semantic similarity to reference translations
Significant research efforts on improving end quality of machinetranslation:
• ACL Workshops on Statistical Machine Translation(Callison-Burch et al., 2011)
• NIST Open Machine Translation Evaluations(Przybocki et al., 2009)
Measured by absolute scores or rankings
Motivation: MT for user consumption, input for other NLP tasks
![Page 13: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/13.jpg)
Post-Editing
Human-targeted translation edit rate (HTER, Snover et al., 2006)
1. Human translators correct MT output
2. Automatically calculate number of edits using TER
TER =# of edits
# of reference words
Edits: insertion, deletion, substitution, block shift
![Page 14: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/14.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: He was supposed to pay half a million to Lubos G.
1: He had for Lubosi G. to pay half a million crowns.
0.27
2: He had to pay lubosi G. half a million kronor.
0.09
![Page 15: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/15.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: He was supposed to pay half a million to Lubos G.
1: He had for Lubosi G. to pay half a million crowns.
0.27
2: He had to pay lubosi G. half a million kronor.
0.09
![Page 16: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/16.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: He was supposed to pay half a million to Lubos G.
1: He had for Lubosi G. to pay half a million crowns.
0.27
2: He had to pay lubosi G. half a million kronor.
0.09
![Page 17: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/17.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: He was supposed to pay half a million to Lubos G.
1: He had for to pay Lubosi Lubos G. to pay half a million crowns.
0.27
2: He had to pay lubosi Lubos G. half a million kronor.
0.09
![Page 18: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/18.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: He was supposed to pay half a million to Lubos G.
1: He had for to pay Lubosi Lubos G. to pay half a million crowns.
0.27
2: He had to pay lubosi Lubos G. half a million kronor.
0.09
![Page 19: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/19.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: The problem is that life of the lines is two to four years.
1: The problem is that life is two lines, up to four years.
0.49 0.29
2: The problem is that the durability of lines is two or four years.
0.34 0.14
![Page 20: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/20.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: The problem is that life of the lines is two to four years.
1: The problem is that life is two lines, up to four years.
0.49 0.29
2: The problem is that the durability of lines is two or four years.
0.34 0.14
![Page 21: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/21.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: The problem is that life of the lines is two to four years.
1: The problem is that life is two lines, up to four years.
0.49
0.29
2: The problem is that the durability of lines is two or four years.
0.34
0.14
![Page 22: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/22.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: The problem is that life of the lines is two to four years.
1: The problem is that life is two of the lines , up to is two to four years.
0.49
0.29
2: The problem is that the durability life of lines is two or to four years.
0.34
0.14
![Page 23: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/23.jpg)
Translation ExampleWMT 2011 Czech–English Track
Ref: The problem is that life of the lines is two to four years.
1: The problem is that life is two of the lines , up to is two to four years.
0.49 0.29
2: The problem is that the durability life of lines is two or to four years.
0.34 0.14
![Page 24: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/24.jpg)
MT Post-Editing Experiment
90 sentences from Google Docs documentation
Translated from English to Spanish by two systems:
• Microsoft Translator
• Moses system (Europarl)
180 MT outputs total
Sent to human translators at Kent State Institute for AppliedLinguistics for post-editing
Translators never saw the reference translations
![Page 25: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/25.jpg)
MT Post-Editing Experiment
90 sentences from Google Docs documentation
Translated from English to Spanish by two systems:
• Microsoft Translator
• Moses system (Europarl)
180 MT outputs total
Sent to human translators at Kent State Institute for AppliedLinguistics for post-editing
Translators never saw the reference translations
![Page 26: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/26.jpg)
MT Post-Editing Experiment
Data collected from professional translators (in training):
Post-edited translations
Expert post-editing ratings1: No editing required2: Minor editing, meaning preserved3: Major editing, meaning lost4: Re-translate
From parallel data:
Independent reference translations
![Page 27: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/27.jpg)
MT Post-Editing Experiment
Evaluate post-edited results using standard MT evaluation metrics:
BLEU (Papineni et al., 2002):
• n-gram precision with a brevity penalty
TER (Snover et al., 2006):
• Minimum edit distance
Meteor (Denkowski and Lavie, 2011):
• Tunable alignment-based metric
Task: Reference-assisted utility prediction
![Page 28: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/28.jpg)
MT Post-Editing Results
Average rating: 1.69
Average HTER: 12.4
Automatic metric scores:
BLEU TER Meteor
Post-edited 79.2 12.4 90.0
MT vs Ref 31.7 49.5 58.2
Post vs Ref 34.1 48.3 59.2
![Page 29: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/29.jpg)
MT Post-Editing Results
Average rating: 1.69
Average HTER: 12.4
Automatic metric scores:
BLEU TER Meteor
Post-edited 79.2 12.4 90.0
MT vs Ref 31.7 49.5 58.2
Post vs Ref 34.1 48.3 59.2
![Page 30: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/30.jpg)
MT Post-Editing Results
Average rating: 1.69
Average HTER: 12.4
Automatic metric scores:
BLEU TER Meteor
Post-edited 79.2 12.4 90.0
MT vs Ref 31.7 49.5 58.2
Post vs Ref 34.1 48.3 59.2
![Page 31: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/31.jpg)
MT Post-Editing Results
Average rating: 1.69
Average HTER: 12.4
Automatic metric scores:
BLEU TER Meteor
Post-edited 79.2 12.4 90.0
MT vs Ref 31.7 49.5 58.2
Post vs Ref 34.1 48.3 59.2
![Page 32: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/32.jpg)
MT Post-Editing Results
r 4-pt BLEU TER Meteor
4-point – 0.32 0.28 0.33
HTER 0.49 0.26 0.24 0.27
Metric correlation with post-editing scores
![Page 33: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/33.jpg)
MT Post-Editing Experiment
Oracle experiment: tune Meteor to maximize correlation
How well can we (over)fit expert post-editing ratings?
![Page 34: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/34.jpg)
The Meteor Metric
Flexible alignment:
Scoring features:
• Precision/Recall contribution (insertions, deletions)
• Fragmentation penalty (reordering)
• Content/function word contribution
• Flexible match weights
![Page 35: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/35.jpg)
MT Post-Editing Results
r 4-pt BLEU TER Meteor Meteororacle4-point – 0.32 0.28 0.33 0.35
HTER 0.49 0.26 0.24 0.27 0.34
Metric correlation with post-editing scores
![Page 36: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/36.jpg)
MT Post-Editing Experiment
Additional experiment: translation usability
Divide translations into two groups:
• Suitable for post-editing (1-2)
• Not suitable for post-editing (3-4)
Examine metric score distribution of each group
Assess metric ability to distinguish between usable and non-usabletranslations
Unfair advantage: reference translations
![Page 37: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/37.jpg)
MT Post-Editing Experiment
Additional experiment: translation usability
Divide translations into two groups:
• Suitable for post-editing (1-2)
• Not suitable for post-editing (3-4)
Examine metric score distribution of each group
Assess metric ability to distinguish between usable and non-usabletranslations
Unfair advantage: reference translations
![Page 38: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/38.jpg)
Usability Experiment Results
0.0 0.2 0.4 0.6 0.8 1.0BLEU Score
0
5
10
15
20
25
Sent
ence
s
UsableNon-usable
0.0 0.2 0.4 0.6 0.8 1.0Oracle Meteor Score
0
2
4
6
8
10
12
14
16
18
Sent
ence
s
UsableNon-usable
![Page 39: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/39.jpg)
Usability Experiment Results
0.0 0.2 0.4 0.6 0.8 1.0BLEU Score
0
5
10
15
20
25
Sent
ence
s
UsableNon-usable
0.0 0.2 0.4 0.6 0.8 1.0Oracle Meteor Score
0
2
4
6
8
10
12
14
16
18
Sent
ence
s
UsableNon-usable
![Page 40: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/40.jpg)
Usability Experiment Results
0.0 0.2 0.4 0.6 0.8 1.0BLEU Score
0
5
10
15
20
25
Sent
ence
s
UsableNon-usable
0.0 0.2 0.4 0.6 0.8 1.0Oracle Meteor Score
0
2
4
6
8
10
12
14
16
18
Sent
ence
s
UsableNon-usable
![Page 41: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/41.jpg)
Larger Data Set
Are out results skewed by the small size of the data (180 sentences)?
WMT12 Quality Estimation Task:
1832 English-to-Spanish MT outputs
HTER scores and 5-point multiple-expert ratings
Run usability experiment with this data
![Page 42: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/42.jpg)
Larger Data Set
Are out results skewed by the small size of the data (180 sentences)?
WMT12 Quality Estimation Task:
1832 English-to-Spanish MT outputs
HTER scores and 5-point multiple-expert ratings
Run usability experiment with this data
![Page 43: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/43.jpg)
WMT 2012 Quality Estimation Task Data
0.0 0.2 0.4 0.6 0.8 1.0BLEU Score
0
50
100
150
200
Sent
ence
s
UsableNon-usable
0.0 0.2 0.4 0.6 0.8 1.0Oracle Meteor Score
0
50
100
150
200
Sent
ence
s
UsableNon-usable
![Page 44: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/44.jpg)
Usability vs HTER
How well do experts and HTER agree?
0.0 0.2 0.4 0.6 0.8 1.0HTER
0
10
20
30
40
50
60
70
80
Sent
ence
s
UsableNon-usable
0.0 0.2 0.4 0.6 0.8 1.0HTER
0
50
100
150
200
250
Sent
ence
s
UsableNon-usable
Kent State WMT 2012
![Page 45: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/45.jpg)
Usability vs HTER
How well do experts and HTER agree?
0.0 0.2 0.4 0.6 0.8 1.0HTER
0
10
20
30
40
50
60
70
80
Sent
ence
s
UsableNon-usable
0.0 0.2 0.4 0.6 0.8 1.0HTER
0
50
100
150
200
250
Sent
ence
s
UsableNon-usable
Kent State WMT 2012
![Page 46: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/46.jpg)
Usability vs HTER (WMT12)
1
1.5
2
2.5
3
3.5
4
4.5
5
0 20 40 60 80 100
Expert
Rating
HTER
0
20
40
60
80
100
![Page 47: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/47.jpg)
Conclusions
MT for post-editing utility is a significantly different task fromMT for adequacy
Current MT tools under-perform on predicting post-editingusability
Even metrics that use post-editing information (HTER) don’tmatch expert assessments
To improve post-editing usability, we need better data, bettermetrics, better MT systems
![Page 48: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/48.jpg)
Conclusions
www.transcenter.info
![Page 49: Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting point for human translators Goal is utility for post-editing Compare post-editing](https://reader034.fdocuments.us/reader034/viewer/2022052519/5f1fa6cb8bab48797f6a5078/html5/thumbnails/49.jpg)
Challenges in Predicting Machine TranslationUtility for Human Post-Editors
Michael Denkowski and Alon Lavie
Language Technologies InstituteCarnegie Mellon University
October 29, 2012