Overview of Multidimensional Quality Metrics (QTLaunchPad)

Post on 20-Jun-2015

1.955 views 2 download

Tags:

Transcript of Overview of Multidimensional Quality Metrics (QTLaunchPad)

Translation Quality Assessment:

Five Easy StepsUsing Multidimensional Quality Metrics to

improve quality assessment and management

Prepared by the QTLaunchPad project (info@qt21.eu)

version 1.0 (26.April 2013)

Who does this apply to?

Requesters of translation services looking for relevant quality metrics

Language Service Providers (LSPs) delivering translation services to their clients

The following materials will apply to negotiation between requesters and providers

This description does not apply to individual translators (although they may want to be aware of the contents)

Step 1: Specifications

Basic questions about your project

E.g.,

What languages are you working in?

What is your subject field?

What sort of project is it (e.g., user interface, documentation, advertising)?

What technology are you using (MT, CAT, etc.)?

What register and style are you using?

Step 2. Select Metrics

Based on your specifications…

MQM recommendation tool will: suggest a pre-defined metric used for similar projects, or recommend a custom metric that applies to your project

You are free to modify the metric as needed

Create a metrics specification file that defines the issues to be examined provides weights (descriptions of how important the

issues are)

Metrics specification file can be used by an MQM-compliant tool

Step 3: Evaluation Method

Three options:

1. Sampling: Examine a portion of the text to determine whether to pass or fail the entire text. Sampling can utilize quality estimation for better results

2. Full error analysis: Review the entire text (needed for critical legal or safety texts)

3. Rubric: Rate the text on a numerical scale (suitable for quick assessment of suitability)

Automated Metrics

If sampling is used, MQM’s quality estimation tools will help focus sampling on those parts of the text that need attention

Automatic metrics can be used in some cases where human evaluation is too expensive or time-consuming

Step 4: Evaluation

Evaluation…

Can be conducted by the requester or LSP in accordance with the agreement between the parties

Follows the method chosen in Step 3 (evaluation method)

Issues must match the metric chosen in Step 2: issues not found in the metric should not be considered errors

MQM provides capabilities

For human evaluation Inline markup provides an audit trail:

Allows independent verification of errors Helps ensure that issues are corrected

Full reporting functions: See what types of errors are reported Understand where errors come from

For automatic evaluation Integrated use of existing quality metrics to help

provide evaluation

translate5

These capabilities are being integrated into an open-source editing tool, translate5 (http://www.translate5.net)

All results are free to implement in additional tools (both open source and proprietary)

Parties interested in development should contact info@qt21.eu

The source matters

Full MQM evaluation includes the source

Source quality evaluation can help identify reasons for problems and resolve them

Translators can be rewarded for addressing source deficiencies (scores over 100% are possible!)

Step 5: Scoring

Scoring Formula

(Q = whatever set of issues being counted within the bigger formula)

Provides consistency with LISA QA Model scoring method

Can be customized to support other legacy systems

Can be applied to individual parts of the overall formula: i.e., fluency, accuracy, grammar, etc. subscores can be derived

Weights (not shown) can be used to adjust importance of various issue types

Scores help guide decisions

Scores are given on a 100% basis

Scores can be broken down into more fine-grained reports. E.g., a score of 96% could have 100% accuracy but

92% fluency. Helps target actions for quality control.

Example

1. Specifications

Parameter Value

Language/Locale Source: English; Target: Japanese

Subject field/domain Medical

Text type Narrative

Audience Educated readers with an interest in medicine

Purpose Education about a new procedure for managing diabetes

Register Moderately formal

Style no specified style – match source if possible

Content correspondence

Literal translation

Output modality subtitles (speech to text)

File format Time-coded XML for dotSub

Production technology human translation

2. Recommended Metric

Issue type Weight (high, medium, low)

Notes

Fluency

Orthography High

Grammar High

Accuracy

Mistranslation High

Omission Low Due to nature as captions, some information loss is expected. Captions should be 60% of spoken dialogue

Untranslated High

Legal requirements

High Must make sure that legal claims are admissible under Japanese law

Chosen from…

Issue types are a subset of the full catalog of types

Chosen from…

Quality Formula (1)

TQ = (Atr + At - As) + (Ft – Fs)

with respect to specifications

TQ = translation qualityAtr = accuracy (transfer)At = accuracy for the target textAs = accuracy for the source textFt = fluency score for target textFs = fluency score for source text

Quality Formula (2)

TQ = (Atr + At - As) + (Ft – Fs)

with respect to specifications

Definition: A quality translation demonstrates required accuracy and fluency for the audience and purpose and complies with all other negotiated specifications, taking into account end-user needs.

The gold portion = dimensions (specifications)

3. Evaluation method

In this example, portions of the text are marketing: sampling is an acceptable evaluation method for these parts

Other portions contain legal and regulatory claims: full error analysis is required for those portions

Inline markup can be used via MQM namespace (because text is in XML) to ensure corrections are made.

4. Evaluation

• Evaluation includes subsegment markup with issues in metric

• Issues stored in MQM namespace to allow audit and revision

• Users can select three severity levels:• critical: the issue renders the text unusable• major: the issue leaves the text usable, but is an obstacle

to understanding• minor: the issue does not impact usability of the text

screenshot: translate5.net showing MQM markup tool

5. Scoring

Issue type Weight Minor Major

Critical

Penalty

Adjusted

Total

Fluency

Orthography 1.0 8 2 1 28 28 97.2%

Grammar 1.0 6 2 0 16 16 98.4%

Subtotal 44 95.6%

Accuracy

Mistranslation

1.0 4 0 0 4 4 99.6%

Omission 0.2 12 4 1 42 8.4 99.2%

Untranslated 1.0 1 0 0 1 1 99.9%

Legal requirements

1.0 0 0 1 10 10 99.0%

Subtotal 23.4 97.7%

Total 67.4 93.3%

Assumes 1000-word sample

Because Omission is considered a low priority in this case, it is given a low

weight

5. Scoring

Without weighting of Omission, the score would be 89.9%

We can see that the translator has more problems with fluency than with accuracy

5. Full scoring (including source)

Issue type Source Target Adjusted

Fluency

Orthography 96.1% 97.2% 101.1%

Grammar 99.0% 98.4% 99.6%

Subtotal 95.1% 95.6% ☞ 100.5%

Accuracy

Mistranslation (100%) 99.6% 99.6%

Omission (100%) 99.2% 99.2%

Untranslated (100%) 99.9% 99.9%

Legal requirements

(100%) 99.0% 99.0%

Subtotal 100% 97.7% 97.7%

Total 95.1% 89.9% 98.2%

Assumes 1000-word sample. Source accuracy set to 100% for computational purposes.

5. Scoring (including source)

In many cases, some problems in a translation are not caused by the translator.

In this case, the translator fixed problems in the source, resulting in better quality for fluency in the target. The translator should be recognized for this work.

For more information

Please visit http://www.qt21.eu/launchpad/

Write to info@qt21.eu