Further evidence for a functionalist approach to translation quality evaluation
-
Upload
ehsaanalipour -
Category
Documents
-
view
87 -
download
6
description
Transcript of Further evidence for a functionalist approach to translation quality evaluation
Further evidence for a functionalist approach to translation quality evaluation
Sonia ColinaThe University of Arizona
Colina (2008) proposes a componential-functionalist approach to translation quality evaluation and reports on the results of a pilot test of a tool designed ac-cording to that approach The results show good inter-rater reliability and justify further testing The current article presents an experiment designed to test the approach and tool Data was collected during two rounds of testing A total of 30 raters consisting of Spanish Chinese and Russian translators and teachers were asked to rate 4ndash5 translated texts (depending on the language) Results show that the tool exhibits good inter-rater reliability for all language groups and texts except Russian and suggest that the low reliability of the Russian ratersrsquo scores is unrelated to the tool itself The findings are in line with those of Colina (2008)
Keywords quality assessment evaluation rating componential functionalism errors
0 Introduction
Recent US federal mandates (eg White House Executive Order 13166)1 requir-ing health care providers who are recipients of federal funds to provide language translation and interpretation for patients with limited English proficiency (LEP) have brought the long-standing issue of translation quality to a wider audience of health care professionals (eg managers decision makers industry stakeholders private foundations) who generally feel unprepared to address the topic A strik-ing example of how challenging quality evaluation can be for health care organiza-tions is illustrated by the experience of Hablamos Juntos an initiative funded by the Robert Wood Johnson Foundation to develop practical solutions to language barriers to health care
Several healthcare providers (including hospitals) working with the program identified what they believed were ldquothe bestrdquo translations available Eighty-seven
Target 212 (2009) 235ndash264 doi 101075target21202colissn 0924ndash1884 e-issn 1569ndash9986 copy John Benjamins Publishing Company
236 Sonia Colina
documents rated as highly satisfactory and recommended for replication were collected from the providers Examination of these health education texts by doc-torate-level Spanish language specialists resulted in quality being identified as a problem Many of these texts were cumbersome to read to the point that readers required the English originals to decipher the intended meanings of some trans-lations It became clear that these texts were potentially hampering health care quality and outcomes by not providing needed access to intended health care in-formation for patients with limited English proficiency Furthermore health care administrators overseeing the translation processes that produced these texts had not identified quality as a problem and needed assistance assessing the quality of non-English written materials It was this context that prompted the launch of the Translation Quality Assessment (TQA) project funded as one of various HJ initia-tives to improve communication between health providers and patients with lim-ited English proficiency The TQA project aims to design and test a research-based prototype tool that could be used by health care organizations to assess the quality of translated materials being able to identify a wide range of quality Colina (2008) describes the initial version of the tool and the first phase of testing The results of a pilot experiment reported also in Colina (2008) reveal good inter-rater reli-ability and provide justification for further testing The current article presents a second experiment designed to test the approach and tool
1 Translation quality revisited
Translation quality evaluation is probably one of the most controversial intensely debated topics in translation scholarship and practice Yet progress in this area does not seem to correlate with the intensity of the debate One may wonder whether the situation is perhaps partly related to the diverse nature of the defi-nitions of translation In a field such as translation studies filled with unstated often culturally-dependent assumptions about the role of translation and transla-tors equivalence and literalness translation norms and translation standards it is not surprising that quality and evaluation have remained elusive to definition or standards Current reviews of the literature offer support for this hypothesis (Co-lina 2008 House 2001 Lauscher 2000) as they reveal a multiplicity of views and priorities in the area of translation quality In one recent overview Colina (2008) classifies the various approaches into two major groups according to whether their orientation is experiential or theoretical parts of that overview are reproduced here for ease of reference (see further Colina 2008)
Further evidence for a functionalist approach to translation quality evaluation 237
11 Experiential approaches
Many methods of translation quality assessment fall within this category They tend to be ad hoc anecdotal marking scales developed for the use of a particular professional organization or industry eg the ATA certification exam the SAE J2450 Translation Quality Metric for the automotive industry the LISA QA tool for localization2 While the scales are often adequate for the particular purposes of the organization that created them they suffer from limited transferability pre-cisely due to the absence of theoretical andor research foundations that would permit their transfer to other environments For the same reason it is difficult to assess the replicability and inter-rater reliability of these approaches
12 Theoretical approaches
Recent theoretical research-based approaches tend to focus on the user of a trans-lation andor the text They have also been classified as equivalence-based or func-tionalist (Lauscher 2000) These approaches arise out of a theoretical framework or stated assumptions about the nature of translation however they tend to cover only partial aspects of quality and they are often difficult to apply in professional or teaching contexts
121 Reader-response approachesReader-response approaches evaluate the quality of a translation by assessing whether readers of the translation respond to it as readers of the source would re-spond to the original (Nida 1964 Carroll 1966 Nida and Taber 1969) The reader-response approach must be credited with recognizing the role of the audience in translation more specifically of translation effects on the reader as a measure of translation quality This is particularly noteworthy in an era when the dominant notion of lsquotextrsquo was that of a static object on a page
Yet the reader-response method is also problematic because in addition to the difficulties inherent to the process of measuring reader response the response of a reader may not be equally important for all texts especially for those that are not reader-oriented (eg legal texts) The implication is that reader response will not be equally informative for all types of translation In addition this method ad-dresses only one aspect of a translated text (ie equivalence of effect on the reader) ignoring others such as the purpose of the translation which may justify or even require a slightly different response from the readers of the translation One also wonders if it is in fact possible to determine whether two responses are equivalent as even monolingual texts can trigger non-equivalent reactions from slightly dif-ferent groups of readers Since in most cases the readership of a translated text is
238 Sonia Colina
different than that envisioned by the writer of the original3 one can imagine the difficulties entailed by equating quality with equivalence of response Finally as with many other theoretical approaches reader-response testing is time-consum-ing and difficult to apply to actual translations At a minimum careful selection of readers is necessary to make sure that they belong to the intended audience for the translation
122 Textual and pragmatic approachesTextual and pragmatic approaches have made a significant contribution to the field of translation evaluation by shifting the focus from counting errors at the word or sentence level to evaluating texts and translation goals giving the reader and communication a much more prominent role Yet despite these advances none of these approaches can be said to have been widely adopted by either profes-sionals or scholars
Some models have been criticized because they focus too much on the source text (Reiss 1971) or on the target text (Skopos) (Reiss and Vermeer 1984 Nord 1997) Reiss argues that the text type and function of the source text is the most important factor in translation and quality should be assessed with respect to it For Skopos Theory it is the text type and function of the translation that is of para-mount importance in determining the quality of the translation
Housersquos (1997 2001) functional pragmatic model relies on an analysis of the linguistic-situational features of the source and target texts a comparison of the two texts and the resulting assessment of their match The basic measure of qual-ity is that the textual profile and function of the translation match those of the original the goal being functional equivalence between the original and the trans-lation One objection that has been raised against Housersquos functional model is its dependence on the notion of equivalence often a vague and controversial term in translation studies (Houmlnig 1997) This is a problem because translations sometimes are commissioned for a somewhat different function than that of the original in addition a different audience and time may require a slightly different function than that of the source text (see Houmlnig 1997 for more on the problematic notion of equivalence) These scenarios are not contemplated by equivalence-based theo-ries of translation Furthermore one can argue that what qualifies as equivalent is as variegated as the notion of quality itself Other equivalence-based models of evaluation are Gerzymisch-Arbogast (2001) Neubert (1985) and Van den Broeck (1985) In sum the reliance on an a priori notion of equivalence is problematic and limiting in descriptive as well as explanatory value
An additional objection against textual and pragmatic approaches is that they are not precise about how evaluation is to proceed after the analysis of the source or the target text is complete or after the function of the translation has been established
Further evidence for a functionalist approach to translation quality evaluation 239
as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)
Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)
13 The functional-componential approach (Colina 2008)
Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously
240 Sonia Colina
As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text
As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)
The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use
Further evidence for a functionalist approach to translation quality evaluation 241
the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)
2 Second phase of TQA testing Methods and Results
21 Methods
One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims
I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions
Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate
the texts Question 3 How consistently do raters in the second session (Reliability) rate
the texts Question 4 How consistently do raters rate each component of the tool Are
there some test components where there is higher rater reliability
II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)
Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation
242 Sonia Colina
211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing
As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages
Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)
Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors
Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application
The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
236 Sonia Colina
documents rated as highly satisfactory and recommended for replication were collected from the providers Examination of these health education texts by doc-torate-level Spanish language specialists resulted in quality being identified as a problem Many of these texts were cumbersome to read to the point that readers required the English originals to decipher the intended meanings of some trans-lations It became clear that these texts were potentially hampering health care quality and outcomes by not providing needed access to intended health care in-formation for patients with limited English proficiency Furthermore health care administrators overseeing the translation processes that produced these texts had not identified quality as a problem and needed assistance assessing the quality of non-English written materials It was this context that prompted the launch of the Translation Quality Assessment (TQA) project funded as one of various HJ initia-tives to improve communication between health providers and patients with lim-ited English proficiency The TQA project aims to design and test a research-based prototype tool that could be used by health care organizations to assess the quality of translated materials being able to identify a wide range of quality Colina (2008) describes the initial version of the tool and the first phase of testing The results of a pilot experiment reported also in Colina (2008) reveal good inter-rater reli-ability and provide justification for further testing The current article presents a second experiment designed to test the approach and tool
1 Translation quality revisited
Translation quality evaluation is probably one of the most controversial intensely debated topics in translation scholarship and practice Yet progress in this area does not seem to correlate with the intensity of the debate One may wonder whether the situation is perhaps partly related to the diverse nature of the defi-nitions of translation In a field such as translation studies filled with unstated often culturally-dependent assumptions about the role of translation and transla-tors equivalence and literalness translation norms and translation standards it is not surprising that quality and evaluation have remained elusive to definition or standards Current reviews of the literature offer support for this hypothesis (Co-lina 2008 House 2001 Lauscher 2000) as they reveal a multiplicity of views and priorities in the area of translation quality In one recent overview Colina (2008) classifies the various approaches into two major groups according to whether their orientation is experiential or theoretical parts of that overview are reproduced here for ease of reference (see further Colina 2008)
Further evidence for a functionalist approach to translation quality evaluation 237
11 Experiential approaches
Many methods of translation quality assessment fall within this category They tend to be ad hoc anecdotal marking scales developed for the use of a particular professional organization or industry eg the ATA certification exam the SAE J2450 Translation Quality Metric for the automotive industry the LISA QA tool for localization2 While the scales are often adequate for the particular purposes of the organization that created them they suffer from limited transferability pre-cisely due to the absence of theoretical andor research foundations that would permit their transfer to other environments For the same reason it is difficult to assess the replicability and inter-rater reliability of these approaches
12 Theoretical approaches
Recent theoretical research-based approaches tend to focus on the user of a trans-lation andor the text They have also been classified as equivalence-based or func-tionalist (Lauscher 2000) These approaches arise out of a theoretical framework or stated assumptions about the nature of translation however they tend to cover only partial aspects of quality and they are often difficult to apply in professional or teaching contexts
121 Reader-response approachesReader-response approaches evaluate the quality of a translation by assessing whether readers of the translation respond to it as readers of the source would re-spond to the original (Nida 1964 Carroll 1966 Nida and Taber 1969) The reader-response approach must be credited with recognizing the role of the audience in translation more specifically of translation effects on the reader as a measure of translation quality This is particularly noteworthy in an era when the dominant notion of lsquotextrsquo was that of a static object on a page
Yet the reader-response method is also problematic because in addition to the difficulties inherent to the process of measuring reader response the response of a reader may not be equally important for all texts especially for those that are not reader-oriented (eg legal texts) The implication is that reader response will not be equally informative for all types of translation In addition this method ad-dresses only one aspect of a translated text (ie equivalence of effect on the reader) ignoring others such as the purpose of the translation which may justify or even require a slightly different response from the readers of the translation One also wonders if it is in fact possible to determine whether two responses are equivalent as even monolingual texts can trigger non-equivalent reactions from slightly dif-ferent groups of readers Since in most cases the readership of a translated text is
238 Sonia Colina
different than that envisioned by the writer of the original3 one can imagine the difficulties entailed by equating quality with equivalence of response Finally as with many other theoretical approaches reader-response testing is time-consum-ing and difficult to apply to actual translations At a minimum careful selection of readers is necessary to make sure that they belong to the intended audience for the translation
122 Textual and pragmatic approachesTextual and pragmatic approaches have made a significant contribution to the field of translation evaluation by shifting the focus from counting errors at the word or sentence level to evaluating texts and translation goals giving the reader and communication a much more prominent role Yet despite these advances none of these approaches can be said to have been widely adopted by either profes-sionals or scholars
Some models have been criticized because they focus too much on the source text (Reiss 1971) or on the target text (Skopos) (Reiss and Vermeer 1984 Nord 1997) Reiss argues that the text type and function of the source text is the most important factor in translation and quality should be assessed with respect to it For Skopos Theory it is the text type and function of the translation that is of para-mount importance in determining the quality of the translation
Housersquos (1997 2001) functional pragmatic model relies on an analysis of the linguistic-situational features of the source and target texts a comparison of the two texts and the resulting assessment of their match The basic measure of qual-ity is that the textual profile and function of the translation match those of the original the goal being functional equivalence between the original and the trans-lation One objection that has been raised against Housersquos functional model is its dependence on the notion of equivalence often a vague and controversial term in translation studies (Houmlnig 1997) This is a problem because translations sometimes are commissioned for a somewhat different function than that of the original in addition a different audience and time may require a slightly different function than that of the source text (see Houmlnig 1997 for more on the problematic notion of equivalence) These scenarios are not contemplated by equivalence-based theo-ries of translation Furthermore one can argue that what qualifies as equivalent is as variegated as the notion of quality itself Other equivalence-based models of evaluation are Gerzymisch-Arbogast (2001) Neubert (1985) and Van den Broeck (1985) In sum the reliance on an a priori notion of equivalence is problematic and limiting in descriptive as well as explanatory value
An additional objection against textual and pragmatic approaches is that they are not precise about how evaluation is to proceed after the analysis of the source or the target text is complete or after the function of the translation has been established
Further evidence for a functionalist approach to translation quality evaluation 239
as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)
Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)
13 The functional-componential approach (Colina 2008)
Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously
240 Sonia Colina
As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text
As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)
The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use
Further evidence for a functionalist approach to translation quality evaluation 241
the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)
2 Second phase of TQA testing Methods and Results
21 Methods
One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims
I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions
Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate
the texts Question 3 How consistently do raters in the second session (Reliability) rate
the texts Question 4 How consistently do raters rate each component of the tool Are
there some test components where there is higher rater reliability
II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)
Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation
242 Sonia Colina
211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing
As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages
Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)
Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors
Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application
The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 237
11 Experiential approaches
Many methods of translation quality assessment fall within this category They tend to be ad hoc anecdotal marking scales developed for the use of a particular professional organization or industry eg the ATA certification exam the SAE J2450 Translation Quality Metric for the automotive industry the LISA QA tool for localization2 While the scales are often adequate for the particular purposes of the organization that created them they suffer from limited transferability pre-cisely due to the absence of theoretical andor research foundations that would permit their transfer to other environments For the same reason it is difficult to assess the replicability and inter-rater reliability of these approaches
12 Theoretical approaches
Recent theoretical research-based approaches tend to focus on the user of a trans-lation andor the text They have also been classified as equivalence-based or func-tionalist (Lauscher 2000) These approaches arise out of a theoretical framework or stated assumptions about the nature of translation however they tend to cover only partial aspects of quality and they are often difficult to apply in professional or teaching contexts
121 Reader-response approachesReader-response approaches evaluate the quality of a translation by assessing whether readers of the translation respond to it as readers of the source would re-spond to the original (Nida 1964 Carroll 1966 Nida and Taber 1969) The reader-response approach must be credited with recognizing the role of the audience in translation more specifically of translation effects on the reader as a measure of translation quality This is particularly noteworthy in an era when the dominant notion of lsquotextrsquo was that of a static object on a page
Yet the reader-response method is also problematic because in addition to the difficulties inherent to the process of measuring reader response the response of a reader may not be equally important for all texts especially for those that are not reader-oriented (eg legal texts) The implication is that reader response will not be equally informative for all types of translation In addition this method ad-dresses only one aspect of a translated text (ie equivalence of effect on the reader) ignoring others such as the purpose of the translation which may justify or even require a slightly different response from the readers of the translation One also wonders if it is in fact possible to determine whether two responses are equivalent as even monolingual texts can trigger non-equivalent reactions from slightly dif-ferent groups of readers Since in most cases the readership of a translated text is
238 Sonia Colina
different than that envisioned by the writer of the original3 one can imagine the difficulties entailed by equating quality with equivalence of response Finally as with many other theoretical approaches reader-response testing is time-consum-ing and difficult to apply to actual translations At a minimum careful selection of readers is necessary to make sure that they belong to the intended audience for the translation
122 Textual and pragmatic approachesTextual and pragmatic approaches have made a significant contribution to the field of translation evaluation by shifting the focus from counting errors at the word or sentence level to evaluating texts and translation goals giving the reader and communication a much more prominent role Yet despite these advances none of these approaches can be said to have been widely adopted by either profes-sionals or scholars
Some models have been criticized because they focus too much on the source text (Reiss 1971) or on the target text (Skopos) (Reiss and Vermeer 1984 Nord 1997) Reiss argues that the text type and function of the source text is the most important factor in translation and quality should be assessed with respect to it For Skopos Theory it is the text type and function of the translation that is of para-mount importance in determining the quality of the translation
Housersquos (1997 2001) functional pragmatic model relies on an analysis of the linguistic-situational features of the source and target texts a comparison of the two texts and the resulting assessment of their match The basic measure of qual-ity is that the textual profile and function of the translation match those of the original the goal being functional equivalence between the original and the trans-lation One objection that has been raised against Housersquos functional model is its dependence on the notion of equivalence often a vague and controversial term in translation studies (Houmlnig 1997) This is a problem because translations sometimes are commissioned for a somewhat different function than that of the original in addition a different audience and time may require a slightly different function than that of the source text (see Houmlnig 1997 for more on the problematic notion of equivalence) These scenarios are not contemplated by equivalence-based theo-ries of translation Furthermore one can argue that what qualifies as equivalent is as variegated as the notion of quality itself Other equivalence-based models of evaluation are Gerzymisch-Arbogast (2001) Neubert (1985) and Van den Broeck (1985) In sum the reliance on an a priori notion of equivalence is problematic and limiting in descriptive as well as explanatory value
An additional objection against textual and pragmatic approaches is that they are not precise about how evaluation is to proceed after the analysis of the source or the target text is complete or after the function of the translation has been established
Further evidence for a functionalist approach to translation quality evaluation 239
as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)
Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)
13 The functional-componential approach (Colina 2008)
Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously
240 Sonia Colina
As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text
As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)
The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use
Further evidence for a functionalist approach to translation quality evaluation 241
the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)
2 Second phase of TQA testing Methods and Results
21 Methods
One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims
I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions
Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate
the texts Question 3 How consistently do raters in the second session (Reliability) rate
the texts Question 4 How consistently do raters rate each component of the tool Are
there some test components where there is higher rater reliability
II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)
Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation
242 Sonia Colina
211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing
As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages
Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)
Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors
Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application
The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
238 Sonia Colina
different than that envisioned by the writer of the original3 one can imagine the difficulties entailed by equating quality with equivalence of response Finally as with many other theoretical approaches reader-response testing is time-consum-ing and difficult to apply to actual translations At a minimum careful selection of readers is necessary to make sure that they belong to the intended audience for the translation
122 Textual and pragmatic approachesTextual and pragmatic approaches have made a significant contribution to the field of translation evaluation by shifting the focus from counting errors at the word or sentence level to evaluating texts and translation goals giving the reader and communication a much more prominent role Yet despite these advances none of these approaches can be said to have been widely adopted by either profes-sionals or scholars
Some models have been criticized because they focus too much on the source text (Reiss 1971) or on the target text (Skopos) (Reiss and Vermeer 1984 Nord 1997) Reiss argues that the text type and function of the source text is the most important factor in translation and quality should be assessed with respect to it For Skopos Theory it is the text type and function of the translation that is of para-mount importance in determining the quality of the translation
Housersquos (1997 2001) functional pragmatic model relies on an analysis of the linguistic-situational features of the source and target texts a comparison of the two texts and the resulting assessment of their match The basic measure of qual-ity is that the textual profile and function of the translation match those of the original the goal being functional equivalence between the original and the trans-lation One objection that has been raised against Housersquos functional model is its dependence on the notion of equivalence often a vague and controversial term in translation studies (Houmlnig 1997) This is a problem because translations sometimes are commissioned for a somewhat different function than that of the original in addition a different audience and time may require a slightly different function than that of the source text (see Houmlnig 1997 for more on the problematic notion of equivalence) These scenarios are not contemplated by equivalence-based theo-ries of translation Furthermore one can argue that what qualifies as equivalent is as variegated as the notion of quality itself Other equivalence-based models of evaluation are Gerzymisch-Arbogast (2001) Neubert (1985) and Van den Broeck (1985) In sum the reliance on an a priori notion of equivalence is problematic and limiting in descriptive as well as explanatory value
An additional objection against textual and pragmatic approaches is that they are not precise about how evaluation is to proceed after the analysis of the source or the target text is complete or after the function of the translation has been established
Further evidence for a functionalist approach to translation quality evaluation 239
as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)
Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)
13 The functional-componential approach (Colina 2008)
Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously
240 Sonia Colina
As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text
As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)
The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use
Further evidence for a functionalist approach to translation quality evaluation 241
the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)
2 Second phase of TQA testing Methods and Results
21 Methods
One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims
I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions
Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate
the texts Question 3 How consistently do raters in the second session (Reliability) rate
the texts Question 4 How consistently do raters rate each component of the tool Are
there some test components where there is higher rater reliability
II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)
Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation
242 Sonia Colina
211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing
As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages
Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)
Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors
Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application
The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 239
as the guiding criteria for making translation decisions This obviously affects the ease with which the models can be applied to texts in professional settings Houmlnig for instance after presenting some strong arguments for a functionalist approach to evaluation does not offer any concrete instantiation of the model other than in the form of some general advice for translator trainers He comes to the conclusion that ldquothe speculative element will remain mdash at least as long as there are no hard and fast empirical data which serve to prove what a lsquotypicalrsquo readerrsquos responses are likerdquo (1997 32)4 The same criticism regarding the difficulty involved in applying textual and theoretical models to professional contexts is raised by Lauscher (2000) She explores possible ways to bridge the gap between theoretical and practical quality assessment concluding that ldquotranslation criticism could move closer to practical needs by developing a comprehensive translation toolrdquo (2000 164)
Other textual approaches to quality evaluation are the argumentation-cen-tered approach of Williams (2001 2004) in which evaluation is based on argu-mentation and rhetorical structure and corpus-based approaches (Bowker 2001) The argumentation-centered approach is also equivalence-based as ldquoa translation must reproduce the argument structure of ST to meet minimum criteria of ade-quacyrdquo (Williams 2001 336) Bowkerrsquos corpus-based model uses ldquoa comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable formrdquo as a benchmark against which to compare and evalu-ate specialized student translations Although Bowker (2001) presents a novel valuable proposal for the evaluation of studentsrsquo translations it does not provide specific indications as to how translations should be graded (2001 346) In sum argumentation and corpus-based approaches although presenting crucial aspects of translation evaluation are also complex and difficult to apply in professional environments (and mdash one could argue mdash in the classroom as well)
13 The functional-componential approach (Colina 2008)
Colina (2008) argues that current translation quality assessment methods have not achieved a middle ground between theory and applicability while anecdotal approaches lack a theoretical framework the theoretical models often do not con-tain testable hypotheses (ie they are non-verifiable) andor are not developed with a view towards application in professional andor teaching environments In addition she contends that theoretical models usually focus on partial aspects of translation (eg reader response textual aspects pragmatic aspects relationship to the source etc) Perhaps due to practical limitations and the sheer complexity of the task some of these approaches overlook the fact that quality in translation is a multifaceted reality and that a general comprehensive approach to evaluation may need to address multiple components of quality simultaneously
240 Sonia Colina
As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text
As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)
The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use
Further evidence for a functionalist approach to translation quality evaluation 241
the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)
2 Second phase of TQA testing Methods and Results
21 Methods
One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims
I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions
Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate
the texts Question 3 How consistently do raters in the second session (Reliability) rate
the texts Question 4 How consistently do raters rate each component of the tool Are
there some test components where there is higher rater reliability
II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)
Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation
242 Sonia Colina
211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing
As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages
Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)
Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors
Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application
The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
240 Sonia Colina
As a response to the inadequacies identified above Colina (2008) proposes an approach to translation quality evaluation based on a theoretical approach (func-tionalist and textual models of translation) that can be applied in professional and educational contexts In order to show the applicability of the model in practical settings as well as to develop testable hypotheses and research questions Co-lina and her collaborators designed a componential functionalist textual tool (henceforth the TQA tool) and pilot-tested it for inter-rater reliability (cf Colina 2008 for more on the first version of this tool) The tool evaluates components of quality separately consequently reflecting a componential approach to quality it is also considered functionalist and textual given that evaluation is carried out relative to the function and the characteristics of the audience specified for the translated text
As mentioned above it seems reasonable to hypothesize that disagreements over the definition of translation quality are rooted in the multiplicity of views of translation itself and on different priorities regarding quality components It is often the case that a requesterrsquos view of quality will not coincide with that of the evaluators yet without explicit criteria on which to base the evaluation the evaluator can only rely on hisher own views In an attempt to introduce flexibility with regard to different conditions influencing quality the proposed TQA tool al-lows for a user-defined notion of quality in which it is the user or requester who decides which aspects of quality are more important for hisher communicative purposes This can be done either by adjusting customer-defined weights for each component or simply by assigning higher priorities to some components Custom weighting of components is also important because the effect of a particular com-ponent on the whole text may also vary depending on textual type and function An additional feature of the TQA tool is that it does not rely on a point deduction system rather it tries to match the text under evaluation with one of several de-scriptors provided for each categorycomponent of evaluation In order to capture the descriptive customer-defined notion of quality the original tool was modified in the second experiment to include a cover sheet (see Appendix 1)
The experiment in Colina (2008) sets out to test the functional approach to evaluation by testing the toolsrsquo inter-rater reliability 37 raters and 3 consultants were asked to use the tool to rate three translated texts The texts selected for eval-uation consisted of reader-oriented health education materials Raters were bi-linguals professional translators and language teachers Some basic training was provided Data was collected by means of the tool and a post rating survey Some differences in ratings could be ascribed to rater qualifications teachersrsquo and trans-latorsrsquo ratings were more alike than those of bilinguals bilinguals were found to rate higher and faster than the other groups Teachers also tended to assign higher ratings than translators It was shown that different types of raters were able to use
Further evidence for a functionalist approach to translation quality evaluation 241
the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)
2 Second phase of TQA testing Methods and Results
21 Methods
One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims
I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions
Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate
the texts Question 3 How consistently do raters in the second session (Reliability) rate
the texts Question 4 How consistently do raters rate each component of the tool Are
there some test components where there is higher rater reliability
II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)
Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation
242 Sonia Colina
211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing
As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages
Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)
Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors
Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application
The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 241
the tool without significant training Pilot testing results indicate good inter-rater reliability for the tool and the need for further testing The current paper focuses on a second experiment designed to further test the approach and tool proposed in Colina (2008)
2 Second phase of TQA testing Methods and Results
21 Methods
One of the most important limitations of the experiment in Colina (2008) is in regard to the numbers and groups of participants Given the project objective of ensuring applicability across languages frequently used in the USA subject re-cruitment was done in three languages Spanish Russian and Chinese As a result resources and time for recruitment had to be shared amongst the languages with smaller numbers of subjects per language group The testing described in the cur-rent experiment includes more subjects and additional texts More specifically the study reported in this paper aims
I To test the TQA tool again for inter-rater reliability (ie to what degree trained raters use the TQA tool consistently) by answering the following questions
Question 1 For each text how consistently do all raters rate the text Question 2 How consistently do raters in the first session (Benchmark) rate
the texts Question 3 How consistently do raters in the second session (Reliability) rate
the texts Question 4 How consistently do raters rate each component of the tool Are
there some test components where there is higher rater reliability
II Compare the rating skillsbehavior of translators and teachers Is there a differ-ence in scoring between Translators and Teachers (Question 5 Section 22)
Data was collected during two rounds of testing the first referred to as the Bench-mark Testing included 9 Raters the second session the Reliability Testing in-cluded 21 raters Benchmark and Reliability sessions consisted of a short training session followed by a rating session Raters were asked to rate 4ndash5 translated texts (depending on the language) and had one afternoon and one night to complete the task After their evaluation worksheets had been submitted raters were required to submit a survey on their experience using the tool They were paid for their participation
242 Sonia Colina
211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing
As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages
Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)
Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors
Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application
The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
242 Sonia Colina
211 RatersRaters were drawn from the pool used for the pre-pilot and pilot testing sessions reported in Colina (2008) (see Colina [2008] for selection criteria and additional details) A call was sent via email to all those raters selected for the pre-pilot and pilot testing (including those who were initially selected but did not take part) All raters available participated in this second phase of testing
As in Colina (2008) it was hypothesized that similar rating results would be obtained within the members of the same group Therefore raters were recruit-ed according to membership in one of two groups Professional translators and language teachers (language professionals who are not professional translators) Membership was assigned according to the same criteria as in Colina (2008) All selected raters exhibited linguistic proficiency equivalent to that of a native (or near-native) speaker in the source and in one of the target languages
Professional translators were defined as language professionals whose income comes primarily from providing translation services Significant professional ex-perience (5 years minimum most had 12ndash20 years of experience) membership in professional organizations and education in translation andor a relevant field were also needed for inclusion in this group Recruitment for these types of individu-als was primarily through the American Translators Association (ATA) Although only two applicants were ATA certified almost all were ATA affiliates (members)
Language teachers were individuals whose main occupation was teaching language courses at a university or other educational institution They may have had some translation experience but did not rely on translation as their source of income A web search of teaching institutions with known foreign language programs was used for this recruitment We outreached to schools throughout the country at both the community college and university levels The definition of teacher did not preclude graduate student instructors
Potential raters were assigned to the above groups on the basis of the infor-mation provided in their resume or curriculum vitae and a language background questionnaire included in a rater application
The bilingual group in Colina (2008) was eliminated from the second experi-ment as subjects were only available for one of the languages (Spanish) Transla-tion competence models and research suggest that bilingualism is only one com-ponent of translation competence (Bell 1991 Cao 1996 Hatim and Mason 1997 PACTE 2008) Nonetheless since evaluating translation products is not the same as translating it is reasonable to hypothesize that other language professionals such as teachers may have the competence necessary to evaluate translations this may be particularly true in cases such as the current project in which the object of evaluation is not translator competence but translation products This hypothesis would be born out if the ratings provided by translators and teachers are similar
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 243
As mentioned above data was collected during two rounds of testing the first one the Benchmark Testing included 9 Raters (3 Russian 3 Chinese 3 Spanish) these raters were asked to evaluate 4ndash5 texts (per language) that had been previ-ously selected as clearly of good or bad quality by expert consultants in each lan-guage The second session the Reliability Testing included 21 raters distributed as follows
Spanish 5 teachers 3 translators (8) Chinese 3 teachers 4 translators (7) Russian 3 teachers 3 translators (6)
Differences across groups reflect general features of that language group in the US Among the translators the Russians had degrees in Languages History and Trans-lating Engineering and Nursing from Russian and US universities and experience ranging from 12 to 22 years the Chinese translatorsrsquo experience ranged from 6 to 30 years and their education included Chinese language and literature Philosophy (MA) English (PhD) Neuroscience (PhD) and Medicine (MD) with degrees ob-tained in China and the US Their Spanish counterpartsrsquo experience varied from 5 to 20 years and their degrees included areas such as Education Spanish and English Literature Latin American Studies (MA) and Creative Writing (MA) The Spanish and Russian teachers were perhaps the most uniform groups includ-ing College instructors (PhD students) with MAs in Spanish or Slavic Linguistics Literature and Communication and one college professor of Russian With one exception they were all native speakers of Spanish or Russian with formal edu-cation in the country of origin Chinese teachers were college instructors (PhD students) with MAs in Chinese one college professor (PhD in Spanish) and an elementary school teacher and tutor (BA in Chinese) They were all native speak-ers of Chinese
212 TextsAs mentioned above experienced translators serving as language consultants se-lected the texts to be used in the rating sessions Three consultants were instruct-ed to identify health education texts translated from English into their language Texts were to be publicly available on the Internet Half were to be very good and the other half were to be considered very poor on reading the text Those texts were used for the Benchmark session of testing during which they were rated by the consultants and two additional expert translators The texts where there was the most agreement in rating were selected for the Reliability Testing Reliability texts were comprised of five Spanish texts (three good and two bad) four Russian texts and four Chinese texts two for each language being of good quality and two of bad quality making up a total of thirteen additional texts
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
244 Sonia Colina
213 ToolThe tool tested in Colina (2008) was modified to include a cover sheet consisting of two parts Part I is to be completed by the person requesting the evaluation (ie the Requester) and read by the rater before heshe started hisher work It contains the Translation Brief relative to which the evaluation must always take place and the Quality Criteria clarifying requester priorities among components The TQA Evaluation Tool included in Appendix 1 contains a sample Part I as specified by Hablamos Juntos (the Requester) for the evaluation of a set of health education materials The Quality Criteria section reflects the weights assigned to the four components in the Scoring Worksheet at the end of the tool Part II of the Cover Sheet is to be filled in by the raters after the rating is complete An Assessment Summary and Recommendation section was included to allow raters the oppor-tunity to offer an action recommendation on the basis of their ratings Ie ldquoWhat should the requester do now with this translation Edit it Minor or small edits Redo it entirelyrdquo An additional modification to the tool consisted of eliminat-ing or adding descriptors so that each category would have an equal number of descriptors (four for each component) and revising the scores assigned so that the maximum number of points possible would be 100 Some minor stylistic changes were made in the language of the descriptors
214 Rater TrainingThe Benchmark and Reliability sessions included training and rating sessions The training provided was substantially the same offered in the pilot testing and de-scribed in Colina (2008) It focused on the features and use of the tool and it con-sisted of PDF materials (delivered via email) a Power-point presentation based on the contents of the PDF materials and a question-and-answer session delivered online via Internet and phone conferencing system
Some revisions to the training reflect changes to the tool (including instruc-tions on the new Cover Sheet) a few additional textual examples in Chinese and a scored completed sample worksheet for the Spanish group Samples were not in-cluded for the other languages due to time and personnel constraints The training served as a refresher for those raters who had already participated in the previous pilot training and rating (Colina 2008)5
22 Results
The results of the data collection were submitted to statistical analysis to deter-mine to what degree trained raters use the TQA tool consistently
Table 1 and Figures 1a and 1b show the overall score of each text rated and the standard deviation between the overall score and the individual rater scores
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 245
200-series texts are Spanish texts 400s are Chinese and 300s are Russian The stan-dard deviations range from 81 to 192 for Spanish from 57 to 212 for Chinese and from 161 to 290 for Russian
Question 1 For each text how consistently do all raters rate the textThe standard deviations in Table 1 and Figures 1a and 1b offer a good measure of how consistently individual texts are rated A large standard deviation suggests that there was less rater agreement (or that the raters differed more in their assess-ment) Figure 1b shows the average standard deviations per language According to this the Russian raters were the ones with the highest average standard devia-tion and the less consistent in their ratings This is in agreement with the reliabillity coefficients shown below (Table 5) as the Russian raters have the lowest inter-rater reliability Table 2 shows average scores standard deviations and average standard deviations for each component of the tool per text and per language Figure 2 represents average standard deviations per component and per language There does not appear to be an obvious connection between standard deviations and
Table 1 Average score of each text and standard deviation
Text of raters Average Score Standard Deviation
Spanish
210 11 918 81
214 11 895 113
215 11 868 150
228 11 486 192
235 11 564 185
Avg 1442
Chinese
410 10 880 103
413 10 630 210
415 10 960 57
418 10 760 212
Avg 1455
Russian
312 9 594 161
314 9 828 156
315 9 756 221
316 9 678 290
Avg 207
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
246 Sonia Colina
0
20
40
60
80
100
210
214
215
228
235
410
413
415
418
312
314
315
316
Text number
Average ScoreStandard Deviation
Figure 1a Average score and standard deviation per text
0
5
10
15
20
25
Spanish Chinese Russian
Standard Deviation(Avg)
Figure 1b Average standard deviations per language
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 247
components Although generally the components Target Language (TL) and Func-tional and Textual Adequacy (FTA) have higher standard deviations (ie ratings are less consistent) this is not always the case as seen in the Chinese data (FTA) One would in fact expect the FTA category to exhibit the highest standard devia-tions given its more holistic nature yet the data do not bear out this hypothesis as the TL component also shows standard deviations that are higher than Non-Spe-cialized Content (MEAN) and Specialized Content and Terminology (TERM)
Question 2 How consistently do raters in the first session (Benchmark) rate the textsThe inter-rater reliability for the Spanish and for the Chinese raters is remark-able however the inter-rater reliability for the Russian raters is too low (Table 3)
Table 2 Average scores and standard deviations for four components per text and per language
TL FTA MEAN TERM
Text Raters Mean SD Mean SD Mean SD Mean SD
Spanish
210 11 277 26 236 23 227 26 177 34
214 11 273 47 209 70 232 25 182 34
215 11 286 23 223 47 182 68 177 34
228 11 150 77 114 60 109 63 114 45
235 11 159 83 123 65 136 64 145 47
Avg 512 53 492 388
Chinese
410 10 270 48 220 48 210 46 180 26
413 10 180 95 165 58 140 52 145 37
415 10 285 24 250 00 235 24 190 21
418 10 225 68 210 46 160 77 165 41
Avg 5875 38 4975 3125
Russian
312 9 183 71 150 61 133 66 128 44
314 9 256 63 217 50 194 39 161 42
315 9 233 94 183 79 178 44 161 42
316 9 200 103 167 79 172 71 139 65
8275 6725 55 4825
AvgSD (all lgs) 63 53 51 39
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
248 Sonia Colina
This in conjunction with the Reliability Testing results leads us to believe in the presence of other unknown factors unrelated to the tool responsible for the low reliability of the Russian raters
Question 3 How consistently do raters in the second session (Reliability) rate the texts How do the reliability coefficients compare for the Benchmark and the Reli-ability TestingThe results of the reliability raters mirror those of the benchmark raters whereby the Spanish raters achieve a very good inter-rater reliability coefficient the Chi-nese raters have acceptable inter-rater reliability coefficient but the inter-rater reli-ability for the Russian raters is very low (Table 4)
Table 5 (see also Tables 3 and 4) shows that there was a slight drop in inter-rater reliability for the Chinese raters (from the benchmark rating to the reliability rating) but the Spanish raters at both rating sessions achieved remarkable inter-rater reliability The slight drop among the Russian raters from the first to the sec-ond session is negligible in any case the inter-rater reliability is too low
Average SD per tool component
0
1
2
3
4
5
6
7
8
9
TL FTA MEAN TERM
SpanishChineseRussianAll languages
Figure 2 Average standard deviations per tool component and per language
Table 3 Reliability coefficients for benchmark ratings
Reliability coefficient
Spanish 953
Chinese 973
Russian 128
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 249
Question 4 How consistently do raters rate each component of the tool Are there some test components where there is higher rater reliability
The coefficients for the Spanish raters show very good reliability with excel-lent coefficients for the first three components the numbers for the Chinese raters are also very good but the coefficients for the Russian raters are once again low (although some consistency is identified for the FTA and MEAN components) (Table 6)
Table 6 Reliability coefficients for the four components of the tool (all raters per language group)
TL FTA MEAN TERM
Spanish 952 929 926 848
Chinese 844 844 864 783
Russian 367 479 492 292
In sum very good reliability was obtained for Spanish and Chinese raters for the two testing sessions (Benchmark and Reliability Testing) as well as for all compo-nents of the tool Reliability scores for the Russian raters are low These results are in agreement with the standard deviation data presented in Tables 1ndash2 and Fig-ure 1a and 1b and Figure 2 All of this leads us to believe that whatever the cause for the Russian coefficients it was not related to the tool itself
Question 5 Is there a difference in scoring between translators and teachersTable 7a and Table 7b show the scoring in terms of average scores and standard deviations for the translators and the teachers for all texts Figures 3 and 4 show the mean scores and times for Spanish raters comparing teachers and translators
Table 4 Reliability coefficients for Reliability Testing
Reliability coefficient
Spanish 934
Chinese 780
Russian 118
Table 5 Inter-rater reliability Benchmark and Reliability Testing
Benchmark reliability coefficient
Reliability coefficient(for Reliability Testing)
Spanish 953 934
Chinese 973 780
Russian 128 118
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
250 Sonia Colina
Table 7a Average scores and standard deviations for consultants and translators
Score Time
text Mean SD Mean SD
210 933 75 758 594
214 933 121 942 1014
215 850 179 363 183
228 467 207 375 223
235 467 186 495 389
410 914 75 460 221
413 629 210 407 137
415 964 48 261 154
418 693 221 524 222
312 525 151 267 26
314 883 103 225 42
315 742 263 287 78
316 633 327 258 66
Table 7b Average scores and standard deviations for teachers
Score Time
text Mean SD Mean SD
210 900 94 636 397
214 850 94 670 418
215 890 124 360 305
228 510 195 380 317
235 680 104 576 402
410 800 132 610 277
413 633 257 710 246
415 950 87 410 115
418 917 58 440 66
312 733 58 550 567
314 717 208 477 627
315 783 144 377 455
316 767 225 467 635
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 251
The corresponding data for Chinese appears in Figures 5 and 6 and in Figures 7 and 8 for Russian
Spanish teachers tend to rate somewhat higher (3 out of 5 texts) and spend more time rating than translators (all texts)
As with the Spanish raters it is interesting to note that Chinese teachers rate either higher or similarly to translators (Figure 5) Only one text obtained lower ratings from teachers than from translators Timing results also mirror those found for Spanish subjects Teachers take longer to rate than translators (Figure 6)
Despite the low inter-rater reliability among Russian raters the same trend was found when comparing Russian translators and teachers with the Chinese and the Spanish Russian teachers rate similarly or slightly higher than translators and they clearly spend more time on the rating task than the translators (Figure 7 and Figure 8) This also mirrors the findings of the pre-pilot and pilot testing (Colina 2008)
In order to investigate the irregular behavior of the Russian raters and to try to obtain an explanation for the low inter-rater reliability the correlation between the total score and at the recommendation (the field lsquorecrsquo) issued by each rater was con-sidered This is explored in Table 8 One would expect there to be a relatively high (negative) correlation because of the inverse relationship between high score and a low recommendation As is illustrated in the three sub tables below all Spanish rat-ers with the exception of SP02PB show a strong correlation between the recommen-dation and the total score ranging from minus0854 (SP01VS) to minus0981 (SP02MC) The results are similar with the Chinese raters whereby all raters correlate very highly
0
10
20
30
40
50
60
70
80
90
100
210 214 215 228 235Mean scores for Spanish raters
TranslatorsTeachers
Figure 3 Mean scores for Spanish raters
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
252 Sonia Colina
0
10
20
30
40
50
60
70
80
210 214 215 228 235
Time for Spanish raters
TranslatorsTeachers
Figure 4 Time for Spanish raters
0
20
40
60
80
100
120
410 413 415 418
Mean Score for Chinese Raters
TranslatorsTeachers
Figure 5 Mean scores for Chinese raters
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 253
0
10
20
30
40
50
60
70
80
410 413 415 418
Time for Chinese Raters
TranslatorsTeachers
Figure 6 Time for Chinese raters
0
10
20
30
40
50
60
70
80
90
100
312 314 315 316
Mean scores for Russian Raters
TranslatorsTeachers
Figure 7 Mean scores for Russian raters
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
254 Sonia Colina
between the recommendation and the total score ranging from minus0867 (CH01BJ) to a perfect 100 (CH02JG) The results are different for the Russian raters however It appears that three raters (RS01EM RS02MK and RS01NM) do not correlate highly between their recommendations and their total scores A closer look espe-cially at these raters is warranted as is a closer look at RS02LB who was excluded from the correlation analysis due to a lack of variability (the rater uniformly recom-mended a lsquo2rsquo for all texts regardless of the total score he or she assigned) The other Russian raters exhibited strong correlations This result suggests some unusual be-havior in the Russian raters independently of the tool design and tool features as the scores and overall recommendation do not correlate highly as expected
0
10
20
30
40
50
60
312 314 315 316
Time for Russian Raters
TranslatorsTeachers
Figure 8 Time for Russian raters
Table 8 (3 sub-tables) Correlation between recommendation and total score81 Spanish raters
SP04AR SP01JC SP01VS SP02JA SP02LA SP02PB SP02AB SP01PC SP01CC SP02MC SP01PS
minus0923 minus0958 minus0854 minus0938 minus0966 minus0421 minus0942 minus0975 minus0913 minus0981 minus0938
82 Chinese raters
CH01RL CH04YY CH01AX CH02AC CH02JG CH01KG CH02AH CH01BJ CH01CK CH01FL
minus0935 minus0980 minus0996 minus0894 minus1000 minus0955 minus0980 minus0867 minus0943 minus0926
83 Russian raters
RS01EG RS01EM RS04GN RS02NB RS02LB RS02MK RS01SM RS01NM RS01RW
minus0998 minus0115 minus0933 minus1000 na minus0500 minus0982 minus0500 minus0993
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 255
3 Conclusions
As in Colina (2008) testing showed that the TQA tool exhibits good inter-rater reliability for all language groups and texts with the exception of Russian It was also shown that the low reliability of the Russian ratersrsquo scores is probably due to factors unrelated to the tool itself At this point it is not possible to determine what these factors may have been yet further research with Russian teachers and translators may provide insights about the reasons for the low inter-rater reliability obtained for this group in the current study In addition the findings are in line with those of Colina (2008) with regard to the rating behavior of translators and teachers Although translators and teachers exhibit similar behavior teachers tend to spend more time rating and their scores are slightly higher than those of trans-lators While in principle it may appear that translators would be more efficient raters one would have to consider the context of evaluation to select an ideal rater for a particular evaluation task Because they spent more time rating (and one as-sumes reflecting on their rating) teachers may be more apt evaluators in a forma-tive context where feedback is expected from the rater Teachers may also be better at reflecting on the nature of the developmental process and therefore better able to offer more adequate evaluation of a process andor a translator (versus evalu-ation of a product) However when rating involves a product and no feedback is expected (eg industry translator licensing exams etc) a more efficient translator rater may be more suitable to the task In sum the current findings suggest that professional translators and language teachers could be similarly qualified to assess translation quality by means of the TQA tool Which of the two types of profes-sionals is more adequate for a specific rating task probably will depend on the purpose and goal of evaluation Further research comparing the skills of these two groups in different evaluation contexts is necessary to confirm this view
In summary the results of empirical tests of the functional-componential tool continue to offer evidence for the proposed approach and to warrant additional testing and research Future research needs to focus on testing on a larger scale with more subjects and various text types
Notes
The research described here was funded by the Robert Wood Johnson Foundation It was part of the Phase II of the Translation Quality Assessment project of the Hablamos Juntos National Program I would like to express my gratitude to the Foundation to the Hablamos Juntos Na-tional Program and to the Program Director Yolanda Partida for their support of translation in the USA I owe much gratitude to Yolanda Partida and Felicia Batts for comments suggestions
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
256 Sonia Colina
and revision in the write-up of the draft documents and on which this paper draws More details and information on the Translation Quality Assessment project including Technical Reports Manuals and Toolkit Series are available on the Hablamos Juntos website (wwwhablamosjuntosorg) I would also like to thank Volker Hegelheimer for his assistance with the statistics
1 The legal basis for most language access legislation in the United States of America lies in Title VI of the 1964 Civil Rights Act At least 43 states have one or more laws addressing lan-guage access in health care settings
2 wwwsaeorg wwwlisaorgproductsqamodel
3 One exception is that of multilingual text generation in which an original is written to be translated into multiple languages
4 Note the reference to reader response within a functionalist framework
5 Due to rater availability 4 raters (1 Spanish 2 Chinese 1 Russian) were selected that had not participated in the training and rating sessions of the previous experiment Given the low number researchers did not investigate the effect of previous experience (experienced vs inex-perienced raters)
References
Bell Roger T 1991 Translation and Translating London LongmanBowker Lynne 2001 ldquoTowards a Methodology for a Corpus-Based Approach to Translation
Evaluationrdquo Meta 462 345ndash364Cao Deborah 1996 ldquoA Model of Translation Proficiencyrdquo Target 82 325ndash340Carroll John B 1966 ldquoAn Experiment in Evaluating the Quality of Translationsrdquo Mechanical
Translation 93ndash4 55ndash66Colina Sonia 2003 Teaching Translation From Research to the Classroom New York McGraw
HillColina Sonia 2008 ldquoTranslation Quality Evaluation Empirical evidence for a Functionalist
Approachrdquo The Translator 141 97ndash134Gerzymisch-Arbogast Heidrun 2001 ldquoEquivalence Parameters and Evaluationrdquo Meta 462
227ndash242Hatim Basil and Ian Mason 1997 The Translator as Communicator London and New York
RoutledgeHoumlnig Hans 1997 ldquoPositions Power and Practice Functionalist Approaches and Translation
Quality Assessmentrdquo Current issues in language and society 41 6ndash34House Julianne 1997 Translation Quality Assessment A Model Revisited Tuumlbingen NarrHouse Julianne 2001 ldquoTranslation Quality Assessment Linguistic Description versus Social
Evaluationrdquo Meta 462 243ndash257Lauscher S 2000 ldquoTranslation Quality-Assessment Where Can Theory and Practice Meetrdquo
The Translator 62 149ndash168Neubert Albrecht 1985 Text und Translation Leipzig EnzyklopaumldieNida Eugene 1964 Toward a Science of Translation Leiden BrillNida Eugene and Charles Taber 1969 The Theory and Practice of Translation Leiden Brill
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 257
Nord Christianne 1997 Translating as a Purposeful Activity Functionalist Approaches Ex-plained Manchester St Jerome
PACTE 2008 ldquoFirst Results of a Translation Competence Experiment lsquoKnowledge of Transla-tionrsquo and lsquoEfficacy of the Translation Processrdquo John Kearns ed Translator and Interpreter Training Issues Methods and Debates London and New York Continuum 2008 104ndash126
Reiss Katharina 1971 Moumlglichkeiten und Grenzen der uumlbersetungskritik Muumlnchen HuumlberReiss Katharina and Vermeer Hans 1984 Grundlegung einer allgemeinen Translations-Theorie
Tuumlbingen NiemayerVan den Broeck Raymond 1985 ldquoSecond Thoughts on Translation Criticism A Model of its
Analytic Functionrdquo Theo Hermans ed The Manipulation of Literature Studies in Literary Translation London and Sydney Croom Helm 1985 54ndash62
Williams Malcolm 2001 ldquoThe Application of Argumentation Theory to Translation Quality Assessmentrdquo Meta 462 326ndash344
Williams Malcolm 2004 Translation Quality Assessment An Argumentation-Centered Ap-proach Ottawa University of Ottawa Press
Reacutesumeacute
Colina (2008) propose une approche componentielle et fonctionnelle de lrsquoeacutevaluation de la qua-liteacute des traductions et dresse un rapport sur les reacutesultats drsquoun test-pilote portant sur un outil conccedilu pour cette approche Les reacutesultats attestent un taux eacuteleveacute de fiabiliteacute entre eacutevaluateurs et justifient la continuation des tests Cet article preacutesente une expeacuterimentation destineacutee agrave tester lrsquoapproche ainsi que lrsquooutil Des donneacutees ont eacuteteacute collecteacutees pendant deux peacuteriodes de tests Un groupe de 30 eacutevaluateurs composeacute de traducteurs et enseignants espagnols chinois et russes ont eacutevalueacute 4 ou 5 textes traduits Les reacutesultats montrent que lrsquooutil assure un bon taux de fiabiliteacute entre eacutevaluateurs pour tous les groupes de langues et de textes agrave lrsquoexception du russe ils suggegrave-rent eacutegalement que le faible taux de fiabiliteacute des scores obtenus par les eacutevaluateurs russes est sans rapport avec lrsquooutil lui-mecircme Ces constats confirment ceux de Colina (2008)
Mots-clefs Mots-cleacutes qualiteacute test eacutevaluation notation componentiel fonctionnalisme erreurs
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
258 Sonia Colina
Appendix 1 Tool
Benchmark Rating Session
T i m e R a t i n g S t a r t s T i m e R a t i n g E n d s
Translation Quality Assessment ndash Cover Sheet For Health Education Materials
PART I To be completed by Requester
Requester is the Health Care Decision Maker (HCDM) requesting a quality assessment of an existing translated text
Requester
TitleDepartment Delivery Date
T R A N S L A T I O N B R I E F
Source Language Target Language
Spanish Russian Chinese
Text Type
Text Title
Target Audience
Purpose of Document
P R I O R I T Y O F Q U A L I T Y C R I T E R I A
____ Target Language
____ Functional and Textual Adequacy
____ Non-Specialized Content (Meaning)
Rank EACH from 1 to 4
(1 being top priority)
____ Specialized Content and Terminology
PART II To be completed by TQA Rater
Rater (Name) Date Completed
Contact Information Date Received
Total Score Total Rating Time
A S S E S S M E N T S U M M A R Y A N D R E C O M M E N D A T I O N
Publish andor use as is
Minor edits needed before publishing
Major revision needed before publishing
Redo translation
(To be completed after evaluating translated text)
Translation will not be an effective communication strategy for this text Explore other options (eg create new target language materials)
NotesRecommended Edits
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 259
- 2 -
RATING INSTRUCTIONS
1 Carefully read the instructions for the review of the translated text Your decisions and evaluation should be based on these instructions only
2 Check the description that best fits the text given in each one of the categories
3 It is recommended that you read the target text without looking at the English and score the Target Language and Functional categories
4 Examples or comments are not required but they can be useful to help support your decisions or to provide rationale for your descriptor selection
1 TARGET LANGUAGE
Category Number
Description Check one
box
1a
The translation reveals serious language proficiency issues Ungrammatical use of the target language spelling mistakes The translation is written in some sort of lsquothird languagersquo (neither the source nor the target) The
structure of source language dominates to the extent that it cannot be considered a sample of target language text The amount of transfer from the source cannot be justified by the purpose of the translation The text is
extremely difficult to read bordering on being incomprehensible
1b The text contains some unnecessary transfer of elementsstructure from the source text The structure of the
source language shows up in the translation and affects its readability The text is hard to comprehend
1c Although the target text is generally readable there are problems and awkward expressions resulting in most cases from unnecessary transfer from the source text
1d
The translated text reads similarly to texts originally written in the target language that respond to the same purpose audience and text type as those specified for the translation in the brief Problemsawkward
expressions are minimal if existent at all
ExamplesComments
2 FUNCTIONAL AND TEXTUAL ADEQUACY
Category
Number Description
Check one
box
2a Disregard for the goals purpose function and audience of the text The text was translated without considering
textual units textual purpose genre need of the audience (cultural linguistic etc) Can not be repaired with revisions
2b The translated text gives some consideration to the intended purpose and audience for the translation but misses some important aspects of it (eg level of formality some aspect of its function needs of the audience
cultural considerations etc) Repair requires effort
2c The translated text approximates to the goals purpose (function) and needs of the intended audience but it is
not as efficient as it could be given the restrictions and instructions for the translation Can be repaired with suggested edits
2d The translated text accurately accomplishes the goals purpose (function informative expressive persuasive) set for the translation and intended audience (including level of formality) It also attends to cultural needs and
characteristics of the audience Minor or no edits needed
ExamplesComments
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
260 Sonia Colina
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
- 3 -
3 NON-SPECIALIZED CONTENT-MEANING
Category Number
Description Check one
box
3a The translation reflects or contains important unwarranted deviations from the original It contains inaccurate renditions andor important omissions and additions that cannot be justified by the instructions Very defective
comprehension of the original text
3b There have been some changes in meaning omissions orand additions that cannot be justified by the translation instructions Translation shows some misunderstanding of original andor translation instructions
3c Minor alterations in meaning additions or omissions
3d The translation accurately reflects the content contained in the original insofar as it is required by the
instructions without unwarranted alterations omissions or additions Slight nuances and shades of meaning have been rendered adequately
ExamplesComments
4 SPECIALIZED CONTENT AND TERMINOLOGY
Category
Number Description
Check one
box
4a Reveals unawarenessignorance of special terminology andor insufficient knowledge of specialized content
4b Seriousfrequent mistakes involving terminology andor specialized content
4c A few terminological errors but the specialized content is not seriously affected
4d Accurate and appropriate rendition of the terminology It reflects a good command of terms and content specific
to the subject
ExamplesComments
TOTAL SCORE
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 261
- 4 -
S C O R I N G W O R K S H E E T
Component Target Language Component Functional and Textual Adequacy
Category Value Score Category Value Score
1a 5 2a 5 1b 15 2b 10 1c 25 2c 20 1d 30
2d 25
Component Non-Specialized Content Component Specialized Content and
Terminology
Category Value Score Category Value Score
3a 5 4a 5 3b 10 4b 10 3c 20 4c 15 3d 25
4d 20
Tally Sheet
Component Category
Rating Score Value
Target Language
Functional and Textual Adequacy
Non-Specialized Content
Specialized Content and Terminology
Total Score
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
262 Sonia Colina
Appendix 2 Text sample
bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull bull
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
Further evidence for a functionalist approach to translation quality evaluation 263
bull bull
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu
264 Sonia Colina
Authorrsquos address
Sonia ColinaDepartment of Spanish and PortugueseThe University of ArizonaModern Languages 545Tucson AZ 85721-0067United States of America
scolinaemailarizonaedu