Translation studies: Simplification and Explicitation Universals

Translation Studies:Simplification and Explicitation Universals

Claudiu Mihaila

Faculty of Computer Science,”Al.I. Cuza” University of Iasi,16, General Berthelot Street,

700483 Iasi, [email protected]

Abstract. The characteristics exhibited by translated texts comparedto non-translated texts have always been of great interest in TranslationStudies. Two universals, namely simplification and explicitation, are re-viewed in this report, presenting some of the studies that have beenundertaken for their confirmation or, on the contrary, their disconfirma-tion. We describe the corpora, the methods, and the results, and analysethe conclusions of several important research papers.

Key words: translationese, translation studies, translation universal,corpus linguistics

1 Introduction

The idea of translation studies to search for regularities and general laws is notnew; Gideon Toury is the best-known advocate for general laws of translation[1]. He proposed this as a fundamental task of descriptive translation studiesdue to the fact that translated language is believed to manifest certain universalfeatures, as a consequence of the translation process. Translations exhibit theirown specific lexico-grammatical and syntactic characteristics [2–4]. These ”fin-gerprints” that the translation process leaves behind were first described byGellerstam and named generically translationese [5].

More recently, it has been stated that there are common characteristics whichall translations share, regardless of the source and the target languages [6].Although mostly intuitively, Mona Baker defines several such universal laws.Additionally, she observes the power that resides in electronic corpora andautomatic natural language processing systems, in comparison to the manualcontrastive studies undertaken by previous scholars on small-scale collectionsof texts. She includes in her list of universals, amongst others, simplification,explicitation, normalisation, and convergence.

However, the issue of the existence of translation universals remains highlycontroversial. While some scientists report that they have found sufficient proofthat such translation laws exist [7], others consider that it is not possible to even

hypothesise on universals since we are not able to capture all translations fromall languages and from all times [8].

The translation universals field is thus a real target of debate in the trans-lation studies domain in the last fifteen years, bringing together different perspec-tives of the language of translation. Perhaps the main reason to investigate thesehypotheses is to raise awareness among translators about the conscious or uncon-scious effects over translated texts, and the relationship between language andculture [7]. Bringing unconscious tendencies to light will emphasise translators’decisions and strategies, and hence should pave the way to more accurate trans-lations, with ”more desired effects and fewer unwanted ones” [9].

The fundamental aim of this line of research is to model a language-indepen-dent learning system, able to distinguish between translated and non-translatedtexts. This development has implications in providing a wide applicability forother languages, thus enhancing the possibilities of study of these universals.Furthermore, it becomes feasible to determine which are the characteristics thatinfluence the most the translated language.

From a practical perspective, a system that automatically identifies transla-tionese (improved or not by the inclusion of specific features of the considereduniversals) may be of great help in the self-assessment of professional translators,or in the assessment of their training process. Moreover, an automatic transla-tionese identifier may significantly improve other nlp applications. For instance,such a system may be integrated in a statistical machine translation frameworkin order to identify translation direction [10]. Another possible application is itsuse in multilingual plagiarism detection, topic that is tackled more intensivelyin the last period.

The report is structured as follows: section 2 contains brief descriptions oftranslation universals, whilst in section 3 we review the related work in thisdomain, focussing on simplification and explicitation. Finally, conclusions aredrawn in section 4.

2 Translation universals

The universals attracted considerable attention from translation experts, buttheir formulation and initial explanation has been based on intuition and intro-spection with ulterior corpus research limited to comparatively small-size cor-pora, literary or newswire texts and semi-manual analysis. Moreover, previousresearch has not provided sufficient guidance as to which are the features whichaccount for these universals to be regarded as valid [11].

Various so-called translation universals as universal tendencies of the trans-lation process, laws of translation and norms of translation have been suggestedin the literature [12, 13, 6, 7].

Toury proposed two laws of translation: the law of standardisation and thelaw of interference [13]. Baker defined four possible translation universals [6,14]. The four universals, namely simplification, explicitation, convergence, andnormalisation, are the ones which are the most intensively studied universals

in the recent years. The simplification universal is described as the tendencyof translators to produce simpler and easier-to-follow texts, whilst explicitationrefers to introducing overt information into the translation that is implicit inthe source language [6]. Convergence states that the translations become moresimilar to one another than the non-translated texts are, and normalisationrepresents the conscious or unconscious rendering of idiosyncratic text featuresin order to make them conform to the typical textual characteristics of the targetlanguage.

Laviosa continued this line of research by proposing features for simplificationin a corpus-based study [7]. Despite some evidence of the existence of such aphenomenon, there is still a remarkable challenge in defining the features whichcharacterise the simplification universal.

3 Related work

A number of papers undertake certain experiments towards the research of theuniversals, however without any clear-cut conclusions. Nevertheless, it seemsthat these problematic claims require a strategy of investigation divided in twolinear stages: first, the investigation of the proposed translation tendencies, andafterwards the investigation of the universality factor.

On the one hand, the claims themselves, without considering the universalityaspect, require adequate practical support in order to be validated as true orfalse. On the other hand, the universality characteristic is a matter of discussion,as the coverage implied by this term is too wide for the lack of evidence providedfor different languages. The condition needed for the universality aspect to bewidely accepted is to be validated for all languages, or at least for all languagefamilies.

In what follows, we will see the current status of two of the hypothesiseduniversals, simplification and explicitation, going through some of the mostprevalent research undertaken in the field.

3.1 Simplification

Recently, a corpus-based approach which tests the statistical significance offeatures proposed to investigate the simplification universal has been exploitedfor Spanish [11, 15].

In [11], Corpas tries to verify the validity of the simplification universalon a Spanish comparable corpus of medical and technical, translated and non-translated texts produced by both professional and semi-professional translators.Simplification seems to be validated for the lexical richness feature. Despitethis, it is contradicted in terms of complex sentences, sentence length, depth ofsyntactical trees, information load, and ambiguity.

Nonetheless, in [15], the authors use the same corpora as in [11] and perform adeeper analysis, exploiting other features as well. The experiments revealed thatthe translated texts contain a lower level of lexical richness and density, a lower

number of discourse markers, and less simple and significantly shorter sentences.However, the simplification traits are more visible only on the technical texts,and to a lesser degree on the professionally translated medical texts.

Furthermore, Ilisei et al. develop a supervised learning system that is able todistinguish with a very high accuracy in some cases between translated andnon-translated texts, also for the Spanish language [16, 17]. They use threecomparable corpora, of which two are related to the medical domain, and onecontains technical texts, and extract 21 language-independent features for theirlearning system to exploit.

Table 1 includes the accuracies of various trained classifiers tested in [17].The BayesNet, Simple Logistic, SVM, and Meta-classifier reach an incrediblevalue of 97.62% in technical texts, with the SVM result statistically significantlybetter than without using simplification features.

Table 1. Classification accuracy results on medical and technical test datasets withregard to simplification features (SF) [17].

ClassifierIncluding SF Excluding SF

Medical Technical Medical Technical

Baseline (ZeroR) 64.71% 66.67% 64.71% 66.67%Naive Bayes 71.57% 95.24% 71.57% 80.95%BayesNet 73.53% 97.62% 71.57% 92.86%Jrip 79.42% 95.24% 72.55% 92.86%Decision Tree 77.45% 92.86% 75.49% 95.24%Simple Logistic 77.45% 97.62% 79.41% 83.33%SVM 75.49% 97.62% 74.51% 69.05%Meta-classifier 82.35% 97.62% 78.43% 92.86%

Aiming at determining which are the most salient features that lead tothese results, Ilisei et al. analyse the outputs of the various classifiers, suchas Decision Tree and Jrip, and use attribute evaluators, such as Chi-Squareand Information Gain. They conclude that lexical richness influences mostlythe classification, closely followed by sentence length, proportions of pronouns,conjunctions, grammatical words, and lexical words; other features influence alsothe classification, but in a smaller proportion. Both lexical richness and sentencelength are features considered to be indicative of the simplification hypothesis,widely discussed and studied in the past decade. Sentence length is a characte-ristic which posed a certain difficulty in its interpretation in the study undertakenin [15]. The most influential features identified with these evaluators concur withthe first-level attributes from the intuitive output of the Decision Tree and Jripclassifiers [18].

A different perspective for this research topic is undertaken by Baroni andBernardini, reporting a machine learning approach for the task of classifyingItalian texts as translated or originals [19]. Several features have been employed

in the feature vector, including unigrams, bigrams, trigrams, word forms, lem-mas, and part-of-speech tags. Therefore, they are able to prove that shallowdata representations can be sufficient to automatically distinguish professionaltranslations from non-translated texts with an accuracy above the chance level,and hypothesise that this representation captures the distinguishing features oftranslationese. Additionally, the system’s classification quality seems to be muchhigher than that of human judges when faced with the same task. However, it isto be explicitly noted that in this study the feature vector is highly dependenton the language the system works on.

The simplification universal is known to be a controversial claim, with dif-ferent studies bringing evidence both for and against it. However, it has beencontested by studies on collocations [20], lexical use [21], and syntax [22].

For instance, Jantunen does not manage to establish clear and consistentevidence of a universal untypical lexical-grammatical patterning when operatingon a subset of the Corpus of Translated Finnish (CTF) [22]. He tests the hypo-thesis on three near-synonym degree modifiers, hyvin, kovin, and oikein, allroughly meaning very, including a quantitative and qualitative analysis to pro-vide a comprehensive description. He uses the Three Phase Comparative Ana-lysis (TPCA) on three corpora, one of original Finnish (CNF), one of textstranslated from various Indo-European and Finno-Ugric languages (MuCTF),and one translated from English (MoCTF). As described in Table 2, the authorshows that the modifiers are almost twice as frequent in the translations (afternormalisation per 100 000 tokens), and that this depends on the source language:the difference is not statistically significant for English, but it is for the MuCTFfor a critical value for χ2 at 0.05 level of significance.

Table 2. Frequencies of hyvin, kovin, and oikein in the CNF, MuCTF and MoCTFcorpora [22].

Modifier CNF MuCTF MoCTF

hyvin 36 66 70kovin 18 39 38oikein 12 15 20

Total 66 120 128

Jantunen then extracts the top-ranked collocations for each of the threemodifiers from each of the three corpora. In the case of hyvin, the collocationsmatch in a extremely small degree. However, in the case of the other twomodifiers, the collocations overlap to a high degree, therefore making it ratherdifficult to draw conclusions. Furthermore, the colligation analysis for hyvinshows no difference between original and translated into Finnish texts. Theconclusion that Jantunen reports is that translations tend to exhibit untypicallexical combinations, due to the source language, and that grammatical combina-

tions tend to be similar in translations and original texts, although the influenceof the source languages cannot be excluded.

3.2 Explicitation

Even though the surge for translation universals happened in the last two de-cades, pointers towards the law of explicitation have existed since the middle ofthe century. Vinay performed a comparative study in 1958 between French andEnglish, and defines explicitation as:

”the process of introducing information into the target language which ispresent only implicitly in the source language, but which can be derivedfrom the context or the situation” [23]

Furthermore, Blum-Kulka notices the tendency of translations to be moreexplicit compared to the source texts, regardless of the language-specific expli-citness [12]. Later, Baker defines the explicitation universal as the tendency to”spell things out rather than leave them implicit” [14].

Two categories of explicitation are described by Pym: the obligatory one,forced by the language specificity, and the voluntary one, when the translator isadding optional information in the text to avoid misinterpretations [24]. Vander-auwera proposes the following list of explicitation repertoires: expansion of con-densed passage; addition of modifiers, qualifiers and conjunctions to achievegreater transparency; and addition of extra information and insertion of expla-nations, amongst many others [25].

Another study, which exploits the Translational English Corpus (TEC), indi-cates a significant use of the optional that with the verbs say and tell in trans-lated texts compared to a British National Corpus (BNC) comparable sub-corpus[26]. Tables 3 and 4 contain the results of the analysis, having included themboth as absolute and percentage values. It is immediately clear that the that-connective is far more frequent in TEC than in BNC. By contrast, the zero-connective is more frequent for all forms of both verbs in the BNC corpus.These differences have been proven to be statistically significant. Furthermore,the results of the say and tell study were consistent with findings by Burnettwho reviewed use of the verbs suggest, admit, claim, think, believe, hope andknow in both TEC and BNC [27].

A similar study investigating the verb promise found the same pattern be-tween translated and non-translated English [28]. Table 5 shows that althoughthe number of occurrences of ’promise’ followed by that or zero connective is veryclose in the two corpora (131 in the TEC and 135 in the BNC), the distributionsare almost directly inverse.

Also, the explicitation universal is investigated in simultaneous interpreting,and Gumul concludes that, to a certain extent, explicitation appears to bedependent on the direction of interpreting [29].

In contrast to simplification, the explicitation universal is maybe the leastcontroversial hypothesis according to the conclusions of several studies. However,

Table 3. Distribution of say + that/zero in the BNC and TEC [26].

Connective BNC TEC

that712 77523.72% 50.22%

zero2289 76876.28% 49.78%

Total 3001 1543

Table 4. Distribution of tell + that/zero in the BNC and TEC [26].

Connective BNC TEC

that997 71941.45% 62.74%

zero1408 42758.55% 37.26%

Total 2405 1146

Table 5. Distribution of promise + that/zero in the BNC and TEC [28].

Connective BNC TEC

that46 8934.1% 67.9%

zero89 4265.9% 32.1%

Total 135 131

the study of English into Korean translation described by Cheong contradictsthis claim [30].

Cheong clearly distinguishes between two reverse operations, explicitationand implicitation, and notes that implicitation has been neglected in the studyof translation universals. Therefore, by using a English-Korean corpus, he triesto determine which of the two phenomena is the dominant one, to test whetherthe direction of the translation has any effect on them, and to identify the factorsthat influence the phenomena. After applying four different measurement unitsand a set of newly devised variables, the author concludes that both explicitationand implicitation are present in the target text, and that the direction of thetranslation influences the behaviour of texts regarding the two phenomena, evenin cases where the identical language pair is involved [30].

Although no studies have yet been performed in the case of the Romanianlanguage, it is possible for explicitation and implicitation to manifest themselvesin Romanian translations too. For instance, the flexible Romanian grammarallows zero anaphora to exist with a relatively high frequency, of 0.32 zeropronominal anaphors per sentence [31]. Therefore, when translating into Roma-nian from a language with a very low degree of zero pronouns, such as English,French, or German, the explicit information in the source text may becomeencoded implicitly in some other word in the target text, without it beingdemanded by grammar rules. Thus, the identification of zero pronouns in Roma-nian [32] might prove itself a valuable characteristic of implicitation. On theother hand, when translating between language pairs both of which have a highdegree of zero anaphora (e.g., Spanish, Portuguese, Korean, or Chinese), bothexplicitation and implicitation might occur, in order to avoid ambiguities or tocreate a more natural text.

4 Conclusions

This report contains in brief some of the results that have been obtained in thefield of translation studies, more specifically on the simplification and explicita-tion universals. We have described various methodologies of study, and presentedthe conclusions of the authors regarding the validity of the two universals.

Although intensely studied in the last two decades, simplification is notyet completely and clearly confirmed as a universal. Although there are manydifferently undertaken studies supporting it, there are also studies which contra-dict it. It is still a difficult task to extract the characteristics of this phenomenon.Nonetheless, efforts are being continuously made on different language pairs andpromising results started to appear in the past few years.

In the case of explicitation, things seem to be clearer than with simplification.It occurs quite often in many translations, mostly in order to avoid misinterpre-tations in the target text. However, there are cases when explicitation appearscombined with its reverse function, implicitation, making it rather complicatedto analyse the data and draw conclusions. Nevertheless, most studies confirmthis hypothesis, making it one of the most plausible universals.

A successful validation of translation universals could be of great help in manyother nlp tasks, which rely on translations. For instance, statistical machinetranslations could be improved by automatically determining the direction oftranslation, and multilingual plagiarism detection may benefit too. Moreover,human translators would become more conscious of the way they translate, andsuch universals could aid them to self-assess their work. However, due to thenumber of disconfirming experiments, it is possible for the name of translationuniversal to not be the most felicitous one; one could rename it to, for example,translation trend.

References

1. Toury, G.: In search of a theory of translation. The Porter Institute for Poeticsand Semiotics, Tel Aviv (1980)

2. Borin, L., Prutz, K.: Through a glass darkly: Part-of-speech distribution in originaland translated text. In Daelemans, W., Sima’an, K., Veenstra, J., Zavrel, J., eds.:Computational Linguistics in the Netherlands 2000. (2001) pp. 30–44

3. Hansen, S.: The Nature of Translated Text - An Interdisciplinary Methodology forthe Investigation of the Specific Properties of Translations. Saarland University,Saarbrucken (2003)

4. Teich, E.: Cross-Linguistic Variation in System and Text. Mouton de Gruyter,Berlin (2003)

5. Gellerstam, M.: Translationese in Swedish novels translated from English. InWollin, L., Lindquist, H., eds.: Translation studies in Scandinavia. CWK Gleerup(1986) pp. 88–95

6. Baker, M.: Corpus linguistics and translation studies: Implications andapplications. In Baker, M., Francis, G., Tognini-Bonelli, E., eds.: Text andTechnology: In Honour of John Sinclair. John Benjamins, Amsterdam -Philadelphia (1993)

7. Laviosa, S.: Corpus-based Translation Studies. Theory, Findings, Applications.Rodopi, Amsterdam - New York (2002)

8. Tymoczko, M.: Computerised corpora and translation studies. Meta 43(4) (1998)pp. 652–659

9. Chesterman, A.: A causal model for translation studies. In Olohan, M., ed.:Intercultural Faultlines. Research Models in Translation Studies I: Textual andCognitive Aspects. St. Jerome, Manchester (2000)

10. Goutte, C., Kurokawa, D., Isabelle, P.: Improving SMT by learning translationdirection. In: EAMT 2009 workshop ”Statistical Multilingual Analysis for Retrievaland Translation”. (2009)

11. Corpas Pastor, G.: Investigar con corpus en traduccion: los retos de un nuevoparadigma. Peter Lang, Berlin & New York (2008)

12. Blum-Kulka, S.: Shifts of cohesion and coherence in translation. In House, J.,Blum-Kulka, S., eds.: Interlingual and Intercultural Communication. Discourse andCognition in Translation and Second Language Acquisition. Narr (1986) pp. 17–35

13. Toury, G.: Descriptive Translation Studies and Beyond. John Benjamins,Amsterdam (1995)

14. Baker, M.: Corpus-based translation studies: The challenges that lie ahead.In Somers, H., ed.: Terminology, LSP and Translation: Studies in Language

Engineering in Honour of Juan C. Sager. John Benjamins, Amsterdam -Philadelphia (1996)

15. Corpas Pastor, G., Mitkov, R., Afzal, N., Pekar, V.: Translation universals: Dothey exist? A corpus-based NLP study of convergence and simplification. In:Proceedings of the AMTA. (2008)

16. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Towards simplification: Asupervised learning approach. In: Proceedings of Machine Translation 25 YearsOn. (November 2009)

17. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Identification of translationese:A machine learning approach. In Gelbukh, A., ed.: Proceedings of the 11th Inter-national Conference on Computational Linguistics and Intelligent Text Processing(CICLing). (2010) pp. 503–511

18. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1) (1986) pp.81–106

19. Baroni, M., Bernardini, S.: A New Approach to the Study of Translationese:Machine-learning the Difference between Original and Translated Text. LitLinguist Computing 21(3) (2006) pp. 259–274

20. Mauranen, A.: Strange strings in translated language: A study on corpora. InOlohan, M., ed.: Intercultural Faultlines. Research Models in Translation StudiesI: Textual and Cognitive Aspects. St. Jerome, Manchester (2000) pp. 119–141

21. Jantunen, J.H.: Synonymity and lexical simplification in translations: A corpusbased approach. Across Languages and Cultures 2(1) (2001) pp. 97–112

22. Jantunen, J.H.: Untypical patterns in translations: Issues on corpus methodologyand synonymity. In Mauranen, A., Kujamaki, P., eds.: Translation Universals : DoThey Exist? Volume 48. John Benjamins (2004) pp. 101–126

23. Vinay, D.: Stylistique Comparee du Francais et de l’Anglais. Didier (1958)24. Pym, A.: Explaining explicitation. In Karoly, K., Foris, A., eds.: New Trends

in Translation Studies. In Honour of Kinga Klaudy. Akademiai Kiado, Budapest(2005) pp. 29–34

25. Vanderauwera, R.: Dutch novels translated into English: the transformation of a”Minority” literature. Rodopi, Amsterdam (1985)

26. Olohan, M., Baker, M.: Reporting ’that’ in translated English: Evidence forsubconscious processes of explicitation? Across Languages and Cultures 1(2)(2000) pp. 141–158

27. Burnett, S.: A corpus-based study of translational English. Master’s thesis,University of Manchester (1999)

28. Olohan, M.: Spelling out the optionals in translation: A corpus study. In: UCRELTechnical Papers. Volume 13. (2001) pp. 423–432

29. Gumul, E.: Explicitation in simultaneous interpreting: A strategy or a byproductof language mediation? Across Languages and Cultures 7(2) (2006) pp. 171–190

30. Cheong, H.J.: Target text contraction in English-into-Korean translations: Acontradiction of presumed translation universals? Meta 51(2) (2006) pp. 343–367

31. Mihaila, C., Ilisei, I., Inkpen, D.: Romanian Zero Pronoun Distribution: AComparative Study. In: Proceedings of the 7th International Conference onLanguage Resources and Evaluation (LREC). (2010)

32. Mihaila, C., Ilisei, I., Inkpen, D.: To Be or Not to Be a Zero Pronoun: A MachineLearning Approach for Romanian. In: Proceedings of the Processing ROmanian inMultilingual, Interoperational and Scalable Environments Workshop (PROMISE).(2010)

Translation studies: Simplification and Explicitation Universals

Technology

Transcript of Translation studies: Simplification and Explicitation Universals