Corpora translation

Post on 10-May-2015

144 views 0 download



Trabalho de Semântica - FFLCH noturno - Corpora in translation

Transcript of Corpora translation

Corpora in translation








What is Corpora? Corpus is a large and structured set of texts or audio transcription used by linguistic studies to do statistical analysis and hypothesis testing.

A Corpus can be used for different purposes, according to the intent of usage.

What is Corpora? There are different types of corpus - Specialized corpus, General corpus, Comparable corpora, Parallel corpora, Learner corpus, Historical or diachronic corpus, Monitor Corpus.

This variety of corpus setting constitutes a Corpora – plural for various corpus.

Corpora Studies in translation

Considering language as a “social practice and a theory of human knowledge and experience” (Bakhtin), it must be stated:

Language cannot be literally translated into a second language, but it must be considered the variations and occurrences of a contextual usage of a certain text/essay/discourse/utterance.

Corpora Studies in translation

When it comes to the translation field, we can consider parallel corpus and comparable corpus as useful types of corpora.

Parallel Corpus is the study of the occurrences of linguistic hypothesis between two or more languages.

Translation examples Comparable Corpus is the set of texts, essays, conversations etc. which can be used to identify differences and equivalences in each language.

Automatic translation: based on grammatical rules/ based on corpora studies. What does it change?

Examples considering Google translate and the difficulties in automatic translation:

Google Example

Original excerpt : The door near the stairs with the “Members only” sign had tempted Nadia from the moment she first entered the club.

Translation - between september/2009 and april/2011):

A porta perto da escada com o ‘Só’ sinal tinha tentado Nadia a partir do momento que ela entrou pela primeira vez o clube.

Google Example

Translation in may/2014:

A porta perto da escada com o sinal de "membros apenas" tinha tentado Nadia partir do momento em que ela entrou pela primeira vez o clube.

It is possible to notice an improvement of the Google automatic translation, according to an updated database of is corpus.

Linguee On the semantic field, parallel corpus is useful to avoid cases of ambiguity, as seen on the previous example, in which potential equivalents are considered when translating the text.

In the case of comparable corpus, a good example is the data based program Linguee, in which plenty information of translation are crossed and presented to the student/researcher.


On the example of a literary text such as Shall I compare thee to a summer’s day? (Shakespeare), one has the opportunity to observe a database corpus of translated literary texts and thus chose the best option of translation:

Linguee example

Google x CocaAnother example is the translation from Portuguese to English, in which it’s possible to compare two different tools. To translate the following excerpt from a Machado de Assis’ short story:

A Segunda VidaMonsenhor Caldas interrompeu a narração do desconhecido:

— Dá licença? É só um instante.

Google x Coca Google Translation:

The Second Life

Monsignor Caldas interrupted the narration of the unknown:

— Excuse me? It's just a moment.

Coca translation

Using the toll Corpus of Contemporary American English (COCA):

Coca example

Search: “The Second Life”

Search: “A Second Life”

A comparison using coca allows a better version on the translation

Translation using the toll COCA

Google translation:

The Second Life

Monsignor Caldas interrupted the narration of the unknown:

— Excuse me? It's just a moment.

With the help of Coca:

A Second Life

Monsignor Caldas interrupted the narration of the stranger: “Excuse me? Just a moment.”

Conclusion We all know technology helps a lot when we look for a word’s meaning or try to understand a text in other language. But, semantically speaking, translation is much more complex than that.

The analysis from the corpora translations from Google, Linguee and Coca shows us that, thus helpeful, those programs are limited and a proper translation might require a further approach.

Conclusion It is possible to say that an automatic translation might be not enough in some cases.

Only a human has the hability to understand the context, identify irony, ambiguity and others figures of speech, and decode a neologism.