Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Transcript of Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
![Page 1: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/1.jpg)
Building a parallel corpus for translation research
and much more"
Ana Frankenberg-Garcia
![Page 2: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/2.jpg)
The study of human translation
Traditionally not a hard scienceDifficult to be systematic
With the advances of corpus linguistics,
things can change …
![Page 3: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/3.jpg)
What is a corpus?
large
specific criteriatext-retrieval software
machine-readable
naturally occurring texts
![Page 4: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/4.jpg)
Advantages of using corpora to study human translation
An enormous amount of translated texts
Systematic analyses
Quantifiable results
![Page 5: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/5.jpg)
Corpora used in translation practice and research
1. Bilingual comparable corpora Farmhouse holidays (EN) & Agroturismo (IT)
2. Monolingual comparable corpora Translational English Corpus (EN)
3. Simple parallel corpora Tectra (EN-GL)
4. Bidirectional parallel corpora COMPARA (PT-EN and EN-PT)
![Page 6: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/6.jpg)
Building parallel corpora text selection
• Genre (scientific, imaginative, technical, etc.)
• Mode (oral? written?)
• Variety (standard? regional?)
• Time (contemporary? older?)
• Languages (which? just two or more?)
• Translations (professional? native speakers? different translators? )
• Simple or bidirectional?
Are there translations?
![Page 7: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/7.jpg)
Building parallel corpora example of interrelated factors
PT-EN or EN-PT PT-EN ↔ EN-PT
scientificacademic
tourism
literaturepolitics (EP)
Languages: PT-ENGenreoral popular
![Page 8: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/8.jpg)
Building parallel corpora
Personal use Shared use
copyright permissions
results verifiable
more users and uses
copyright
no hassle
![Page 9: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/9.jpg)
Building parallel corporacopyright
• Two permissions, double the work
• Publishers, authors and translators generally don’t know what a corpus is
• Protect
• Advertise
![Page 10: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/10.jpg)
Building parallel corpora alignment
Text?
Paragraph?
Sentence?
Clause?
Word?
Which parts of ST and TT match?
![Page 11: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/11.jpg)
Building parallel corpora tagsAlignment tags
e.g. textual, grammatical, semantic
What do we want tags for? More pre-processing, less post-processing
Optional tags
<id=EBJT1 1845>Joe watched Robin climb into the trailer and man-handle the calves one by one towards the ramp, their winglike ears pierced with plastic identity tags.
<id=EBJT1 1845>Joe ficou a ver Robin subir para o atrelado e encaminhar as vitelas uma a uma para a rampa, com as suas orelhas, que faziam lembrar asas, furadas e umas etiquetas de plástico a identificá-las.
![Page 12: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/12.jpg)
Our options for
A bidirectional parallel corpus of English and Portuguese
Funding Portuguese Government and European Union (FEDER and FSE) contract ref. POSC/339/1.3/C/NAC
Project leaders Ana Frankenberg-Garcia & Diana SantosResearch assistants Pedro Sousa, Rosário Silva & Susana Inácio
![Page 13: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/13.jpg)
PT Source texts EN Source texts
Corpus structure
EN TranslationsPT Translations
parallel
bi-directional
parallel
![Page 14: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/14.jpg)
PT ENPT1 PT2
EN1 EN2
ST TT
![Page 15: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/15.jpg)
Language varieties
Portugal
Brazil
Angola
Mozambique
UK
US
South Africa
PORTUGUESE ENGLISH
Unbalanced distribution!
![Page 16: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/16.jpg)
Publication dates
1837
2002
1880
1997
1988
1914
![Page 17: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/17.jpg)
Genre
Published fiction other genres
EXTENSIBLE
![Page 18: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/18.jpg)
Portuguese authors
PortugalCamilo Castelo BrancoEça de QueirósJosé Cardoso PiresJosé SaramagoJorge de SenaLídia JorgeMário de CarvalhoSá Carneiro
Brazil Aluísio AzevedoAutran Dourado Chico Buarque Jô SoaresJosé de AlencarMachado de AssisManuel Antônio de AlmeidaMarcos ReyPatrícia MeloPaulo CoelhoRubem Fonseca
MozambiqueMia Couto
AngolaJosé Eduardo Agualusa
![Page 19: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/19.jpg)
English authors
British IslesDavid Lodge
Ian McEwan
Julian Barnes
Joseph Conrad
Joanna Trollope
Kazuo Ishiguro
Lewis Carrol
Mary Shelley
Oscar Wilde
United StatesHenry JamesEdgar Allan PoeRichard Zimler
South AfricaNadine Gordimer
![Page 20: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/20.jpg)
Portuguese translators
Ana Maria Amador, Ana Falcão Bastos, Ana Luísa Faria, Aníbal Fernandes, Carlos Grifo Babo, Cristina Ferreira de Almeida, Cristina Rodriguez, Eduardo Guerra Carneiro, Fernanda Pinto Rodrigues, Geraldo Galvão Ferraz, Helena Cardoso, Januário Leite, José Viera Lima, J. Teixeira de Aguilar, Lídia Cavalcante-Luther, Lucinda Santos Silva, Luís Lobo, Manuel João Gomes, M. F. Gonçalves de Azevedo, Maria Carlota Pracana, Maria do Carmo Figueira, Mário Martins de Carvalho, Nina Videira, Paula Reis, Yolanda Artiaga.
![Page 21: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/21.jpg)
English translators
Adria Frizzi, Alan Clarke, Alexis Levitin, Alice Clemente, Cliff Landers, David Brookshaw, David Rosenthal, Elizabeth Lowe, Ellen Watson, Helen Caldwell, Giovanni Pontiero, Graeme Mac Nicoll, Gregory Rabassa, Isabel Burton, John Gledson, John Parker, John Byrne, John Vetch, Margaret Jull Costa, Mary Fitton, Natália Costa, Peter Bush, Richard Zenith, Ronald W. Sousa.
![Page 22: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/22.jpg)
Can any text be included in the corpus?
Only published source texts and translations
Only English translated directly from Portuguese
Portuguese translated directly from English
Only human translations!
![Page 23: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/23.jpg)
72 source texts (extracts)
75 translations
Texts
![Page 24: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/24.jpg)
Size
1,549,551 1,436,493words words in in English Portuguese
Possibly the largest existing edited parallel corpus
![Page 25: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/25.jpg)
Interface
Free
Easy to use by people who have never heard of corpora before
Powerful and flexible tool for experienced corpus users
Results good for research and education
www.linguateca.pt/COMPARA/
![Page 26: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/26.jpg)
![Page 27: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/27.jpg)
“nodded”
![Page 28: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/28.jpg)
![Page 29: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/29.jpg)
![Page 30: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/30.jpg)
ST
TT
0
2
4
6
8
10
12
14
100 K words
Distribution of “nodded” in source texts and translations
![Page 31: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/31.jpg)
Users and uses
Language learners and anyone working with PT-EN bilingual dictionary with examples
Language teachers exercises and tests
Translators language equivalents
Translation lecturers exercises & problems
Translation theorists test translation hypotheses
Lexicographers bilingual dictionaries
Computational linguists and language engineers machine translation and other applications
![Page 32: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/32.jpg)
Backstage options
![Page 33: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/33.jpg)
Text tags
EBJB1.ptele revelou-me o seu interesse por Gosse <tnote> Edmund William Gosse (1849-1928), crítico inglês </tnote> e pela sociedade literária inglesa dos finais do século passado.
EBDL2T1.enWhen we sat on the sofa together to watch <title>News at Ten</title>
![Page 34: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/34.jpg)
EBDL1T1.pt passou-me uma receita de <named> Valium </named>
EBJB1.en the white bear, <foreign> thalassarctos maritimus </foreign>, is the aristocrat of bears...
EBDL1T1.ptacaba por se esquecer de ter medo, até que acaba por verificar que não há <emph> de que </emph> ter medo.
Text tags
![Page 35: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/35.jpg)
![Page 36: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/36.jpg)
![Page 37: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/37.jpg)
1 alignment unit = 1 source-text sentence
S
S
S
S
S2
S S(+S)
S
S½
Ø
ST TT
Alignment options and tags
![Page 38: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/38.jpg)
![Page 39: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/39.jpg)
![Page 40: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/40.jpg)
Portuguese: PALAVRAS
Petrus/PROP pediu/V_fmc a/DETartd especialidade/N da/PRP+DETartd casa/N --/PU uma/DETarti paella/N valenciana/ADJ --/PU que/SPECrel comemos/V em/PRP silêncio/N ,/PU acompanhados/V apenas/ADV do/PRP+DETartd saboroso/ADJ vinho/N Rioja/PROP ./PU
Grammar tags
![Page 41: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/41.jpg)
[pos="V.*"] "silêncio"
![Page 42: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/42.jpg)
![Page 43: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/43.jpg)
English: CLAWS (coming soon)
Petrus/NP1 asked/VVD for/IF the/AT specialty/NN1 of/IO the/AT house/NN1 --a/AT1 Valencia/NP1 paella/NN1 --which/DDQ we/PPIS2 ate/VVD in/II silence/NN1 ./.
Grammar tags
![Page 44: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/44.jpg)
I did, too --changed over to the knitted tie at a <sem=“cor”> red </sem>light.
People interested in creating specific tags for their research can do so, as long as they do the tag insertion and revision work
Specific tag revision interface underway (Sousa, in preparation)
e.g. semantic tag for colour (Inácio et al. 2007)
Other tags
![Page 45: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/45.jpg)
![Page 46: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.](https://reader036.fdocuments.us/reader036/viewer/2022070311/552fc143497959413d8dff5c/html5/thumbnails/46.jpg)
1. Observing source texts and translations
2. Constrasting Portuguese and English
3. Comparing translated and untranslated language
4. Examining the characteristics of translated texts
Research work
Studies unthinkable before corporaMany other studies possible!
www.linguateca.pt/COMPARA/ComparaPublications.html