Creating bilingual dictionaries for under-resourced...
Transcript of Creating bilingual dictionaries for under-resourced...
![Page 1: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/1.jpg)
1
Creating bilingual dictionaries for under-resourced languages
Marzanne Janse van Rensburg and Vivian Marr Oxford University Press
Afrilex, Windhoek, June 2019
![Page 2: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/2.jpg)
Creating dictionaries for under-resourced languages
Timing: 8.30-9.45: 1. The Oxford Global Languages
programme
2. Content creation methodology • Principles • Single lexicographical framework • Hands-on translation and
discussion (50 minutes)
9.45-10.15: Tea break!
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 3: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/3.jpg)
Creating dictionaries for under-resourced languages
Timing: 10.15-12.00 1. Corpora
• Gap-filling • Building your own corpus • Using corpus to build a
framework entry 2. Ask questions at any time!
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 4: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/4.jpg)
Creating dictionaries for under-resourced languages
Oxford Global Languages The vision:
To bring lexical content online for 100 of the world’s languages and make it available to developers, consumers, licensees, and researchers for a wide variety of uses
The mission:
To improve the quality and breadth of global linguistic knowledge and communication, giving voice to all people in a rapidly changing world
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 5: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/5.jpg)
Oxford Global Languages
20 languages launched to date – 6 more almost ready to go
https://developer.oxforddictionaries.com https://www.oxforddictionaries.com
Georgian
Greek
Gujarati
Hausa
Hindi
Igbo
Indonesian
isiXhosa
isiZulu
Kiswahili
Latvian
Malay
Marathi
Northern Sotho
Persian
Romanian
Setswana
Southern Quechua
Tajik
Tamil
Tatar
Telugu
Tok Pisin
Turkmen
Urdu
Yoruba
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 6: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/6.jpg)
Creating dictionaries for under-resourced languages
• The principles – applicable to all languages: – Single neutral, common framework
– Arranged by frequency
– Translate
– Peer review
– Discuss
– Finalize
– Reverse
• Working iteratively, based locally
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 7: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/7.jpg)
Creating dictionaries for under-resourced languages
• Why a single common, neutral framework?
– Reusable as the core for content creation
– Scalable – can be iteratively expanded and enhanced
– Common source for multiple languages allows automatic generation of language pairs
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 8: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/8.jpg)
Creating dictionaries for under-resourced languages
• WordReference automatically generated “virtual dictionaries”
• “Far from perfect” – but useful
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 9: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/9.jpg)
Creating dictionaries for under-resourced languages
• The results:
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 10: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/10.jpg)
ELF: English Language Framework
• Frequency order • Prioritized senses • Translation help to aid
editors • Available to license in
exchange for digital rights
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 11: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/11.jpg)
Creating dictionaries for under-resourced languages
• What’s the minimum we want to offer users? – Headword
– Part of speech
– Sense division
– Example sentences showing language in use
– Language register shown
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 12: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/12.jpg)
Creating dictionaries for under-resourced languages
• Translations, of course! Not definitions.
• Translations must be:
– Up-to-date
– Accurate
– Matching in register
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 13: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/13.jpg)
Creating dictionaries for under-resourced languages
• Additional help for licensees
– Translation help
– Training materials
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 14: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/14.jpg)
Creating dictionaries for under-resourced languages
• Arranged by frequency, derived from corpus
– Covers core language first
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 15: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/15.jpg)
Creating dictionaries for under-resourced languages
• Based locally
– Recruitment where the language is used
– Opportunities for local skills development and expansion
• Working iteratively
– Build and release, build and release
– Quicker to market and to generate revenue
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 16: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/16.jpg)
Content reversal
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 17: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/17.jpg)
Creating dictionaries for under-resourced languages
• Content reversal
– Translations become headword candidates
– Still experimenting
– Success with English-Igbo, English-Yoruba, English Marathi – but still small
– Other sources needed to ensure appropriate and sufficient coverage
– Creating framework on the fly
– Challenge to build iteratively
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 18: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/18.jpg)
Creating dictionaries for under-resourced languages
• Editing tools
– Spreadsheet approach
– Also possible to work in XML, with tools like TshwaneLex
– Moving to JSON format for purely digital product
– Building our own editing tool - DELTA
– Dictionary Creation Package can be supplied as spreadsheet or along with DELTA
Afrilex workshop, June 2019 (C) Oxford
University Press
![Page 19: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/19.jpg)
Creating dictionaries for under-resourced languages
• Translation task: 20 minutes in pairs or groups of three
– Choose an entry or entries from the English Language Framework (ELF) and work through translating into your language
– Followed by 20 minutes of group discussion:
• what went well/could have gone better
• where the help provided was sufficient/could have been improved
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 20: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/20.jpg)
Creating dictionaries for under-resourced languages Sample ELF entries to choose from:
bear
centre
dark
exercise
happy
measure
oil
paper
quick
sleep
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 21: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/21.jpg)
Tea break!
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 22: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/22.jpg)
Gap-filling in an Oxford Kiswahili dictionary
![Page 23: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/23.jpg)
Approach
Used all the OUP-published Kiswahili textbooks
and literature for Kenya and Tanzania to build a
corpus
Extracted a wordlist from the textbook corpus
sorted by frequency
![Page 24: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/24.jpg)
Next step: to clean up these lists.
This entails removing any words that shouldn’t be
on the list, e.g. names of places and people, file
extensions like .indd, numerals, etc.
![Page 25: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/25.jpg)
In order to see which words had to be
considered for inclusion in the new edition of
the dictionary, a comparison was necessary:
Comparing the existing wordlist (of the
current dictionary) to the wordlist extracted
from the corpus.
![Page 26: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/26.jpg)
This is easy to do in Excel:
Paste the two lists next to each other and run a
formula that shows which words appear in the one
list but not in the other.
Formula:
=IF(ISERROR(VLOOKUP(A:A,B:B, 1, FALSE)),FALSE,TRUE)
![Page 27: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/27.jpg)
![Page 28: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/28.jpg)
![Page 29: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/29.jpg)
Sketch Engine and WebBootCaT technology
Some background and a hands-on exercise
![Page 30: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/30.jpg)
How it works
Sketch Engine has a built-in corpus tool that
enables users to extract information from the
internet to build a corpus.
![Page 31: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/31.jpg)
Sketch Engine’s corpus-building tool, which uses
the WebBootCaT technology, automatically
creates a text corpus from relevant web pages.
Data downloaded from the internet is cleaned,
optionally de-duplicated and non-text is eliminated
to obtain linguistically valuable text material.
![Page 32: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/32.jpg)
Let’s try this technology in Sketch Engine.
Step 1 – Click on “New corpus”
![Page 33: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/33.jpg)
Step 2 – Give your corpus a name and specify the
language
![Page 34: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/34.jpg)
Step 3 – Click on “Find texts on the web”
![Page 35: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/35.jpg)
Step 4 – Click on “Web search” and specify the
maximum URLs and then click “Go”
![Page 36: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/36.jpg)
Once the corpus is compiled, a pop-up will appear
to show you it is done
![Page 37: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/37.jpg)
Now you can start having fun!
Search for concordances, compile wordlists, etc.
![Page 38: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/38.jpg)
Creating dictionaries for under-resourced languages
• Using SketchEngine to identify:
– Common collocates of different types
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 39: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/39.jpg)
Creating dictionaries for under-resourced languages
• Using SketchEngine to find:
– Good (bland) example phrases, which illustrate how the headword is used in context
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 40: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/40.jpg)
Gap filling
Word lists, reading programme, corpus research, …
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 41: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/41.jpg)
Types of corpora synchronic corpus: texts from a single period of time
diachronic corpus: texts collected from over a long period time, to show chronological change
– monitor corpus: frequently updated corpus of contemporary language, for identifying emerging language trends
– historical corpus: uses historical texts to show language change over a long period of time
learner corpus: content produced by learners of a language, used to study language learning
multilingual corpus: encompasses text in two or more languages
– parallel corpus: versions of the same texts in two (or more) different languages, aligned with each other to aid in identifying translations
– comparable corpus: a set of corpora in multiple languages on the same topic but not translating the same text
Afrilex workshop, June 2019 (C) Oxford
University Press
![Page 42: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/42.jpg)
Creating dictionaries for under-resourced languages
• Corpus task: 15 minutes in pairs or groups of three
– Choose an entry or entries from a list of suggestions from the Oxford English Corpus and use SketchEngine to create a framework entry for a basic dictionary
– It should include: part(s) of speech, sense division, illustrative example(s)
– Followed by 15 minutes of group discussion as to
• what you found interesting about using corpus
• how you might use it in the future
Afrilex workshop, June 2019 (C) Oxford
University Press
![Page 43: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/43.jpg)
Creating dictionaries for under-resourced languages suggested corpus look-up entries:
aerial
alcoholic
Austrian
curve
insult
manual
queue
riot
remedy
throttle
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 44: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/44.jpg)
Tools that may be of interest
• Sketch Engine (https://app.sketchengine.eu/#open): corpus software and corpora
in many languages, including WebBootCAT (https://www.sketchengine.eu/guide/create-a-corpus-from-the-web/): corpus builder
• NoSketch Engine (https://www.sketchengine.eu/nosketch-engine/): free version with limited functionality)
• TextSTAT (http://neon.niederlandistik.fu-berlin.de/en/textstat/): corpus software • AntConc (http://www.laurenceanthony.net/software/antconc/): corpus software • Wordsmith (https://www.lexically.net/wordsmith/): corpus software • BYU corpora (https://www.english-corpora.org/): corpora in English and Spanish
(https://www.corpusdelespanol.org/) • CQPWeb (https://cqpweb.lancs.ac.uk/): corpora in many languages • Lexonomy.eu (https://www.lexonomy.eu/): dictionary editing and publishing tool
Afrilex workshop, June 2019 (C) Oxford University Press
![Page 45: Creating bilingual dictionaries for under-resourced languagespvatn.net/wp/wp-content/uploads/2019/06/OUP-Afrilex-full...Tok Pisin Turkmen Urdu Yoruba Afrilex workshop, June 2019 (C)](https://reader036.fdocuments.us/reader036/viewer/2022070115/60b33a6b2dca0e345c03e91c/html5/thumbnails/45.jpg)
45
Thank you
With special thanks to: everyone at Afrilex who made this workshop possible, the team at Lexical Computing Ltd for access to SketchEngine and the Oxford English Corpus, and the team at OUP, especially Andy Allen, Tressy Arts, Emma Davies, Phillip Louw, Katherine Martin, Judy Pearsall, and Angus Stevenson
Marzanne Janse van Rensburg and Vivian Marr Oxford University Press