Int J Lexicography 2008 Kaalep 369 94

26
CREATING SPECIALISED DICTIONARIES FOR FOREIGN LANGUAGE LEARNERS: A CASE STUDY Heiki-Jaan Kaalep: Department of General Linguistics, University of Tartu and Jaan Mikk: Department of General Education, University of Tartu, U « likooli 18, 50090 Tartu, Estonia ([email protected]) Abstract The paper describes a set of 12 specialised Estonian-Russian dictionaries for Russian schools, motivated by the socio-cultural context in Estonia that favours Russian- speaking people learning Estonian. The dictionaries are of L1-L1-L2 type and include terms, their main inflectional forms, explanations and the Russian translation of the term. To make the dictionaries as comprehensible as possible, the rules for clear writing were followed. Natural language processing tools were used to facilitate the work of the dictionary compilers by providing feedback on the vocabulary they use and by automatically generating the inflectional forms and asterisks for referencing terms in the explanations. The dictionaries were printed in paper format and made available online free of charge. 1. Introduction When creating dictionaries, various theoretical and practical considerations must be taken into account. The first practical factor influencing all other considerations is the amount of planned work – that is, the cost of the whole enterprise. This essentially depends on the size of the dictionary and whether several dictionaries have to be created simultaneously during a short period, in which case, the issue becomes even more complicated. Dictionaries have to suit the needs of future users. For example, depending on the type of language knowledge, a dictionary can be mono-, bi- or multilingual; depending on the type, scope and depth of domain or world knowledge, a dictionary can be general or specialised (terminological). Further distinctions are possible (Swanepoel 2003), but in real life, a dictionary often represents a mixed type – seeking to cater to the different needs of the potential users. International Journal of Lexicography, Vol. 21 No. 4. Advance access publication 5 June 2008 ß 2008 Oxford University Press. All rights reserved. For permissions, please email: [email protected] doi:10.1093/ijl/ecn017 369 at Universitatea Transilvania on January 8, 2014 http://ijl.oxfordjournals.org/ Downloaded from

description

Un articol folositor din domeniul lexcografiei.

Transcript of Int J Lexicography 2008 Kaalep 369 94

  • CREATING SPECIALISEDDICTIONARIES FOR FOREIGNLANGUAGE LEARNERS:A CASE STUDY

    Heiki-Jaan Kaalep:Department of General Linguistics,University of Tartu andJaan Mikk: Department of General Education,University of Tartu,U likooli18, 50090 Tartu,Estonia ([email protected])

    Abstract

    The paper describes a set of 12 specialised Estonian-Russian dictionaries for Russian

    schools, motivated by the socio-cultural context in Estonia that favours Russian-

    speaking people learning Estonian. The dictionaries are of L1-L1-L2 type and include

    terms, their main inflectional forms, explanations and the Russian translation of the

    term. To make the dictionaries as comprehensible as possible, the rules for clear writing

    were followed. Natural language processing tools were used to facilitate the work of

    the dictionary compilers by providing feedback on the vocabulary they use and by

    automatically generating the inflectional forms and asterisks for referencing terms in the

    explanations. The dictionaries were printed in paper format and made available online

    free of charge.

    1. Introduction

    When creating dictionaries, various theoretical and practical considerations

    must be taken into account. The first practical factor influencing all other

    considerations is the amount of planned work that is, the cost of the whole

    enterprise. This essentially depends on the size of the dictionary and whether

    several dictionaries have to be created simultaneously during a short period, in

    which case, the issue becomes even more complicated.

    Dictionaries have to suit the needs of future users. For example, depending on

    the type of language knowledge, a dictionary can be mono-, bi- or multilingual;

    depending on the type, scope and depth of domain or world knowledge,

    a dictionary can be general or specialised (terminological). Further distinctions

    are possible (Swanepoel 2003), but in real life, a dictionary often represents

    a mixed type seeking to cater to the different needs of the potential users.

    International Journal of Lexicography, Vol. 21 No. 4. Advance access publication 5 June 2008 2008 Oxford University Press. All rights reserved. For permissions,please email: [email protected]

    doi:10.1093/ijl/ecn017 369 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • In what follows we present the choices we made in the course of the project,

    the aim of which was to create a set of 12 specialised Estonian-Russian school

    dictionaries, containing over 14,000 terms altogether. The electronic versions of

    the dictionaries are freely available at http://www.keeleveeb.ee.

    2. The language situation in Estonia

    One third of the Estonian population is of non-Estonian origin. This group

    speaks Russian and most are not able to communicate in Estonian. The

    Russian-speaking minority immigrated to Estonia in the Soviet era. Many

    enterprises needed a labour force, especially in our capital Tallinn and in the

    Northeast of Estonia.

    Nowadays, Estonian is the only official language in Estonia. Knowledge of

    the Estonian language facilitates finding a job and promotion in a career. The

    unemployment rate was 12% among Estonians and 20% among non-Estonians

    in the years 1998 2000 (Pavelson and Luuk 2002: 102). Knowledge of the

    Estonian language was the most important factor in acquiring employment in

    the two largest cities in Estonia (Pavelson 1998: 217). The net salary of an

    employed non-Estonian constituted only 67% of the salary of an Estonian

    employee in Tallinn in 1999 (Pavelson and Luuk 2002: 99). All this is important

    in motivating non-Estonian speaking people to learn Estonian.

    The Estonian government formulated the State Programme for Integration.

    The aim of the programme is to help the Russian-speaking minority participate

    fully in the economic, cultural and political life of Estonia. To reach this aim,

    different measures have been planned, including teaching Estonian to adults

    and teaching it at Russian-speaking schools. At schools, some subjects are now

    taught in Estonian and the proportion of these subjects is increasing.

    Graduates from Russian-speaking compulsory basic level schools should be

    able to continue their studies at higher levels (secondary schools) where some

    subjects will be taught in Estonian (Development 2004: 43). In the programme

    for adult education, it is stressed that a terminological minimum in Estonian

    should be elaborated for the minority groups (Riiklik programm 2000). The

    importance of acquiring terminology in specialist fields is based on the finding

    that the words of a specialist language are repeated more often in a text than

    the common words used in everyday language (Kownacki 1955). If students

    know the specialist terms in a language, then they are prepared to understand

    the texts in that specialist field of knowledge.

    3. The basic requirements for the dictionaries

    In 2004, the Integration Foundation for non-Estonians initiated a call for

    tenders to compose specialised dictionaries in Estonian, covering twelve curri-

    culum subjects: art, biology, chemistry, geography, history, human studies,

    370 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • handicraft, domestic science and industrial arts, mathematics, music, physical

    education, physics and social science. The aim of the dictionaries was to help

    students in Russian-speaking schools learn subjects in Estonian. In some

    subject areas, specialised Estonian dictionaries and even textbooks were

    missing altogether at that time.

    The basic model of the dictionaries that this call for tenders had in mind

    followed the L1-L1-L2 principle used in the Kernerman series of dictionaries.

    In short, this would mean that the term and the explanation are in Estonian

    (L1-L1) and the term is also then translated into Russian (L2).

    What was the motivation for choosing L1-L1-L2 instead of L1-L2, given

    that research on general language dictionary use has shown that learners tend

    to prefer L1-L2 dictionaries (e.g. Piotrowsky 1989, Koren 1997, Hsien-jen

    2001, Laufer and Levitzky-Aviad 2006), and that there is inconclusive evidence

    about the superiority of general bilingualised over bilingual dictionaries in

    language learning (Pujol et al. 2006)?

    A bilingual dictionary may well be without explanations, if the words trans-

    lated stand for something familiar to the user for example, door or worm.

    Imagine, however, that after having found the translation in a dictionary, the

    user discovers that he does not know what his native language word means, as

    may well be the case for corundum (Estonian korund, Russian imorld,Tamm et al. 2005). In this case, the dictionary should provide an explanation

    after all, the goal is to help the user understand the meaning.

    One would expect that the majority of headwords in a specialised dictionary

    represent this last type, at least for layman users. Worse still, an ordinary word

    may well have a slightly different specialised meaning in a subject field, so that

    a mere translation would be actually imprecise, possibly even misleading. For

    example, the translation of laine wave would be bmjl wave, invoking amental image of a disturbed water surface. In physics, however, the term wave

    is defined as a moving oscillation (Partel 2005). Thus, the core meaning of the

    general language word invoked by a translation would distract the user from

    the actual concept in the specialised field.

    In fact, L1-L1-L2 is a rather common type of specialised dictionary, espe-

    cially if the dictionary is multilingual that is, L2 stands for the presentation of

    more than one language.

    The aim of the dictionaries was to give Russian-speaking people a practical

    tool for reading the curriculum subject-related texts in Estonian, using subject

    textbooks and discussing topics in Estonian. The subject related dictionaries

    had to supplement ordinary bilingual dictionaries by giving definitions or

    explanations (afterwards explanations) of the terms in Estonian and the

    translation of the terms into Russian (Development 2004: 42).

    The dictionaries had to include Russian-Estonian glossaries of terms as well.

    These lists were meant to help the Russian-speaking student when s/he writes

    a text in Estonian and cannot recall the Estonian term.

    Specialised Dictionaries for Foreign Language Learners 371 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • The dictionaries had to be user-friendly for informal learning. The expla-

    nations of the terms should be easily understandable and no complicated

    abbreviations, symbols or codes should be used (Development 2004: 46).

    Examples and illustrations should be used if necessary for a better under-

    standing of the explanations. Estonian is a highly inflective language, and

    therefore, the terms in dictionaries had to be given in each of the main forms

    nominative, genitive and partitive cases in the singular and partitive in the

    plural. The terms in the dictionaries had to be presented in alphabetical order.

    The total number of terms in the 12 dictionaries had to be 13,000 15,000.

    4. Elaboratedmodel for the dictionaries

    The prescribed aim of the specialised dictionaries was to support the acqui-

    sition of the Estonian language for special purposes that is, to support two

    cognitive tasks for the user: acquiring the concepts of a specialist field, and

    acquiring the linguistic means for expressing these concepts.

    The basic requirements were not specific enough to be used as the model for

    the dictionaries, so we had to elaborate them.

    4.1 User profile

    The profile of the intended user is determined according to two dimensions:

    encyclopaedic competence (i.e. knowledge of the field) and foreign-language

    competence (Bergenholtz and Tarp 1995: 21). Our users would presumably

    possess a low level of both encyclopaedic and linguistic competence. In add-

    ition, we would not expect the users to have sophisticated dictionary skills per

    se. This user profile requires explanations to be written in simple language. It

    also means that the conventions and symbols used in the dictionaries should be

    self-explanatory, and that the form of the dictionary entries should be simple.

    4.2 Linguistic functions

    Bilingual dictionaries are divided into reception-oriented and production-

    oriented, depending on whether they emphasise helping the user to either grasp

    the meaning of foreign words (reception), or to find the proper way to express

    ideas in the foreign language (production).

    Our dictionaries, having Estonian headwords and explanations, would be

    seen first of all as reception-oriented. However, in order to produce Estonian, it

    is not enough if one knows the word in its lemma form one has to create the

    proper inflectional form, and this may be difficult even for a native Estonian,

    especially if the word is rarely used outside the specialised field. By providing

    inflectional forms for every lemma, the dictionaries thus also appear to be

    372 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • production-oriented. It seems appropriate to provide the users even more

    information for language production.

    In addition to the inflectional forms of the headword, the dictionaries

    should also show the inner structure of the compound words, for example

    sulamis_temperatuur melting temperature. Estonian is similar to German in

    the sense that word compounding by concatenation is very productive, and the

    proportion of compounds in specialised language is even higher than in general

    language, where more than 12% of the tokens in a running text are compounds

    (Kaalep 1997). For a language learner, the information on word structure is

    helpful in understanding and memorizing.

    The presentation of the headword and its inflectional forms should reflect

    pronunciation. Estonian orthography is not entirely phonetic the irregular

    stress and duration of syllables are not marked in writing. The special symbols

    used in the headword and its inflectional forms (which often have a different

    pattern of syllable durations) would serve to help the user in acquiring the

    language.

    Neither the word structure nor the pronunciation is normally explicated

    in specialised dictionaries. Moreover, the fully-fledged presentation of the basic

    inflectional forms of the headword is new in Estonian specialised lexicography

    traditionally, Estonian dictionaries employ a number referring to the

    inflectional paradigm instead. The user has to look it up in a separate

    morphology section of the dictionary and use his language competence to infer

    the inflected forms of the current word, based on analogy with the example

    paradigm in the morphology section.

    4.3 Conceptual functions

    A specialised dictionary should not limit itself to merely defining terms and

    presenting their translations. The terms represent concepts of a particular field,

    and they are connected in various ways: part whole, tool product, actor

    impact etc. In chemistry, for example, we have chemical substances, the

    reactions they are involved in and the way they are described via chemical

    formulas, all closely interconnected. The relationships between chemical

    substances are often expressed in explanations, for example, metaan . . . lihtsaim

    susivesinik valemiga CH4 . . . methane . . . the simplest hydrocarbon which

    formula is CH4 . . . (Tamm et al. 2005: 78).

    A dictionary should explicate the existence of the relationships so that the

    user gets a better understanding of the subject. (Note that sometimes the

    conceptual structure is considered so important that the entries are presented

    systematically, according to their positions in the subject field, and not

    alphabetically, thus sacrificing the ease of searches to conceptual clarity.)

    If the entries are sorted alphabetically, then conventionally the connections

    between them are made explicit via cross-references, which occupy a special

    Specialised Dictionaries for Foreign Language Learners 373 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • position in the dictionary microstructure and are signalled by special phrases

    like see also or symbols like arrows. However, we need not limit our con-

    nections to these cross-references only.

    The specialised dictionaries have definitions, and if a definition contains

    a word that is also a term in this subject field, and is thus presented as a

    headword in the same dictionary, this word should be marked. This marking is

    often used in encyclopaedias and terminological dictionaries (see Dancette and

    Rethore 2000), including ISO standards (e.g. ISO/IEC 2382 Information

    Technology Vocabulary), making it easier for the users to understand the

    definition and also saving time if they have to look up the meaning of this

    word they know whether they will find it in the current dictionary or should

    look for it elsewhere.

    There are concepts for which synonymous terms are used, for example,

    naatriumkloriid keedusool sodium chloride common salt (Tamm et al.2005: 87). Whatever the reason for this synonymy, and whether it is

    theoretically seen as a source of misunderstanding and mis-conceptualisation

    (a view held by traditional terminologists (Erelt 1982, Kull 2000, ISO 704

    2000)), or as actually helping communicate the ideas (Saari 1980, Temmerman

    1997), the existence of synonymous terms should be clearly communicated to

    the dictionary user.

    Thus, it was decided that whenever a concept is expressed using synonymous

    terms, they should all be listed in one entry, together with the explanation of

    the concept and the Russian term (together with its synonyms). Naturally, the

    synonyms are also listed as reference entries, without the explanations and

    Russian terms, in order to simplify searches, while still imposing the concept-

    oriented view of the subject field.

    4.4 Microstructure

    Based on the considerations above, our dictionaries contain the following

    information types in the main entries and reference entries. A main entry in our

    dictionary represents one concept and contains up to 8 fields:

    term [declined or conjugated forms of the term] SUBJECT FIELD

    (synonymous term) definition or explanation examples. See alsorelated term Russian translation(s) of the term.

    Let us look at an example from a mathematics dictionary (Abel and

    Lepmann 2005).

    eg.iptuse k.olm_n.urk [-nurga, -n.urka, -n.urki e -n.urkasid]

    GEOMEETRIA taisnurkne kolmnurk, mille kulgede pikkused on 3,4 ja 5 uhikut. echneqpih qoercmj{lhi

    374 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • (Egyptian triangle - [singular genitive, singular partitive and two forms of

    plural partitive case] GEOMETRY right-angled triangle where the sideshave lengths of 3, 4, and 5 units Russian term

    Field 2 contains the following forms: for declinable words, word-forms in the

    singular genitive, singular partitive and plural partitive case; for verbs, word-

    forms in the infinitive and indicative present first person singular forms. Parallel

    forms are also given. The forms are needed to show the inflectional pattern of

    the word. Special marks are used to explicate word structure and pronunciation

    in both fields 1 and 2, denoting the extra long and irregularly stressed syllables,

    and to mark the border between the parts of a concatenated compound.

    Fields 3 (subject field label), 4 (synonym), 6 (examples) and 7 (reference to

    a related term) are optional. Field 3 (subject field label, for example

    ALGEBRA in mathematics) is used in only some of the dictionaries.

    Field 5 may contain words that are defined as terms in the same dictionary.

    In that case, the words are marked with an asterisk () to show the user thatthey may be looked up in the same dictionary.

    If a term has more than one meaning, then the meanings are differentiated

    using numbers.

    A reference entry serves to guide the user from a synonymous term to the

    main entry and contains 3 fields:

    term 5declined or conjugated forms of the term4 See main term.

    5. Workflow

    The stages in compiling one dictionary and the average effort in man-hours

    were as follows:

    (1) Compiling the list of terms: 100 hours.

    (2) Reviewing the list: 15 hours.

    (3) Changing the list: 20 hours.

    (4) Writing explanations, examples, and Russian equivalents: 500 hours.

    (5) Content editing of the dictionary: 100 hours.

    (6) Computer-based composition of the list of words in explanations, com-

    parison of the list with the frequency dictionary of the Estonian language

    and marking the terms in the explanations with asterisks: 5 hours.

    (7) Rewriting the explanations excluding difficult words, including some

    new terms: 40 hours.

    (8) Computer-based composition of the Russian-Estonian glossary and

    marking the terms in the explanations: 1 hour.

    (9) Reviewing the manuscript of the dictionary: 40 hours.

    (10) Considering the remarks of the reviewers and language editing:

    80 hours.

    Specialised Dictionaries for Foreign Language Learners 375 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • (11) Computer-based composition of the glossary, marking the terms in the

    explanations and adding inflectional base forms of the term together with

    special marks for explicating word structure and pronunciation: 40 hours.

    (12) Approval of the dictionaries by the Textbook Approval Committees of

    the Ministry of Research and Education. Considering the remarks of the

    Committee and computer-based analysis as in point 11 if needed.

    (13) Adding the illustrations and converting the dictionary to its final layout.

    (14) Printing and distributing the dictionary.

    (15) Making the dictionary available on the Internet: 40 hours.

    Steps 6, 8, and 11 were performed by IT specialists, so the dictionary compilers

    did not have to worry about these. In step 8, the Russian-Estonian glossary

    included Russian terms in alphabetical order and an Estonian translation (or

    translations) of each term for example, ogklmfelhe 1 paljundamine 2paljunemine 3 sigimine (Toom and Teller 2005: 181). (The Russian term

    replication had three translations in Estonian.) Intensive use of computers

    gave the compilers more time to concentrate on stages 1-5, 7 and 10, which

    were decisive in ensuring the quality of the dictionaries.

    The software used at different stages and by different dictionary compilers

    was heterogeneous. Some compilers used a customised version of a relational

    database, Microsoft Access, made by Arvi Tavast, while others used off-the

    shelf consumer products like word processors and spreadsheets, and exported

    their data to the database at some stage. The language processing tools were

    run on Unix and Linux. The exchange format between the different stages and

    programs was XML. This proved convenient for the textual data, but

    illustrations and some mathematical equations had to be treated in an ad hoc

    way, using a mechanism of placeholders. Non-textual data was inserted only in

    the final layout version of the dictionary.

    6. The tasks of the lexicographers

    6.1 Choosing the terms

    Specialised dictionaries are in many ways different from general-purpose

    dictionaries, which represent the most prototypical field of study in lexico-

    graphy. When creating a general-purpose dictionary, a lexicographer starts

    from words and tries to explain or translate their meanings, thus following a

    form-based semasiological approach. However, terminology theory dictates

    that when compiling a specialised dictionary, the author (a terminologist)

    should start from meanings (precisely defined concepts in the subject field) and

    find the words (terms) that correspond to these concepts, thus following the

    concept-based, onomasiological approach. If we view the process of dictionary

    making as similar to creating a database, we may say that the lexicographer

    376 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • regards the word as the primary key, while the terminologist regards the

    concept as the primary key.

    This difference in the views of the lexicographer and terminologist is justified

    by the fact that a word in a human language can have multiple fuzzy meanings,

    while the terminologist strives to arrive at terms that represent precise concepts,

    that is, unique and unambiguous meanings. The task of the terminologist can

    be manageable only as long as he is dealing with a limited domain (chemistry,

    mathematics etc.).

    In reality, the working habits of lexicographers and terminologist are not as

    different as the theoretical considerations would suggest: both lexicographers

    and terminologists work in a way that often combines elements of both

    semasiological and onomasiological approaches (Bowker 2003: 155).

    The dictionaries were designed to support the reading of subject texts and

    learning school subjects in Estonian. For this reason, we composed the lists of

    terms in accordance with the school textbooks. Terms were sought in all the

    texts, and especially the words in bold print were considered for inclusion in the

    list of terms. Some textbooks contained glossaries of terms; these were used as

    well. However, for some subjects for example, music and handicraft, the

    textbooks did not cover all the necessary topics. In this case, the authors of our

    dictionaries had to rely on their experience in choosing the terms. Proper nouns

    were not used as terms except in the historical dictionary.

    The dictionaries were in Estonian and the authors were Estonian-speaking

    university lecturers or schoolteachers. At the same time, the dictionaries had to

    be used by Russian-speaking students. For this reason, the lists of terms were

    anonymously reviewed by the teachers of the Russian-speaking school. The

    remarks from these reviewers were considered by the authors of the lists of

    terms. The composed lists of terms were put on a web site for examination by

    the authors of other subjects.

    6.2 Generalapproach to composingexplanations

    In our project, we had to remember first, that the dictionaries were meant for

    pupils, and second, that the dictionary compilers were specialists in their

    subject fields and pedagogy, but not professional lexicographers. Thus, it was

    important to give them as much instructional help and feedback by reviewing

    their work as possible, as well as use some natural language processing methods

    to provide additional feedback.

    The biggest part of composing a dictionary is writing explanations along

    with finding examples or illustrations. In our dictionaries the definitions had to

    be scientifically correct, but not include small details that usually make

    definitions difficult to understand. The explanations had to be short we were

    composing dictionaries and not encyclopaedias. Most of the explanations were

    written according to the ideas of the real definition: explanations included the

    Specialised Dictionaries for Foreign Language Learners 377 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • general concept and the special characteristics of the concept being defined.

    For example, the term Illegal was defined as a person who has no right to be

    in the country (Piir 2005: 24). At the same time, the general term person was

    not explained in the dictionary because it is among the 1,000 most frequent

    words in the Estonian language (Kaalep and Muischnek 2002: 143) and was

    considered familiar to the readers of the dictionary.

    In some cases, the authors explained terms using illustrations. For example,

    the term worm gearing was explained using Figure 1.

    Examples were also used to make the concepts more understandable for

    readers. For example, in the dictionary of music, the term stage music was

    illustrated by concrete examples: opera, operetta, musical and ballet (Leppoja

    2005: 40).

    6.3 Rules forclear writing

    The specialised dictionaries were composed for foreign learners who are less

    experienced with different sentence constructions in Estonian and have a

    smaller vocabulary than native speakers. Therefore, the authors of the

    dictionaries were asked to follow the rules for clear writing. The rules were

    first formulated by Rudolf Flesch (1946) and they are intensively used in

    textbooks, newspapers etc. Research has shown evidence that using the rules

    for clear writing facilitates learning in most cases (Klare 1963, Mikk 2000).

    There are three groups of rules (Mikk 2000: 157 198):

    (a) avoid complicated sentences,

    (b) prefer familiar words,

    (c) avoid abstract words.

    Figure 1: Explanation of the term worm gearing in the dictionary of

    handicraft, domestic science, manual training and technology (Peedisson et al.

    2005: 141).

    378 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • The specific rules of every group were explained in detail to the authors.

    Complicated sentences can be avoided if unnecessary words are excluded, the

    passive voice is changed to the active, prepositional phrases are replaced by

    simple sentences, etc. The words outside the 10,000 most frequent Estonian

    words as well as unknown scientific terms and long words were not recom-

    mended for use in the explanations. It was recommended to the authors that

    they should prefer words denoting objects perceivable by the senses and add

    examples to the explanations.

    Some numerical values of acceptable complicacy were introduced. For

    example, the sentence length should not exceed 11 13 words, there should be

    no more than three words between grammatically connected words in a

    sentence; and there should be three to nine concrete words for every abstract

    word in the explanations.

    6.4 The process ofcomposingexplanations

    The initial writing of the dictionaries was carried out in four stages. The

    authors submitted definitions for ten terms, then they sent explanations for the

    first third, then the second, and the last third. After the first submissions,

    the team leader checked the quality of the explanations. The typical mistakes in

    the first drafts were as follows:

    (1) The general term was at the end of the explanation; there were more than

    three words between the term being explained and the general term.

    (2) The specific characteristics of the explained term were insufficient or

    missing altogether.

    (3) The words in the explanations were difficult to understand.

    (4) There were too many words and details in the explanations.

    The options for writing more understandable explanations were discussed

    with the authors and afterwards they used them as a guide.

    The dictionaries were then put on a web page and every author had the

    chance to look at how the other authors had explained their terms.

    6.5 Reviewing themanuscripts

    The manuscripts were reviewed by experienced teachers in Russian-speaking

    schools. These teachers were best placed to decide if the dictionaries suited their

    students needs. The reviewers were briefly informed about the aims of the

    dictionaries. The reviewers were asked to assess

    (1) whether all the main terms of the subject were explained in the dictionary,

    (2) whether the explanations were scientifically correct, and

    Specialised Dictionaries for Foreign Language Learners 379 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • (3) whether the explanations were understandable for 7th 9th grade

    students in Russian-speaking schools.

    The reviewers did not know the names of the authors and the authors did not

    know the names of the reviewers during the review process. The reviewers made

    valuable recommendations about the selection of the terms, the formulation of

    the explanations and the translations of the terms. The authors considered the

    recommendations.

    6.6 Examples ofexplanations

    The entries are given in Estonian and translated into English.

    Example 1.

    fuusiline mina hfuusilise mina, fuusilist mina, fuusilisi minasidi inimeseettekujutus oma fuusilisest valimusest ja kehast. shghvepi~ioqhl pmapqbellmcm "~"; ioqhl pmapqbellmcmshghvepimcm "~" (Koiv 2005: 15).(physical ego h(genitive and partitive case in the singular and partitive inthe plural)i human beings perception of his or her physical looks andbody. (two synonyms of the Russian term).

    The explanation is a short one as are most of the explanations in the

    dictionaries. The verb on is is omitted after the explained term. The general

    term perception is not far from the explained term if not considering the

    conjugated forms.

    Example 2.

    l.iit_s.ilm h-silma, -s.ilma, -s.ilmi e -s.ilmasidi silm, mis koosneb tuhan-detest osasilmakestest, neist igauhes on tilluke laats. Liitsilmad on paljudellulijalgsetel, millega nad tajuvad esemete kuju, liikumist ja varvust. kiilidel, mesilastel, karbestel. pjmflz cjg (Toom and Teller2005: 70).

    (compound eye h(genitive and partitive case in the singular and two formsof partitive in the plural)i: eye, consisting of thousands of partial eyes everyone of which has a tiny lens. Many arthropods have compound eyes by

    which they perceive the shape, movement, and colour of objects.

    dragonflies, honeybees and flies have compound eyes (Russian term))In this example, the general term is silm eye. It has an asterisk noting that

    the general term is also explained as a concept in the dictionary. For a better

    understanding of the concept compound eye, it is said who has compound

    eyes in an additional sentence and concrete examples of the familiar living

    beings with compound eyes are named.

    380 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • Example 3.

    president aalne valitsemis_sust eemhpresident aalse -susteemi, presi-dent aalset -sust eemi, president aalseid -sust eeme e president aalseid -

    sust eemisidi riigikord, kus valitsus vastutab presidendi ees. AmeerikaUhendriigid. Vastand parlamentarism noegh elqpi~ smoknobjelh~ (Hallik 2005: 105).(presidential system of government h(genitive and partitive in the singularcase and two forms of partitive in the plural)isystem of government inwhich the government is responsible before the president. UnitedStates of America. Antonym parliamentary government (Russianterm))

    We aimed to avoid value judgements in the explanations; however, one can

    find them on carefully reading the dictionary. For example, in the last

    explanation, the positive or negative image of the United States may be

    transferred to presidential system of government.

    7. Using linguistic software

    One reason for the absence of a certain type of information in a dictionary is

    that adding it would be an extra effort for the compiler, especially when done

    manually.

    According to the anticipated linguistic and conceptual functions of the

    dictionaries, presented in section 3, the following three types of information

    were considered worth including:

    (1) The inner structure of the concatenated compound terms and special

    marks for explicating pronunciation.

    (2) Inflectional forms of the terms.

    (3) Asterisks before the terms in the explanations to denote that they can be

    looked up in the same dictionary.

    We could manage the workload resulting from adding this information

    rather easily because we could use linguistic software: a morphological analyser

    and synthesiser (Kaalep and Vaino 2001). The program can be used online at

    http://www.filosoft.ee/. The Estonian spell-checker is a stripped-down version

    of this program, used for example in the Estonian version of Microsoft Office.

    The same program is also used as one of the modules in an Estonian text-to-

    speech synthesiser (about Estonian speech synthesis, see also

    http://www.esis.ee/ist2000/IT/ioc/speech.html, http://www.dialog-21.ru/

    materials/archive.asp?id7047andy2001andvol6078). This program wasused to automatically generate the basic inflectional forms of the headwords,

    and adding the marks for pronunciation and compound word structure.

    Specialised Dictionaries for Foreign Language Learners 381 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • This software had been designed to broadly cover general language and even

    included the potential to use various heuristics to guess the inflectional forms of

    words missing from its lexicon.

    The process of formulating the inflected forms of the headwords, however,

    was not as smooth as hoped. A dictionary headword is not always a single

    word in its lemma form. There are multi-word terms, non-Estonian words (e.g.

    forte), head-words in plural (e.g. imetajad mammalia), and combinations of

    all these exceptions. It was handling these exceptions that resulted in most of

    the effort spent on the automatic conversion of the dictionaries.

    There are words that have restrictions on their set of inflectional forms for

    example, there are no plural forms of postimpressionism (Hallik 2005). These

    restrictions are of a semantic nature. According to the rules of the language, the

    plural forms for both stiil style (Parmasto et al. 2005) and gooti stiil Gothic

    style (Hallik 2005) are plausible. It is up to the lexicographer to decide that the

    plural partitive forms of gooti stiil should not be given in the dictionary on the

    grounds that they would sound unnatural, being never used in this particular

    subject field.

    The same third-party morphological program was also essential in the

    automatic tagging (with asterisks) of words in the explanations that belong to

    the list of headwords in the same dictionary. Because of the inflective nature of

    Estonian, words in an Estonian text usually occur in forms other than the

    lemma. In order to be able to establish that there is a link between a word-form

    and a headword, one should first lemmatise the inflected word, and this is the

    task of a morphological analyser. Once the word-form is lemmatised, it is a

    trivial task to spot it in the list of headwords and add the asterisk.

    We wanted to keep the complexity of the explanations under control. One

    way of doing this is by controlling the vocabulary that is being used: rare and

    complex words should be avoided. Thus, the task of the computer was to come

    up with this list of unwanted words. Using the same lemmatising tool, it was

    easy to create the list of all words used in the explanations. Then this list was

    compared with the list of head-words of the same dictionary, as well as with the

    list of the 10,000 most frequent words from a frequency dictionary of Estonian

    (Kaalep and Muischnek 2002). The aim of this comparison was to find words

    that are not in either of these lists. It would be sensible to assume that those

    words would be difficult and should be either avoided in the definitions, or

    added as new head-words and explained. The dictionary compilers received

    these lists during a review of their drafts.

    8. Some characteristics of the dictionaries

    The dictionaries contain altogether 12,373 concepts represented by 14,170

    terms. The dictionary of history is the biggest with 1,590 concepts represented

    by 1,897 terms; and the smallest is the dictionary of human studies with 361

    382 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • concepts represented by 408 terms. Table 1 gives an overview of the volumes of

    the dictionaries, in terms of concepts and terms. It also provides information

    about two problematic issues in specialised lexicography: multi-word terms and

    terms with multiple meanings that is, terms used to denote several concepts.

    The dictionary of chemistry (Tamm et al. 2005) has the largest number of

    terms per concept 1.23, but the dictionary of history (Hallik 2005) is the only

    one to contain a concept with as many as 7 synonymous terms: teorent,

    teotoo, teoorjus, teokoormis, teokohustus, moisatoo, moisategucorvee, villeinage the obligation of a peasant to work for the landlord for

    free. The smallest term-concept ratio is in the dictionary of physical education

    (Hein 2005) 1.03.

    8.1 Multi-word terms

    The ratio of multi-word units among terms ranges from 9% in the dictionaries

    of art (Parmasto et al. 2005) and human studies (Koiv 2005) to 50% in the

    dictionary of mathematics (Abel and Lepmann 2005) and 36% in physics

    (Partel 2005). This is yet another confirmation of the observation that:

    Another striking characteristic displayed by current specialized dictionaries is

    the importance they give to complex terms (LHomme 2007).

    The first thing one notices is the variability of the syntactic patterns of multi-

    word terms, posing problems for various tasks of automatic natural language

    Table 1: Concepts and terms in the dictionaries

    Subject concepts terms (incl. multi-

    word terms)

    polysemous

    terms

    Art 777 950 (88) 3

    Biology 1384 1523 (183) 21

    Chemistry 1151 1410 (340) 4

    Geography 1213 1424 (200) 1

    History 1590 1897 (564) 63

    Human studies 361 408 (35) 0

    Handicraft, domestic

    science and industrial arts

    1343 1455 (194) 17

    Mathematics 1231 1422 (716) 6

    Music 660 728 (107) 2

    Physical education 1003 1031 (156) 12

    Physics 959 1133 (410) 12

    Social science 701 789 (184) 1

    Total 12373 14170 (3177) 142

    Specialised Dictionaries for Foreign Language Learners 383 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • processing (Sag et al. 2002), including the task of generating the main

    inflectional forms in our dictionaries.

    In our dictionaries, all the multi-word terms are noun phrases. The longest

    contain 5 words, and can be found in the dictionaries of history (Hallik 2005)

    and social science (Piir 2005), for example Inimese ja kodaniku oiguste

    deklaratsioon Declaration of the Rights of Man and Citizen the declaration

    approved by the National Assembly of France in 1789. There are 8 multi-word

    terms with 5 components, 49 with 4 components, 340 with 3 components and

    2,780 with 2 components.

    In addition to variations in length, multi-word terms exhibit a considerable

    variety of linguistic patterns, their syntactic structures and rigidity. Almost

    50% of the multi-word terms in our dictionaries are adjective-noun pairs,

    both components inflecting freely for example, kunstlik viljastamine artificial

    insemination (Toom and Teller 2005). Nearly 40% are noun-noun pairs,

    where the first noun is frozen in genitive case, thus being an attribute

    syntactically for example, hulga element element of a set (Abel and Lepmann

    2005). Approximately 7% of the multi-word terms have more than one

    adjective (which inflects freely) or attribute (which is frozen) for example,

    kangi tasakaalu reegel law of the equilibrium of the lever (Partel 2005),

    laatse optiline peatelg optical axis of a lens (Partel 2005) and otsene rahaline

    toetus direct financial support (Piir 2005). Approximately 3% exhibit a

    linguistic structure different from the three previous groups for example,

    parameetriga vorrand equation with a parameter (Abel and Lepmann 2005),

    inglise stiilis park park in the English style (Hallik 2005) and suurem voi

    vordne greater than or equal (Abel and Lepmann 2005). Finally, 3% of the

    multi-word terms are non-Estonian words for example, concerto grosso

    (Leppoja 2005).

    The high proportion of adjective-noun pairs is very different from the

    term patterns in English and French, where the noun-noun (or noun de noun)

    pairs have been found to be twice as frequent as adjective-noun (or noun-

    adjective) pairs (Gaussier 2001). This is because Estonian uses the German-type

    concatenative compounding for noun-noun pairs for example, laser printer.

    The high proportion of multi-word terms in a specialised dictionary may

    invoke doubts about the soundness of the compiling principles: In specialized

    dictionaries, a very large portion of complex nouns has a compositional

    meaning (e.g. a laser printer is a printer that functions with a laser. . .)

    (LHomme 2007). (This may be an unfortunate example, in the sense that

    laser printer is also a headword in a general language dictionary (Pearsall

    1999), and that Estonian (laserprinter), German (Laserprinter) and Swedish

    (laserskrivare) all agree that laser printer really deserves a name of its own.)

    One may, however, similarly ask whether kolmnurga pindala area of a

    triangle (Abel and Lepmann 2005) stands for a unique concept, or is it just a

    composition of triangle and area.

    384 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • Mathematics is largely about calculating. In geometry, the conceptual system

    is based on geometric shapes and forms (triangles, cylinders etc), each of which

    has a different formula for calculating its characteristics such as volume or

    area. So the dictionary contains terms like area of a triangle and volume of a

    cone, the definitions of both contain the formulas for calculating their values.

    It is this formula that justifies the inclusion of area of a triangle in the

    dictionary, and the absence of a similar formula that results in not including,

    for instance, area of a polygon as a term.

    However, if one is not concerned about the exact method for calculating the

    areas of geometric shapes and forms, the meaning of the expression area of a

    triangle looks truly compositional. We see that the compositionality of the

    meanings depend on the angle or closeness with which we view the concepts

    involved.

    8.2 Polysemy

    In addition to the 142 instances when a term represents several concepts in one

    dictionary, there are terms that represent concepts in several fields. There are

    1,011 terms that are explained in two or more dictionaries: one in six, two in

    five, 13 in four, 129 in three and 866 in two dictionaries.

    The most popular term in the dictionaries is periood period, explained as a

    certain time span in physics (Partel 2005), human studies (Koiv 2005) and art

    (Parmasto et al. 2005), one horizontal line in the Mendeleyev table in

    chemistry (Tamm et al. 2005), the smallest musical unit of a completed part in

    music (Leppoja 2005) and a group of repeating numbers after the decimal

    point in a decimal fraction in mathematics (Abel and Lepmann 2005). We see

    that if a term is met in several dictionaries, it may denote a concept that is

    common to several subject fields, or that it denotes quite different concepts.

    Systematic polysemy is not highlighted in our dictionaries. Examples of

    systematic polysemy would be a group of musicians and the composition for

    such a group for example, kvartett quartet (Leppoja 2005), a type of

    performance and the composition for this performance for example, kontsert

    concert (Leppoja 2005). In mathematics, the terms loik line, korgus height

    and so on, denote geometric entities, as well as their lengths. Both meanings

    have been noted in the definitions in Abel and Lepmann (2005) for example:

    trapetsi k orgus [- k orguse, - k orgust, - k orgusi e - k orguseid]

    GEOMEETRIA trapetsi aluste vaheline ristloik (voi selle ristloigu pikkus).

    bzpmq qoneuhh(height of a trapezoid h(genitive and partitive case in the singular and twoforms of partitive in the plural)i: GEOMETRY perpendicular line between thebases of a trapezoid (or its length). (Russian term))

    Specialised Dictionaries for Foreign Language Learners 385 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • 9. The use of dictionaries

    It is widely acknowledged that using dictionaries facilitates learning a foreign

    language. For example, the experiments carried out by Susan Knight (1994:

    295) revealed that students who used dictionaries learned more words and their

    text comprehension level was higher. A dictionary is especially needed for low

    verbal ability students (Knight 1994: 295), because guessing word meanings is

    not a method to be encouraged for them in foreign language learning. For

    students at an advanced level in a foreign language, however, the use of bilin-

    gual dictionaries has been found to have no effect on vocabulary acquisition

    (De Ridder 2002).

    Bilingual dictionaries can be used more often than monolingual, but

    monolingual dictionaries can facilitate the development of more strategies for

    learning a foreign language (Hsien-jen 2001).

    The dictionaries compiled facilitate learning in different ways. The

    dictionaries

    (1) present the spelling and the pronunciation of terms,

    (2) give the main inflectional forms of the terms,

    (3) introduce the meaning of the term with examples or illustrations if

    needed,

    (4) give the Russian translation of the terms,

    (5) include a Russian-Estonian glossary of terms.

    Maie Soll (personal communication in October 2007) has asked seven

    principals of Russian-speaking schools about the use of the dictionaries. All of

    them were very positive, except one who did not know that such dictionaries

    were available at the school. The dictionaries are used during the lessons in all

    or many subjects for which the dictionaries are available.

    Teachers have experienced that students consider the dictionaries very useful

    for different reasons. The students are positively surprised that they can find all

    the terms of a subject in one dictionary. They use the dictionaries with pleasure.

    The students are interested in seeking the meaning of new words and they

    readily use the Russian-Estonian glossary of terms.

    Marje Peedisson conducted a survey to find what Russian-speaking

    schoolteachers (20 persons) thought about the dictionaries (Peedisson 2006).

    The teachers of handicraft and domestic science considered the dictionaries

    very useful for students because:

    (1) textbooks for this subject are only in Estonian,

    (2) the knowledge of Estonian terminology helps students integrate into

    Estonian society,

    (3) the subject includes many difficult terms which are missing in general

    dictionaries,

    386 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • (4) many school subjects have dictionaries in Estonian, but handicraft and

    domestic science had no dictionary until the dictionary in this project was

    published.

    The teachers found the dictionaries very useful for themselves as well, since:

    (1) they have to learn to teach the subject in Estonian,

    (2) the dictionaries are systematic and easy to use,

    (3) the dictionaries prevent the creation of self-made terms.

    The dictionaries were meant for the Russian-speaking minority, but they are

    also used successfully by a group of Spanish-speaking children who live in

    Estonia and study at an Estonian-speaking school. The Spanish-speaking

    children use the dictionaries when they do not understand texts first of all in

    textbooks of physics and chemistry (Soll, personal communication in 2007).

    The children have no use for Russian terms and a Russian-Estonian dictionary;

    however, all foreign speaking students in Estonian-speaking schools can use the

    main part of the dictionaries the linguistic information about the terms and

    the explanations of the terms.

    The users have identified that there are not enough copies of the dictionary

    in the classroom. The number of printed dictionaries was restricted by the

    project budget. Six dictionaries were printed in 1,000 copies and the other six in

    2,000 copies. Only large schools received 20 copies of the last group; that is, a

    dictionary for every two students in a classroom. For normal classroom work,

    a dictionary is needed for every student.

    The compilation of the dictionaries has been a success for the Ministry of

    Education and Research in Estonia, and the ministry is hoping for the funds to

    compose analogous dictionaries for high school level. The dictionaries are

    useful not only for Russian-speaking schools, but for Estonian-speaking

    schools as well because some students from Russian-speaking families are

    studying in Estonian-speaking schools and they need the dictionaries.

    10. Online versions

    The number of copies of the dictionaries was insufficient to satisfy the needs of

    all the interested people. Therefore, the dictionaries were put on a web page for

    use by all people in Estonia free of charge.

    The dictionaries are available online at an Estonian dictionary portal http://

    www.keeleveeb.ee. The portal contains links to tens of mono-, bi- and multi-

    lingual dictionaries.

    It is generally known that online dictionaries differ from their paper

    counterparts in several ways. The advantages of electronic dictionaries have

    been well advertised: an online dictionary may be much larger, without the user

    noticing any associated negative effect (like increased search time); it may

    Specialised Dictionaries for Foreign Language Learners 387 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • contain multiple external links; and most importantly, advanced search oppor-

    tunities, including (a fuzzy) search for definitions and examples as well.

    The search facilities in an electronic dictionary make it possible to retrieve

    several matching entries at once (e.g. entries containing the search word as a

    part of a multi-word expression). This is especially true for specialised,

    terminological dictionaries: a great percentage of the terms are multi-words,

    and retrieving them automatically can be seen as one of the major strong points

    of electronic dictionaries. Now, this increases the amount of text retrieved as a

    result of a simple dictionary query, and we face the problem of displaying the

    results in a comfortable way.

    Namely, as a disadvantage, one should note the clumsiness of a quick

    browsing of the dictionary contents, if the search has returned a long entry

    (or several entries), so that part of the information is hidden from the screen.

    There are two options that could be presented to the user in this situation: first,

    a scroll bar, so that the user can scroll lengthy documents, or alternatively,

    the output can be divided into smaller pieces that are presented to the user

    as a list of clickable items, thus eliminating the need for scrolling. According to

    a user review, both of these choices are evenly preferred by users, resulting in

    a draw (50-50) (User needs 2005: 10).

    We decided to present the results from one dictionary as a scrollable page.

    The motivation behind our decision was that as Keeleveeb displays the results

    from every dictionary under a separate clickable tab, we would like to save the

    user from too much clicking, and we actually expect the search results from

    every single dictionary to be rather short.

    We also decided to keep the presentation of the dictionary entries as similar

    to the paper originals as possible. We trust the professional decisions made by

    the publishing house about the layout, fonts and extra symbols used. Only a

    few changes were necessary in order to present the electronic version because

    one cannot expect all users to have the same rich set of different fonts on their

    home computers as those used for printing the dictionaries.

    Searching for a headword in a dictionary is actually not a trivial task. It is

    true that once the user asks for a word, standard algorithms can retrieve the

    corresponding entries very effectively. A problem arises if the user does not ask

    for exactly the same form that is used as a dictionary headword. In the case of

    an inflective language, with German-type compounds, it is difficult for a user to

    guess what the exact form of the queried word should be.

    For example, a search for the word raud iron would yield, among others,

    the following results from different dictionaries:

    Chemistry:

    r aud [raua, r auda, r audu e r audasid] ANORGAANILINE KEEMIA keemiline

    element jarjenumbriga Z 26; praktikas vaga laialdaselt kasutatavkeskmise aktiivsusega metall. fejegm (Tamm et al. 2005).

    388 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • (iron [genitive and partitive case in the singular and two forms of the

    partitive in plural] INORGANIC CHEMISTRY element with sequence number

    Z 26; a metal with medium chemical activity, very widely used inpractical applications. Russian term)r audbet oon [-betooni, -bet ooni, -bet oone e -bet oonisid]RAKENDUSKEEMIA raudarmatuuriga (vardad, vork) tugevdatud betoon. fejegmaeqml (Tamm et al. 2005).(ferroconcrete [genitive and partitive case in the singular and two forms of

    the partitive in plural] APPLIED CHEMISTRY concrete strengthened with iron

    (bars, net) Russian term)History:

    raua ja vere poliitika [- poliitika, - poliitikat] Otto von Bismarcki

    seisukoht, et maailma asju otsustatakse raua ja verega, st sojaga.

    nmjhqhi fejeg h iombh (Hallik 2005).(iron and blood policy [genitive and partitive case in the singular] the

    view of Otto von Bismarck that the decisions in the world are made by

    iron and blood, i.e. by war Russian term)The main question in a retrieval algorithm is how to make the outcome

    intuitively sensible for the user. A good search would give the best match as the

    first one in the list of answers, and the less likely ones should follow it. Only in

    the absence of good matches should the user get some worse ones; otherwise,

    they should not be retrieved at all.

    Our algorithm returns the entry that matches the headword exactly (if found)

    as the first result. A list of alphabetically sorted entries with headwords that

    contain the search word (in any inflectional form, case insensitive) as a com-

    ponent of a compound word or a multi-word unit then follows. See Figure 2.

    In the previous example, the first result is the definition of raud iron, the

    second result is a compound word raudbetoon ferroconcrete, reinforced

    concrete and the third result is a multi-word unit, containing the word raud in

    its genitive form raua.

    We decided not to add the option of querying for words in the definitions or

    examples. We believe that such queries would not give any relevant

    information for the user because while compiling the dictionaries, the authors

    did not foresee this as a possibility; they compiled the dictionaries so that all the

    useful information could be extracted by searching via the headwords. Adding

    useless options would just add unnecessary complexity to the interface.

    11. Conclusion

    The task of creating a dictionary always involves the challenge of getting the

    most out of the available resources.

    Specialised Dictionaries for Foreign Language Learners 389 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • In specialised dictionaries for schools, it is often difficult to find a compiler

    who is at the same time competent in the subject field, pedagogy and

    lexicography. We concentrated our efforts on helping the dictionary compilers

    with pedagogical and lexicographic issues, assuming that their subject field

    knowledge was good enough. This guidance consisted of the following:

    (1) Providing a well-defined step-by-step procedure for compiling a

    dictionary, and checking that this procedure is indeed followed.

    (2) Providing human evaluation feedback at various steps of the process.

    (3) Providing a concept-oriented database for organising the dictionary

    content.

    (4) Using natural language processing tools for providing feedback on the

    vocabulary of the explanations.

    Figure 2: The result for the query raud iron, ferrum from the dictionary of

    chemistry (Tamm et al. 2005)

    390 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • (5) Separating the content-creation process from the layout creation process.

    This included automatic creation of linguistic information for headwords,

    and asterisks for denoting references in explanations.

    The components of the software that made the workflow of the project easier

    were already available; it was only necessary to combine them to suit our needs.

    The resulting dictionaries are rich in information (definitions, examples) and

    are specifically oriented towards learning Estonian, containing linguistic

    information concerning word structure, pronunciation and inflection. Using

    linguistically oriented software in the process enabled us to add an additional

    feature automatically cross-reference links, pointing from words in the

    definitions to related headwords.

    Acknowledgements

    Composing the dictionaries, printing and distribution were financed by the EU

    PHARE programme.

    The dictionaries were made available online with the support from the EU

    eContent project Eurotermbank and the Estonian national programme for

    language technology.

    We are also thankful to the authors, editors, reviewers, computer specialists,

    etc. (altogether 75 specialists) who participated in the work.

    Deep gratitude to the two anonymous reviewers whose recommendations were

    very helpful when finalising the manuscript.

    References

    A. Dictionaries

    Abel, E. and Lepmann, L. 2005. Matemaatika moisted 7.9. Klassile. Eesti-vene-eesti

    sonastik. (Concepts of mathematics for the grades 7.9. Estonian-Russian-Estonian

    Dictionary). Tartu: Univesity of Tartu Press (in Estonian).Dancette, J. and Rethore, C. 2000. Dictionnaire Analytique de la Distribution. Analytical

    Dictionary of Retailing. Montreal: Les Presses de lUniversite de Montreal.Hallik, T. 2005. Ajaloo moisted 7.9. klassile. Eesti-vene-eesti sonastik. (Concepts of

    history for the grades 7.9. Estonian-Russian-Estonian Dictionary). Tartu: Univesity

    of Tartu Press (in Estonian).

    Hein, V. 2005. Kehalise kasvatuse moisted 7.9. klassile. Eesti-vene-eesti sonastik(Concepts of physical education for the grades 7.9. Estonian-Russian-Estonian

    Dictionary). Tartu: Univesity of Tartu Press (in Estonian).

    Koiv, K. 2005. Inimeseopetuse moisted 7.9. klassile. Eesti-vene-eesti sonastik. (Concepts

    of human studies for the grades 7.9. Estonian-Russian-Estonian Dictionary). Tartu:

    Univesity of Tartu Press (in Estonian).Leppoja, K. 2005. Muusikaopetuse moisted 7.9. klassile. Eesti-vene-eesti sonastik.

    (Concepts of music for the grades 7.9. Estonian-Russian-Estonian Dictionary). Tartu:

    Univesity of Tartu Press (in Estonian).

    Specialised Dictionaries for Foreign Language Learners 391 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • Partel, E. 2005. Fuusika moisted 7.9. klassile. Eesti-vene-eesti sonastik. (Concepts of

    physics for the grades 7.9. Estonian-Russian-Estonian Dictionary). Tartu: Univesity

    of Tartu Press (in Estonian).Parmasto, A., Laur K., and Kiidron, K. 2005. Kunstiopetuse moisted 7.9. klassile.

    Eesti-vene-eesti sonastik. (Concepts of art for the grades 7.9. Estonian-Russian-

    Estonian Dictionary). Tartu: Univesity of Tartu Press (in Estonian).

    Pearsall, J. (ed.) 1999. The Concise Oxford Dictionary. Oxford, OUP.Peedisson, M., Rihvk, E., and Soobik, M. 2005. Kasitoo, kodunduse, too ja

    tehnoloogiaopetuse moisted 7.9. klassile. Eesti-vene-eesti sonastik (Concepts of

    handicraft, domestic science, manual training and technology for the grades 7.9.

    Estonian-Russian-Estonian Dictionary). Tartu: Univesity of Tartu Press (in Estonian).Piir, I. 2005. Uhiskonnaopetuse moisted 7.9. klassile. Eesti-vene-eesti sonastik.

    (Concepts of civic studies for the grades 7.9. Estonian-Russian-Estonian

    Dictionary). Tartu: Univesity of Tartu Press (in Estonian).Tamm, L., Tamm, T., and Tuulmets, A. 2005. Keemia moisted 7.9. Klassile. (Concepts

    of chemistry for the grades 7.9. Estonian-Russian-Estonian Dictionary). Tartu:

    Univesity of Tartu Press (in Estonian).

    Toom, M. and Teller, M. 2005. Bioloogia moisted 7.9. klassile. Eesti-vene-eesti sonastik.

    (Concepts of biology for the grades 7.9. Estonian-Russian-Estonian Dictionary).

    Tartu: Univesity of Tartu Press (in Estonian).

    B. Other literature

    Bergenholtz, H. and Tarp, S. (eds.) 1995. Manual of Specialised Lexicography.

    Philadelphia/Amsterdam: John Benjamins.

    Bowker, L. 2003. Specialized Lexicography and Specialized Dictionaries in P. van

    Sterkenburg (ed.), A Practical Guide to Lexicography. Philadelphia/Amsterdam: John

    Benjamins, 154164.De Ridder, I. 2002. Visible or invisible links: does the highlighting of hyperlinks affect

    incidental vocabulary learning, text comprehension, and the reading process?

    Language Learning and Technology 6(1), 123146. http://llt.msu.edu/vol6num1/pdf/

    deridder.pdf.Development of Additional Estonian-Language Study Materials 2004. Contract between

    Ministry of Finance of Estonia and Tartu University (manuscript in the University of

    Tartu).Erelt, T. 1982. Eesti oskuskeel (Estonian specialised language). Tallinn: Valgus

    (In Estonian).Flesch, R. 1946. The art of plain talk. New York/London: Harper and Brothers

    Publishers.Gaussier, E. 2001. General considerations on bilingual terminology extraction in

    D. Bourigault, C. Jacquemin, and M.-C. LHomme (eds.), Recent Advances in

    Computational Terminology. Amsterdam/Philadelphia: John Benjamins, 167183.ISO 704, 2000. Terminology workPrinciples and methods. Geneva: International

    Organization for Standardization.Hsien-jen, C. 2001. The effects of dictionary use on the vocabulary learning strategies

    used by language learners of Spanish. ERIC No ED471315.Kaalep, H.-J. 1997. An Estonian morphological analyser and the impact of a corpus on

    its development Computers and the Humanities 31: 115133.

    Kaalep, H.-J. and Muischnek K. 2002. Eesti kirjakeele sagedussonastik (Frequency

    Dictionary of Estonian Literary Language). Tartu: Tartu University Press (in

    Estonian).

    392 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • Kaalep, H.-J, and Vaino, T. 2001. Complete morphological analysis in the

    linguists toolbox Congressus Nonus Internationalis Fenno-Ugristarum Pars V,

    Tartu, 916.

    Klare, G. R. 1963. The measurement of readability. Iowa: Iowa State University.Knight, S. 1994. Dictionary use while reading: The effects on comprehension and

    vocabulary acquisition for students of different verbal abilities. The Modern

    Language Journal. 78: 285299.

    Koren, S. 1997. Quality versus Convenience: Comparison of Modern Dictionaries from

    the Researchers, Teachers and Learners Points of View. Teaching English as a

    Second or Foreign Language 2(3).Kownacki, S. 1955. Teaching Foreign Languages in Specialized Fields. Modern

    Language Journal 39(7): 351353.Kull, R. 2000. Oskuskeel ja uldkeel: erisused ja samasused (Special language and

    general language: differences and similarities) Keel ja Kirjandus 8: 545557 (In

    Estonian).Laufer, B. and Levitzky-Aviad, T. 2006. Examining the Effectiveness of Bilingual

    Dictionary PlusA Dictionary for Production in a Foreign Language. International

    Journal of Lexicography 19: 135155.LHomme, M-C. 2007. Using Explanatory and Combinatorial Lexicology to Describe

    Terms in L. Wanner (ed.). Selected Lexical and Grammatical Issues in the Meaning-

    Text Theory. In Honour of Igor Melcuk, Amsterdam/Philadelphia: John Benjamins,

    163198.Mikk, J. 2000. Textbook: Research and Writing. Frankfurt am Main et al.: Peter Lang.

    Pavelson, M. 1998. Vene noorte haridusorientatsioonid (Educational orientations of

    Russian youth), in M. Lauristin, S. Vare, T. Pedastsaar and M. Pavelson

    Mitmekultuuriline Eesti: Valjakutse haridusele. Tartu, 209224 (in Estonian).Pavelson, M. and Luuk, M. 2002. Non-Estonians on the labour market: A change in the

    economic model and differences in social capital in M. Lauristin and M. Heidmets

    (eds.) The Challenge of the Russian Minority: Emerging Multicultural Democracy in

    Estonia. Tartu: Tartu University Press, 89116.

    Peedisson, M. 2006. Kasitoo ja kodunduse moistete sonastik 7.9. klassi vene emakeelega

    opilasele: koostamise pohimotteid ja kasutamise voimalusi. (Dictionary of concepts of

    handicraft and domestic science for the Russian-speaking student in grades 7.9.). MA

    dissertation, University of Tartu (In Estonian).Piotrowsky, T. 1989. Monolingual and bilingual dictionaries: Fundamental differences

    in M. L. Tickoo (ed.), Learners Dictionaries: State of the Art. Singapore: SEAMEO

    RELC.Pujol, D., Corrius M., and Masnou J. 2006. Print Deferred Bilingualised Dictionaries

    and their Implications for Effective Language Learning: A New Approach to

    Pedagogical Lexicography. International Journal of Lexicography 19: 197215.

    Riiklik programm Integratsioon Eesti uhiskonnas 20002007 (State programme

    Integration in Estonian Society in 20002007). http://www.riik.ee/saks/ikomisjon/

    programm.htm (in Estonian).Saari, H. 1980. Omasona ja voorsona paarid eesti oskussonavaras (1) (The pairs of

    native and foreign words in Estonian specialised word stock). Keel ja Kirjandus 11:

    654666 (In Estonian).Sag, I. A., Baldwin, T., Bond, F., Copestake, A., and Flickinger, D. 2002. Multiword

    Expressions: A Pain in the Neck for NLP in: Proceedings of the Third International

    Conference on Intelligent Text Processing and Computational Linguistics (CICLING

    2002). Mexico City, 115.

    Specialised Dictionaries for Foreign Language Learners 393 at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from

  • Swanepoel, P. 2003. Dictionary typologies: A pragmatic approach in P. vanSterkenburg (ed.), A Practical Guide to Lexicography. Philadelphia/Amsterdam:

    John Benjamins, 4469.Temmerman, R. 1997. Questioning the Univocity Ideal. The difference between socio-

    cognitive Terminology and traditional Terminology. Hermes Journal of Linguistics

    18: 5190.User Needs ConsolidationRequirements specication report 2005. http://www.euro

    termbank.com/uploads/D3.1%20User%20Needs%20and%20Requirements.pdf.

    394 Heiki-Jaan Kaalep and Jaan Mikk at U

    niversitatea Transilvania on January 8, 2014http://ijl.oxfordjournals.org/

    Dow

    nloaded from