Lost in Collocation: When Arabic Collocation Dictionaries ...
Collocation Colligationn_part II
description
Transcript of Collocation Colligationn_part II
-
COLLOCATION, COLLIGATION ANDENCODING DICTIONARIES. PART II:LEXICOGRAPHICAL ASPECTS
Dirk Siepmann:Universita t-GHSiegen, Fachbereich 3, Adolf-Reichwein-Strae,D-57068 Siegen,Germany ([email protected])
Abstract
The present article starts from a broad definition of collocations as holistic
lexico-grammatical or semantic units (see Part I for full details), asking how such
units can be adequately represented in bilingual and monolingual encoding dictionaries.
It is found that an onomasiological approach to dictionary making is better suited to
this task than a semasiological, framework-based methodology whereby individual
lexicographers work on small, alphabetically classified sections of the dictionary.
Typically, semasiological dictionaries and corresponding methodologies have difficulty
in arranging items in a clear and memorable way, give patchy or inadequate coverage to
semantic-pragmatic collocations, cannot provide adequate cross-referencing between
synonymous items and are prone to translation errors. It is shown how onomasiological
dictionaries and methodologies can remedy such deficiencies. The Bilexicon project
aimed at creating thematic learners dictionaries is the main source laid under
contribution with a view to illustrating the suggestions made.
1. Introduction
There is growing recognition that both structurally simple (i.e. (bound)
morphemes, lexemes) and structurally complex units (i.e. collocations or
colligational patterns) are linguistic signs (Feilke 2003). If the dictionary is
meant to be a record of such signs, the task of the lexicographer is to gather
together evidence of both types of sign. So far it has been lexemes, non-
compositional idioms and morphemes that have received the bulk of
lexicographic attention, but the future clearly belongs to collocation and
colligation in the widest possible sense. However, most linguistic models of
collocation are too limited (e.g. Hausmann 1999), too formalist (e.g. Melcuk
1998) or too broad (e.g. Kjellmer 1994) to be readily adaptable to lexicographic
practice (see the first part of this article, IJL 18/4).
International Journal of Lexicography, Vol. 19 No. 1. Advance access publication 29 November 2005 2005 Oxford University Press. All rights reserved. For permissions,please email: [email protected]
doi:10.1093/ijl/eci051 1
-
A viable lexicographic definition of collocation can be based on the notions
of Gebrauchsnorm, or usage norm (Steyer 2000: 108), reflected in concepts
such as minimal recurrence (Kocourek 1991, Siepmann 2003) or statistical
significance (Sinclair 1991), on the one hand, and the notion of inhaltliche
Geschlossenheit or holisticity, on the other hand (Siepmann 2003). Holisticity
here refers to the facts that native speakers can ascribe meaning to general-
language collocations even if these are divorced from context and that such
units are intuitively considered as self-contained wholes. We thus arrive at the
following definition of collocation:
a collocation is any holistic lexical, lexico-grammatical or semantic
unit which exhibits minimal recurrence within a particular discourse
community.
It should also be taken to include colligation with a particular grammatical
category, such as a noun phrase. Thus, the collocations the future belongs to
(die Zukunft gehort, lavenir appartient a`) or lautoroute file would be felt to
be incomplete by most speakers, requiring as they do a prepositional object.
This variable complement is conceived of as part of the collocation.
With this definition in mind, it becomes possible to suggest a four-way
typology of collocation along the following lines (see Part I):
(a) Colligation ( you can stick yourNP, far be it from me to INF, ignorertout deN, il ny a qua` INF, ce/cette N [tradition, etc.] est reste(e),NP dans lame, typischN, etc.); note that this definition of colligation isdifferent from Firths (1957) or Hoeys (1998)1, since it concerns not only
the grammatical preferences of individual words, but also those of longer
syntagms. Thus, the syntagm tu navais qua` can be said to be in colligation
with an infinitive clause.
(b) Collocation between lexemes or phrasemes ( just as clause . . . so / in thesame manner clause, levy charges, briser ses chaussures, cest-a`-dire enloccurrence, regarde ou` tu vas, bon ben, a` la fin, etc.).
(c) Collocation between lexemes and semantic-pragmatic (contextual)
features (beautifully [result of creative activity], [uncertainty] not so,[question] eh bien, [expectation] duly, [negative contextual aspect](not) detract from s.o.s enjoyment, help! [on such one-word collocations,
cf. Gonzalez-Rey 2002: 95, 101)
(d) Collocation between semantic-pragmatic features (e.g. long-distance
collocations, Siepmann 2005).
This typology and the notational conventions that go with it present two
major advantages with a view to lexicographic applications: they allow us to
capture the full range of collocational phenomena, and they dispense almost
2 Dirk Siepmann
-
entirely with complicated metalanguage such as that used in Melcuks
lexicologie combinatoire et explicative (Melcuk et al. 1995).
In what follows, I shall discuss some of the demands the full-scale integration
of lexico-grammatical units of the type just discussed places upon commercial
monolingual and bilingual encoding dictionaries. My main concern therefore
is with the reference needs of active users, such as the native French speaker
trying to write, speak or translate into English. My thesis is that the bilingual
onomasiological rather than the semasiological dictionary constitutes the ideal
repository for the collocational and colligational units required by active
users. After a brief description of the Bilexicon project aimed at producing
near-comprehensive thematic learners dictionaries, I shall go on to marshal
various sorts of evidence on the weaknesses of the semasiological and the
strengths of the onomasiological approach. This will lead to the conclusion
that the traditional dictionary-making process should be turned on its head:
rather than starting from an alphabetical framework it should proceed from
a bilingual or multilingual onomasiological research base.
I shall then proceed to discuss coverage of collocations in current bilingual
and monolingual dictionaries, together with suggestions for improvement.
The last two sections will be devoted to types of lemmas and limits on the
translatability of collocations.
2. A brief outline of the Bilexicon project
The Bilexicon project pursues a theoretical as well as a practical aim. On the
theoretical side, the aim is to provide a sound basis for the production of
unabridged onomasiological bilingual learners dictionaries which focus on
collocation. On the practical side, such dictionaries are to be developed for the
language pairs English/French, English/German and French/German, both in
print and electronic form.
The project can be sketched in rough outline only. What is said here should
not be taken to suggest that the problem of describing the native-speaker
lexicon or specific sections thereof is easily solved (for a fuller account,
see Siepmann, in preparation; for a sample chapter, see the authors website
www.dirk-siepmann.de).
2.1 Rationale
The rationale behind the Bilexicon project proceeds from a paradox about
foreign language learning in higher education: language teaching specialists
have long demanded that university graduates in modern languages should
have a native-like lexical competence in their L2 (e.g. Meiner et al. 2001);
in practice, however, such a competence is seldom attained, and few serious
Collocation, Colligation and Encoding Dictionaries 3
-
efforts have been made to improve attainment levels. De Florio-Hansen
(2004: 83f ) sums up the situation at German universities by stating that
students linguistic competence does not increase significantly between the
beginning of their course of study and its successful completion.
However, to sustain a prolonged learning effort, students must be told how
many and which lexical items they have to learn before they can confidently
claim to be competent users of the foreign languages of their choice (cf. Council
of Europe 2001: 6.4.7.2). Only once this material basis for vocabulary learning
has been laid do methodological factors come into play and can realistic
assimilation targets be set.
2.2 The compilation ofa native-like vocabulary
So far little research effort has been expended upon describing the extent
of native-like lexical competence in the L2. There is only one study for the
language pair German-French (Hausmann, forthcoming), whose aim it is to list
a large section of the receptive vocabulary of French which is intransparent
from a German perspective.
What Hausmann has achieved for the receptive side the Bilexicon project
aims to do for the productive side: to draw up a near-comprehensive list of
those collocations (including colligations) which may be considered to make
up a native-like vocabulary. The compilation of the native-like vocabulary
proceeds from two premises:
(a) Any attempt to determine basic and advanced vocabularies must start
from a list of all native-speaker signs (perhaps even including manual and
facial gestures), i.e. the entire lexicon of the language. The approach is thus
essentially top-down.
(b) It is from such a list that a near-native vocabulary can then be constructed.
Thus, rather than asking, as the traditional frequency approach did, which
are the most frequent words in the language, and which words do we need
to add to these to obtain a good working vocabulary?, this approach poses
the question what are the meaning units that native speakers use, and which
of these have to be mastered to be able to perform at a near-native (or lower)
proficiency level?. It is based on the simple observation that some adult
learners can pass as native speakers of the L2 because they have perfect
pronunciation and a command of lexico-grammar which is sufficient to express
any communicative need in a correct and natural manner. Nevertheless these
learners have not normally attained the same level of lexical competence as
a native; even for them, the framing of ideas in the foreign language is
conditioned by linguistic proficiency. It is the level of vocabulary knowledge
achieved by such learners that can be described as near-native.
4 Dirk Siepmann
-
In theory, therefore, it should be fairly easy to establish a procedure that
might be used in compiling a near-native vocabulary. In practice, however,
such a procedure still comes up against considerable, if not insuperable
difficulties. The procedure might look something like this. In a first step a
full-size lexico-grammar of at least one language would have to be compiled.
The main problem at this stage is to give a definition of multi-word units that is
sophisticated enough to distinguish these from lexical bundles (Biber et al.
1999) or n-grams, i.e. mere strings of word forms which occur more than once
in a corpus. Such a definition has been attempted in Part I of this article. Thus,
for example, at the end of the is an n-gram retrievable from any medium-sized
corpus, but underlying it is the colligation at the end of the NP.
The frequency approach is an adaptation, to linguistic units beyond the
word level, of the traditional procedure for determining core vocabularies.
At its simplest, it uses a very large corpus to determine the frequency of
each meaning unit; units whose frequency is below a minimum threshold are
discarded. It is not difficult to see why this approach, if used exclusively,
is more or less unworkable. The main reason is that there is no such thing as
a representative corpus, and there are no very large corpora available which
can provide accurate guidance on spoken usage. Even the Internet or sections
of it, such as google.co.uk with the option pages from the UK is neither
representative nor reliable as a corpus. Apart from being skewed towards the
written language, it contains large amounts of outdated and non-native speaker
material2; it is also uninformative on range and distribution, i.e. the extent to
which an item appears in several different text types.
In an alternative approach, each collocational or colligational unit could
be subjected to a test for economy effects. As explained above, foreign-born
speakers who pass as natives have not normally developed the same lexical
competence as native speakers; they succeed in giving a native-like impression
by recycling or creatively recombining items from what is admittedly
a vast repertoire. This repertoire, however, need not contain the hundreds of
thousands of rough formulaic synonyms that native speakers have at their
disposal. In other words, the native-like speaker can achieve considerable
economies in learning effort by acquiring just one expression for each com-
municative need. Siepmann (in preparation) suggests that such economies
manifest themselves in at least eight different economy effects resulting in the
elimination of a collocation or lexeme from the near-native vocabulary.
To take but one example, a native English speaker wishing to describe the
state of being stationary in traffic can choose from among a number of
synonymic expressions, such as be / get caught in a traffic jam, be / get caught
up in a traffic jam, be / get stuck in a traffic jam, sit in traffic, sit in a traffic jam,
be stationary, etc. For the non-native, knowledge of just one of these expres-
sions will do; when it comes to choosing which, the criteria of frequency,
availability and learnability may be invoked.
Collocation, Colligation and Encoding Dictionaries 5
-
It should have become clear that, despite its deficiencies, the second
alternative is more promising than an approach based on frequency alone,
especially if the point of departure is a clearly delimited area of the vocabulary,
such as the language of motoring or the vocabulary relating to feelings. First,
a very large corpus of subject-specific material is assembled from Internet and
other sources, such as corpora and published dictionaries. In constructing
such a corpus, it is important to include Internet genres that are lexically close
to real-life speech, such as news forums, e-mail, fan fiction, film and soap
opera scenarios. A further means of reducing the inevitable bias towards
writing in corpus construction is to elicit judgements from native speakers on
the currency of particular words and collocations in speech. It is to be expected,
however, that such tests will produce tangible results in only a few vocabulary
areas, such as proverbs (Arnaud 1992) or idioms. In others, such as motoring,
the sheer size of the lexical material precludes any detailed investigation of
native-speaker judgements.
The third alternative is some sort of combination of the frequency-based
approach and the approach drawing on economy effects, which could,
for example, be applied in succession. Economy effects may also be taken
into consideration in determining proficiency levels below the near-native level.
The subsequent procedure involves three major steps:
(1) Corpora and dictionary sources are tapped to identify all the individual
word-forms and words belonging to the vocabulary area in question. This
involves the making of a corpus-based word list using for example
the WordSmith tool of the same name and the use of dictionaries which
allow full-text searches or searches by subject area, such as TLF, DO, PR
or CIDE.
(2) In the next step, programs such as WordSmith and Collocate are used
to determine the collocations and patterns entered by the items on the
word list.
(3) The third step is to eliminate redundant collocations on the basis of the
aforementioned economy effects.
In a fourth, optional step various proficiency levels might be distinguished
on the basis of the frequency of collocations and single words or on the
basis of the transparency of items for particular user groups (cf. Hausmann,
forthcoming).
2.3 Macrostructure
The project stands in the long tradition of what, borrowing from McArthur
(1986), we might call thematic learner lexicography a tradition that goes
6 Dirk Siepmann
-
back almost to the dawn of civilisation. Recent examples of this tradition
include LLCE, VAEA and CW, to name but a few.
As McArthur (1998: 153) believes, it is impossible to find an ultimate true
schema for ordering things and words in the world, and the Bilexicon Project
lays no major claim to innovation in this respect. Its point of departure is
a fairly traditional division of the lexicon into topic areas such as motoring
and sub-areas such as parking. Where it does innovate is in the distinction
between topic areas and situation types and in cross-referencing between
syntactically and semantically similar patterns, which will be available only
in the electronic version.
The distinction between topic areas and situation types is not perfectly
clear-cut and merits a brief explanation. In a sense, every communicative
situation is of course unique, but it seems permissible to generalise across
specific situations to arrive at similar situation-types (Lyne 1985) or text-
types embedded in more general topic areas (McArthur 1981). An exclusive
focus on either of these, as found in the works just cited, seems severely
limiting, as topic areas and situation-types are interdependent. One situation-
type, such as a court hearing, can involve widely varying topics. It may also
be subdivided into any number of sub-types, down to as narrow a discoursal
span as the conversational turn in the case of a simple exchange of greetings
(speaker A: hello, speaker B: hello); conversely, the same topic, such as an
account of an accident, can occur in several different situation-types or text-
types, such as general conversation, court hearings, newspaper reports or
insurance claims letters. Let us consider a few examples to illustrate the
possible categorisation of various types of collocation (see Table 1).
What distinguishes the Bilexicon from other bilingual thesauri is that
allocation of entries to topic areas is essentially bottom-up, that is, it is
the collocations found in the subject-specific corpora which determine the
Table 1: Semantic categorization in a conceptually organised dictionary
Collocation Topic Area 1:
Situation Type 1
Topic Area 2:
Situation Type 2
money/funds/a sum/etc.leave account/bank/etc.
Banking
Tu craches ta valda ? Road traffic: Traffic
lights (obsolescent)
Emotions: Impatience
regarde ou` tu vas! Movement: Moving
with care
Emotions: Care
(or Caution)
make s.o. feel small Emotions: Humiliation
I would give anything
to INFEmotions: Cravings
Collocation, Colligation and Encoding Dictionaries 7
-
setting up and internal structuring of sub-areas and situation types. This stands
in contrast with traditional approaches to thesaurus building, where terms were
inserted into a fully pre-determined ontological structure. There are, of course,
obvious limitations to such an approach in that some words and collocations
have both general and topic-specific uses. A case in point is the vocabulary
relating to damage, which is important in such situation types as car accidents
but may also apply to a wide range of other situations (any kind of accident,
intention to harm, legal terminology, etc.).
Underlying this thematic organization in the electronic version will be a layer
of semantic links inspired by such work as Francis, Hunston and Manning
(1996, 1998), who have shown that words entering similar patterns usually
share an aspect of meaning. This will enable users to extend their vocabulary
along a non-thematic route and will raise their awareness of the close link
between sense and syntax.
3. Semasiological vs. onomasiological dictionaries
As noted in the previous section, the Bilexicon project aims at producing
bilingual onomasiological dictionaries whose main entry type will be of a
collocational nature. This represents a break with the word-based lexicography
still current in both semasiological and onomasiological approaches. Semasio-
logical dictionaries tend to consist of an alphabetical word list leading the user
from the word to its meaning, while onomasiological dictionaries allow the
user to proceed from a particular concept and find the most appropriate
word for it. Both types of dictionary are therefore mainly based on individual
words although, perforce, including phraseology in sub-entries and examples.
This section begins with a brief critique of the notion of word meaning before
discussing the effectiveness of the two types of dictionary in representing
collocation.
3.1 Meaningunits beyond the word
The vast majority of todays dictionaries are based on the Saussaurean
paradigm that the basic unit of meaning in a language is the word; accordingly,
dictionaries are regarded as word books (cf. German Worterbucher) which
provide records of the various senses of individual words. So influential
has been this view of the dictionary that the bestsellers among present-day
monolingual and bilingual encoding dictionaries are small to medium-sized,
alphabetically organised pocket or desk dictionaries which list one-to-one
equivalents between words and provide only limited guidance on the syntag-
matics of language. Modern dictionaries thus perpetuate the time-honoured
8 Dirk Siepmann
-
tradition of recording single words which has existed at least since Babylonian
antiquity.
There is, of course, no denying the fact that speakers can isolate words
from context and thus arrive at a definition of word meanings. However, since
the definition of word meaning requires the speaker to engage in a process of
abstraction, it is at least debatable whether it is word meanings that underlie
the speakers competence. Even the elicitability of paradigmatic relations
between the meanings of individual words does not allow us to conclude
that word meanings are stored in paradigmatic networks in what is often
called the mental lexicon (cf. Aitchison 1994). It is equally conceivable that
observees in psychological experiments respond with particular paradigmatic
associations because they have repeatedly met the associated items in
syntagmatic strings (cf. Rapp and Wettler 1992, Rapp 1995); as Jones (2002)
has shown, antonyms, for example, tend to co-occur syntagmatically (good or
bad, rich and poor).
The crucial factor in the acquisition of meanings thus seems to be the
primary association between lexical units of varying length3 and their extra-
linguistic and/or intralingual context of occurrence rather than the secondary
paradigmatic connections between two or more words that speakers can
establish when prompted or the word meaning which they can abstract out of
context when asked. Put another way, when unprompted, speakers produce
meanings by syntagmatically associating and/or modifying lexical chunks
which they have encountered before in similar contexts as the current one.
Our own practices of dictionary making have blinded us to the fact that we do
not communicate by stringing together individual words, but rather by means
of semi-prefabricated lexico-grammatical units.
This view, first proposed in outline by Bally (1909), has recently come to the
fore again in the Firthian tradition. Meaning is seen as residing in typical
combinations of lexical choices or collocability on the one hand, and typical
combinations of grammatical choices or colligation on the other (Hunston
2001). A crucial aspect of an items meaning is its semantic prosody, a term
which reflects the realisation that lexical items become infused with particular
connotations due to their typical linguistic environment (Sinclair 1991, Louw
1993, Stubbs 1995).
The implications of the above for lexicography, especially learner lexicog-
raphy are clear: if a) meaning is considered to be inherent in collocation (under
which term I here subsume colligation) and b) the dictionary is intended to
provide a record of the units of meaning in a language, then future dictionaries
will have to provide a full account of collocational meaning units and their
typical contexts of occurrence.4 One of the most obvious desiderata, then, is for
collocations, as defined in the introduction, to be given entry status. Rather
than appear in the exemplificatory material, collocations of this type should
themselves be illustrated with examples as necessary.
Collocation, Colligation and Encoding Dictionaries 9
-
3.2 Difficulties ofthe semasiologicaldictionaryinrecordingandrepresentingcollocation
The foregoing considerations raise questions about the macrostructure, micro-
structure and mediostructure (Hartmann 2001: 6466) of a dictionary which
could adequately represent collocation. There are a variety of systematic
reasons why traditional semasiological print dictionaries, whether mono-
lingual or bilingual, will tend to fall short of this goal. Tersely stated, the main
reasons are:
(1) the difficulty of arranging items in a clear and memorable way;
(2) the inadequate coverage and representation of collocation between lexemes
and semantic-pragmatic features;
(3) insufficient discrimination between collocations and examples.
Let us deal with these in sequence.
3.2.1 Place of entry. Firstly, semasiological dictionaries arrange entries by thealphabet. If collocations are to be given entry or sub-entry status in such
dictionaries, this will pose the age-old question about the word or word-form
under which the multi-word entry should appear. There is a wide range of
possibilities for resolving this question. The policy of many dictionaries is to
indicate some of the collocates of headwords in square brackets or in the
exemplificatory material and to enter (comparatively) fixed expressions such as
idioms at the first notional word. Thus, the idioms all hell breaks loose and
out of a clear blue sky would be found respectively at hell and clear. There are
a number of possible alternatives to this organizing schema (cf. Gates 1988).
For example:
(1) Collocations may be arranged alphabetically by their first components.
(2) Collocations may be entered at the semantically most important
component.
(3) Collocations may be entered at the grammatically most important
component.
(4) Collocations may be entered at the least frequent component if there is a
wide difference in frequency between the constituents (cf. Bogaards 1990).
The second of these possibilities would partially solve the difficulties users
have in locating collocations because of their directionality; two-item
collocations are still normally recorded at the entry for the collocate rather
than for the base (i.e. the semantically most important word). Thus, users will
find meet a criterion under meet rather than criterion, although their
formulation process starts with the noun. One wonders, however, whether
the second and third of these schemas will always lead to an unequivocal
solution, as lexicographers and users views on what is semantically and
grammatically most important may differ. The fourth solution reflects user
10 Dirk Siepmann
-
preferences identified in an empirical study, but seems only to apply to native
(French) dictionary users rather than language learners (Bogaards 1990).
For the sake of user convenience, it is desirable therefore to enter a col-
location under each of its meaning components and to cross-refer the user to
the place where the entry is found. Drawing on this insight, Petermann (1983)
has devised a consistent location policy for traditionally conceived phrasemes
(i.e. fixed expressions) which could also be applied to collocations. He suggests
that each phraseme should appear under each of its notional components
while being assigned only to one main entry. The choice of this entry is to be
determined by the following criteria: if the phraseme contains a noun, this
becomes the main entry; if there are several nouns, main entry is given to the
first. If there is no noun, main entry is given to the first adjective, etc., in the
following order: verb, adverb, pronoun, numeral, interjection. Consistent as
this policy may be in theory, the question is whether the average dictionary user
can be expected to comprehend it. Interestingly, however, it is in keeping with
the results of an empirical study (Bogaards 1990), which found that Dutch
language learners begin their searches with nouns, followed by adjectives
and verbs.
Another common suggestion consists in recording different types of
phrasemes in different ways (Burger 1989: 595). Fully idiomatic phrasemes
are to be listed under one of their components only, with cross-references at
the entries for other components; the choice of the entry term should not be
governed by semantic considerations, as these require the largest amount of
previous knowledge on the part of the user. Partially idiomatic phrasemes
which are linked to a specific meaning of a headword are to be treated under
the relevant sense division. Non-idiomatic phrasemes have to be discussed
at each of their components, under the relevant senses. Although presenting
the clear advantage of highlighting connections of meaning, this arrangement
is theoretically unsound in that, rather than recognizing the holisticity of
collocations, it presupposes their semantic divisibility and may entail an
etymological re-motivation of what is only a partially motivated or
unmotivated fixed expression (see also Burger 1989: 595).
To compound matters, the nesting of collocations may make retrieval
difficult. A large number of syntactically well-formed collocations (cf. for
example regarde ou` tu vas or Ive got [liquid, crumbs, etc.] all over/on [piece of
clothing, exercise book, etc.]) are made up of highly frequent individual lexemes
such as regarder, aller, have, haben, etc., a factor which contributes to heavily
inflating entries for such words. Current unabridged dictionaries bear ample
testimony to this, although they are still a long way from including the
totality of collocations. Thus, the entry for aller in PR, for example, runs to
three and a half columns.
One way of solving this problem would be to draw items together in blocks
at the end of the entry. Each block would present items exhibiting a particular
Collocation, Colligation and Encoding Dictionaries 11
-
type of syntactic relationship, after the manner of OC, for example. But then
again such clustering may be difficult to justify with clearly motivated multi-
word units like there is good reason to INF; there is a strong case here fortreatment under the relevant sense division of reason.
There are, of course, equally good reasons for giving main entry to
collocations as there are for recording them under a sub-entry, whether this be
a separate entry or a sense division of a particular headword (cf. Burger 1998:
172 on multi-word units). However, if we decide to give collocations main
entry status, this will entail an even more complex macrostructure. To take but
one example, multi-word collocations serving a pragmatic or text-structuring
function and beginning with the pronoun it (it behoves us to INF, it is worthbearing in mind that/wh-clause, etc.) or the preposition to (to give an example,to this end, to return toNP) would fill dozens of pages, and so would two-itemcollocations beginning either with common nodes or common collocates
(such as increase or give).
From all this it seems reasonable to conclude, as most theorists do (cf. for
example Burger 1989: 595 on phrasemes), that there is no ready-made solution
for the positioning of collocational units in semasiological dictionaries.
Each case requires to be considered on its own merits, and the preferences of
particular user groups have to be taken into account (Bogaards 1990, 1991);
there should be neither consistent conflation into end-of-article nests nor
arbitrary allocation to a particular sense division. Rather, as with derivatives
and compounds (which have traditionally been conceived of as distinct from
collocations), it is inevitable to steer a middle course between considerations
of semantic relatedness, user convenience and economy of treatment (cf. Cowie
1999: 150 on derivatives and compounds). In any case, collocations should
be highlighted typographically, and, if necessary, attention should be drawn
to their special pragmatic and/or text-structuring functions. However, given
the sheer size of the class of collocations, alphabetical access seems an
unmanageable solution in the long run.
3.2.2 Representation of semantic-pragmatic collocations. If we now ascertain therelationship between types of collocations and the problems associated with
recording them, it turns out that the semasiological dictionary experiences
the greatest difficulty in adequately representing purely semantic-pragmatic
collocations occurring in specific situation-types or topic areas. A pertinent
example is afforded by semantic-pragmatic collocations based around mordre
sur (overlap into, go over into, cut into, veer off course into/onto), which
occur in three main topic areas, viz. a) geography (e.g. une region mord sur une
autre), b) medicine (une partie du corps mord sur une autre) and c) motoring
(une voiture mord sur une partie de la route).
The bilingual semasiological encoding dictionary has two options to
represent such information: by adapting PGF style: une voiture mord sur qc
12 Dirk Siepmann
-
(accotement, ligne mediane, etc.), or by adapting CR style: [voiture] mordre sur
[accotement]. Of these, the first would seem to be immediately comprehensible
to the user, since it is very close to a natural language sentence. The mono-
lingual encoding dictionary could solve the problem by using Cobuilds folk
definition style, which allows the lexicographer to place typical collocates in
the first part of the defining sentence:
lorsquune voiture mord sur une partie de la chaussee ou sur le bas-cote,
elle va au-dela` de la voie de circulation qui lui est normalement attribuee
Unfortunately, apart from Cobuild, DAFA and, to a lesser extent, CIDE, none
of the available monolingual dictionaries have so far made any use of the above
procedures for representing collocational meaning.
One deficiency of the semasiological encoding dictionary which even Cobuild
has been unable to remedy is the impossibility of representing synonymy
between collocations in a space-saving and user-friendly manner. Let us
consider the following example of a collocation of type 3 and its possible
representation in a semasiological dictionary:
money=funds=a sum leave account=bank=fund=countryIf we were to record this semantic-pragmatic collocation ([money]
leave [place where money is stored]) with a view to enabling the user tocomprehend and encode it in its entirety, we would have to make a minimum of
three entries (at money, funds and sum) and a maximum of eight entries (money,
funds, sum, account, bank, fund, country, leave), not to speak of the amount of
cross-referencing that would be required. Moreover, collocational attraction
between any two of the constituents in this semantic-pragmatic collocation
(e.g. funds leave country) may be too weak to show up in a concordancebased on mutual information (Church and Hanks 1990) or log likelihood
(Dunning 1993), thereby not warranting the inclusion of any specific
collocation. Yet the semantic-pragmatic collocation as a whole is clearly
frequent enough and of interest to language learners, especially since other
languages such as German may have slightly different ways of expressing
the same idea (e.g. money leaves an account Geld geht von einem Konto ab /
[less commonly:] Geld verlat ein Konto).
3.2.3 Examples vs. collocations. Another problem with existing semasiologicaldictionaries is that they fail to distinguish between examples and collocations,
i.e. they frequently record holistic units within the exemplificatory material
rather than assigning them entry status and exemplifying them in their turn.
This is not usually a serious problem with traditional two-word collocations
in which the collocate assumes a specific meaning if we disregard for
the moment the fact that such collocations may still be difficult to locate
Collocation, Colligation and Encoding Dictionaries 13
-
for users but it becomes one in the case of collocations which appear to have
been freely put together by the application of general semantic and syntactic
rules. This can be illustrated with two examples, one from an unabridged
monolingual dictionary (GR) and one from a monolingual learners dictionary
(CCED).
GR, which offers a sprinkling of extended collocations, will serve to
illustrate the haphazard nature of current practice (for further detail, see
Siepmann 2005). Thus, the exemplificatory infinitive clause pour nen citer
quun exemple a collocation of type 2 common in academic writing is found
as the second example under sub-entry II.2:
(XIVe). Cas, evenement particulier, chose precise qui entre dans
(une categorie, un genre . . .) et qui sert a` confirmer, illustrer, preciser
(un concept). Voici un exemple de sa betise. Pour ne (nen) citer quun
(seul) exemple. Apercu, echantillon, specimen. Ce cas offre un exemple
typique de telle maladie. 5X Type. Cest un bel exemple de presence
desprit! Alleguer, apporter des exemples a` lappui dune assertion, dune
affirmation. 5X Preuve. Exemple concret illustrant une idee abstraite.
Appuyer (cit. 5) dun exemple. Exemples donnes dans un manuel de physique,
de chimie. Exemple bien, mal choisi. Donnez-moi un exemple de volcan
eteint, de plissement tertiaire. Exemples a` lappui dun raisonnement,
dune demonstration. Exemple qui prouve que . . . Il ma cite lexemple de
ce chanteur (! 1. Basse, cit. 7). Puiser ses exemples dans lhistoire(! Egosme, cit. 1). (GR, s.v. exemple)
The multi-word collocation in question has been entered as an example
sentence followed by a full stop. This implies that the phrase can stand on its
own, thus obscuring its textual function of introducing an example, and
potentially leading at least the foreign-born user astray.
With a collocation such as we (now) turn (now) to the situation is even less
clear. In CCED it appears in the exemplificatory material at sub-entry 12 for
turn and is not explicitly marked as a collocational unit:
We turn now to the British news.
This example sentence may, however, not be very useful to learners, since it
neglects to highlight that we are dealing with a transitional device that can be
employed in both spoken and written English rather than an ad-hoc formation.
The drawbacks of such practice should by now be obvious. For one thing,
neither the native nor the non-native user will be sensitised to the holistic
nature of multi-word units. For another, the non-native user in particular
will find it difficult to find variants of a particular collocation, such as pour ne
donner quun exemple or pour prendre un seul exemple in the case of the example
from GR this is due to the lack of synonymic links in the mediostructure
14 Dirk Siepmann
-
already touched upon. One reason for the lack of cross-referencing with
regard to synonyms is what may be termed the alphabetical framework
approach to dictionary making. In the compilation of large-scale dictionaries
one commonly starts by drawing up an alphabetical list, or framework
of the major sense divisions before assigning one small section of the
alphabetical list to the individual lexicographer, who will identify and enter
collocations of individual lexemes without much regard to the findings of his
or her colleagues.
As can also be inferred from the above examples, another serious
disadvantage of current practice is that common collocations tend to be
submerged amid a welter of detail. Thus, in GR, it takes a considerable amount
of searching to locate the concessive discourse marker il faut bien reconnatre
que within one of the sub-entries for reconnatre. The specific pragmatic
function of the marker is not made explicit; rather, it must be inferred from
the general definition given under sense division 4 of reconnatre or from its
synonymy with the evidence marker il faut se rendre a` levidence, to which the
reader is cross-referred.
4. (XIVe). Admettre pour vrai apre`s avoir nie, ou apre`s avoir doute,
accepter malgre des reticences. 5X Admettre, averer, declarer . . .On a fini
par reconnatre son innocence. 5X Croire (a`);! aussi Rendre hommage*a` . . .On est force de reconnatre des divergences (cit. 1) entre certains
textes . . .Maintes fois, il le reconnat lui-meme, il manquait de bon sens
(! Grain, cit. 26). Reconnatre la superiorite de qqn. 5X Ceder (3.: leceder a`); proclamer . . .Amener qqn a` reconnatre. 5X Convaincre.
Reconnatre que. 5X Admettre, avouer, convenir (de); ! Boiteux, cit. 7;demarche, cit. 4; Dieu, cit. 47; malheur, cit. 39; oracle, cit. 4. Ils ont tous
reconnu quil a fait ce quil a pu. 5X Tomber (daccord). Vous nhesiterez
(cit. 14) pas a` reconnatre que. . . Je reconnais que . . .5X Accorder; entendre
(jentends bien). - Quoi quon dise, on doit reconnatre que . . . (- Canaille,
cit. 12). Force (cit. 58) lui etait de reconnatre que . . . (- Exciter, cit. 32).
Il faut bien, on doit reconnatre que . . .5X Evidence (se rendre a` levidence);
! Melodique, cit. 1.
Turning now to colligational patterns, we find that quite a number of these
have found their way into the dictionaries, but that they are usually treated by
way of lexical exemplification. Here are a few examples from PR:
un mecanicien en herbe (PR; underlying colligation: NP [vocation]en herbe)
de la graine de voyou (PR; underlying colligation: de la graine deNP)etre musicien dans lame (PR; underlying colligation: NP dans lame)
Collocation, Colligation and Encoding Dictionaries 15
-
Note that such treatment is doubly limiting. For one thing, it conceals the
generativity of the patterns as well as the limits of such generativity; for
another, it omits to signal typical textual embeddings. Thus, a colligational
pattern such as NP/ADJ a` ses heures tends to occur as an appositive (oftenclause-initial), and this information must be made available to the dictionary
user. Cf. for example:
Poe`te a` ses heures, Guillaume improvisait des vers.
Nicolas, jardinier a` ses heures, dispose dune plantation qui lui fournit la
matie`re premie`re de ses petards.
3.2.4 Other deficiencies resulting from a semasiologicalmethodology. Another point tonote (and one I shall expand upon in the section on translation equivalence
below) is that definitions and sense divisions in monolingual dictionaries
as well as translations in bilingual semasiological encoding dictionaries often
leave something to be desired. Again, this is primarily because bilingual
lexicographers who work on single letters or words often lack contextual,
or more accurately, subject-specific information; even if they have such
information in one language, they may still find it difficult to provide natural
textual equivalents because they fail to avail themselves of the time-honoured
strategy used by professional translators of comparing parallel texts, i.e. texts
which deal with the same or similar subject matter in different languages.
To compound matters, bilingual dictionaries tend to exhibit an empirical
dependency (Kromann 1991: 2714, Hausmann 2002: 1619) on monolingual
dictionaries in the sense that the aforementioned alphabetical framework
is generally grounded on monolingual dictionaries. As a consequence,
interlingual divergences which could emerge from a contrastive analysis are
not normally taken account of.
There is ample evidence from a number of studies of such dependencies.
Hausmann (2002: 1619) shows that OH was the first dictionary to introduce
the notion of tact into its French renderings of the English adjective
insensitive, for the simple reason that its compilers had at their disposal two
new monolingual dictionaries which used tact in their definitions and
provided several examples of its use including several typical collocations.
In similar vein, Cummins and Desjardins (2002) demonstrate that there is
insufficient discrimination in a number of bilingual dictionaries between the
various senses of two English-French pairs ( population/population and plus ou
moins/more or less) to enable correct encoding. For example, French population
has an affective use not paralleled by its direct English equivalent which is
better rendered by nouns or collocations such as people or the (general) public.
Again, it is reliance on monolingual dictionaries which appears to be the root
cause of such oversights.
16 Dirk Siepmann
-
Another example can be seen in GW (German-English), which renders
the German compound noun Bildungsangebot by the clumsily literal word
combination educational offer. As a study of parallel texts will reveal, however,
the intended meaning is idiomatically expressed in British English as
educational provision (see also Laffling 1991) or training provision, as the case
may be.
While such shortcomings could be remedied fairly easily by consulting
parallel texts available from corpora or the Internet or by developing
algorithms for the automatic extraction of traditionally-conceived bipartite
verb-noun or noun-adjective collocations (cf. Laffling 1991; Smadja,
McKeown and Hatzivassiloglou 1996; Fontenelle 2003), the situation is less
straightforward with extended collocations of the type far be it from me
to INF, vieles spricht dafur, dass (see Siepmann 2005), regarde ou` tu vas ortout se passe comme si (see Siepmann 2004). These collocations are either
absent from dictionaries or wrongly translated because there are usually no
node words on which either the human lexicographer or extraction software
could base their search for an equivalent (cf. regarde ou` tu vas pass auf, wo duhintrittst).5
Take, for example, the discourse marker far be it from me to . . . , which is
common in academic and journalistic prose. In CG this has been rendered by
es sei mir ferne, zu . . .The German expression is untypical of modern academic
or newspaper style and has a distinctly archaic ring to it. For lack of resources
in which to locate a workable equivalent, the lexicographer must have selected
one from the entry for fern(e) in an outdated monolingual German dictionary.
Greater familiarity with academic and newspaper German or reliance on
parallel texts would have thrown up solutions such as es liegt mir
fern zu INF or nichts liegt mir ferner, als zu INF.
4. Potential benefits of the onomasiological approach
My contention in this section is that the adoption of an onomasiological,
collocation-based approach is likely to make the dictionary compilation process
more reliable and more efficient, thereby ultimately leading to more reliable
final products. So far commercially available onomasiological dictionaries,
like their semasiological counterparts, have focussed on single words or
traditionally-conceived fixed expressions (e.g. RO, DO, WE) but they will
really come into their own when collocation is taken into account.
The principal reason why the onomasiological approach is superior to the
semasiological is not far too seek: as communicators, we do not start from
lists of individual words which we then go on to combine in a suitable fashion.
It is not atomised single units, but concepts and processes (Gotze 1999: 11)
that are represented in our brain. The concepts we wish to convey and the com-
municative choices we make are normally expressed either by collocations or,
Collocation, Colligation and Encoding Dictionaries 17
-
less commonly, by individual words.6 As pointed out above, collocations are
inextricably linked with, and usually restricted to, some particular topic area
and/or situation-type through what may be described as neuronal assemblies,
i.e. the repeated association of lexical units or semantic-pragmatic features with
a situational or syntagmatic context. In the same way, the lexicographer gains
considerable advantage from focussing on collocational choices within a
particular subject area.
Let us now consider the ways in which the onomasiological approach can
resolve the problems noted above for the semasiological approach.
4.1 General lexicographic principles and the onomasiologicalapproach
We may start by looking at a number of lexicographical stringency criteria
proposed by Melcuk et al. (1995: 33 ff.). They point out, among other things,
that traditional dictionaries fail to describe semantically related lexemes in
a sufficiently uniform manner (Melcuk et al. 1995: 40). As an example they
cite nouns designating nationality. Whereas un Francais is defined as une
personne de nationalite francaise in one dictionary, un Chinois has no
definition, etc. Melcuk et al. (1995: 40) therefore posit the principle of
uniformity, which states that the articles representing phrasemes belonging to
one semantic field must be as closely similar as possible. It follows that,
although their idealized dictionary is alphabetical for reasons of ease of use,
it is ultimately onomasiological since the central concept underpinning it is
the semantic field. Only an onomasiological methodology can guarantee
uniformity of treatment.
Another clear advantage of the onomasiological approach lies in its being
explicit in the sense that nothing is left to the users intuition. As Melcuk
et al. (1995: 3536) point out, a collocation such as magazine feminin cannot be
entered as a mere example because it could theoretically mean either magazine
about women or magazine for women. One wonders, however, whether full
explicitness can ever be achieved when using a monolingual methodology;
as mentioned in Section 2.1 above, many of the nicer sense distinctions in
one language (such as the various meanings of French population) only come to
light against the background of another language. Thus, while monolingual
collocational dictionaries such as OC may well record stream of traffic or flow
of traffic, they do not differentiate between the two senses of the collocation
which become apparent when comparison is made with equivalent German
expressions (in German a distinction is made between flieender Verkehr
into which the road user merges and Verkehrsstrome or Verkehrsfluten
visualised as continuous lines of dense traffic).7 Nor do they take note of
triple collocations such as endless stream of traffic, which may, however,
become apparent from a contrastive search for a viable equivalent of the
18 Dirk Siepmann
-
German compound noun Blechlawine. See the entry from the projected
English-German bilingual thesaurus in Table 2.
To take but one more example, neither the big four monolingual learners
dictionaries8 nor CR recognize the specific sense that wait assumes in the area
of traffic; a bilingual methodology would reveal this sense since it requires non-
literal renditions such as rester en stationnement in French and stehen or halten
in German (see Table 3). This shows that, in a bilingual thesaurus, explicitness
can be achieved quasi automatically by recording all possible variants of
a collocation along with its topic-specific or situation-specific translations,
e.g. magazine feminin / magazine pour femmeswomens magazine.Likewise, the principle of internal coherence (Melcuk et al. 1995: 36 ff.) can
be readily adhered to in a bilingual thesaurus based on collocations rather than
Table 2: stream of trafc and its German equivalents
English German
stream of traffic / flow of traffic /
traffic flow
der Verkehrsstrom /
die Verkehrsflut
the steady stream of traffic
heading to St Sampsons
die kontinuierliche Verkehrsflut
in Richtung St. Sampsons
(die sich nach St Sampsons
ergieende Blechlawine)
look behind early and move into
the stream of traffic when safe
schauen Sie sich fruhzeitig um
und ordnen Sie sich bei einer
gunstigen Gelegenheit in den
flieenden Verkehr ein
endless stream of traffic /
solid line of cars / heavy traffic
die Blechlawine*
there is an endless stream of traffic
from the Strae des
17. Juni going past the Brandenburg Gate
von der Strae des 17.
Juni rollt eine Blechlawine am
Brandenburger Tor vorbei
we go around a bend and there
ahead of usis a solid line
of cars as far as you can see
wir fahren um eine Kurve und
vor uns ergiet sich eine
Blechlawine soweit das
Auge reicht
Table 3: wait and its French and German equivalents
English French German
I couldnt
wait very long
je ne pouvais pas rester en
stationnement tre`s longtemps
ich konnte nicht lange halten /
ich konnte nicht lange anhalten
Collocation, Colligation and Encoding Dictionaries 19
-
lexemes (or lexemes and collocations). This principle states that there should
be perfect correspondence between the definition (i.e., in the case of a bilingual
thesaurus, the translation), the syntactic patterns and the lexical patterns
entered by a lexeme or phraseme; the only problem here is the directionality
of translation, which may lead to a larger number of entries in a bilingual
dictionary, as illustrated by the aforementioned collocation stream of traffic.
When used on its own, this collocation can be translated almost literally into
German in the form of the compound nouns Verkehrsstrom or Verkehrsflut.
When modified by the adjective endless, however, it can be rendered more
elegantly by the colloquial compound Blechlawine.
The problems with the definition of lexemes which arise from the inclusion
of such collocations as celibataire endurci do not occur in bilingual dictionaries
and are in fact purely theoretical, since collocations should be considered as
holistic meaning units. As Melcuk et al. (1995: 37) rightly conclude, the lexeme
celibataire on its own can never have the meaning homme en age detre marie
qui na jamais ete marie et qui veut rester tel although the above collocation
would seem to suggest just that.
Two additional principles proposed by Melcuk et al. (1995) are the principle
of exhaustiveness and that of compulsory consultation of databases.
As outlined in Section 2, the fulfilment of these principles can be greatly aided
through using a bilingual or multilingual approach which should proceed in an
iterative cycle:
compilation of subject-specific corpora in at least two languages !compilation of subject-specific word and collocation lists! analysis of thecontextual embedding of collocations with the help of the Internet !additions to corpora from Internet sources used in context analysis (etc.)
In summary, it could be said that future lexicography should pursue
a methodology which is diametrically opposed to the framework approach
outlined above. Sooner than proceeding from alphabetical lists of individual
lexical units based on monolingual dictionaries, it would be grounded in topic-
specific lists of collocations. The methodology of monolingual dictionary
making would thus also be turned on its head, since monolingual dictionaries
would benefit from the more detailed sense divisions established by bilingual
onomasiological lexicography.
4.2 Other potential benefits
An onomasiological methodology allows us to solve the problem of separating
different meaning units which would normally be allocated to the same article
in a semasiological dictionary. An example of this is the French collocation
20 Dirk Siepmann
-
donner exemple, which can be used in three different types of situation withtwo different meanings (see Siepmann 2003):
(1) a situation where the speaker/writer wishes to cite another author: Miller
(1995) donne un exemple de . . .
(2) a situation where the speaker/writer introduces an example of his or her
own: pour donner un exemple, je vais vous donner un exemple
(3) a situation where the speaker/writer gives an actual example: lArabie
Saoudite donne un exemple dEtat islamique moderne ( is an example)
The collocation would thus be given at least three entries in different sub-
sections of an onomasiological dictionary. Similar considerations hold true for
English collocations such as avoid an accident (cf. French empecher un accident
vs. eviter un accident) or leave the road (cf. German von der Strae abfahren
[intentional] vs. von der Strae abkommen [accidental]). It is the contrastive
background of a foreign language that allows the lexicographer to uncover the
polysemy of such items.9
Another problem noted above was the placement of collocations within
the dictionary; this can be resolved quite elegantly in an onomasiological
dictionary (or hybrid electronic dictionaries) such as the projected English-
French Bilingual Thesaurus (Bilexicon), where topic area and situation type
are the decisive factor in determining place of entry.
Likewise, in an onomasiological dictionary semantically related or syno-
nymic expressions do not need to be cross-referenced, as they will appear at
the same place in the dictionary. Examples are given in Table 4.
Table 4: Synonymic collocations in an onomasiological dictionary
Synonymic or semantically related
collocations
Topic Area: Situation Type
encore nomme / autrement appele / quon
appelle aussi
Discourse Markers:
Reformulation
dont say a word / dont make a sound /
be quiet / hush / quiet, please / shut up /
wrap up / belt up / put a sock in it
Noise: Telling people
to be quiet
Freizeit-N, Gelegenheits-N, Hobby-N Hobbies: Describing amateurs
when the right moment has come, in due
course, at the appropriate juncture, at the
appropriate moment, when the time has
come
Timing: Right moment
fahren auf / befahren / benutzen / fahren
(trans.) ( Strae)Driving: Road use
Collocation, Colligation and Encoding Dictionaries 21
-
The division of labour among various lexicographers can thus be by topic
area rather than the alphabet. For one thing, this solves the problem of missing
cross-references or missing translations for synonymic items; for another,
it allows an allocation of tasks to lexicographers by areas of real-world
expertise rather than the alphabet. Errors or infelicities such as those discussed
in Section 3 can thus be avoided.
Turning now to the problems involved in adequately representing colloca-
tions (especially of the semantic-pragmatic type), we note that the onomasio-
logical approach allows us to adapt and further develop PGF style, as already
sketched above. PGF style indicates possible collocates in both subject and
object position; sometimes generalised labels such as s.o. or s.th. are replaced
by more specific labels such as un animal. A few examples from PGF follow:
qn fait un appel du pied a` qn jd gibt jdm einen Wink mit dem Zaunpfahl
qn conduit qn/un animal/qc quelque part jd bringt jdn/ein Tier/etw
irgendwohin; (a` pied ) jd fuhrt jdn/ein Tier/etw irgendwohin; (en voiture)
jd fahrt jdn/ein Tier/etw irgendwohin
jd schlachtet qn tue [o abat] un animal/des animaux
un animal butine ein Tier sammelt Nektar [o Blutenstaub]
This practice can be further refined in onomasiological dictionaries. The
example of Table 5 illustrates the collocations entered by the French verb
butiner; this is a typical case where an individual word in French corresponds to
a collocation in English (for further evidence of interlingual correspondences
across morpho-syntactic levels, see Part I of this article).
For reasons of space and user convenience, typical subjects of butiner are
shown in the first line of the entry, so that they do not clutter up the following
lines, where the emphasis is on object complementation. In these lines the most
Table 5: An entry for butiner
butiner {une abeille,
un papillon, une guepe, . . . butine}
to collect nectar / pollen {a bee,
a butterfly, a wasp, . . . collects nectar}
une abeille butine (quelque part:
sur les fleurs des artichauts / dans
les pissenlits)
a bee gathers / collects / sucks (up)
nectar / pollen ( from artichoke
blossoms / from dandelions); a bee
gathers / collects honey11
une abeille butine une plante
(pour qqc: pour le nectar)
a bee visits a plant (to collect nectar);
collects nectar from a plant; sucks
(up) nectar from a plant
une abeille butine le pollen /
le nectar / le miel (quelque part)
a bee sucks up nectar / a bee collects
pollen (somewhere)
22 Dirk Siepmann
-
common specific subject abeille is used consistently, where PGF uses a
superordinate term such as animal. In the case of butiner subject and object
complementation could probably be dealt with in the same way for any number
of language pairs. With some verbs, however, the presentation of subject verbcollocations and object verb collocations may be determined by the targetlanguage. Consider, for example, the French verb craquer and its German
equivalents in Table 6.
This second example shows that complex colligations of the type qqc craque
de qqc must be illustrated with examples to be comprehensible to the dictionary
user. PGF style can also be adapted to variable idioms. In the example of
Table 7, the core meaning is given as a noun entry, while the sentence entries
illustrate different collocations.
Table 6: An entry for craquer
craquer knacken / knistern / knarren /
krachen / knirschen
une branche / une articulation craque ein Ast / ein Gelenk knackt
la chaussure / le toit / le fauteuil /
le parquet craque
der Schuh / das Dach / der Sessel /
das Parkett knarrt
la neige craque der Schnee knirscht
qqc / qqn craque de qqc
{bruits, materiaux de construction, . . .;
jointures}
(etwa:) bei j-m knackt es irgendwo /
an einem Ort knarrt etw.
il craquait de toutes ses jointures alle seine Gelenke knackten / bei ihm
knackte es in allen Gelenken
la maison craque de bruits de
radiateurs et de boiseries
im Haus knackt und knarrt es aus
der Heizung und der Holztafelung
Table 7: An entry for un pave dans la mare
un pave dans la mare eineBombe (die irgendwo einschlagt)
( uberraschende und beunruhigendeNachricht)
cest un pave dans la mare das schlagt ein wie eine Bombe
qqn jette un pave dans la mare /
qqn envoie un pave dans
la mare / qqn lance un pave
dans la mare
j-m sorgt fur Aufregung / j-m erregt die
Gemuter / j-m wirbelt einigen
Staub auf / j-m sorgt fur Wirbel /
j-m lat die Wellen der Aufregung
hoch schlagen
Collocation, Colligation and Encoding Dictionaries 23
-
In onomasiological dictionaries, additional economy of treatment may be
achieved by presenting collocations common to a particular semantic field at the
entry for the generic lexeme of the field, a suggestion that has already been
implemented by Melcuk and Wanner (1996: 233ff.) for the field of German
nouns denoting emotion. However, Melcuk andWanner also draw attention to
the limitations of such an approach, given that even closely related nouns do not
share all their collocates (cf. Part I on the arbitrariness of collocation). For ease
of use and memorisation, it may in any case be preferable to give the entire set of
collocations for each concept or lexeme at the entry for that concept or lexeme.
5. Coverage
This section is meant to illustrate by example how the onomasiological
approach can close some of the gaps found in current encoding dictionaries.
It will be seen that even the best collocational dictionaries are far from covering
anything like the entire range of collocation described in Part I of this article.
The section is divided into three parts. The first deals with breadth of coverage,
the second with depth, while the third offers suggestions for improvement.
5.1 Breadth ofcoverage
Within the Bilexicon project, a detailed trilingual investigation was conducted
into general-language items peculiar to one area of the vocabulary familiar to
most native speakers, namely road traffic. It was found that, while offering
a fair number of collocations in this area, OC misses out some very common
ones, such as
an empty parking space, a tight parking spot, a traffic jam clears, double
bend, avoid a traffic jam, the motorway (road) links (Paris) with
(Bordeaux), close a motorway, come off the motorway, open a (new)
motorway, motorway journeys, a clear motorway, a valid driving licence,
take ones driving test, nothing coming (etc.)
Table 8 compares the results for the English noun motorway with the
list of motorway collocations given in OC. The comparison shows that a
large number of collocations which an active user (i.e. a translator or language
learner) might need have been missed out. Numerically best represented in
this example as well as in traditional dictionaries generally are noun noun,adjective noun and noun verb collocations. Equally well covered intraditional dictionaries are fully fixed expressions such as proverbs or idioms.
Among the collocations of type 2 three-item collocations or triples
(Hausmann 2003) are patchily covered, probably because both monolingual
24 Dirk Siepmann
-
Table 8: Coverage of motorway in OC and in an ideal dictionary
Published dictionaries Additional collocations
from trilingual analysis
NADJ: busy, four-lane (etc.),orbital, urban
NV: join, leave, turn off, build
NN: driving, traffic, network,system, bridge, junction,
service area, service station,
crash, pile-up
NPrep.: along the motorway,down the motorway,
off the motorway,
onto the motorway,
on the motorway,
up the motorway,
motorway from,
motorway to
NADJ: big, large, major (! Fr. grandeautoroute); clear (! G. frei); clogged;congested;controlled; deserted; elevated;
empty; toll-free (! G. gebuhrenfrei,mautfrei)
NV: block, come off, cruise, get onto,go onto, go on, turn off, get off,
pull off, open, reopen
Nmotorway: toll (! Fr. a` peage,G. gebuhrenpflichtig, mautpflichtig),
motorwayN: access, bridge, company(! Fr. societe dautoroute),intersection, journey
(! G. Autobahnfahrt),lay-by, madness, maintenance,
miles, project
(! Fr. projet dautoroute), trip
NPrep.: (be) beside he motorway(! F. border lautoroute)
triples: electronic motorway tolls
(elektronische Mauterhebung), on a clear
motorway, on clear motorway
(! G. auf freier (Auto-)Bahn, auf einerfreien Autobahn), excellent motorway
access, turn a trunk road into a
motorway (enlarge a trunk road into
a motorway) (! G. eine Bundesstraezur Autobahn ausbauen), widen a
motorway to four lanes (! G. vierspurigausbauen), to do a lot of motorway
driving, the motorway links A with B
(! F. relie A a` B)
Collocation, Colligation and Encoding Dictionaries 25
-
and collocational dictionaries such as OC exclude many common compound
nouns from their alphabetical framework. Thus, OC records parking as
a participial noun, but does not accord entry status to parking space, thus
missing out common triples such as empty parking space or look for a parking
space. It might be argued that empty parking space is not a collocation at all
but a free combination; this line of reasoning is contradicted by the fact that
the equivalent German collocation is freier Parkplatz (as opposed to leerer
Parkplatz, which corresponds to a deserted / empty car park; see Part I of this
article). This underscores again the importance of an onomasiological
approach, which does not pre-empt decisions on what to include on the basis
of a restricted starting list. To take another example, while all unabridged
French dictionaries enter the expressions cest-a`-dire and en loccurrence, none
of them mentions the frequent co-occurrence of the two.
This brings us to one of the most severely neglected subsets of collocations,
which have been termed second-level discourse markers (Siepmann 2005).
Second-level discourse markers are fixed expressions, restricted collocations
or colligational patterns usually composed of two or more printed words;
typical examples are it is argued that, the same goes for, strictly speaking, force
est de INF, dapre`s ce qui prece`de or with this in mind. Although ubiquitousin both academic and journalistic language, they have so far been paid scant
attention in lexicography. In PR, for example, there is no mention at all
of the various collocations based on the colligation force est de INF( force est de constater / reconnatre / ajouter / . . .). As in the case of cest-a`-dire
en loccurrence, these collocations in turn form their own collocations, which,
unsurprisingly, also go unrecorded in current semasiological dictionaries.
Some examples:
with this in mind let us turn toNP
turning to NPwe find/note that-clause
not clause any more than clause
Patchy coverage is also given to conversational formulae of the type dont
make a sound, do you hear me, I couldnt agree more, look at the time. While
these four examples can all be located in CG or CR, those given in Table 9
are absent from at least one of the two.
5.2 Depth ofcoverage
Turning to depth of coverage, we find that three areas in particular are in need
of improvement, viz. a) triples b) collocational synonymy c) complementation
26 Dirk Siepmann
-
patterns or semantic-pragmatic collocations. The deficiencies found in each
of these areas will now simply be illustrated with a few examples from the
investigation into motoring vocabulary. The investigation revealed that triples
have been severely underestimated by theoreticians of collocations. Again,
the sheer size of the class, not all of whose members have been reproduced
here, indicates the superiority of an onomasiological, multilingual approach.
Where triples can be used alongside two-item collocations the triples have been
underlined (see Table 10).
Similar observations can be made for colligational patterns. The items in
Table 11 are just a small sample of those which have not been given their fair
share of attention in current dictionaries. Detailed cross-linguistic investigation
also threw up evidence of a general difference in patterning between English
and French which could never have been detected in a monolingual
investigation: in English two prepositions are often used in sequence to
describe movement, whereas French must resort to two clauses and two
different verbs to express the same idea (see Table 12). Finally, it may not be
amiss to illustrate (see Table 13) how the onomasiological approach can reveal
that synonymy, whether perfect or approximate, is not at all rare in natural
languages at the level of complex signs (i.e. collocations).
Table 9: Conversational formulae
English French German
theres no discussion il ny rien a` discuter da gibt es nichts
zu diskutieren
I wouldnt wish it
on anyone
cest quelque chose que
je ne souhaiterais
pas a` mon pire ennemi
das wurde ich
niemandem wunschen
(wollen) / das wurde
ich nicht einmal
meinem argsten
Feind wunschen
just being friendly jai seulement voulu
etre (me montrer)
aimable avec
(pour) toi/vous
ich meine es ja nur gut
this isnt really
aboutNP(pour toi) il ne sagit
pas de INF /NPDir geht es ja gar
nicht umNPand Bobs your uncle et le tour est joue /
et voila` le travail
und fertig ist die Laube
I wouldnt kick
him/her out of the bed.
Je ne coucherais
pas dans le
porte-savon.
Ich wurde ihn/
sie nicht von der
Bettkante stoen.
Collocation, Colligation and Encoding Dictionaries 27
-
5.3 Improvingcoverage
How can coverage be improved in future? Since OC was based on a large
general corpus (the BNC), this question is intimately linked to another, namely
whether any corpus can approach the collective linguistic experience of
a language community (Howarth 1996: 72). Clearly, the answer still has to be
in the negative at the moment of writing, especially since most of todays major
corpora are narrowly synchronic, comprising only the last fifteen years or so.
Yet in future very large corpora may well be built which will reflect the
knowledge and experience of language accumulated over several generations.
Everything stands or falls by the size and diversity of the corpora consulted,
so that it would obviously be wrong at the present time to infer the non-
existence of a collocation from its absence from a corpus.
As already pointed out, one way to overcome the limitations of exclusive
reliance on a large general corpus is by using sizeable subject-specific com-
parable corpora (this is the old principle of overall frequency vs. range first
Table 10: Examples of common triples not found in other dictionaries
(English-German)
a busy road / a busy street; a much used
road
eine stark befahrene Strae / eine
viel befahrene Strae / eine
verkehrsreiche Strae
on the open road; on clear roads / on
clear motorways (etc.)
auf freier Strecke; auf offener Strae
outside lane hogging / blocking the fast
lane / sitting in the outside lane
das Blockieren der Uberholspur
winter road clearance der Winterdienst
s.o. changes into first gear / goes into
first gear / engages first gear / puts
the car into first gear / gets the car
into first gear
j-m legt den ersten Gang ein
a good driving road eine Strae, auf der es sich gut fahrt
s.o. goes along a path / a road j-m fahrt (auf ) einem Weg / einer
Strae
the cab went along the coast road das Taxi fuhr uber die Kustenstrae
(fuhr die Kustenstrae entlang)
s.o. uses a road as a rat-run j-m nutzt eine Strae als einen
Schleichweg
s.o. gets into the correct lane / s.o.
selects the correct lane / s.o. moves
into the correct lane
j-m ordnet sich ein
28 Dirk Siepmann
-
Table 11: Examples of common colligational patterns not found in other
dictionaries (English-German)
a car comes ( verb of motion ing) ein Auto kommt( BewegungsverbPartizip Perfekt)
another car came careering
around the corner
noch ein Wagen kam
um die Ecke gerast
a road has a . . .mph speed limit auf einer Strae ist die
Geschwindigkeit auf . . . km/h
begrenzt/ auf einer
Strae gilt eine
Geschwindigkeitsbegrenzung
von . . . km/h
there is a car somewhere ein Auto fahrt irgendwo
there was hardly a car
on the streets
es fuhr kaum ein Auto
shall we go the [place name] way? sollen wir uber [Ortsname]
fahren?
a road takes s.o. somewhere /
a road takes s.o. [distance]
somewhere (through / past /
to / into / across s.th.)
eine Strae fuhrt ( j-mden)
irgendwo hin / eine Strae
geht irgendwo hin / uber eine
Strae erreicht man [(nach)
Distanz] [Ort]
a gust of wind / a bend (etc.) forces
a car / s.o. (somewhere:
off the road, into the crash barrier, into
the path of another vehicle, etc.); . . . forces
a car to swerve (somewhere); causes a
car to swerve; {wind, force of the impact}
pushes a car somewhere
eine Windboe (usw.) drangt
j-mden / ein Fahrzeug
(irgendwohin) ab; der Wind
druckt ein Fahrzeug aus der
Fahrtrichtung; der Wind
druckt ein Fahrzeug zur
Seite; in einer Kurve wird
ein Fahrzeug abgedrangt
Table 12: Cross-linguistic difference in verb patterning
English French
the car swerved (1) across the road
and (2) into the ditch
la voiture (1) a traverse la route et
(2) a fini dans le fosse
the car veered (1) off the side of the
road and (2) several yards down an
embankment
la voiture (1) sest deportee sur le
cote de la route et (2) a devale a`
plusieurs me`tres en contrebas
Collocation, Colligation and Encoding Dictionaries 29
-
applied by Thorndike 1921); in addition, all such corpora should be compiled
for several languages. This is exactly the procedure followed in the afore-
mentioned investigation of road traffic vocabulary, which used a specialist
trilingual corpus of around 200 million words and three large general corpora
of around 600 million words. Such breadth in corpus selection will usually
enable the lexicographer to fill gaps in the corpora of one language by
translating an item from another language (of course, the translation should
itself be checked against a very large corpus such as the Internet). To give
a simple example, the French collocation heurter de plein fouet is highly
common in newspaper reports on car accidents, but corresponding English
collocations such as hit with full force / at speed are extremely rare in
comparable English corpora.
Such a procedure is also of great interest to contrastivists, since it enables
them to discover lexical gaps and divergences in colligational or clause patterns
(see above). Thus, the aforementioned study of motoring vocabulary showed
that there is no standard English equivalent for German aus der Kurve getragen
werden or French etre deporte dans un virage; however, expressions such
as wipe out on the bend or veer off the road on the bend may fill the bill.
Table 13: Collocational synonymy in an onomasiological dictionary
English German
driving standards / driving practice /
driving behaviour / road manners
das Fahrverhalten / das Verhalten
im Straenverkehr
s.o. sticks to the speed limit / s.o.
keeps to the speed limit / s.o.
observes the speed limit
j-m halt sich an die
Geschwindigkeitsbegrenzung /
j-m beachtet die
Geschwindigkeitsbegrenzung
a car turns over three times / rolls
three times / somersaults three
times / overturns three times
j-m / ein Fahrzeug uberschlagt
sich dreimal
s.o. / a car is stopped by the police
(*s.o. is pulled by the cops)
j-m / ein Wagen wird von der Polizei
angehalten (*wird von den Bullen
gestoppt)
a car / a trailer swerves / goes out of
control / wipes out / veers off its
path
j-m / ein Wagen bricht aus; j-m gerat
aus der Spur; j-m kommt von der
Fahrtrichtung ab; j-m gerat ins
Trudeln
a car gets trapped under another /
a car is jammed under another / a
car is left wedged under another / a
car is left embedded under another
ein Fahrzeug verkeilt sich in einem
anderen / ein Fahrzeug ist
eingekeilt unter einem anderen
30 Dirk Siepmann
-
Similarly, monolingual German lexicography might well overlook such
colligational patterns as Geschwindigkeit auf der Autobahn or Strae, auf der
sich gut fahren lat, whereas combinations such as the compound noun
motorway speed or the adjective-noun collocation a good driving road will be
readily detectable in an English corpus. Of course, such considerations are also
true for the other translation direction (cf. sick note on demand
Gefalligkeitsattest certificat de complaisance; accident involving . . . accident
mettant en cause . . . Unfall, an dem . . . beteiligt sind ).
Finally, it should be noted that, if the aim is to cover collocation as well as
colligation, then it will be impossible to fully automate the dictionary-making
process in the foreseeable future. The reason for this is that such colligational
patterns as NP/ADJ dans lame / en herbe (etc.) cannot be located in even themost sophisticated tagged corpora, since the retrieval software will also come
up with such sequences as NP/ADJ dans la maison / dans la grotte / danslhotel (etc.). Human intervention will thus remain indispensable.
6. Collocation types, lemma types and citation forms
As seen above, a useful distinction can be established between four major types
of collocational relationship. However, the distinction cannot be transferred
as such to the dictionary for a number of reasons:
(1) Firstly, there is no one-to-one correspondence between collocation types
and the three traditional lemma types (one-item lemma, multi-item lemma,
morphematic lemma); long-distance collocations do not fall into any of
these three categories; they also cut across the boundary of categories 2
and 3, as do some two-item collocations.
(2) Any dictionary maker who aims at commercial viability and user
friendliness should at least be wary of representing collocations of type 3
by means of general semantic labels such as [uncertainty] not so. In suchcases it may be wiser to exemplify rather than abstract away from actual
instances. For maximum user friendliness, the example should exhibit
prototypical features of the collocation to be recorded (cf. Harras 1989: 611
on entry words; on prototype theory, see Aitchison 1994). In learners
dictionaries, the definition may help to introduce an element of generality
or abstraction that would be missing in other dictionaries, as witness the
example in Cobuild style (see Figure 1; Siepmann 2005: 318).
Note the pioneering use of broken underlines to illustrate the presence of
long-distance collocational attraction based on semantic features. The same
typographical presentation could be used in any bilingual dictionary. Since
bilingual dictionaries do not normally contain definitions, at least two examples
Collocation, Colligation and Encoding Dictionaries 31
-
of each collocation should be given for the user to form a correct under-
standing of its use and to be able to use it productively in a new context.
Accordingly, unabridged dictionaries of the future should contain at least
the three major types of lemmas (one-item lemmas, multi-item lemmas
and morphematic lemmas)10; to this we might add separable lemmas as
representations of long-distance collocations and some collocations of type 3
(see Table 14). As seen in Tables 5 and 6, complementation patterns can be
shown using placeholders such as so or sth or typical representatives of the
semantic class which can be inserted into a particular slot, such as abeille
in Table 5.
7. The limits of translatability
Opponents of bilingual dictionaries or vocabulary lists for encoding purposes
have often argued that such learning materials encourage the erroneous
assumption of one-to-one equivalences between items. The argument is
clearly valid if we equate one-word items such as house and maison or
English population and French population, but it falls apart in the case of
so /sou/(...)12 You can use not so to say that what you have juststated is untrue although it may have seemed probableat first sight. This use is particularly common in writtenEnglish. Some might think Volkswagen, which nowowns 70 per cent of the Czech company, would havethought the Skodas identity problematic. Not so. VWsees Skoda as one of the most recognised brand namesin advertising.
PHR assentencePRAGMATICS
Figure 1: A sample entry for not so in Cobuild style
Table 14: Lemma types
Linguistic Category Lemma type Example
morpheme morphematic lemma un micro-N, ein Hobby-N
lexeme one-item lemma une pomme
collocations of
type 1, 2 and 3
multi-item lemma:
a) colligational
b) collocational
a) N a` ses heures
b) une pomme de terre,
tomber dan