1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied...

51
1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk University, December 2009

Transcript of 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied...

Page 1: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

1

Computational Lexicography:

Mapping Meaning onto Use

Patrick HanksInstitute of Formal and Applied Linguistics,

Charles University in Prague

Masaryk University, December 2009

Page 2: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Talk Outline1. How to map word meaning onto word use?

– Contrast American and European linguistic theory

2. Dictionaries– before and after corpora

3. Collocations and lexical preferences

4. Practical corpus analysis• Look first for the pattern, not the meaning

5. Creativity in language: how writers and speakers ‘exploit’ normal patterns of word use

2

Page 3: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Lexical theory in American linguistics

• Bloomfield; Chomsky and his followers –

– say a lot about syntax, but little about words and meanings,

– even though, according to Chomsky’s projection principle, the syntax of a well-formed sentence is “projected” from the lexical items in the sentence, plus “selectional restrictions”.

– “Meaning is dangerous ground.” – N. Chomsky

• Pustejovsky: The Generative Lexicon. • Fillmore; Goldberg; Jackendoff: Construction Grammar

– Meanings are carried by constructions

– A construction can be a word (e.g. “sleep”) or a phrase (e.g. “she slept her way to the top”)

3

Page 4: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Foundations of lexical research (in European linguistics)

• Saussure and C20 structuralists– Semantic field theory (Trier, Porzig, Gipper)– Structure of meaning in language (Coseriu, Pottier)– Lyons, Ullmann

• Firth, Halliday, Sinclair– “You shall know a word by the company it keeps”– To study the lexicon is to study collocations– By empirical analysis of corpus data

4

Page 5: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Making lexical analysis possible• Wittgenstein (1953):

– “Do not ask for the meaning, ask for the use.”– “What is a game? Do not assume a common property,

but look and see.”• Grice (1957, 1975):

– Conversational co-operation presupposes a set of linguistic conventions that users of a language know and expect others to know

• Rosch (1975, etc.): Prototype theory

• Lakoff and Johnson (1981): – The fundamentally metaphorical nature of many abstract concepts– Regular alternation between concrete and abstract ((e.g. grasp a

handle, grasp an idea)

5

Page 6: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Dictionaries before corpora

• Based on collections of citations– Literary language, not everyday language– “Historical principles” – oldest meaning first

• Or based on introspection – Introspection distorts data. Example, “total” as a verb

• EXERCISE: Invent a sentence using total as a verb.

– What we think we say and what we actually say are different

– Cognitive salience (ease of recall) vs. social salience (frequency of use)

6

Page 7: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

James Murray (1878) predicts the need for corpus data

• “The editor and his assistants have to spend precious hours searching for examples of common everyday words. Thus, in the slips, we have 50 citations for abusion, but for abuse, not five.” – James Murray, Presidential address to the Philological Society, 1878

• Murray was the first editor of the Oxford English Dictionary

7

Page 8: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Analysis of Meaning in Language

• Analysis based on predicate logic is doomed to failure:– Words are NOT building blocks in a ‘Lego set’

– A word does NOT denote ‘all and only’ members of a set

– Word meaning is NOT determined by necessary and sufficient conditions for set membership

• Instead, a prototype-based approach to the lexicon is necessary: – mapping prototypical interpretations onto prototypical phraseology

– Prototypical phraseology includes collocational preferences

– E.g. what do you hazard? – Typically, you hazard a guess.

– Not a necessary condition, but a collocational preference

8

Page 9: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Patterns in Corpora

• When you first open a concordance, very often some patterns of use leap out at you. – Collocations make patterns: one word goes with another

– To see how words make meanings, we need to analyse collocations

• The more you look, the more patterns you see.

• BUT

• When you try to formalize the patterns, you start to see more and more exceptions.

• The boundaries are fuzzy and there are many outlying cases.

9

Page 10: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

10

John Sinclair (1933-2007): Why is Sinclair important? (1)

Sinclair was editor-in-chief of the Cobuild dictionaries for foreign learners. (I was the managing editor)

Collocations:

• “Many, if not most meanings, require the presence of more than one word for their normal realization.”

• “Patterns of co-selection among words, which are much stronger than any description has yet allowed for, have a direct connection with meaning.” (Sinclair 1998, The Lexical Item, page 4)

Page 11: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

11

Why is Sinclair important? (2)The idiom principle (also known as the phraseological tendency) vs. the open-choice principle: “The principle of idiom is that a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments.” (Sinclair 1991. Corpus, Concordance, Collocation, p. 110)

“Tending towards open choice is what we can dub the terminological tendency, which is the tendency for a word to have a fixed meaning in reference to the world. ... tending towards idiomaticity is the phraseological tendency, where words tend to go together and make meanings by their combinations.” (Sinclair 2004. Trust the Text, p. 29)

Page 12: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

12

Some problems with current dictionaries

• Definitions are not mutually exclusive.

• There is no general agreement on what counts as a word sense.

• No clear criteria are given in dictionaries for distinguishing one sense from another.

• There is very little syntagmatic or collocational information in English dictionaries

• Some Italian, German, Greek, Italian dictionaries try to give it, but without sufficient evidence

Page 13: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Dictionary definitions are often not mutually exclusive

• People have tended to assume that dictionary sense distinctions are mutually exclusive. Quite often, they are not. E.g.:

pour, v.t.

1. cause a liquid to flow. 2. serve (tea, coffee, etc.) by putting it into a

cup.

These defs. look different, but actually 2 is just a subset of 1.

Tea, coffee etc., are liquids, and “putting” is done by “causing to flow”. Only the social context of 2 is different (more restricted)

13

Page 14: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

14

Word sense disambiguation

Lesk (1986): ‘How to tell a pine cone from an ice cream cone’, using OALD definitions:

pine 1. kind of evergreen tree with needle-shaped leaves. 2. waste away through sorrow or illness.

cone 1. a solid object with a round flat base and sides that slope up to a point… 2. something of this shape whether solid or hollow. 3. a piece of thin crisp biscuit shaped like a cone, which you can put ice cream in to eat it. 4. the fruit of certain evergreen trees.

• Lesk here is trying to “disambiguate” by contextual criteria. • Existing dictionary definitions are his starting point• He did not look for patterns of word use in corpora• We want to research new definitions based on actual usage• Associate the meaning with the PATTERN not the word

Page 15: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

15

Lumping and splitting

Most dictionaries are splitters. E.g. why did OALD 1963 make these two senses (cone)?

• 1. a solid object with a round flat base and sides that slope up to a point… 2. something of this shape whether solid or hollow.

Why not make it:• a solid or hollow object with a round flat base and sides

that slope up to a point

This problem is endlessly multiplied in entry after entry in most English dictionaries.

Page 16: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Dictionaries, collocations, phraseology: a new approach

• Start with verbs– The verb is the pivot of the clause

• Distinguish normal uses from creative uses and mistakes– Dictionaries should record norms, not

exploitations of norms• Use the corpus to find the phraseological and

collocational patterns associated with each verb

16

Page 17: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Meaning is context

• What does fire mean? file? treat? land? …• Words in isolation don’t have meaning!• They have one or more meaning potential

– Meaning potential is multiple and vague.

• Put a word in context, and its vagueness is reduced or even (for practical purposes) eliminated.– Theoretically, there is always the potential for extended

meaning

17

Page 18: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

18

Context and Collocations

• “You shall know a word by the company it keeps” – J. R. Firth.

• Corpus analysis can show what company our words keep.

• Frequency alone is not enough: “of the” is a frequent collocation – but not interesting!

• “(the) storm abated” is less frequent, but more interesting. Contrasted with “(the) threat abated”, it can give a different meaning to the verb abate.

• We need a way of measuring the statistical significance of collocations. Sketch Engine provides one.

Page 19: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

19

Mutual information

• A way of computing the statistical significance of two words in collocation.

• Compares the actual co-occurrence of two words in a corpus with chance.

• Church and Hanks (1990): ‘Word Association Norms, Mutual Information, and Lexicography’ in Computational Linguistics 16:1.

• Kilgarriff, Rychlý, et al. (2004): “The Sketch Engine”, Proceedings of Euralex 2004. Lorient, France.

Page 20: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

20

Deciding relevant context

• Peter treated Mary.

• Peter treated Mary respectfully.

• Peter treated Mary with respect.

• Peter treated Mary with antibiotics.

• Peter treated Mary to lunch.

• Peter treated Mary to his views on George W. Bush

• Peter treated the woodwork with creosote.

Page 21: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

21

Normal implicatures: taking prototypes and domain seriously

• If someone files a lawsuit, they activate a procedure asking a court for justice.

• When a pilot files a flight plan, he or she informs ground control of the intended route and obtains permission to begin flying. …

(12 more such definitions of file, verb.)

Now we have to find whether any other words that have similar meaning to “lawsuit” or “flight plan” in this context

Contrast:

• When a group of people file into a room or other place, they walk in one behind the other.

Page 22: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Norms and Exploitations• Anyone acquiring a language must learn not one, but two,

forms of rule-governed behavior: 1. The ability to use words normally.

2. The ability to exploit the norms mentioned in 1 in a creative way, typically in order to say new things or to say old things in a new, interesting way.

• The norms of natural languages have not yet been satisfactorily described

due in part to lack of data until about 10 years ago and in part to linguists treating exploitations as if they were norms).

• Exploitation rules – second-order rules – cannot be fully described until the norms are known.

22

Page 23: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

23

Norms

• How words are normally used.• Patterns of usage.• Descriptive (not prescriptive). • Norms are acquired by priming (M. Hoey) and by

reinforcement (repeated exposure). • Norms can be discovered by systematic, empirical

Corpus Pattern Analysis (CPA).

Page 24: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

24

Exploitations

• People don’t just say the same thing, using the same words repeatedly.

• They also exploit norms in order to say new things, or in order to say old things in new and interesting ways.

• Exploitations include metaphor, ellipsis, word creation, and other figures of speech.

• Exploitations are the rules that govern linguistic creativity.

Page 25: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

25

The CPA method

• CPA: Corpus Pattern Analysis (based on TNE: the Theory of Norms and Exploitations).

1. Find the significant collocates and tag those lines first. 2. Then create a sample concordance (KWIC index):

– from a ‘balanced’ corpus (i.e. general language): BNC50– 250 examples of actual uses of the word to start with

3. Classify every line in the sample, by context. 4. Analyse a larger sample if necessary. 5. Use introspection to interpret data, but not to create data.

Page 26: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

26

In CPA, every line in the sample must be classified

An important principle of statistical analysis.

The classes are:

• Norms

• Exploitations

• Alternations

• Names (Midnight Storm: name of a horse, not a storm)

• Mentions (to mention a word or phrase is not to use it)

• Errors (e.g. learned mistyped as leaned)

• Unassignables

Page 27: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

27

Methodological precepts

• Don’t look for necessary conditions for the meaning of a word. (There aren’t any.)

– “This elephant is a mouse” is an unlikely sentence of English – but not meaningless.

• There are innumerable possible but unlikely sentences.• Don’t try to account for all possibilities. Very few of them

are interesting. • Corpus linguistics and prototype theory provide a new

focus – on actual and probable sentences and meanings.

Page 28: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

A typical pattern dictionary entry

• irritate

PATTERN 1 (90%): [[Anything]] irritate [[Human]]

IMPLICATURE: [[Anything]] causes [[Human]] to feel mildly annoyed.

PATTERN 2 (8%): [[Stuff]] irritate [[Body Part]]IMPLICATURE: [[Stuff]] causes [[Body Part]] to become inflamed and somewhat painful.

• Notes:Semantic values are assigned to arguments.

Both patterns are transitive (V n), but they have different meanings.

They are distinguished by the semantic types of the nouns.

Getting the right level of semantic generalization for each n is hard.

28

Page 29: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

29

Another CPA verb norm

abate/V BNC frequency: 185 in 100m.

1. [[Event = Storm]] abate [NO OBJ] (11%)

2. [[Event = Flood]] abate [NO OBJ] (4%)

3. [[Event = Fever]] abate [NO OBJ] (2%)

4. [[Event = Problem]] abate [NO OBJ] (44%)

5. [[Emotion = Negative]] abate [NO OBJ] (20%)

6. [[Person | Action]] abate [[State = Nuisance]] (19%) (Domain: Law)

But if you wanted to, you could lump 1-5 together into a single sense.

Page 30: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

30

Unsorted sample from a concordance

incessant noise and bustle had abated. It seemed everyone was up

after dawn the storm suddenly abated. Ruth was there waiting when

Thankfully, the storm had abated, at least for the moment, and

storm outside was beginning to abate, but the sky was still ominous

Fortunately, much of the fuss has abated, but not before hundreds of

, after the shock had begun to abate, the vision of Benedict's

been arrested and street violence abated, the ruling party stopped

he declared the recession to be abating, only hours before the

‘soft landing’ in which inflation abates but growth continues moderate

the threshold. The fearful noise abated in its intensity, trailed

ability. However, when the threat abated in 1989 with a ceasefire in

bag to the ocean. The storm was abating rapidly, the evening sky

ferocity of sectarian politics abated somewhat between 1931 and

storm. By dawn the weather had abated though the sea was still angry

Page 31: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

31

[[Event = Storm]] abate [NO OBJ]

dry kit and go again.The storm abates a bit, and there is no problem in

ling.Thankfully, the storm had abated, at least for the moment, and the

sting his time until the storm abated but also endangering his life, Ge

storm outside was beginning to abate, but the sky was still ominously o

bag to the ocean.The storm was abating rapidly, the evening sky clearin

after dawn the storm suddenly abated.Ruth was there waiting when the h

t he wait until the rain storm abated.She had her way and Corbett went

storm.By dawn the weather had abated though the sea was still angry, i

lcolm White, and the gales had abated: Yachting World had performed the

he rain, which gave no sign of abating, knowing her options were limite

n became a downpour that never abated all day.My only protection was

ned away, the roar of the wind abating as he drew the hatch closed behi

Page 32: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

32

[[Event = Problem]] abate [NO OBJ]

‘soft landing’ in which inflation abates but growth continues moderaFortunately, much of the fuss has abated, but not before hundreds of

the threshold. The fearful noise abated in its intensity, trailed

incessant noise and bustle had abated. It seemed everyone was up

ability. However, when the threat abated in 1989 with a ceasefire in

the Intifada shows little sign of abating. It is a cliche to say that

h he declared the recession to be abating, only hours before the pub

he ferocity of sectarian politics abated somewhat between 1931 and 1

been arrested and street violence abated, the ruling party stopped b

the dispute showed no sign of abating yesterday. Crews in

Page 33: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

33

[[Emotion = Negative]] abate [NO OBJ]

(selected lines)

ript on the table and his anxiety abated a little.This talented, if

that her initial awkwardness had abated # for she had never seen a

es if some inner pressure doesn't abate.He wanted to play at the fun

Baker in the foyer and my anxiety abated.He seemed disappointed and

hained at the time.When the agony abated he was prepared to laugh wi

self; the pain gradually began to abate spontaneously, a great relie

ght, after the shock had begun to abate, the vision of Benedict's sn

y calm, control it!) The fear was abating, the trembling beginning t

his dark eyes. That fear did not abate when, briefly, he halted. For

AN EXPLOITATION OF THIS NORM:isapproval, his kindlier feelings abated, to be replaced by a resurg

(“kindlier feelings” are normally positive, not negative.)

Page 34: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

34

Part of the lexical set [[Event = Problem]] as subject of ‘abate’

From BNC: {fuss, problem, tensions, fighting, price war, hysterical media clap-trap, disruption, slump, inflation, recession, the Mozart frenzy, working-class militancy, hostility, intimidation, ferocity of sectarian politics, diplomatic isolation, dispute, …}

From AP: {threat, crisis, fighting, hijackings, protests, tensions, violence, bloodshed, problem, crime, guerrilla attacks, turmoil, shelling, shooting, artillery duels, fire-code violations, unrest, inflationary pressures, layoffs, bloodletting, revolution, murder of foreigners, public furor, eruptions, bad publicity, outbreak, jeering, criticism, infighting, risk, crisis, …}

(All these are kinds of problem.)

Page 35: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

35

Part of the lexical set [[Emotion = Negative]] as subject of ‘abate’

From BNC: {anxiety, fear, emotion, rage, anger, fury, pain, agony, feelings,…}

From AP: {rage, anger, panic, animosity, concern, …}

Page 36: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

36

A domain-specific norm:[[Person | Action]] abate [[Nuisance]]

(DOMAIN: Law. Register: Jargon)

o undertake further measures to abate the odour, and in Attorney Ge

us methods were contemplated to abate the odour from a maggot farm

s specified are insufficient to abate the odour then in any further

as the inspector is striving to abate the odour, no action will be

t practicable means be taken to abate any existing odour nuisance,

ll equipment to prevent, and or abate odour pollution would probabl

rmation alleging the failure to abate a statutory nuisance without

t I would urge you at least to abate the nuisance of bugles forthw

way that the nuisance could be abated, but the decision is the dec

otherwise the nuisance is to be abated.They have full jurisdiction

ion, or the local authority may abate the nuisance and do whatever

Page 37: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

37

Lexical sets are contrastive in context

• Different lexical sets generate different meanings.

• Lexical sets are not like syntactic structures.

• In principle, lexical sets are open-ended, but most have high-value best examples.

• In practice, a lexical set may have only 1 or 2 members, e.g. take a {look | glance}.

• No certainties in word meaning; only probabilities.

• … but probabilities can be measured.

Page 38: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

38

A more complicated verb: ‘take’• 61 phrasal verb patterns, e.g.

[[Person]] take [[Garment]] off [[Plane]] take off

[[Human Group]] take [[Business]] over

• 105 light verb uses (with specific objects), e.g. [[Event]] take place [[Person]] take {photograph | photo | snaps | picture} [[Person]] take {the plunge}

• 18 ‘heavy verb’ uses, e.g. [[Person]] take [[PhysObj]] [Adv[Direction]]

• 13 adverbial patterns, e.g. [[Person]] take [[TopType]] seriously [[Human Group]] take [[Child]] {into care}

• TOTAL: 204, and growing (but slowly)

Page 39: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

39

A fine distinction: ‘take + place’

• [[Event]] take {place}: A meeting took place.• [[Person 1]] take {[[Person 2]]’s place}:

– George took Bill’s place; Bill left and George took his place.

• [[Person]] take {[REFLDET] place}: Wilkinson took his place among the greats of the game.

• [[Person=Competitor]] take {[ORDINAL] place}: The Germans took first place.

Page 40: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Another fine distinction: ‘break + [[Bone]]’

• “John broke his leg”• Whose leg?

• “John broke his nose”• Whose nose?

We need to distinguish ‘reflexive determiners’ from other kinds of possessive determiners

40

Page 41: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

41

Noun norms

• Norms for nouns are different in kind from norms for verbs.– Adjectives and prepositions are more like verbs.

• A different analytical apparatus is required for nouns.

• Prototype statements for each true noun can be derived from a corpus.

• Examples for the noun ‘storm’ follow.

Page 42: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

42

Storm (literal meaning) (1)

WHAT DO STORMS DO?• Storms blow.

• Storms rage.

• Storms lash coastlines.

• Storms batter ships and places.

• Storms hit ships and places.

• Storms ravage coastlines and other places. 

Page 43: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

43

Storm (literal meaning) (2)

BEGINNING OF A STORM:

• Before it begins, a storm is brewing, gathering, or impending.

• There is often a calm or a lull before a storm.

• Storms last for a certain period of time.

• Storms break.

END OF A STORM:

• Storms abate.

• Storms subside.

• Storms pass.

Page 44: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

44

Storm (literal meaning) (3)

WHAT HAPPENS TO PEOPLE IN A STORM?

• People can weather, survive, or ride (out) a storm.

• Ships and people may get caught in a storm.

Page 45: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

45

Storm (literal meaning) (4)

WHAT KINDS OF STORMS ARE THERE?

• There are thunder storms, electrical storms, rain storms, hail storms, snow storms, winter storms, dust storms, sand storms, tropical storms…

• Storms are violent, severe, raging, howling, terrible, disastrous, fearful, ferocious…

Page 46: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

46

Storm (literal meaning) (5)

OTHER ASSOCIATIONS OF ‘STORM’:

• Storms, especially snow storms, may be heavy.

• An unexpected storm is a freak storm.

• The centre of a storm is called the eye of the storm.

• A major storm is remembered as the great storm (of [[Year]]).

• STORMS ARE ASSOCIATED WITH rain, wind, hurricanes, gales, and floods.

Page 47: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Pattern Dictionary and FrameNet

CPA investigates syntagmatic criteria for distinguishing different meanings of polysemous words, in a “semantically shallow” way.

FrameNet:

• expresses the deep semantics of situations (frames);

• proceeds frame by frame, not word by word;

• analyses situations in terms of frame elements;

• studies meaning differences and similarities between different words in a frame;

• does not explicitly study meaning differences of polysemous words;

• does not analyse corpus data systematically, but goes fishing in corpora for examples in support of hypotheses;

• has problems grouping words into frames, and misses some;

• has no established inventory of frames;

• has no criteria for completeness of a lexical entry.47

Page 48: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Theoretical consequences and practical applications (1)

Pedagogical:

• Anyone acquiring a language must learn competence in two kinds of rule-governed linguistic behaviour:– How to use words normally

– How to exploit the norms (creative metaphors, ellipsis, etc.)

• A pattern dictionary gives comparative frequency of patterns.– A lexical syllabus could select only primary norms.

– “Primary norms” are a) high-frequency norms and b) concrete norms.

• In error analysis: what norm was aimed at?– If learners are exploiting norms creatively, do you (the teacher) really

want them to?

48

Page 49: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Theoretical consequences and practical applications (2)

For theoretical linguistics:

• Are some grammars better than others for representing how words are used to make meanings?‘S NP VP’: a confusion of language with predicate logic?

• ARG3 (sometimes called ‘adjunct’, ‘adverbial’): – CPA shows that a new grammar of adverbials is needed.

• Metaphor analysis:– CPA distinguishes conventional metaphors from exploitations.

• Ontologies:– The relationship between a possible ontology of words in use and

scientific conceptual ontologies such as WordNet.

49

Page 50: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

Theoretical consequences and practical applications (3)

• For computational linguistics and AI:• Improving machine translation

– Getting the pattern right is more likely to select the right translation.

• Parsing and word-class tagging: – CLAWS achieves ~90% accuracy in word-class tagging in BNC

– CPA reveals some systematic errors in CLAWS tagging.

• Anaphora resolution: – He found a glass of water on the table and drank it.

– ‘[[Animate]] drink [[Liquid]]’ selects water as a direct object

50

Page 51: 1 Computational Lexicography: Mapping Meaning onto Use Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague Masaryk.

51

Conclusions

• Meanings are associated with patterns – normal contexts, rather than with words in isolation.

• Normal contexts correlate statistically significant collocations in different clause roles.

• The whole language system is probabilistic and preferential.

• The probabilities can be analysed in a new kind of dictionary – a syntagmatic pattern dictionary.