Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A....

17
Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997) A summary by Peter Clark Boeing Research * * and Sue Atkins, the source of the original

Transcript of Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A....

Page 1: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Adam Kilgarriff doesn’t believe in word senses….

(“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2)

pp 91-113, 1997)A summary by Peter Clark

Boeing Research

*

* and Sue Atkins, the source of the original quote.

Page 2: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Word Sense Disambiguation

“Many words have more than one meaning. When a person understands a sentence with an ambiguous word in it, that understanding is built on the basis of just one of the meanings. So, as some part of the human language understanding process, the appropriate meaning has been chosen from the range of possibilities.”

The basic idea:

Page 3: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Thesis• The early work: Toy examples• More recently: statistical WSD

– Given the context of a word, identify the sense– Approaches:

• use sense-labeled training data• use user guidance (Yarowsky):

1. build a concordance for a word

2. user selects discriminating “seed” surrounding words (1 per sense)

3. classify the word occurrences, using the seeds, as to the word sense

4. find other words correlated with the word senses extra seeds

5. goto 3

• choose seeds automatically (clustering + thresholding)

Page 4: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Thesis (cont)

Treadmills attached to cranes were used to lift heavy

For supplying power to cranes, hoists, and lifts

above this height, a tower crane is often used. This

elaborate courtship rituals cranes build a nest of vegetation

are most closely related to cranes and rails. They ran

low trees. At least five crane species are in danger of

For example:

Two seeds

Page 5: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Antithesis

There is a computationally relevant/ useful/ interesting set of word senses in the language, approximating to those stated in a dictionary.

Page 6: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

What is a Word Sense? Exercise:

Have you put the money in the bank?The rabbit climbed up the bank.

How many word senses:

How many word senses:

Cut the rope.Cut down your daily fat intake.The car cut to the left at the intersection.Cut along the dotted line.The coach cut two players from the team.We cut through the neighbor’s yard to get home.The boat cut the water.Cut the engine.This cuts into my earnings.This soap cuts grease well.

Page 7: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Find senses using ambiguity tests? • The “word-sensers”: use linguistic tests to find senses• e.g., the “crossed readings” test (for example):

– “Mary arrived with a pike and so did Agnes.”– “Tom bought some beans, and so did Harry.”

• But:– can’t always construct a plausible test sentence:

“John ate the apple”, “John ate” = two senses?

“Mary ate, and John, the apple.” ?

– need to do interpretation decide on acceptability!• anomaly may not be lexical (word sense), but could be

parsing or pragmatic interpretation!

Page 8: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Fillmore similarly seemed to be trying this…

For each lexicographically relevant unit we want straightforward ways of asking:

What does it mean? In what contexts is it used? What other words belong to those contexts? What are its combinatorial properties? What words are derivationally related to it?

• different meanings in different contexts different senses• Different complementation patterns different senses• different nominalizations different senses

Page 9: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

So how do dictionary authors do it?• Lexicographers

– “lexicologists with a deadline” (and page limit)• is a highly pragmatic enterprise

– working for a human audience• describes the unknown with the known• affects what “acceptable” genus and differentiae are

– they often hedge on where the sense boundaries are

Cŭt n. 1. Act of cutting; stroke or blow with knife etc.; ~ and thrust, use of both edge and point of sword, (fig.) lively interchange of argument. 2. Act of utterance that wounds the feelings; (Cricket, Tennis, etc.) stroke made by cutting. 3. …

(Concise Oxford Dict.)

Page 10: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Alternative: Corpus-Based Methods • Data-driven, rather than theory-driven

– Cluster lines in a concordance for a word together– Each cluster will be a word sense– No reason to expect clear boundaries

• Kilgarriff: – Pattern of usage isn’t simply statistical coocurrence– Rather, involves a complex interplay between

different knowledge sources– For example…

Page 11: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

e.g., “Handbag”• 715 examples in BNC of plain uses of “handbag”

– put in, take out, look for, lose, steal, find etc.• But couple of dozen stretch this

– unique object • “an inimitable rendering of the handbag speech in…”

– metonymy• “She moved from handbags through gifts to the flower shop”

– metaphor• “with bats hanging in the trees like handbags”

– Mrs Thatcher• “A mad cow with a handbag”• “sent out Mrs Thatcher with a fully-loaded handbag”

Page 12: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

e.g., “Handbag” (cont)– handbag as weapon

• “Meg swung her handbag”• “determined women armed with heavy handbags”

– discos (“dance round your handbag”)• “Tensions mounted between regulars and the handbag brigade”

Page 13: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Predictability• The non-standard usages are predicable:

– not that you can use any word the way you like;– rather licence for usage comes from various sources

• standard usage + linguistic + world knowledge + collocation

e.g., “handbags at ten paces”

is okay and amusing while

“briefcases at ten paces”

“shoulder-bags at ten paces”

do not carry the weapon connotation, not so understandable

• thus could argue that handbag-as-weapon should be a word sense

Page 14: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Word Sense Hierarchies (Autohyponomy)Perhaps we have autohyponomy here?

handbag_1 (purse-like thing)

handbag_2 (handbag-as-weapon)

Happens all the time in dictionaries…

knife_1 (bladed object)

knife_2 (cutlery) knife_3 (weapon)

Page 15: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

Word Sense Hierarchies (Autohyponomy)

sanction_1 (control of some kind)

sanction_2 (imposed punishment) sanction_3 (official endorsement)

This ambiguity never occurs in usage

There are occasions where a “lowest common denominator” will be the appropriate reading make this a new word sense

Interesting side note…

Page 16: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

So What?

[Paraphrased] “A task-independent set of word

senses is not a coherent concept. Word senses are

simply undefined unless there is some underlying

rationale for clustering, some context which

classifies some distinctions as worth making and

others as not worth makings. Homonyms like

“pike” are a limiting case: in almost any situation

is is worth making the fish/weapon distinction.”

Page 17: Adam Kilgarriff doesn’t believe in word senses…. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997)

So What? (cont)• Objective notion of “word sense” is not well-defined:

– linguistic tests require human interpretation

– clustering methods depends on the corpus, and some user-defined notion of similarity and sufficient distinctiveness.

• Alternative: basic unit word sense. Rather:– basic unit occurrences of the word in context

– word sense clusters of those units

Word senses are –define relative to a set of interests

–“abstractions over clusters of word usages”