Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

41
Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University

Transcript of Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Page 1: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Gradience and Similarity in Sound, Word, Phraseand Meaning

Jay McClelland

Stanford University

Page 2: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Collaborators

Dave Rumelhart Mark Seidenberg Dave Plaut Karalyn Patterson Matt Lambon Ralph Cathy Harris Gary Lupyan Lori Holt Brent Vander Wyk Joan Bybee

Page 3: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

The Compositional View of Language (Fodor and Pylyshyn, 1988) Linguistic objects may be atoms or

more complex structures like molecules.

Molecules consist of combinations of atoms that are consistent with structural rules.

Mappings between form and meaning depend on structure-sensitive rules.

This allows languages to be combinatorial, productive, and systematic.

[ John [ hit [the ball] ] ] [ [w [ ei [t]]] [^d]

S NP, VPVP V NP; NP …

word stem+affix stem {syl}+syl’+{syl} syl {onset} + rhyme

rhyme nuc + {coda}

Subj Agent Verb Action Obj Patient

Vi+past stemi + [^d]

Page 4: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Critique

The number of units present in an expression is not always clear

The number of different categories of units is not at all clear

Real native ‘idiomatic’ language ability involves many subtle patterns not easily captured by rules

There is no generally accepted framework for characterizing how rules work

Page 5: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

How many mountains?

Page 6: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.
Page 7: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

There is less discreteness in some cases than others

And more in some domains than to others

Page 8: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Some cases in language where it is hard to decide on the number of units How many words?

Cut out, cut up, cut over; cut it out? Barstool, shipmate; another, a whole nother

How many morphemes? Pretend, prefer, predict, prefabricate Chocoholic, chicketarian Strength, length; health, wealth; dearth, filth

How many syllables? Every, memory, livery; leveling, shoveling; evening…

How many phonemes? Teach, boy, hint, swiftly, softly Memory, different What happened to you?

Page 9: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Cases in which it is unclear how many types of units are needed Object types:

Species California redwoods Butterflies along a

mountain range Types of tomatoes

Restaurants Japanese Italian Seafood

Linguistic types Word meanings

ball run

Segment types fuse, fusion dirt, dirty (cf. sturdy)

Page 10: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Characterizations of how rules work

Rule or exception (Pinker et al) V + past Stem + /^d/ gowent; dig dug;

keep kept; say said General and specific rules (Halle, Marantz)

V + past Stem +/^d/ if stem ends in ‘eep’: ‘ee’ ‘eh’ if stem = say: ‘ay’ ‘eh’

Output oriented approaches OT: e.g. ‘No Coda’ Bybee’s output oriented past tense schemas

A lax vowel followed by a dental, as in hit, cut, bid, waited ‘ah’ or ‘uh’ followed by a (preferably nasalized) velar

as in (sang, flung, dug …)

Page 11: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

How do the general and the specific work together? Past Tenses

likeliked but keepkeptpaypaid but saysaid

English spellingsound mappingmint, hint, … but pintsave, wave, … but have

Meanings of sentencesJohn saw a dogJohn saw a doctor

Page 12: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Can the contexts of application of the more specific patterns be well defined? For the past tense

Generally, words with more complex rhymes will be more susceptible to reduction *VV[S]t where [S] stands for stop consonant

Item frequency and number of other similar items both appear to contribute

For spelling to sound Sources of spelling are lost in history But item frequency and similar neighbors play important roles

For constructions Characterization of constraints is generally relatively vague and seems to be a matter

of degree Subj: Human V: saw Obj: Professional ‘Paid a visit to’ John saw an accountant John saw an architect The baby saw a doctor The boy saw a doctor

Perhaps similarity to neighbors plays an important role here as well

Page 13: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Summary

Linguistic objects vary continuously in their degree of compositionality and in their degree of systematicity

While some forms seem highly compositional and some forms seem highly regular/systematic, there is generally a detectable degree of specificity in every familiar form (Goldberg)

Even nonce forms reflect specific effects of specific ‘neighbors’

It may be useful to adopt the notion that language consists of tokens selected from a specified taxonomy of units and that linguistic mappings are determined by systems of rules…

BUT, an exact characterization is not possible in this framework

Units and rules are meta-linguistic constructs which do not play a role in language processing, language use or language acquisition.

These constructs impede understanding of language change

Page 14: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

What will the alternative look like?

It will be a system that allows continuous patterns over time

-- articulatory gestures and auditory waveforms

to generate graded and distributed internal representations that capture linguistic structure and mappings

in ways that respect both

the continuous and discrete aspects of linguistic structure

without enumeration of unitsexplicit representation of rules

Page 15: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Using neural network models to capture these ideas

Page 16: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Units in Neural Network Models

Many neural network models rely on distributed internal representations in which there is no discrete representation of linguistic units.

To date most of these models have adopted some sort of concession to units in their inputs and outputs.

We do this because we have not yet achieved the ability to avoid doing so, not because we believe these units exist

Page 17: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

A Connectionist Model of Word Reading (Plaut, McC, Seidenberg & Patterson, 1996)

Task is to learn to map spelling to sound, given spelling-sound pairs from 3000 word corpus.

Network learns gradually from frequency weighted exposure to pairs in the corpus.

For each presentation of each item: Input units corresponding to spelling

are activated. Processing occurs through

propagation of activation from input units through hidden units to output units, via weighted connections.

Output is compared to the item’s pronunciation.

Small adjustments to connections are made to reduce difference.

M I N T

/m/ /I/ /n/ /t/

Page 18: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Aspects of the Connectionist Model

Mapping through hidden units forces network to use overlapping internal representations.

-Allows sensitivity to combinations if necessary

-Yet tends to preserve overlap based on similarity

Connections used by different words with shared letters overlap, so what is learned tends to transfer across items.

M I N T

/m/ /I/ /n/ /t/

Page 19: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Processing Regular Items: MINT and MINE

Across the vocabulary, consistent co-occurrence of M with /m/, regardless of other letters, leads to weights linking M to /m/ by way of the hidden units.

The same thing happens with the other consonants, and most consonants in other words.

For the Vowel I: If there’s a final E produce /ai/ Otherwise produce /I/

M I N T

/m/ /I/ /n/ /t/

Page 20: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Processing an Exception: PINT

Because PINT overlaps with MINT, there’s transfer Positive for N -> /n/ and T -> /t/ Negative for I -> /ai/

Of course P benefits from learning with PINK, PINE, POST, etc.

Knowledge of regular patterns is hard at work in processing this and all other exceptions.

The only special thing the network needs to learn is what to do with the vowel.

Even this will benefit from weights acquired from cases such as MIND, FIND, PINE, etc.

P I N T

/p/ /ai/ /n/ /t/

Page 21: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Model captures patterns associated with ‘units’ of different scopes without explicitly representing them.

The model learns basic regular correspondences, generalizes appropriately to non-words. mint, rint; seat, reat; rave, mave…

It learns to produce the correct output for all exceptions in the corpus. pint, bread, have, etc…

It is sensitive to sub-regularities such as special vowels with certain word-final clusters, c-conditioning, final-e conditioning… sold, nold; book, grook;

plead, tread, ?klead bake, dake; rage, dage / rice, bice

Shows graded sensitivity modulated by frequency to item-specific, rhyme-specific, and context-sensitive correspondences.

pint

bread

hint

dent

High LowFrequency

Err

or /

Set

tling

Tim

e

Page 22: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

How does it work? Correspondences of different

scopes are represented in the connections between the input and the output that depends on them.

Some correspondences, e.g. in the word-initial consonant cluster, are highly compositional, and the model treats them this way.

Others, such as those involving the pronunciation of the vowel, are highly dependent on context, but to a degree that varies by with the type of item.

Page 23: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Elman’s Simple Recurrent Network

Finds larger units with coherent internal structure from time series of inputs.

Series are usually discretized at conventional linguistic unit boundaries, but this is just for simplicity.

Uses hidden unit state from processing of previous input as context for next input.

Page 24: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Elman networks learn syntactic categories from word sequences

Page 25: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.
Page 26: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Elman (1991) Explored Long-DistanceDependencies

Page 27: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

NV Agreement and Verb successor prediction

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Page 28: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Prediction withan embedded clause

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Swho

Vp

Vs

N

Page 29: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Attractor Neural Networks

Advantages Discreteness as well as

continuity Captures general and

specific in a single network for semantic as well as spelling-sound regularity

General information is learned faster and is more robust to damage, capturing development and learning

Adding context would allow context to shade or select meaning

Page 30: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Can we do without units on the input and the output? I think it will be crucial to do so because speech

gestures are continuous.They have attractor-like characteristics but

also vary continuously in many ways and as a function of a wide range of factors

It will then be entirely up to the characteristics of the processing system to exhibit the relevant partitioning into units

Page 31: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Keidel’s model that learns to translate from continuous spoken input to articulatory parameters.

The input to the model is a time series of auditory parameters from actual spoken CV syllables.

Output is the identity of the C and the V, but…

It should be possible to translate from auditory input to the continuous articulatory movements that would ‘imitate’ the input. An important future direction

Page 32: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Units and Rules as Emergents

In all three example models, units and rules are emergent properties that admit of matters of degree.

We can choose to talk about such things as though they have an independent existence for descriptive convenience but they may have no separate mechanistic role in language processing, language learning, language structure, or language change.

Although many models use ‘units’ in their inputs and outputs, the claim is that this is a simplification that actually limits what the model can explain.

Page 33: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Beyond the Phone and the Phoneme

Some additional problems with the notions of phonetic segment.

Model of gradual language change exhibiting pressure to be regular and to be brief.

Page 34: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Just a Few of the Problems with Segments in Phonology Enumeration of segment types is fraught with problems.

No universal inventory; there are cross-language similarities of segments but every segment is different in every language (Pierrehumbert, 2001).

When we speak the articulation of the same “segment” depends on Phonetic context Word frequency and familiarity Degree of compositionality, which in turn depends on frequency Number of competitors Many other aspects of context…

Presence/absence of aspects of articulation is a matter of degree. Nasal ‘segment’, release burst, duration /degree of approximation to closure

in l’s, d’s and t’s… Language change involves a gradual process of reduction/adjustment.

Segments disappear gradually, not discretely. What is it half way through the change?

The approach misses out on some of the global structure of spoken language that needs to be taken into account in any theory of phonology.

Page 35: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

A model of language change that produces irregular past tenses (with Gary Lupyan)

Our initial interest focused on quasi-regular exceptions: Items that add /d/ or /t/ and reduce the vowel:

Did, made, had, said, kept, heard, fled…

Items already ending in /d/ or /t/ that change (usually reduce) the vowel:

hid, slid, sat, read, bled, fought..

We suggest these items reflect historical change sensitive to: Pressure to be brief contingent on comprehension Consistency in mapping between sound and meaning

Page 36: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Two constraints on communication

The spoken form I produce is constrained:

To allow you to understand

To be as short as possible given that it is understood.

My understanding of what you said

My Intended Meaning

Speech

Your understanding of what I said

Your Intended Meaning

Page 37: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Simplified version of this actually explored by Lupyan and McClelland (2003)

The network has a Phonological word pattern Corresponding semantic pattern

For present and past tense forms of 739 verbs It is trained with the phonological word form as

input, and this is used to produce a semantic pattern.

The error at the output layer is back-propagated allowing a change in the connection weights.

The error is also back-propagated to the input units, and is used to adjust the phonological word pattern.

There is also a pressure on the phonological word form representation to be simpler, depending on how well the utterance was understood (summed error at the output units).

The improved phonological word form is then stored in the list.

Your understanding of what I said

What I say whenI want to communicate a particular message.

Page 38: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Model Details: L&M Simulation 2a Semantic patterns

‘Quasi-componential’ representations of tense plus base word meaning are created, based on including tense information in the feature vectors passed through the encoder network.

The representation of past tense varies somewhat from word to word.

Phonological patterns have one unit per phoneme but long vowels or diphthongs have an extra unit, plus a unit for the syllabic ‘ed’. Initialized with binary values (0,1).

Although units still stand for phonemes, presence/absence is a matter of degree.

Learning rate for the representation is slow relative to learning rate for the weights.

739 monosyllabic verbs, frequency weighted. Training corpus is fully regularized at the start of the simulation.

Page 39: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Simulation of Reductive Irregularization Effects

In English, frequent items are less likely to be regular.

Also, d/t items are less likely to be regular.

The same effects emerge in the simulation.

While the past tense is usually one phoneme longer than present, this is less true for the high frequency past tense items.

Reduction of high frequency past tenses is to a phoneme other than the word final /d/ or /t/. Regularity and role in mapping

to meaning protects inflection.

Page 40: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Further Simulations

Simulation 2b showed that when irregulars were present in the training corpus, the network tended to preserve their irregularity.

In ongoing work an extended model shows a tendency to regularize low-frequency exceptions.

Simulation 2c used fully componential semantic representation of past tense, resulting in much less tendency to reduce.

Page 41: Gradience and Similarity in Sound, Word, Phrase and Meaning Jay McClelland Stanford University.

Discussion and Future Directions

The work discussed here is a small example of what needs to be accomplished, even for a model of phonology.

Extending the approach to continuous speech input will be a big challenge

Extending continuous speech to full sentences as input and output will be a bigger challenge still

Neural network approaches are gaining promenance as processing power grows, and these things will be increasingly possible

It will still be useful to notate specific linguistic units, but machines will not need these to communicate – no more that our minds need them to speak and understand.